A bit of fun for a Friday. I bought a Kindle 3 when they were launched in the UK in August 2010. Over the past 9 months, my impression is that I’ve been buying more books than I used to, and that they’re mostly Kindle books. I have a second prediction, which is that I read more than I used to, but we won’t cover that today.
To check whether my intuition was correct, I decided to take a look at my Amazon order history. I do buy books from other places, but they’re a minority and tend to be photography books, a category which I’m excluding from this analysis as they’re not the kind of book I’d buy on a Kindle. All other types of book are included, even cookery and programming books.
The first task was to grab my order history. US customers have it easy – Amazon.com have a reporting facility that lets you download all your orders by year. Alas, this doesn’t currently work for the UK site, and there isn’t an API, so I resorted to scraping my order history using Python. I’ll cover this in more detail in a later post, but let’s just say that Mechanize and BeautifulSoup are awesome for doing this kind of thing – Mechanize pretends to be a browser, and so enables you to authenticate with Amazon and let Python into the good stuff. BeautifulSoup then tries to make sense of the HTML being returned by letting you parse the tag tree and grab elements of interest.
Thankfully, the updated physical order history uses ID and class names, which makes it a little easier to home in on different aspects of the order, so this wasn’t too tricky. The Kindle order history is another matter though: nested tables with no identifiers, such that my identifier to find an order block is to grab table rows which have bgcolor=’#ffffff’! Not pretty. The Kindle order page also doesn’t give any information about price – and although I didn’t need to include order total in the visualisation below, having price for the Kindle books was crucial because a large chunk of my downloads will have been for the free, out-of-print editions. Including these wouldn’t have been a fair comparison. So, to get price, I had to send another sub-request off to grab each individual order page from the Kindle order history.
A little while later, and the two scripts gave me 581 items ordered since 2000! (including the free eBooks) This includes non-book orders from Amazon, and helpfully, it appears that the Amazon ASIN identifier starts with a B when the product ID isn’t an ISBN, i.e. isn’t a book. This meant it was easy to separate out the two. I then manually removed anything that looked like a photography book, and brought the data into Tableau.
Surprise! My Kindle purchases per month in the valid period (it’s only been 9 months since I got my Kindle, so I’m only comparing September-May each year) nearly mirror my physical book purchases from last year. The total for this year is higher, but looking further back, my Amazon book buying has steadily increased year on year, so there isn’t justification to say that the Kindle has affected my overall book buying quantity – though it’s clear that the majority of my purchases are now Kindle books.
One assumption quashed – next time we’ll look at the Python scripts, and then take a look at cumulative order costs over 10 years!
[admin note: migrated from FindingVirtue to Ixyl in April 2019]