Free-ebooks Tracker

I’d recently subscribed to Booktrakr – a service in beta that retrieves online book data and presents it to you in a daily email as well as a dashboard in the browser. It’s a good and useful idea, the only caveat being that you may not want to give out your passwords, which they need to get the data for you.

After a couple of weeks I realized that the service didn’t quite meet my needs. Understantably, it’s focused on sales and is intended for people who are selling their books through the regular book and e-book sales channels such as Amazon and iTunes and Barnes & Noble. That’s fine for them, but since I give away all of my ebooks, and do that through a number of outlets that Booktrakr doesn’t cover, I thought that maybe I could just do the same thing myself, customized for my particular needs. It turned out that some of the work I’ve been doing for my day job came in quite handy.

I’d been building an internal daily reports dashboard for my company, using the standard LAMP setup (Linux, Apache, MySQL, PHP on Ubuntu) along with a javascript framework for UI elements. I also use Selenium WebDriver for navigating and scraping web pages, and it turns out these are exactly the elements needed for my own personal Free-ebooks Tracker project.

A few of the data point require downloading and parsing spreadsheets (Smashwords’ affiliates, quarterly or annually, and Kindle Direct, at least for starting out accumulating the history) but the rest of it is visiting web pages, scraping the page source for the reported downloaded quantities, and some programming to stash all this in a database and perform some differential operations. It’s not perfect. Some of the figures are misleading because I just added some sites in the past few days. Over the next several days it will all sort itself out in terms of totals and recent history. Also, the browser can crap out sometimes (WebDriver is far from flawless) so there are days when reports don’t happen for some websites, but for the most part, I’m now sending myself daily reports and presenting the information in tidy little tables at the Pigeonweather Cloud site on Amazon Web Services.

The totals do not include the various pirate site where I’ve seen some of my books, nor the Scribd copies that other people have uploaded, and I haven’t yet included some other sites (including Lulu, Liibook, XinXii) where downloads have been relatively few in any case. I don’t intend to go into business doing this kind of thing, but if someone is interested in setting  up their own, I could help with some advice and source code.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s