4 May 2012

In Celebration of Electro-Hippies & International Day Against DRM

In celebration of International Day Against DRM I explore my inner 'electro-hippie' open-source soul/self and investigate the DRM debate. 
Prepare pitch-forks and torches!
So in my eagerness I created two columns, drawing a neat line down my diary page and writing on either side: 'pro-DRM' and 'anti-DRM'. I set out to explore the arguments for and against DRM, and learned a lot from some smart fellows (see the Readings below). It must be a flavour of the times, but by the time I had done reading all of the articles I didn't have a single 'pro-DRM'.

So begins my tirade and plans to single-handedly destroy the evil monopoly of Amazon and others who dare to implement DRM! No, but here's my list of 'anti-DRMs' which in effect each become in their own way, positives instead of antis.

1) Sell from your own website!

If you don't wrap your eBooks in DRM, it's quite simple to sell content to your readers from your site. An initial payment gateway restricts access and is unlocked through payment, but once the reader has paid, they happily download and share their eBooks across devices. It is possible to sell DRMd content from your site, but this requires integration with Adobe Content Server 4 (for ePubs and PDFs) which is costly. Once you start to sell from your own site without DRM, the veil lifts and you realise that it's entirely possible to sell content in snippets and chunks, or even on subscription and pay-per-view models.

2) 'Lower' discounts

If you sell from your own site, you don't have to give anyone (you gonna give yourself?) massive discounts. Hence, you can also pretty much match (and better) any lowest price offered by any retailer.

3) DRM doesn't curb piracy

What? Really? It's not really that hard to believe. It's really, really easy to pirate eBooks, and if you're going to pirate, you're going to pirate; you will find a way.

4) Just make it available at the right price

If you provide your content cheaply enough, and if it's easy enough to purchase and use (i.e. put on any device you want), people will pay for it. Amazon and Apple, among others have, thankfully, made it possible for consumers to purchase media at the click of a button. It is this ease of use and convenience that people pay for. Personally, I am aware that I can download a pirated copy of a book through torrents or news services pretty easily, but I don't because a book is easy to find and purchase - it's just easier to go to Amazon. Trawling through torrents is time-consuming, and a little irritating. Torrent sites in general aren't appealing - they're shoddy-looking and the results are painstaking to filter through (call me lazy!). If, however, there was a book that I DESPERATELY wanted to read but couldn't find in eBook format, and which, if ordered through Kalahari.net would take several weeks to arrive, and I can't wait that long (we all have these book imperatives) - and if said book was not available anywhere as an eBook, or say it was available, but only for US$50 - then said torrents might receive a visit from a certain 'customer'. The message here is simple though: if you put your content on the right platforms and make it available at the right prices - they will come.

5) Why should readers be locked into a particular ecosystem?

If I don't like Kindle for iPad (which I don't particularly, thank you very much!) and I'd rather read my purchased book somewhere else, in any new, more awesome .Mobi reader that might be developed, then why should I be restricted to Amazon's ecosystem? Even if it's only the fact that I can't alter screen brightness from within the App or system I'm reading in that annoys me about the particular reading system - or the fact that I don't only want serif or sans-serif but want a wider selection of fonts to choose from - I should be able to enjoy consuming the content in the way that I want to.

6) Like physical books are fool-proof?

They aren't. Many physical books are now pirated eBooks because the publisher didn't make them available to the market. Sure, I can send 100 eBooks out instead of 5 hard covers at the click of a button (I sure as hell am not going to make 100 printouts), but the fact is that most people are not hard 'infringers' - most people are not likely to send a book they enjoyed out to 100 people. Yes, this will happen, and YES, let's think about this as free marketing - or let's 'write it off' as free marketing. Most people are likely to recommend a book to family and friends and share a DRM-free book with only a select few people. At least then the recipients of the generosity will have discovered the author's work and are more likely to favour the ease and convenience of purchasing the author's next or backlist titles through Amazon or wherever else.

7) I'll sure as hell pay for Properly-Produced eBooks

Have you seen some of the dog's breakfasts that come through the torrents or that are created from PDF using tools like Calibre?

8) Pirated editions of books are more versatile

I can read them on more devices. Therefore, I might, as a willing payer for eBook content, just prefer a pirated eBook because I know I can read it anywhere, on any device. Sale missed.

9) Reading anywhere on any device!

People have time to burn. At the airport, while waiting for gran, while at work. With a DRM-free book, I can put my paid-for content in the cloud (yes, it's possible through Amazon) say, for example on Bookworm, and read the ePub on my Motorola phone at the airport. But some content? You can't do this with (unless the retailers have plans to make an App for Motorola?)

10) You can mix & match non-DRMd stuff

Teachers, for example, would want to cut and paste at will - intermingling their own notes and lesson plans with eBooks, for example. With DRMd content? Nope.

11) Cut, Copy, Paste (into FB status or on Twitter)

Non-DRMd content can be copied and pasted. It makes no sense to limit the copying of snippets or chunks of text. What if, even, I wanted to reproduce an entire chapter on my blog as an amateur writer to demonstrate how awesome an author is? This is publicity. If copying and pasting text means being able to share more easily on Facebook and Twitter and anywhere else, it should be allowed, even at the risk of readers copying entire texts (they would find other means to copy it if they really wanted to, in any case).

Ok, so having gone through all of the above, I still can't find much use in DRM! The major disadvantage that the pedigree chums who wrote the articles in the Readings below are on about is that DRM feeds into Amazon's monopoly. DRM for publishers means protecting content from piracy. But DRM for Amazon - does it mean helping publisher's protect their content? Maybe that was a selling point for them in the beginning, but the short answer is that DRM for Amazon means locking you into their ecosystem. Once in, their recommendations engine does a great job in terms of keeping you there (and it's a mixed blessing, because Amazon DOES help publishers to make digital sales). On the whole though, electro-hippie quenched, I have to say that anti-DRMers make a pretty solid case!

Readings

6 Mar 2012

How Does Amazon Recommendations Work?

I had the strange feeling that just by Googling and YouTubing this question, my IP address, Google account and any possible smatterings of data that could possibly be associated with my various browsing sessions, accounts, purchase histories and published social media data were being deftly shuffled through the nimble fingers of The Brain at Amazon HQ, analysed in real time, resulting in me - or at least everything knowable about me online - put into an alert box and black-listed as a corporate espionage agent. Bots would follow my every online move from here on - and in fact every physical one too, via GPS from my iPhone. And then I decided to summarise what little paltry stuff I could find and publish it on my paltry little blog. Paranoid much?


This is as absolutely ridiculous, but data is not, and it is becoming everything.


I was lucky enough to attend O'Reilly Media's Tools of Change Conference recently in New York, and there the talk was all about how 'Data' is the new sexy thing in publishing. I shit you not. Data.


Twitter recently announced that it's selling archived Tweets to a company called Datasift to be 'mechanically harvested' as Grace Dent writes here. There was an interesting presentation at the conference by Andrew Savikas of Safari Books Online looking at what they are up to in terms of analysing and interpreting customer data in order to make recommendations and to improve products.


Safari Online offers a 'usage-based payment model', which Savikas explained works out better for publishers in the long run, based on a pay-per page model. He demonstrated that at 0.07 cents per page, on a 400-page book which might, for example be priced at US$15 for the whole thing, the 400 x 0.07 cents model works out to US$28 whole the whole book.


By using the model, Safari Online can track which parts of a book are hotspots, and which are not so essential, and use this to improve their products. Pretty nifty! I can imagine this in education. It would be fairly easy to determine which sections are more crucial than others, even before data, but it might be more interesting to see which parts are being used more or less thoroughly, for example a quick-fix list summary or a video illustrating the water cycle in a textbook might be much more popular.


Using data could help to improve books. Safari Online can also track users by country, which would be helpful, broken down to smaller geographic locations if possible, to determine targeted marketing campaigns (billboards, physical display and bookstores, for example) for publishers who still rely heavily on 'traditional' marketing and advertising.


They also track usage of books by the time of day, giving a better indication of when might be a good time to make suggestions to customers and also some insights into what kinds of books might be more for leisure or work.


But now that my little side-track is over, I'll return to the Amazon question. What makes the recommendation engine purr?


There's very little information aside from an old paper on how Amazon used to do recommendations, which gives an overall perspective of some models, and some idea of just how flipping difficult it must be to offer real-time recommendations to customers. In that old paper, the authors describe distinct methods. Let's take a case.


Option 1


Gwede is using OnlineBookSeller.


OnlineBookSeller finds another customer, Jacob, with a similar purchasing and reviewing history. Jacob has bought similar books, and reviewed similar books to Gwede.


OnlineBookSeller recommends those books that Jacob has bought but that Gwede hasn't yet, puts them into a nice little recommendations bucket.


Option 2


OnlineBookSeller  finds a group or a cluster of similar users to Gwede based on purchases and ratings of other books history. OnlineBookSeller looks at collective items that Gwede has not yet bought but that the cluster has bought and positively reviewed, and recommends them to Gwede.


Option 3


Gwede bought How to Win Friends and Influence People recently. OnlineBookSeller does an active search for keywords, books listed by subject code, for example, author name Dale Carnegie, and returns results to Gwede. The recommendation might, for example, be How to Stop Worrying and Start Living also by Dale Carnegie. 


Option 4


Amazon's 'old' method, called 'item-to-item collaborative filtering', compares items with one another instead of customers, to build recommendations, building lists of similar items, and also items that were bought as pairs, and therefore likely as 'next-purchases'.


I've tried to do some digging, and various people say a number of things about how this may or may not have changed. Some say that Amazon is still using 'item-to-item collaborative filtering'. I feel Amazon would do better to use a hybrid of all sorts of different algorithms, including not only item-to-item, but also categories of consumers. Consumers, for example, who spend little time on the site but nearly always result in a purchase, know what they want. These consumers probably hardly ever click on their recommendations. Option 1 might fit this kind of consumer.


Another customer might display more 'herd-like' behaviour and fit into a neat genre category of book-buyer-types, in which case clusters might have more applicability, or at least be less likely to be wrong.


Yet others might never have made a single Amazon purchase in their lives, in which case given the lack of data, Option 3 would work well.


I'm sure a myriad other factors could be involved, such as observing sources for traffic to the website, for example keyword searches in search engines or references from particular pages that might result in buying trends. (Referrals, for example from particular websites or author blogs.)


It's fun although pointless to speculate, but what we can learn (or not) is


- If Option 4 above is one way Amazon makes recommendations, create simultaneous price/discount promotions with great marketing and publicity drives for authors with more than one book. "If you buy and review these 2 books by Gwede at Amazon, we'll give you a discount coupon for books direct from our website on topic X". While you're at it, see if its desirable and feasible to match metadata and subject categories for those items.


- Rankings and reviews are crucial. Get people to review your titles positively, and get a lot of reviews. Running competitions to drive reviews (and make them shareable in and of themselves - for example a really funny video linked to the competition that is likely to be shared on its own), might be an excellent way to increase your book's ranking.


- All of the above thinking on algorithms and metadata simply reinforces the fact that publisher metadata has to be accurate and thought out. For example, if Amazon is creating lists of similar items, it is likely to be looking at the category or subject of the book. Instead of simply categorising a book as 'Fiction', it should be broken down as far as possible: '#4 in Books > Teens > Science Fiction & Fantasy > Science Fiction OR #4 in Books > Teens > Science Fiction & Fantasy > Fantasy" as Suzanne Collins' Catching Fire sales ranking illustrates. Improving metadata helps Amazon do it's wonderful job of recommendations.


- The more copies you sell, the more copies you sell. (The more consumers Amazon is likely to pair your title with, as well as more "bought-with" products its going to pair it with.)


- Collect data. Collect and analyse data from your own site. Have log-in access to your site and a facility to review and rate books.


- Use data to improve products.


Some useful reading


1. The Creative Penn blog post


2. Google supposedly recycles Amazon's algorithm for YouTube 


3. An old paper on Amazon's item-to-item algorithm