This is an eye opening and interesting account of how Tom Oward was able to data mine Amazon’s wish list database to get a profile of “subversives” based on their requested reading list. Makes you think twice before adding a book to your wishlist.
Using a pair of 5-year-old computers, two home DSL connections, 42 hours of computer time, and 5 man hours, I now had documents describing the reading preferences of 260,000 U.S. citizens.
I downloaded all the files to an external 120 GB Firewire drive in UFS format. The raw data occupied little more than 5 GB. I initially wanted to move all the files into a single directory to facilitate searching, but as the directory contents exceeded 100,000 items, the speed became glacially slow, so I kept the data divided into chunks of 25,000 wishlists.
Next comes the fun part – what books are most dangerous? So many to choose from. Here’s a sample of the list I made. Feel free to make up your own list if you decide to try some data mining. Send it to the FBI. I’m sure they’ll appreciate your help in fighting terrorism.
[Via Boing Boing]