What we learned from 5 million books

Google has scanned a lot of books and has assembled a big dataset spanning many years of human writing activity. To the extent that writing captures human culture, the dataset is large enough to represent quite an accurate subset of human culture.

In this talk the speakers present some analyses that can be done on such a dataset. The approach could be termed culturomics. It is a bit like genomics, with the difference being that the raw data are words instead of DNA sequence.

As these sort of analyses become more sophisticated, the results have the potential to become very interesting. Here is a taste of things to come.

