Accessing and analyzing my own personal account information on large sites

I mess around with programming strictly as a hobby. After doing a bit of amateurish natural language analysis here, people directed me to the nltk library, which is for natural language analysis. And I got the idea to create my own OAuth module to access my personal account data on sites such as Google and Microsoft and do some crunching of it.

Since this is just for fun so far, I’m not so much interested in use cases as I am in learning how to do things. Can anyone point me in the right direction to develop the necessary skills to:

  • obtain all of my personal account data from the databases of various services
  • do natural language analysis of it