donderdag 21 februari 2013

Data Journalism Tricks

Scrapping data from the web is not easy. Google spreadsheets in Google drive are a great help. Here are some tricks.

Importing a table
Let's start easy. I was doing some research about Dutch municipalities and their mayors. Question was: Is the distribution of political parties over the mayors resembling the distribution in the parliament? So you need a list mayors and their party. Wikipedia is an option and works fine as example. Here is the page: http://nl.wikipedia.org/wiki/Lijst_van_huidige_burgemeesters_in_Nederland . Now there are several possibilities to get this in a spreadsheet: copy/paste or using OutWit Hub. Directly importing it in Google spreadsheets goes like this. Open a new spreadsheet and type in the first cell:

=ImportHtml("http://nl.wikipedia.org/wiki/Lijst_van_huidige_burgemeesters_in_Nederland","table",2)


 The snapshot shows that all the mayors of municipalities starting with a B are download. Change 2 into 1 and you get the A's.








GeoCoding
Putting data on a map is a great trick for visualizing. In order to do that you must know the position of your data on the map. Let's take the list of mayors (B's) and you want to show from the municipalities on the map the political party of the mayor.
You need longitude and latitude of the cities.  The first city is Baarle-Nassau, what is longitude and latitude?

Type in the first cell of the spreadsheet:

=JOIN(",", ImportXML("http://open.mapquestapi.com/search?format=xml&q=Baarle-Nassau", "//place[1]/@lat | //place[1]/@lon"))

Mapquest will lookup latitude and longitude and put the outcome in the first cell separated by a comma. However I have a whole list, for example all cities starting with a B in the first column. In the second column I want lat. and lon.  Type in the first cell of column B the following:


=JOIN(",", ImportXML(CONCATENATE("http://open.mapquestapi.com/search?format=xml&q=",A2), "//place[1]/@lat | //place[1]/@lon"))


 If Baarle-Nassau is in cell A2 and you put this formula in H2, Mapquest will lookup the lon and lat for A2 and put the outcome in H2. If you have a whole list of names in column A; simple copy down the formula in H.
Next export the spreadsheet to Google Fusion Tables and use your column with lon. and lat to put markers on the map.





You want an other goldmine for wrapping in data nuggets: http://schoolofdata.org/handbook/ .
There are other ways to do geocoding, for example using Google Refine. Here is an excellent tutorial: http://blog.ouseful.info/2013/02/20/geocoding-using-the-google-maps-geocoder-via-openrefine/






Geen opmerkingen:

Een reactie posten

Opmerking: Alleen leden van deze blog kunnen een reactie posten.