dinsdag 9 januari 2018

Data journalism training program 2018

Data Journalism Basic DDJ1
Data Journalism Advanced DDJ2
Goals: introduction into the basic tools and skills for data journalism focusing on 1 variable analysis and visualization.

The training has the following elements:
- What is data journalism?
- Finding data
- downloading, scraping, handling pdf data
- Analyzing data with excel or calc
- Visualizing data in excel or calc, using Tableau and plotly, google fusion tables
- Building the story

outwit- hub

Examples for training

1 Data Dutch municipalities and mayors
- finding data dutch mayors / kroonbenoemde burgemeesters
- pdf doc to excel
- inspecting data in excel: vars: party, gender, age (experience)
- calculating key figures
- making graphs in excel
- interactive graphs for on line in tableau
- what is conclusion and the story

2. Largest cities
- Country-Wiki page
- scrape table
- clean up data and variables
- key figures
- map and charts for largest cities in tableau
- story building

3. Sovereign ratings
- trading economics
- scrape table and clean up
- summarize with pivot tables
- key figures
- charts and maps in sovereign ratings tableau
- story building

4 Working with maps
- markers and polygone maps in tableau
- adding gps data to maps
- finding shp maps and viz in qgis

5. Print and on line
- visualizations and resolution of charts and maps

6. Moving charts in tableau
Using GDP and life expectancy for a group of countries and years: scatter diagram per year

Topics on demand
- finding shp maps and viz in qgis

Examples of DDJ projects
- water levels in Cape Town dams
- crime rates for CPT precincts
- water point in Tanzania
- backgrounding members of the house of commons in NL
- crime rates in Dutch cities
- quality of hospitals in NL
- election data in Namibia
- twitter relations between reporters and members of the parliament
- fimicide in tibilisi/georgia
- SA elections on the map
- text analysis of the Source

Goals: using basic statistics for analysis I data journalism, analysis of more than 1 variable.
Using R for analysis.
Making charts in R with ggplot and exporting to plotly or googleviz.
Building and advanced scraper for more than 1 pages.

The training has the following elements:
- what is R and using R: introduction
- basic statistics
- advanced scraping
- making advanced plots and maps

R installed and packages
R studio
outwit hub

Examples for training

1. Data Dutch municipalities and mayors part 2
- Building advanced scraper for dutch municipalities, finding income unemployment, house value, cars
- add to data part 1
- import xls data in R
- univariate analysis in R and charts
- multivariate analysis in R, focusing on linear models:
income – house value and party
unemployment – party
gender and party
testing relationship, corr ,chi
- exporting plots and charts to plotly

2. development data /worldbank
- GDP and life expectancy,
- GDP and mobile phones,
- GDP financial inclusion
- Sovereign debt to GDP
- for set of countries and number of years

- downloading data in xls
- cleaning data
- importing data as datamatrix
- Analysis and charts in R, exporting to plotly

3. building advanced scraper for key economic data for countries from trading economics

4. working with databases
- spreadsheets versus databases
- making a databases and new records
- importing xls as databases
- queries and exporting
- joints

Topics on demand
- analyzing text with R for building word frequencies, word tree, word cloud, clustering
example: 100 articles published by the Source

- social network analysis of twitter users with Gephi
example: twitter relations between reporters and members of the parliament NL

woensdag 27 december 2017


Wanted to start 2018 with a larger view at the connected world. Here is my new Samsung UHD monitor 28 inch. Here are the specs: http://www.samsung.com/nl/monitors/uhd-ue590/LU28E590DSEN/

woensdag 6 december 2017

donderdag 23 november 2017


This enough...after a year struggling with Windows10 fast boot, secure boot, registered keys, UEFI files and endless updating, I cleaned the whole hard drive. Back to Linux again: installed Mint 17.3. All the data journalism stuff: Excel(office online) or Calc(Open Office, Outwit-Hub or table capture Chrome, Refine, Tabula, Qgis, R and R studio; all working. Just in case: running Win10 in a virtual box.
One small problem: I love Tableau public for viz. Not available for Linux (yet)....starting with plot.ly. Better integration with R.

donderdag 16 november 2017

Word frequencies of headlines of the Source

This is work in progress!
Last month headlines in The Source.
Here are the top  word frequencies (wf>5) as bar graph and word cloud.

maandag 18 september 2017


Making visualizations like maps or charts is the end product for many data journalism research. When you want to explain the structure of the research and the different steps of the data journalism research, problems emerge. Or when you want to make the research more transparent by sharing the outcome of the different steps of the research. For example I am using R for analysis and plotly for visualizations; for showing the different steps I am have a text describing the whole process in markdown. During the lecture or a training you have to switch from one application to another. Jupyter notebooks solves this problem, because by using different kernels in the notebook you can show text in markdown, calculations in R, and visualizations in plotly. You can also share the notebook with the data, so anybody can after downloading follow the research process step by step.