New data files for analysis of Wikimedia traffic

Today I released two new json files: one file with demographics data from World Bank, a second file with a subset of the first, augmented with Wikimedia page views counts.

Both complement visualization Wikipedia Views Visualized (aka WiViVi), but both can be used in other contexts as well.

1) World Bank demographics data

This file world-bank-demographics.json resulted from harvesting World Bank API files.

It contains yearly (!) figures for four metrics: (more could be added rather easily):

– population counts,
– percentage internet users,
– percentage mobile subscriptions,
– GDP per capita.

I used this demographics file to publish a set of charts (many more on meta).

World_Bank_internet_users_per_100_-_Regions World_Bank_mobile_subscriptions_per_100_-_Regions

 Details: World Bank files have different formats (some csv, some json) and use a variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3). Script 1) first does normalization, then data are aggregated, filtered, indexed.

2) Global Wikipedia page views data, incl demographics data

This file datamaps-data.json contains the equivalent of 3 rather complex (*) csv files which feed WiViVi. This new format brings together demographics data and pageviews (by country, by region, and by language), and also adds additional meta info. This json format is meant for external use, as it’s much easier to parse for some than the 3 csv files which WiViVi uses itself (the csv files use nested delimiters).

Notes:
A) Json file 1) replaces two csv files which up to now were filled from Wikipedia pages, one on population counts, one on internet users.
B) Although Wikipedia lists nowadays also use World Bank data, this is not consistently done, see talk pages here and here (sections ‘Wikipedia vs World Bank’).
C) For scripts and data files see GitHub: 1) here and 2) here 

This entry was posted in Nice Charts, Wikimedia View(er)s. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *