New data files for analysis of Wikimedia traffic

Today I released two new json files: one file with demographics data from World Bank, a second file with a subset of the first, augmented with Wikimedia page views counts.

Both complement visualization Wikipedia Views Visualized (aka WiViVi), but both can be used in other contexts as well.

1) World Bank demographics data

This file world-bank-demographics.json resulted from harvesting World Bank API files.

It contains yearly (!) figures for four metrics: (more could be added rather easily):

– population counts,
– percentage internet users,
– percentage mobile subscriptions,
– GDP per capita.

I used this demographics file to publish a set of charts (many more on meta).

World_Bank_internet_users_per_100_-_Regions World_Bank_mobile_subscriptions_per_100_-_Regions

 Details: World Bank files have different formats (some csv, some json) and use a variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3). Script 1) first does normalization, then data are aggregated, filtered, indexed.

2) Global Wikipedia page views data, incl demographics data

This file datamaps-data.json contains the equivalent of 3 rather complex (*) csv files which feed WiViVi. This new format brings together demographics data and pageviews (by country, by region, and by language), and also adds additional meta info. This json format is meant for external use, as it’s much easier to parse for some than the 3 csv files which WiViVi uses itself (the csv files use nested delimiters).

Notes:
A) Json file 1) replaces two csv files which up to now were filled from Wikipedia pages, one on population counts, one on internet users.
B) Although Wikipedia lists nowadays also use World Bank data, this is not consistently done, see talk pages here and here (sections ‘Wikipedia vs World Bank’).
C) For scripts and data files see GitHub: 1) here and 2) here 

Posted in Nice Charts, Wikimedia View(er)s | Leave a comment

Wiki Loves Monuments 2017

Again in 2017 Wiki Loves Monuments (WLM) has been a top ranking project community initiative in terms of attention raised.

Here are further stats on that contest. The charts follow the layout of earlier years. The data have been aggregated from WLM stats tools wikiloves (which itself does great reporting) into spreadsheets 1 and 2

Participating countries in 2016 WLM contest

Map of countries participating in Wiki Loves Monuments 2017

 

 


Some charts are about image uploads.
One is about image uploaders, also known as contributors.


Countries

Participants-v3


Participants per year to Wiki Loves Monuments contest (click to zoom)

With 52 participating countries, 10 more than in 2016, the 2017 contest ranks a close second after 2013, when 53 countries participated. (See 1st table). 6 contenders participated for the first time: Australia, Croatia, Dutch Caribbean (Aruba participated as a country earlier), Finland, Saudi Arabia and Uganda.

In those 8 years since WLM started these countries participated most often:
7x France, Germany, Norway, Russia, Spain, Sweden
6x Austria, Israel, Italy, Netherlands, Slovakia, Ukraine

The contest ran in different countries during different periods (mostly because different calendars are in use, and the aim is to run the contest for a full calendar month).

 


Uploads

In 2017 in total 245,168 images were uploaded, which is 12% less than in 2016.

WLM_uploads_per_year_2010-2017-v2


In 2017 Ukraine contributed most images: 37,592WLM_uploads_by_country_2017-v2

WLM_uploads_by_country_cumulative_top20_2010_2017-v2

WLM_uploads_by_country_year_by_year_2010_2017-v2


Contributors

Where in 2016 India and United States scored an ex-aequo first place
with 1784 uploaders, this year the United States easily scored the first place,
with 1418 contributors, India came second, with 1130.

WLM_contributors_by_country_2017-v2

WLM_uploaders_by_country_year_by_year_2010_2017_top_10_v2


The following chart is new this year. We know for each country how many people contributed images. We don’t know how many of those people were foreign visitors (given the theme of the contest probably most of those were tourists). This proportion may vary widely per country. This is actually relevant for all presented charts, but here more than in other charts, as demographics are here an explicit part of the equation. It shows how in statistics, just like in photography, point-of-view and perspective can greatly influence the final picture.

WLM_contributors_per_million_population_2017


Edit activity on Commons

Two Wikistats diagrams: every year the Wiki Loves Monuments contest brings peak activity on Commons. The second peak earlier in the year, mostly since 2014, is the result of the Wiki Loves Earth contest (which has somewhat different dynamics: WLE runs for a longer period than WLM, in 2017 WLE had 52% more contributors than WLM, but 47% less uploads). See also: this Wiki-loves yearly results page.

Charts also available on Wikimedia Commons


PlotUploadsCOMMONSupdated

Data for 2017 not yet available on last chart.

Posted in Nice Charts, Wiki Loves Monuments | Leave a comment

Browse winning Wiki Loves Monuments images offline

wlm_2016_in_aks_the_reflection_taj_mahal

Click to show full size (1136×640), e.g. for iPhone 5

 

The pages on Wikimedia Commons which list the winners of the yearly contests [1] contain a feature ‘Watch as Slideshow!’. Works great.

However, wouldn’t it be nice if you could also show these images offline (outside a browser), annotated and resized for minimal footprint?

Most end-of-year vacations I do a hobby project for Wikipedia. This time I worked on a script [2] [3] to make the above happen. The script does the following:

  • Download all images from Wiki Loves Monuments winners pages [1]
  • Collect image, author and license info for each image on those winners pages
  • or if not available there, collect these meta data from the upload pages on Commons
  • Resize the images so they are exactly the required size
  • Annotate the image unobtrusively in a matching font size:
    contest year, country, title, author, license
wlm-annotations

Font size used for 2560×1600 image

 

  • Prefix the downloaded image for super easy filtering on year and/or countrywlm-winners-file-list-detail


I pre-rendered several sets with common image sizes, ready for download. You can request an extra set for other common screen sizes [4] [5]:

wlm_download_folder


For instance the 1920×1080 set is ideal for HDTV (e.g. for Appl
e TV screensaver) or large iPhones. On TV the texts are readable by itself, on phone some manual zooming is needed (but unobtrusiveness is key).

[1] 2010 2011 2012 2013 2014 2015 2016
[2] The script has been tested on Windows 10.
Prerequisites: curl and ImageMagicks convert (in same folder).
[3] I am actually already rewriting the script, separating it into two scripts, to make it more modular and more generally applicable. First script will extract information from WLM/WLE (WLA?) winners pages and image upload pages, and generate a csv file. Second script will read this csv, download images, resize and annotate them. I will announce the git url here when done.
[4] 4K is a bit too large for easy upload. I may do that later when the script can also run on WMF servers.
[5] Current sets are optimal for e.g. HDTV and new iPhones (again, others may follow):
1920×1080 HDTV and iPhone 6+/7+
1334×750 iPhone 6/6s/7
1136×640 iPhone 5/5s 

Posted in Wiki Loves Monuments | Leave a comment

Wiki Loves Monuments 2016

In 2016 Wiki Loves Monuments (WLM) has been a top ranking project community initiative in terms of attention raised.

Here are further stats on that contest. The charts follow the layout used in this blog in earlier years, but the data have now been collected from another WLM stats tools wlm-stats. For added depth see also this Wikimedia blog post.

Participating countries in 2016 WLM contest

Map of countries participating in Wiki Loves Monuments 2016


Some charts are about image uploads.
One is about image uploaders, also known as contributors.

Countries

With 44 participating countries, 9 more than in 2015, the 2016 contest ranks second after 2013, when 53 countries participated. (See first table). 8 countries participated for the first time: Bangladesh, Georgia, Greece, Malta, Morocco, Nigeria, Peru and South Korea.

In those 7 years since WLM started 7 countries participated 6 times: Belgium, France, Germany, Norway, Russia, Spain, Sweden.

The contest ran in different countries during different periods (mostly because different calendars are in use, and the aim is to run the contest for a full calendar month).

List of countries that participated, per year

Participants per year to Wiki Loves Monuments contest (click to zoom)


Uploads

 

The 2016 in total 277,406 images were uploaded, which is 20% more than in 2015.

WLM_uploads_per_year_fixed


In 2016 Germany contributed most images: 38,809

wlm_uploads_by_country_2016


wlm_uploads_by_country_cumulative

wlm_uploads_by_country_cumulative_2010_2016


Contributors

In 2016 India and United States excelled in number of uploaders: 1784 vs 1783. As the measured numbers fluctuate a bit over time (there is always ongoing vetting), I suggest we call this an ex aequo first place.

wlm_contributors_by_country_2016
wlm_uploaders_by_country_year_by_year_2010_2016_top_10


Edit activity on Commons

Two Wikistats diagrams: every year the Wiki Loves Monuments contest brings peak activity on Commons. The second peak earlier in the year, mostly since 2014, is result of the Wiki Loves Earth contest.

Charts also available on Wikimedia CommonsPlotEditorsCOMMONS_updated


In 2016 the September peak (WLM) in uploads is again much more visible than the June peak (WLE). See also: this Wiki-loves yearly results page.

PlotUploadsCOMMONSupdated

Posted in Nice Charts, Wiki Loves Monuments | Leave a comment

Wikistats’ days will be over soon. Long live Wikistats 2.0!

(tl;dr)Vote on Wikistats reports you want to see migrated

Dear Wikistats users,

With a mixture of melancholy and relief, I announce my withdrawal from the Wikistats project at the end of this summer, thirteen years after I started it. I will continue doing other stats work for WMF.

Wikistats has been a labor of love, and was built in close cooperation with the Wikimedia community. There are aspects of Wikistats in which I still take pride: equal treatment of all projects, some level of multi language support, all dump based metrics available for all years since 2001, to name a few. Other aspects were less to like, even grew from a nuisance into a pain over the years: the scripts are monolithic, and really hard to maintain, even for me, as they grew increasingly complex, and with hardly any documentation. I’ve never made a secret of those deficiencies. Being the sole maintainer for many years, besides doing other stats work, for me to rewrite Wikistats and make it future proof was simply out of the question. Over the last half year the WMF Analytics Team migrated the data feed for Wikistats traffic reports to hadoop, and built some awesome new reports. Other reports were upgraded. In the coming months my colleagues will focus on replacing a selection of the remaining Wikistats reports, priority yet to be decided, based on your feedback. Of course the Wikistats scripts will still be available for reuse on other projects, but I have recommended against investing in their maintenance at WMF. That might have been the better choice years ago, but we passed that point.

Half a year ago I asked your input to a survey on which traffic reports should be migrated first. Now I want to ask you: which Wikistats content and activity reports (aka dump reports) would you want to see continued in a new form (probably with more awesome improvements)?

Please visit this new survey which contains a list of available reports, and state your preferences.

Thank you!

Erik Zachte

Posted in uncategorized | 17 Comments

Wikistats upgraded to new page view definition

tl;dr New and upgraded Wikistats data files, reports and charts, with cleaner metrics.

Recently Wikimedia Foundation has upgraded its data feeds for hourly page view counts, using a new definition which excludes crawler traffic. These new data are now available from May onwards (backfilled).  A big THANK YOU to the Analytics Team.SnippetAnalyticsTeam
16_Oliver_Keyes_-_Wikimedia_Foundation_016 and Oliver Keyes


As prominent consumer of these data feeds Wikistats couldn’t stay behind. Besides a major upgrade to existing files, charts and reports (see below) new charts were added as well.

New charts

There are now charts for overall totals per project, for 6 metrics: Total Editors, Total Edits, Total Articles, New Articles, Total Page views and Active Wikis.

Some examples:

Often in the news: the editor trend on Wikipedia. Here all Wikipedias taken together show a pretty slow overall drop since 2007 (a sharper decline on English Wikipedia is almost offset by increases on other Wikipedias). Trend for very active active editors is totally flat. 



This chart shows how editor activity spiked on Wikivoyage, early in 2013, just after the fork from Wikitravel.



Early in 2013 all interwikis were migrated by bots to Wikidata, which caused lots of bot activity. 



A new metric: Active Wikis, indicate per project how many wikis are actually being maintained. The threshold of 3+ active editors per wiki is of course arbitrary. Open for discussion.



BIG CHANGE: In new data feeds all crawler traffic is (finally) filtered out. Overall this results in a drop of around 20% page views (or actually bot requests). The drop is larger on small wikis and projects

Upgraded reports

Wikistats uses the upgraded data feeds to produce several sets of reports:

    • Monthly pageview reports for all wikis and projects, normalized/raw, mobile/non-mobile/combined. You’ll see a drop in page views after the upgrade in May 2015, but much of that is due to the new definition filtering crawler requests.SnippetMonthlyPageViews
      Transition to new page view definition is clearly marked.

  • Summaries per wikis, and now also per project.
    SnippetSummaryAllWIkis

 

Upgraded downloads

Daily/monthly per article page views

SnippetDownloads
Wikistats grabs the upgraded hourly files, both the page views per article and per wiki, and aggregates each into several larger entities. For per article page views daily and monthly aggregates now also used the new feed.

Page view totals per wiki (aka ‘project totals’)

SnippetDownloads2

Wikistats collects the hourly per-wiki ‘projectview’ files, packages them in to a yearly tar file, and produces a large set of csv files available for download as one zip file (both of these, input and output, are here). The csv files include totals per wikis per hour, day, day of week, month, and more, and contain separate counts for WMF’s mobile and non-mobile sites.

Upgraded process flow

This diagram shows the upgraded process flow, and all files involved. Monthly Pageview Reports
Thanks for your patience.

Posted in Nice Charts, Wikimedia Edit(or)s, Wikimedia View(er)s, Wikistats Reports | Leave a comment

Active editor trends as year-over-year changes

For many years I publish active editors trends for all Wikimedia wikis, see e.g. these summaries. Here I’d like to present these same editor trends in a slighty different way, which may help to show there is some cause for optimism at least for the largest Wikipedias.

In the conventional charts it may be a bit difficult to see if a growth or decline is speeding up or slowing down. This is easier to see when we plot year over year changes (YoY) rather than the absolute values.

We’ll start with a totally hypothetical idealized example without deeper significance (parameters manually tweaked), just to demonstrate how absolute and YoY values are connected.
Active_editors_YoY0
In the above diagram both ways of presenting the data have been combined. The red trend line shows absolute values (vertical scale at the left). The black line is the YoY trend (vertical scale at the right)
In the following real-world example the red trend line is the all too familiar explosive growth followed by a very slow but persistent decline in active editors on the English Wikipedia.The decline starts in November 2007, and is rather consistent in following years, with YoY mostly between 0.90 and 0.99 for next seven years.
Active_editors_YoY1
A bit more precise: average YoY for 2008 and following years is
0.906, 0.945, 0.921, 0.976, 0.932, 0.942, 0.998.
For the last nine months YoY for the English wikipedia is above flat trend (YoY > 1)!
Active_editors_YoY1b
How YoY value is derived from absolute values. With the scale of these first diagrams the subtle fluctuations in YoY after 2007 are too small to see. We need to change the vertical scale for that.
 The following three charts show the largest eight Wikipedias in terms of active editors. Each chart shows a different selection.
Active_editors_YoY2

Three Wikipedias with the largest number of editors.
Active_editors_YoY3

Largest Wikipedias with growing editor base in last 12 months  (avg YoY > 1).
Active_editors_YoY4

Largest Wikipedias with a still (barely) declining editor base in last 12 months (avg YoY < 1).

Files with active editors (5+ and 100+ edits per month, absolute and YoY, are available for download at http://dumps.wikimedia.org/other/pagecounts-ez/wikistats/ (see csv_wp_active_editors.zip, and similar)

Posted in uncategorized | 1 Comment

New Wikistats report, for once about Wikistats itself

There is a new Wikistats report, which as an exception (and one-off) reports about Wikistats itself. It shows which reports on stats.wikimedia.org are most popular, how many ‘unique’ (sort of) people requested those reports, and how often.

To this end all traffic to stats.wikimedia.org in April 2015 has been analyzed. A pretty rigorous filtering process removed most bot traffic (perhaps even erring on the side of low counts). First all explicit bot traffic was removed (based on the user agent string), then lots of implicit bot traffic was filtered as well (where request patterns showed bot-like behavior). In the end only 3.2% of all html requests to stats.wikimedia.org qualified for the analysis, and only 78% of the ip addresses (see footer notes).

Most table rows are about a functionally equivalent set of reports, with first three columns showing the overall total for the entire set. The right-most column lists the 10 most popular unique files within that set, with number of requests per file. For conciseness those top 10 files are only shown when you hover over the first link in that column.

The files have been distributed over two tables which reflects the most important dichotomy in Wikistats: reports are about
– database content and content creators, with counts distilled from xml dumps, or
– site traffic, with counts distilled from Kraken (either via 1:1000 sampled log, or hourly aggregations)

See for more this Wikistats Overview diagram (the new report cross-links to this diagram in column ‘srce’).

These numbers should not be taken too lightly as a measure of the relative importance of any report. Popularity of a report is just one factor in the weighing process.

Note: Unique visitors is by necessity an approximation. Some people may have accessed the site several times over the month, using a provider which hands out dynamic ip addresses. But on the assumption that few people will visits the site on more than one occasion and also have a dynamic address, that may not affect the overall counts that much, also relative popularity of different reports will be even less affected.

http://stats.wikimedia.org/wikistats-traffic-2015-04.html

 

Posted in Wikimedia View(er)s, Wikistats Reports | 1 Comment

Wiki Loves Africa 2014 – Celebrating African Cuisine

The first Wiki Loves Africa media contest was held in 2014, October and November.

People could contribute with photos, videos and interviews. There was a great response in many countries all over Africa, with overall 873 unique contributors. Soon there will be winners and prizes. The organizers should feel proud of what they accomplished.

Food from Tunisia
Kaouther Bedoui CC BY-SA 4.0
Tajines_in_a_pottery_shop_in_Morocco
Jafri Ali CC BY-SA 4.0

Thanks to Romaine for supplying many country specific templates and categories on short notice. These can be counted with a script. Here are charts for contributions/uploads and contributors per country.

WLA uploads 2014
Click to show full size
WLA contributors 2014
Click to show full size

Images on this page are from the first jury selection. For all images browse Wiki Loves Africa categories.

By the way, did you know? “The chick which is always near its mother eats the best part of the grasshopper”. (Kenyan proverb)

Posted in uncategorized | 1 Comment

Wiki Loves Monuments 2014

Here are results of Wiki Loves Monument (WLM) 2014 contest.

Map

Some charts are about image uploads.
Some about image uploaders, also known as contributors.

Countries

With 41 participating countries, 11 less than in 2013, that is again an awesome achievement. (See first table). 8 countries participated for the first time: Albania, Iraq, Ireland, Kosovo, Lebanon, Macedonia, Pakistan, Palestinian Territory.

In those 5 years since WLM started 14 countries each participated 4 times: Austria, Belgium, Estonia, France, Germany, Luxembourg, The Netherlands, Norway,  Poland, Romania, Russia, Spain, Sweden and Switzerland.

The contest ran in different countries during different periods.

Participants

Uploads

The number of images uploaded was 268,667, which is 72% of last year’s record count (375,160).

WLM_2014_uploads_per_year


All contest taken together, Poland contributed by far the most images:
a whopping 161,250.

WLM_2014_uploads_by_country_cumulative


Same data as previous chart, with yearly results unstacked.

WLM_2014_uploads_by_country


Ukraine contributed most images in 2014: 46350.

WLM_2014_uploads_2014

Contributors

Italy ranked first in number of contributors in 2014: 1045.

WLM_2014_uploaders_per_year


The largest volunteer base of any year in any country is still India,
where in 2012 2089 volunteers contributed to the contest.

UploadsByCountry-2010-2014-Top25


A every year the peak in activity around WLM is easily detectable
in Wikistats charts for Commons, e.g.

PlotEditorsCOMMONS

Posted in Nice Charts, Wiki Loves Monuments, Wikimedia Edit(or)s | Tagged | 1 Comment