Back to volunteer mode

 

[TL;DR] I retired as Data Analyst for Wikimedia Foundation

 

Dear friends, wikimedians, colleagues, stats lovers,

 

This week I retired from my job as Data Analyst for the Wikimedia Foundation (WMF). I will continue to contribute as a volunteer, but my focus will be less on stats. I will leave that to my wonderful colleagues in the WMF Analytics Team, who already did a great job on migrating and redesigning many stats reports, in particular traffic stats [1]. I wish the team well with executing the second phase of the transition, the dump-based Wikistats reports [2], which is delayed, but ongoing.

 

Last week I published the final version of Wikistats 1 reports. Reason: the scripts were never designed for maintainability, so handover was always deemed infeasible [3]. Continued publication of non-curated reports would be a liability, and therefore not an option.

 

Here are parting thoughts, on the role of stats, within our movement and in general:

 

Rationale for Wikistats

The rationale of Wikistats as I see it: in early years it served to convince ourselves and the world (reporters mainly) that our dramatic growth was real. Today it serves foremost to motivate contributors that they have ample reason to be proud, that their contributions made and make an astonishing difference in the world.

 

On my role as Data Analyst

Part of the role of Data Analyst is to get the numbers right. Part is to bring out overall patterns, and facilitate insights . [4] It’s a daunting task to help seeing the wood from the trees, even in the best of circumstances. In other words, to present information rather than raw data. I know my endless tables were often seen as non-informative. These were indeed somewhat dull intermediate aggregations (not raw data). In contrast, my trend charts and visualizations should be valued differently. Those are by nature more ambitious, and even are optimized for a different audience.

 

BTW Wikistats 1 too often proved a hindrance towards exporting these numbers for further analysis elsewhere. W2 is doing a much better job here, as it is in other aspects.

 

On bringing perspective

Too often people (who took the role of experts) raised alarms about the future of Wikipedia. Be it in a PhD thesis, or an interview at a major news network, or a strategy report. After 2007 a decline in contributors on English Wikipedia set in, but continued extremely slowly. Even when it has never happened in many other languages, even when this may be attributed partly to a coming of age of the movement, and/or the novelty effect wearing off. I remember comments like ‘Wikipedia is nose-diving’, ‘Volunteers are leaving Wikipedia by droves’ and ‘Wikipedia may well cease to exist in 5 years time’ (said over 10 years ago). I regard these as unhelpful hyperboles, irresponsible over-extrapolations. I did my best to put this into perspective.

 

On memes

Wikipedia took the world by storm with astounding growth figures. There is an element of addiction in surpassing expectations time and again. It may help us with the fundraiser to dazzle the public with ever higher numbers, but we might lose in credibility. Some stats went that direction, at least on the meme level. We still boast about our numbers, using ever extremer inclusion criteria. Prime example: 300 wikipedias is what one might even be called ‘fake stats’. Make that 150 active wikis (with 5+ active contributors), or at most 175 with at least 3 active contributors [5] (is a wiki with one or two active contributors even a functioning wiki?).

 

At Wikimania London (2014) I talked about how we should err on the side of modesty. That message never came across. I started to have a discussion on this within WMF but failed to bring this to fruition. My bad.

 

Thanks much!

In early years of Wikistats, as a volunteer, I was naturally self-steering. I always closely following mailing list discussions, in order to contribute with stats where-ever our community seemed passionate about. I joined WMF as an employee early on, over 10 years ago. I am very grateful that in my WMF years I was allowed to steer by my own intuition mostly, even when my work was embedded in a so much expanding organization.

 

Analysis requires dialogue and constructive criticism. I am grateful for every inquisitive remark, expression of doubt, demand for proof, challenge to a different interpretation I received over the years. 

 

I want to thank my colleagues at WMF for their friendly cooperation and support. I also learned so much from you all. You guys rock!

 

I want to thank wikimedian Nemo specifically, for volunteering many years of support for Wikistats, educating others, tracking bugs, doing housekeeping, making informed inquiries. Nemo, nobody (pun intended) motivated me to continue like you did.

 

Work ahead

In 2019 I hope to continue with a tree mapping volunteer project for Leiden, which my wife Carolina and I started last summer. It’s a mashup of OpenStreetMap, Commons and Wikipedia [7]. We first focused on a lovely park in the center of Leiden (once a cemetery) and hope to extend the project to the larger park around our city center (Singelpark), which our city council aims to rebrand as ‘the longest park in The Netherlands’.


Bits of advice for fellow data analysts

Details pollute. Daily stats make monthly trends disappear. Reporting on short periods makes readers zoom in on what is capricious, it makes them waste their energy, and possibly be alarmed about non-events.

 

When definitions are ambiguous or evolving (even when algorithms and data feeds are reliable) readers can have difficulty to make sense of the data. Always a hazard.

 

If you set the rules for condensing stats into red and green emoticons, you do not facilitate strategic decisions, you make them. 

 

After several alarms raised over our drops in monthly page views (usually in February) I introduced normalized monthly counts. This is easy for pageviews, less so for unique visitors, but doable (count only first 28 days of each month). So far this hasn’t made it into Wikistats 2.

 

When one compares wikis by word count, it is so much easier not to factor in average byte count per word per language, e.g. for ideographic languages. I learned that the hard way.


Clever stats may produce insights. Responsible stats also track how these insights may evolve. Responsible stats are hard work.

 

Final thought

Wikimedia foundation and Wikimedia community at times live in a somewhat uneasy co-existence. For sure there is an uneven distribution of power, and of access to our funds. Common ground is found in idealism, responsibility and hard work. Yeah for idealism. Feel compassionate for those who suffer from a lack of it.

 

Erik Zachte mail erikzachte@### (no spam: ### = infodisiac.com)

 

(I wanted to post this on my Infodisiac.com blog, but that blog it is out of order at the moment, so I’ll upload it as html page). Feel free to comment at my post on Facebook. 

 

==

 

Notes:

 

[1] https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os

 

[2] https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2

 

[3] Instead these scripts grew organically, me taking shortcuts; I mostly wrote these in free time while I was part-timer at KLM airlines. Also, the scripts date from a different era (oldest are from 2003), and hence were built for a totally different infrastructure. Overhaul was decided 5+ years ago, and it finally is happening.

 

[4] I did not always succeed. Like Mike Godwin (who reminded me over the years, alas to no avail), I’m still puzzled by the suddenness of the trend reversal in monthly editors in 2007. Why did the growth stop so immediate, at least on English Wikipedia? 

 

[5] https://stats.wikimedia.org/EN/PlotActivityZZ.png

 

[6] To his credit former WMF deputy director Erik Moeller always encouraged me to be open and candid about stats hiccups, which he discovered himself from time to time (mostly in the manual reports). I was happy to oblige.

 

[7] http://infodisiac.com/bomen/groenesteeg.html (also click on ‘Info’ top right)