[TL;DR] I retired as Data Analyst for Wikimedia Foundation
Dear friends, wikimedians,
colleagues, stats lovers,
This week I retired from my job
as Data Analyst for the Wikimedia Foundation (WMF). I will continue to
contribute as a volunteer, but my focus will be less on stats. I will
leave that to my wonderful colleagues in the WMF Analytics Team, who already
did a great job on migrating and redesigning many stats reports, in particular
traffic stats [1]. I wish the team well with executing the second phase of the
transition, the dump-based Wikistats reports [2], which is delayed, but
ongoing.
Last week I published the final
version of Wikistats 1 reports. Reason: the scripts were never designed for
maintainability, so handover was always deemed infeasible [3]. Continued
publication of non-curated reports would be a liability, and therefore not an
option.
Here are parting thoughts, on the
role of stats, within our movement and in general:
The rationale of Wikistats as I
see it: in early years it served to convince ourselves and the world (reporters
mainly) that our dramatic growth was real. Today it serves foremost to motivate
contributors that they have ample reason to be proud, that their contributions
made and make an astonishing difference in the world.
Part of the role of Data Analyst
is to get the numbers right. Part is to bring out overall patterns, and facilitate
insights . [4] It’s a daunting task to help seeing the
wood from the trees, even in the best of circumstances. In other words, to
present information rather than raw data. I know my endless tables were often
seen as non-informative. These were indeed somewhat dull intermediate
aggregations (not raw data). In contrast, my trend charts and visualizations
should be valued differently. Those are by nature more ambitious, and even are
optimized for a different audience.
BTW Wikistats 1 too often proved
a hindrance towards exporting these numbers for further analysis elsewhere. W2
is doing a much better job here, as it is in other aspects.
Too often people (who took the
role of experts) raised alarms about the future of Wikipedia. Be it in a PhD
thesis, or an interview at a major news network, or a strategy report. After
2007 a decline in contributors on English Wikipedia set in, but continued
extremely slowly. Even when it has never happened in many other languages, even
when this may be attributed partly to a coming of age of the movement, and/or
the novelty effect wearing off. I remember comments like ‘Wikipedia is
nose-diving’, ‘Volunteers are leaving Wikipedia by droves’ and ‘Wikipedia may
well cease to exist in 5 years time’ (said over 10
years ago). I regard these as unhelpful hyperboles, irresponsible
over-extrapolations. I did my best to put this into perspective.
Wikipedia took the world by storm
with astounding growth figures. There is an element of addiction in surpassing
expectations time and again. It may help us with the fundraiser to dazzle the
public with ever higher numbers, but we might lose in credibility. Some stats
went that direction, at least on the meme level. We still boast about our
numbers, using ever extremer inclusion criteria. Prime example: 300 wikipedias is what one might even be called ‘fake stats’. Make that 150 active wikis (with 5+ active
contributors), or at most 175 with at least 3 active contributors [5] (is a
wiki with one or two active contributors even a functioning wiki?).
At Wikimania London (2014) I
talked about how we should err on the side of modesty. That message never came
across. I started to have a discussion on this within WMF but failed to bring
this to fruition. My bad.
In early years of Wikistats, as a
volunteer, I was naturally self-steering. I always closely following mailing
list discussions, in order to contribute with stats where-ever our community
seemed passionate about. I joined WMF as an employee early on, over 10 years
ago. I am very grateful that in my WMF years I was allowed to steer by my own
intuition mostly, even when my work was embedded in a so much expanding organization.
Analysis requires dialogue and
constructive criticism. I am grateful for every inquisitive remark, expression
of doubt, demand for proof, challenge to a different interpretation I received
over the years.
I want to thank my colleagues at
WMF for their friendly cooperation and support. I also learned so much from you
all. You guys rock!
I want to thank wikimedian Nemo
specifically, for volunteering many years of support for Wikistats, educating
others, tracking bugs, doing housekeeping, making informed inquiries. Nemo,
nobody (pun intended) motivated me to continue like you did.
Details pollute. Daily stats make monthly trends
disappear. Reporting on short periods makes readers zoom in on what is
capricious, it makes them waste their energy, and possibly be alarmed about
non-events.
When definitions are ambiguous or
evolving (even when algorithms and data feeds are reliable) readers can have
difficulty to make sense of the data. Always a hazard.
If you set the rules for condensing
stats into red and green emoticons, you do not facilitate strategic decisions,
you make them.
After several alarms raised over
our drops in monthly page views (usually in February) I introduced normalized
monthly counts. This is easy for pageviews, less so for unique visitors, but
doable (count only first 28 days of each month). So far this hasn’t made it
into Wikistats 2.
When one compares wikis by word
count, it is so much easier not to factor in average byte count per word per
language, e.g. for ideographic languages. I learned that the hard way.
Clever stats may produce insights. Responsible stats also track how these
insights may evolve. Responsible stats are hard work.
Wikimedia foundation and
Wikimedia community at times live in a somewhat uneasy co-existence. For sure
there is an uneven distribution of power, and of access to our funds. Common
ground is found in idealism, responsibility and hard work. Yeah for idealism.
Feel compassionate for those who suffer from a lack of it.
Erik Zachte mail erikzachte@### (no spam: ### = infodisiac.com)
(I
wanted to post this on my Infodisiac.com blog, but that blog it is out of order
at the moment, so I’ll upload it as html page). Feel free to comment at my post
on Facebook.
==
Notes:
[1] https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os
[2] https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2
[3] Instead these scripts grew
organically, me taking shortcuts; I mostly wrote these in free time while I was
part-timer at KLM airlines. Also, the scripts date from a different era (oldest
are from 2003), and hence were built for a totally different infrastructure.
Overhaul was decided 5+ years ago, and it finally is happening.
[4] I did not always succeed.
Like Mike Godwin (who reminded me over the years, alas to no avail), I’m still
puzzled by the suddenness of the trend reversal in monthly editors in 2007. Why
did the growth stop so immediate, at least on English Wikipedia?
[5] https://stats.wikimedia.org/EN/PlotActivityZZ.png
[6] To his credit former WMF deputy
director Erik Moeller always encouraged me to be open and candid about stats
hiccups, which he discovered himself from time to time (mostly in the manual
reports). I was happy to oblige.
[7] http://infodisiac.com/bomen/groenesteeg.html (also
click on ‘Info’ top right)