Here is an interesting albeit hypothetical question: “If all volunteer work done by the Wikimedia community had been done instead by a commercial company, how much would that have cost? Of course there is no definitive answer to this, given the number of unknowns, but it is a nice challenge to reach a crude approximation. There is some relevance in this for the Wikimedia Foundation audit process.
First let me state that this question is not only hypothetical, it is impossible. Let us assume for the sake of the argument that a crowd of contractors had indeed been given this enormous task, and had somehow after a year of toiling produced exactly the same wikis we have today. Still those products would each have been a facsimile at best, inferior to the ‘real thing’ in several critical aspects. It would have been void of any inspirational value, which is maybe our most crucial asset today, as I truly believe that our endeavor is a beacon of hope in a segregated and divided world. Neither would we have functioned as a role model for similar movements that now follow our example (to name one favourite of mine: openstreetmap.org is building an awesome free map of the whole world, in many cases footstep by footstep).
Back to out original question: “How much work was done by Wikimedia volunteers from July 2007 till June 2008, and how should that be valued monetarily?“. As we all know there are a myriad ways in which volunteers contribute. For most of these I have not even a clue yet how to quantify that (suggestions welcome, see below). But as a starter I want to present two approaches to estimate very conservatively how much time our community spent in the period under scrutiny on producing the texts that we have, and then tentatively value that in monetary terms. The latter of course depends entirely on which country hosts that fictive company. For now let’s assume the United States.
Please do not propagate the numbers I present below right away. I may adapt them based on community feedback.
Approach I: article edits
In those 12 months July 2007-June 2008 all projects together received 133 million edits, distributed as follows:
|Wikispecial:||582||thousand (commons, species, meta, …)|
These are edits on namespace 0 (real article edits, no talk pages, no category pages, etc). I would estimate the other namespaces account for 2-5% but let us skip these for now. I would have preferred to use edit counts that include all namespaces, possibly with exception of discussion pages, but wikistats does not count anything but namespace 0 right now (an anachronism really, should be changed).
Assume +0% for other namespaces
See this Bot Activity Matrix: wikistats knows % bot activity per project/language, but only as average for total life time of a language project. Bot activity grows over the years. So the percentage for period under scrutiny might be somewhat higher. The largest wikipedias for which numbers are known (excl. English) have relatively few bot edits (German 8%, Japanese 6%), for the next largest 15 projects it is between 22% and 30%. I assume English Wikipedia has more in common with German and Japanese than with smaller Wikipedias, but let us be reasonably conservative.
Assume -20% for bots
This leaves 107 million manual edits on the article namespace in one year.
The big question is how much time took an average manual edit? 30 seconds? 5 minutes? There are many fast edits for spellchecking and vandal patrol (though not all vandal corrections are that easy). For arguments sake let us assume the average edit took 3 minutes.
There are 525600 minutes in a year. So if one person edited continuously, all day long, every day, no sleep, it would cost him/her 609 years to make 107 million edits.
Now let’s assume a workday of 8 hours, 5 workdays a week, 50 weeks per year. The same worker would now need 2667 years to do all these edits alone: 133 million edits / 40000 edits per year (60*8*5*50/3) = 2667 FTE’s. At a yearly wage of $50,000 that would total $132 million
Again: the big unknown is how much time takes an average edit. It could be 2 minutes ($88 million), or 5 minutes ($220 million) or more (read article section or article diff, think, check facts, edit, preview, correct, save).
Proviso’s / Assumptions
- See table with monthly edits on Wikipedia : Latest wikistats data are for May 2008. For each project I have normalized counts to 12 months: for most projects that meant total for Jul’07-May’08 * 12/11, for Wp DE Jul’07-Feb’08 * 12/8
- For English Wikipedia I have assumed a flat growth rate since last know data, and taken Jun’06-Sep’06 (4 months) * 3: There is no other project that has not seen an increase in edits is last 2 years, except German (0.98 as much average edits per months in Jul’07-Feb’08 versus Jun’06-Sep’06. Other top 10 projects saw growth ratios between 1.13 and 2.41
Approach II: word counts
Here is totally different approach, which should yield very conservative numbers, but with an easier conversion to dollar value. The idea is a two step process:
First count how many words have been added in 12 months July 2007-June 2008. Of course many texts have been edited time and again, and many more words have been written than have survived till the end of the period under investigation, but that makes this estimate truly a conservative one! By the way wikistats word counts is already conservative, it counts words after stripping markup text, links etc.
Secondly let us assume addition of these texts took at least as much time and equal expertise as it would have taken to translate these texts. See for an example this UK table of tariffs (anyone knows a good US example ?)
Now we make the following calculation: words added times commercial translation tariff per word -> low estimate for costs to deliver the original texts.
Note: tariff for a good translator is higher than one would intuitively guess, translating is not just reading while breathing in and writing while breathing out and another dollar/pound gained, it often involves lots of research to find the best translation, which makes it more comparable with editing Wikipedia.
In those 12 months Jul 2007-Jun 2008 all projects together grew with 1717 million (1.7 billion) new words, distributed as follows:
|Wikipedia:||1397||million (1.4 billion)|
|Wikispecial:||12||million (commons, species, meta, …)|
Most words in Wikisource are not original content (by definition), so let us subtract Wikisource (In know editing Wikisource is real work, but please allow me to take a few shortcuts here).
Assume -212 million for Wikisource
These are edits on namespace 0 (real article edits, no talk pages, no category pages, etc). I would estimate the other namespaces account for 2-5% but let us skip these for now. Again, I would have preferred to use word counts that include all namespaces, possibly with exception of discussion pages, but wikistats does not count anything but namespace 0 right now (an anachronism really, should be changed).
Assume +0% for other namespaces
I have no idea how many words have been added by bots (many geographic articles were added by bots). Another reason to upgrade wikistats some day and filter bot edits. Wild guess:
Assume -50% for bots
Let us be extra conservative and assume 10% of words would still be rejected as not being proper words (rather syntactical elements).
Assume -10% for word ‘pollution’
That leaves 677 million words added manually in those 12 months.
At a translator rate (equals we assume copy edit tariff) of $200 per 1000 words (just a rough guess, derived from above mentioned list) this would have cost $135 million to write manually.
Feedback and suggestions
I welcome feedback on these approaches. The whole idea, the two approaches presented, the figures assumed, especially assumptions for wages and tariffs (anyone knows a good site with US translator tariffs?).
I welcome suggestions for approaches to quantify other aspects of our endeavor. To name a few:
- number of lines of code committed in this time window
- number of hours of volunteer system administrator work
- number of hours of volunteer outreach work
- average time spent on producing and uploading multimedia
Added October 1
From July 2007 till June 2008 1,288 thousand (1.3 million) binaries (images, movies and sound files) were added to Commons (again this includes extrapolation for last month, for which wikistats data are yet not available). Of course other Wikimedia wikis also received multimedia uploads but much less, and part of those may have been copied/moved to Commons by bots later, so let us skip those, again to be conservative in our estimates.
Even more than with texts it is hard to quantify the effort spent on producing those multimedia files. Some maps and information graphics will have cost hours or even days to produce, same with audio files. Clearly preparing (cropping/resizing), uploading and tagging those files was just a (minor?) part of the effort. How much time did the average photographer need to shoot the picture? Should we factor in the time spent to produce all those pictures from the same photo session that were not uploaded, but that were needed to make the best ones stand out? Compare a professional photographer, who does not charge only the time needed to make the final selection, but rather the time spent on the whole shooting session.
What about quality of the pictures? Our best pictures are certainly top notch. How about the average picture? Is the share of pictures that can compare favorably with professional content published elsewhere comparable with the share of article texts that would survive such scrutiny?
If we estimate the average production and handling time needed for each binary upload to Commons as 5 minutes, then a contractor working each year 50 weeks of 40 hours, would have needed 54 years. If we estimate instead that the total time spent per upload was one hour we talk of 644 years work. This is similar to approach I above (based on text edits): we counted raw work time there. In approach II we focussed on the net result at the end of the year: just like many text edits did leave no trace in the final articles, also many mediocre pictures have never been used in any of our articles or have been replaced by better material. Right now there is no statistic telling us of which part of the images on Commons are actually used in our projects.