Quantifying volunteer contribution

Here is an interesting albeit hypothetical question: “If all volunteer work done by the Wikimedia community had been done instead by a commercial company, how much would that have cost? Of course there is no definitive answer to this, given the number of unknowns, but it is a nice challenge to reach a crude approximation. There is some relevance in this for the Wikimedia Foundation audit process.

First let me state that this question is not only hypothetical, it is impossible. Let us assume for the sake of the argument that a crowd of contractors had indeed been given this enormous task, and had somehow after a year of toiling produced exactly the same wikis we have today. Still those products would each have been  a facsimile at best, inferior to the ‘real thing’  in several critical aspects. It would have been void of any inspirational value, which is maybe our most crucial asset today, as I truly believe that our endeavor is a beacon of hope in a segregated and divided world. Neither would we have functioned as a role model for similar movements that now follow our example (to name one favourite of mine: openstreetmap.org is building an awesome free map of the whole world, in many cases footstep by footstep).

Back to out original question: “How much work was done by Wikimedia volunteers from July 2007 till June 2008, and how should that be valued monetarily?“. As we all know there are a myriad ways in which volunteers contribute. For most of these I have not even a clue yet how to quantify that (suggestions welcome, see below). But as a starter I want to present two approaches to estimate very conservatively how much time our community spent in the period under scrutiny on producing the texts that we have, and then tentatively value that in monetary terms. The latter of course depends entirely on which country hosts that fictive company. For now let’s assume the United States.

Please do not propagate the numbers I present below right away. I may adapt them based on community feedback.


Approach I: article edits

In those 12 months July 2007-June 2008 all projects together received 133 million edits, distributed as follows:

Wikipedia: 124 million
Wiktionary: 7 million
Wikisource: 730 thousand
Wikispecial: 582 thousand (commons, species, meta, …)
Wikiquote: 532 thousand
Wikibooks: 477 thousand
Wikinews: 214 thousand
Wikiversity: 137 thousand
Total: 133 million

These are edits on namespace 0 (real article edits, no talk pages, no category pages, etc). I would estimate the other namespaces account for 2-5% but let us skip these for now.  I would have preferred to use edit counts that include all namespaces, possibly with exception of discussion pages, but wikistats does not count anything but namespace 0 right now (an anachronism really, should be changed).

Assume +0% for other namespaces

See this Bot Activity Matrix: wikistats knows % bot activity per project/language, but only as average for total life time of a language project. Bot activity grows over the years. So the percentage for period under scrutiny might be somewhat higher. The largest wikipedias for which numbers are known (excl. English) have relatively few bot edits (German 8%, Japanese 6%), for the next largest 15 projects it is between 22% and 30%. I assume English Wikipedia has more in common with German and Japanese than with smaller Wikipedias, but let us be reasonably conservative.

Assume -20% for bots

This leaves 107 million manual edits on the article namespace in one year.

The big question is how much time took an average manual edit? 30 seconds? 5 minutes? There are many fast edits for spellchecking and vandal patrol (though not all vandal corrections are that easy). For arguments sake let us assume the average edit took 3 minutes.

There are 525600 minutes in a year. So if one person edited continuously, all day long, every day, no sleep, it would cost him/her 609 years to make 107 million edits.

Now let’s assume a workday of 8 hours, 5 workdays a week, 50 weeks per year. The same worker would now need 2667 years to do all these edits alone: 133 million edits / 40000 edits per year (60*8*5*50/3) = 2667 FTE’s. At a yearly wage of $50,000 that would total $132 million

Again: the big unknown is how much time takes an average edit. It could be 2 minutes ($88 million), or 5 minutes ($220 million) or more (read article section or article diff, think, check facts, edit, preview, correct, save).

Proviso’s / Assumptions

  • See table with monthly edits on Wikipedia : Latest wikistats data are for May 2008. For each project I have normalized counts to 12 months: for most projects that meant total for Jul’07-May’08 * 12/11, for Wp DE Jul’07-Feb’08 * 12/8
  • For English Wikipedia I have assumed a flat growth rate since last know data, and taken Jun’06-Sep’06 (4 months) * 3: There is no other project that has not seen an increase in edits is last 2 years, except German (0.98 as much average edits per months in Jul’07-Feb’08 versus Jun’06-Sep’06. Other top 10 projects saw growth ratios between 1.13 and 2.41

Approach II: word counts

Here is totally different approach, which should yield very conservative numbers, but with an easier conversion to dollar value. The idea is a two step process:

First count how many words have been added in 12 months July 2007-June 2008. Of course many texts have been edited time and again, and many more words have been written than have survived till the end of the period under investigation, but that makes this estimate truly a conservative one! By the way wikistats word counts is already conservative, it counts words after stripping markup text, links etc.

Secondly let us assume addition of these texts took at least as much time and equal expertise as it would have taken to translate these texts. See for an example this UK table of tariffs (anyone knows a good US example ?)

Now we make the following calculation: words added times commercial translation tariff per word -> low estimate for costs to deliver the original texts.

Note: tariff for a good translator is higher than one would intuitively guess, translating is not just reading while breathing in and writing while breathing out and another dollar/pound gained, it often involves lots of research to find the best translation, which makes it more comparable with editing Wikipedia.

In those 12 months Jul 2007-Jun 2008 all projects together grew with 1717 million (1.7 billion) new words, distributed as follows:

Wikipedia: 1397 million (1.4 billion)
Wikisource: 212 million
Wiktionary: 57 million
Wikibooks: 18 million
Wikispecial: 12 million (commons, species, meta, …)
Wikiquote: 11 million
Wikinews: 7 million
Wikiversity: 5 million

Most words in Wikisource are not original content (by definition), so let us subtract Wikisource (In know editing Wikisource is real work, but please allow me to take a few shortcuts here).

Assume -212 million for Wikisource

These are edits on namespace 0 (real article edits, no talk pages, no category pages, etc). I would estimate the other namespaces account for 2-5% but let us skip these for now. Again, I would have preferred to use word counts that include all namespaces, possibly with exception of discussion pages, but wikistats does not count anything but namespace 0 right now (an anachronism really, should be changed).

Assume +0% for other namespaces

I have no idea how many words have been added by bots (many geographic articles were added by bots). Another reason to upgrade wikistats some day and filter bot edits. Wild guess:

Assume -50% for bots

Let us be extra conservative and assume 10% of words would still be rejected as not being proper words (rather syntactical elements).

Assume -10% for word ‘pollution’

That leaves 677 million words added manually in those 12 months.

At a translator rate (equals we assume copy edit tariff) of $200 per 1000 words (just a rough guess, derived from above mentioned list) this would have cost $135 million to write manually.


Feedback and suggestions

I welcome feedback on these approaches. The whole idea, the two approaches presented, the figures assumed, especially assumptions for wages and tariffs (anyone knows a good site with US translator tariffs?).

I welcome suggestions for approaches to quantify other aspects of our endeavor. To name a few:

  • number of lines of code committed in this time window
  • number of hours of volunteer system administrator work
  • number of hours of volunteer outreach work
  • average time spent on producing and uploading multimedia

Multimedia content

Added October 1
From July 2007 till June 2008 1,288 thousand (1.3 million) binaries (images, movies and sound files) were added to Commons (again this includes extrapolation for last month, for which wikistats data are yet not available). Of course other Wikimedia wikis also received multimedia uploads but much less, and part of those may have been copied/moved to Commons by bots later, so let us skip those, again to be conservative in our estimates.

Even more than with texts it is hard to quantify the effort spent on producing those multimedia files. Some maps and information graphics will have cost hours or even days to produce, same with audio files. Clearly preparing (cropping/resizing), uploading and tagging those files was just a (minor?) part of the effort. How much time did the average photographer need to shoot the picture? Should we factor in the time spent to produce all those pictures from the same photo session that were not uploaded, but that were needed to make the best ones stand out? Compare a professional photographer, who does not charge only the time needed to make the final selection, but rather the time spent on the whole shooting session.

What about quality of the pictures? Our best pictures are certainly top notch. How about the average picture? Is the share of pictures that can compare favorably with professional content published elsewhere comparable with the share of article texts that would survive such scrutiny?

If we estimate the average production and handling time needed for each binary upload to Commons as 5 minutes, then a contractor working each year 50 weeks of 40 hours, would have needed 54 years. If we estimate instead that the total time spent per upload was one hour we talk of 644 years work. This is similar to approach I above (based on text edits): we counted raw work time there. In approach II we focussed on the net result at the end of the year: just like many text edits did leave no trace in the final articles, also many mediocre pictures have never been used in any of our articles or have been replaced by better material. Right now there is no statistic telling us of which part of the images on Commons are actually used in our projects.

This entry was posted in Observations, Wikimedia Edit(or)s. Bookmark the permalink.

8 Responses to Quantifying volunteer contribution

  1. Anthony says:

    I think this is interesting. I tried to quantify this for the work from the beginning, and came up with $100-250 million.

    $50,000 a year seems like way too much to pay someone for something people are willing to do for free. If you’re going to pay people that much, you should expect much better quality work, and/or much less time.

    As for the per-word estimate, I think your cost per word is way too high. Mahalo is paying $10-15 for 300 words. I’m not sure that paid translation is comparable to encyclopedia writing in the first place, but comparing paid translators to people who contribute to an encyclopedia for free certainly doesn’t make any sense.

  2. Discounting Wikisource certainly caught my eye! :-) You could cost it based on the stats for the “Page” namespace.

    http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics

    The counts for “proofread” and “validated” can be costed using labour costs of typical proofreading labour costs. The “validated” counts are a second pass of proofreading, so those numbers are added to the counts for “proofread”.

    Wikisource has free content translations, so you could quantify the time invested into those and cost them. Here they are on English Wikisource.

    http://en.wikisource.org/wiki/Category:Wikisource_translations

    The French project is interwiki linked from there, and I have asked for other projects to categorise their free translations. See:

    http://wikisource.org/wiki/Wikisource:Scriptorium#Wikisource_translations

    Another important “cost” that is very difficult to quantify is the time invested in design, and project management. Project management can be estimated, and there are a lot of good methods of doing this. The “design” cost is much harder to quantify. Simple concepts like “[citation needed]” are the result of a lot of work over many years. The copyright templates on Commons and Wikisource are a significant amount of research.

  3. Erik says:

    Antony, your post is excellent material for comparison.

    > $50,000 a year seems like way too much to pay someone for something people are willing to do for free.

    That is why the question is a hypothetical one. The whole point is though: were it not done for free how much would it have cost?

    Chunks of the largest wikipedias can compete with best of class paper encyclopedias, often written by academics. Other chunks cover pop culture better than many magazines. On average should we compare with the salary of academics or reporters or teachers?

    I can see that the ‘wiki way’ may mean more edits are needed to produce good results.

    In that respect the second approach works better: it does not look at how much work went into editing, it merely looks at the net result at the end of the measured period.

    Also in this sense the comparison with a professional copy editor or translator holds: they are not paid per word written but per word published.

    As explained in the post I used translator tariffs because they are easy to obtain and help to establish a conservative estimate: surely writing an article costs more than translating it (to a comparable modern language).

  4. pfctdayelise says:

    For Wikimedia Commons you should count the Image: namespace edits. Not that I really understand why you would exclude any namespaces, if your hypothetical situation is “exactly the same wikis as we have today”. Talk pages, policy pages, categories, templates are all part of that.

  5. Tgr says:

    The last sentence before the second table is confusing, did it get copied from the first section accidentally?

    Edit count calculations seem unrealistic, because the wiki process is a very ineffective one: a staff of paid editors would not spend time on edit warring, vandalising articles and then reverting them, writing inferior versions of articles before writing the final one etc. It would be interesting to know what fraction of the total edits is spent on things like that. I think there was a paper about what fraction of edits survive to the final text, it would be interesting to see numbers based on that.

  6. Erik says:

    > Not that I really understand why you would exclude any namespaces

    I added the following explanation: “I would have preferred to use edit/word counts that include all namespaces, possibly with exception of discussion pages, but wikistats does not count anything but namespace 0 right now (an anachronism really, should be changed).”

    > For Wikimedia Commons you should count the Image: namespace edits.

    Certainly that should be counted! Though I like to think of that as a special case with complete different dynamics. Any suggestion for “average time spent on producing and uploading multimedia”? I’ll get back on this one.

  7. Erik says:

    > The last sentence before the second table is confusing, did it get copied..

    Thanks, I made a correction.

    > Edit count calculations seem unrealistic, because the wiki process is a very ineffective one: …

    Agree, I think the first approach yields some cute trivia like ‘it would have taken one person x years to do this’, forgetting that one person would have spent much less time reverting his or hers own edits (and of course would have argued a lot less with him/her self in public, but discussions were not counted anyway).

  8. Erik says:

    John, thanks for your informed comments on Wikisource. Your suggestions certainly help to operationalise estimates of work spent on Wikisource. Yet I find it difficult to translate those in quick (and very dirty) estimates of time spent, without a script that counts how many words fall in each category.

    Your second point is equally valid. After all estimates have been added we could assume an overhead for project management (10% ?) and design (15% ?). Actually most templates (which have not been counted anyway, see explanation in main text), would probably have required a design time that far exceeds that of all other texts and dwarfs the time that was needed to add the actual contents.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>