In the week before the Wikimedia Developer Conference in Berlin in April 2009 I published a set of reports that analyzed Wikimedia’s traffic. I quote from the blog post that announced these reports: “Fact: index.php with action=edit is issued roughly 6 million times a day, and action=submit only 1 million times a day, so it seems only 1 in 6 edit requests leads to a database update (previews?).” 1 Although I did not draw any firm conclusions from this in the blog 2, I was tempted to conclude that this large mismatch between edits and submits badly reflected on the difficulties people encountered during the editing process, which made many or even most of them abort the procedure. And from what I heard in Berlin many readers of the blog drew the same conclusion. 6:1 edits:submits, what a shame!
1: In the May edition of the report the ratio is even 7:1.
2: I did suggest edit previews might be a part of the explanation, which was wrong: edit previews are issued with action=submit
So when Arash Boostani and I brainstormed at the conference on how to measure any usability improvements in the edit process it occurred to us that changes in the edit to submit ratio would be a suitable metric to watch. Suppose the Usability Initiative could bring the ratio down from 1:6 to 1:5, or even 1:4 or lower, that would be something! Of course such a major change in the edit:submit ratio would be a very ambitious goal, and by no means to be underestimated. But at least it seemed a more discriminating metric than changes in the total number of editors and/or edits. The latter figures no doubt will change as well by usability improvements, but at the same time so many other factors will influence these metrics that figures will be hard to interpret.
On further investigation the raw edit:submit ratio seemed a very crude indicator. Surely it must be possible to refine this metric, by excluding irrelevant edits that never would lead to a submit, and hence are not influenced by usability issues. And actually the refined metric is much less disheartening. The remainder of this post suggests how to refine the metric. Of course feedback is most welcome, as always!
So here are some filters I applied to the log before counting:
- Only edit/submits on the English Wikipedia are taken into account: funds for the Usability Initiative are earmarked (even when other wikis will benefit).
- Only human edits are relevant. This is may seem obvious but it makes a huge difference, more about that below.
- Only edits for page texts (tech: mime type ‘text/html’) are counted (not for stylesheets and other geeky stuff).
- Only edits that were issued from the English Wikipedia count: a minor fraction of edits has other Wikimedia wikis as referer (~starter page).
After applying the above filters 18269 records in the sampled log (where each records stands for 1000 records in the full log) still qualify:
Ideally we should also exclude all edits which are unintentional, for which no submit is to be expected. A whopping 8482 or 46% of the remaining log lines have parameter ‘redlink=.. ‘ Relatively speaking almost none of these lead to a submit (even with say 100,000 new articles per month on the English Wikipedia that would mean 8482 x 1000 – 100,000 = 8,382,000 or 98.8% of these clicks on a red link do not lead to a submit. This has nothing to do with usability in the edit process, more with usability in the browse process: most users don’t know what red links stand for.
- Only lines without ‘redlink=..’ are counted
This narrows the number of relevant edit requests in the sampled log considerably, and leaves 9787
Beside ‘edits’ that are almost certainly done by novices (the red links) there are edits where the opposite is true. Most users that edit an old revision from the history page do so to revert. Also undoing a set of revisions in one click is not for the uniniated. So lines with ‘old=..’ or ‘undo=’..,undoafter=..’ could be filtered out to zoom in as much as possible on ‘edits meant to lead to a submit’, issued by an undefined user base.
Of course when one takes out edits by almost certainly experienced users one should take out substract the resulting submits as well. And this time assume 100% submits, as a hopefully reasonable approximation.
After scanning the 1:1000 sampled log for the whole month of May the following combinations of parameters were found 10 times or more in 18269 lines (multiply all counts x 1000, so 18.3 million edits request were issued for the English Wikipedia in May, out of 214 million for Wikimedia projects or .