Archive for the ‘Wikipedia Writings’ Category

Statistics

Friday, November 27th, 2009

It seems to have been a busy week for all things Wikipedia. The Wall Street Journal recently contained an article describing how the number of users had fallen by about 49,000 in the previous three months. The article was based on the research and doctoral thesis of Dr. Felipe Ortega and constructed over a period of three years. It’s as meaty a document as you would expect for a thesis, with the conclusions it describes taking some time to fully digest.

Erik Moeller and Erik Zachte, the Deputy Director and Data Analyst recently rebuffed the report’s findings in a blogpost, stating that they measure a contributor as someone who has made five edits instead of just a single one, and that those numbers are holding somewhat steady. It also contains a small number of other statistics, but doesn’t comprehensively respond to the issue of contributor decline:

  • The comScore Media Metrix indicates that the site has received 6% more visits from September to October. This is a very small snapshot in time. There may be seasonal trends or other factors that have influenced this growth. A quarterly year-on-year growth figure would be more appropriate here, along with comparisons against the growth in internet availability worldwide. Regardless, the number of visitors does not translate directly to the number of readers that you have – for that, more in-depth metrics are required involving pageviews per visit, length of visit and so on.
  • Churn figures for the number of contributors are not available. While the number of active contributors has been holding steady at about 40,000 there’s no information about how many leave and join over a quarterly period. Service providers actively track churn rates as an indicator of user dissatisfaction and indication that the user experience needs to be improved.
  • The discrepancy between Dr. Ortega’s results and WMF’s own figures indicates a large volume of failed conversions from a reader to a full contributor. Analysis and understanding of what causes these failed conversions should be something that the WMF investigates and reports on, together with plans on how to improve the conversion rate. Again, the conversion rate is something that should be tracked in the same way as churn rate.

The blogpost from the Wikimedia Foundation has also included a small amount of information about their strategy plans for future growth. I’ve already commented on these in other blogposts, but I’ll summarise them here:

  • The usability initiative is unlikely to deliver sweeping benefits as it is focused purely on the Mediawiki interface. As a large proportion of the user experience is made up of the way information is presented, and that presentation is controlled by the active contributors instead of the Foundation. It’s already been declared out of scope by the usability initiative, and as such the entire end-to-end experience will not improve.
  • The strategic planning initiative is heavily siloed, and as such is unlikely to result in a cohesive end-to-end strategy without monumental effort. It’s also inefficiently organised and doesn’t work well at helping innovative new ideas to float to the top.
  • The outreach programme is heavily biased in pursuing cultural and historical repositories such as galleries and museums. There are few if any controlled studies into reaching out to the public at large, understanding their perceptions and concerns and working to bridge the gap between repositories and the public. Usability testing, public focus group work, user experience feedback and so on should all be part of what the Foundation is trying to achieve.

I can understand that what the Foundation is doing is important, but it is concerning that there remain glaring gaps in both the metrics presented and the future plans offered. If they are serious about changing how they operate, the user experience should be an integral part of that. After all, every single piece of content they currently hold can be copied under Creative Commons. The Mediawiki software is available free under the GNU General Public License. The only unique aspects of Wikipedia are the name, the contributors and the experience. If someone else can deliver something better in one or two of those areas, the Wikimedia Foundation may be in big trouble.

Experience

Thursday, November 26th, 2009

This article is a continuation of my series on Wikipedia, it’s design and ongoing strategy. This one focuses on the reader – the person who reads the occasional article or searches for something particular. Contributors, administrators, technologies and other options will follow later.

tesco_feedbackI was at my local Tesco doing some shopping for the weekend while I was thinking about this topic. While I was bagging up my groceries the checkout assistant ran my card, handed me my receipt and also gave me a small card. I’ve included a photo of it here as it’s one of those things that almost seems unreal – here you have a multinational corporation going out of it’s way to seek comments from customers. The reverse of the card lists five different ways you can provide your feedback, from sending a text message from your mobile phone to filling out a comments card in store.

You can bet that Tesco are going to act on the information they receive. The Clubcard phenomenon that started back in 1995 was designed in part to provide customers with offers that were meaningful to them as individuals, but also to allow a supermarket chain to start building analytics on their customer base. It’s very comparable to the kind of information that websites have been gathering on their visitors, being able to see how small changes make a difference in areas such as site navigation, advertising response rates and so on. It almost feels like crowdsourcing, with a specific goal set and a wide range of possible contributors targeted. Almost, even, like Wikipedia. Hold on to that thought of feedback though – I’ll be coming back to it later.

The appearance of Wikipedia is something that almost anyone who has used the Internet is familiar with. The same boxy layout with puzzle-globe logo in the corner has been around for a number of years. As far as providing the core information that a reader would need, you would think that a well laid-out article, some references and images would be all the casual reader needs. Unfortunately it’s become clear that someone just passing through and spending five minutes to glance at an article would benefit from a shedload more information in an easy-to-digest format. Before we talk about what’s missing, let’s remind ourselves what a Wikipedia article looks like.

wikipedia-example
Firstly, let’s guess how good this article is. It’s engaging, well sourced, clear and deep in coverage. But is it something that site contributors agree is something of high quality, or is it an article that looks good but in fact is riddled with problems? The only clue in this case is the star that I’ve circled in red in the top right corner. In this case we’re looking at a Featured article, one of only roughly 2700 articles out of a library of over 3 million – that means the chance of randomly stumbling on a featured article is nearly one in a thousand – even then you have to notice the small bronze star and understand what it means. If an article doesn’t have this mark, the reader is left to what they can infer from the content in order to draw their own conclusions about quality.

wikipedia-gradingInterestingly though, each article on Wikipedia is graded on a quality scale. This information can quickly help a reader work out how much caution they should treat an article with, how much work has been put into it and if there are any glaring issues. Unfortunately, information on article quality is tucked away on the discussion page where the casual reader is unlikely to find it. There are addons available that extract this information and present it on the article page, but they’re optional and require you to have a registered account there in order to use them. If Wikipedia is to restore confidence in what it is offering, it really needs to consider making article quality much clearer.

Italian_cuisine_orangeWork is also underway by outside organisations such as the University of California to analyse the history of articles and track changes. Powerful tools allow readers to instantly identify suspect information. They might not be able to do much about it, but at least they can be aware that the information needs further verification before repeating it elsewhere. The WikiTrust extension is currently developed as a plugin to MediaWiki and could possibly be computationally expensive – it may be something that the developers would want to draw in as core capability, again to improve the level of information presented to the reader.

amazon-helpfulNow, remember that thought about feedback? Well, it’s probably no news at all that collecting feedback from users is something that has been happening on the web for years. Amazon started collecting reviews for books, but it was only when they started asking readers if they found the reviews helpful that they started to take off. By displaying the most helpful reviews, Amazon managed to improve the quality of information presented to the reader through a completely automated system. In the same way, Wikipedia could use this technique to gain metrics on their articles, not just to see if someone reads them but to see if people like them. Maybe even pop up a simple request for more information if someone doesn’t like an article, and display all feedback prominently and centrally. The problem with Wikipedia’s current model is that it encourages feedback on article discussion pages. For an article in the doldrums that’s rarely examined, discussion page feedback can go largely ignored. By collecting it all centrally, the volunteer editors can gain instant feedback on what articles are liked and disliked, and prioritise their work accordingly. It really needs to start collecting reader information as much as possible as soon as possible, building a database of information and then acting on what the reports reveal. If the site is still visited by hundreds of thousands of visitors every month, collecting more information than a simple page impression is vital.

There’s a whole other piece to the puzzle though, and that’s usability. Making web pages and sites usable by a wide audience is seen as so important that the US Government has an extensive resource on the subject. In it, two things are mentioned that are seen as critical to the success: full end-to-end testing, and using groups of people unfamiliar with the product being tested. This means that to undertake meaningful usability analysis, groups of people that are completely new to Wikipedia should be let loose on the entire project and test a series of use cases by following a script. Unfortunately, Wikipedia’s own usability project falls short of these requirements:

The Wikipedia Usability Initiative is realized by a grant from the U.S.-based Stanton Foundation. The goal of this initiative is to measurably increase the usability of Wikipedia for new contributors by improving the underlying software on the basis of user behavioral studies, thereby reducing barriers to public participation. With an initial focus on English Wikipedia, eventually this research and development will be implemented across all languages and possibly to other Wikimedia projects.

Why is this, you might ask? The truth is, much of the user experience isn’t controlled by the Wikimedia Foundation who maintain Wikipedia, but is crowdsourced by the community of contributors that work on the site regularly and any changes have to be agreed by consensus. This means that an outside view is rarely if ever heard, and an internal feedback loop develops where decisions are taken in the interest of the community. These might not be in the interest of readers, especially if the readers don’t make their opinions heard. Wikipedia desperately needs a proper usability initiative, with wide-ranging ability to test all aspects of the project and not just the core interface or underlying software.

There are of course many other areas that Wikipedia can improve it’s reader experience. Detecting first-time visitors and providing them with an introduction is one, while introducing a reccomender system to analyse what an individual reader is reading and suggest other articles they might like to look at is another. Making more use of metadata embedded in article infoboxes in order to provide more powerful searching, such as “tenor opera singers born before 1945″ or “mountain peaks above x meters above sea level in Europe”. Being able to identify how many articles a citation is used in. All these things are additional features though – items that go above the baseline user experience and offer additional capability. It’s why I haven’t gone into them in detail here – in my opinion there’s a need to take care of the core features first, with the value-added stuff being able to follow on later.

Feedback

Thursday, November 19th, 2009

For about a month now, the Wikimedia Foundation’s Strategy Wiki has been trundling along in a desperate attempt to try and work out how to evolve over the next five years. Trouble is, it’s become apparent that the whole process has rapidly devolved into minute detail where every tiny detail is being closely examined to try and identify a way forward. It’s really a mirror of how the concept of using a wiki has become wedded to almost everything the Foundation does, with additional requirements being plugged into the Mediawiki framework no matter how clumsily they fit. It’s not a recipe for finding good ideas, it’s a recipe for reams of documentation and ideas with no clear way to identify the wheat from the chaff.

So how should it be done? Research into crowdsourcing techniques indicates that it works best when you have a large number of people, preferably removed from the immediate problem, who then submit ideas on how to solve it. Others can then comment on those ideas and provide feedback, while even the most meagre of participants can give an idea a simple thumbs up/down to help rank and sort ideas. A great example of this is WebStorm, which is ideal for collecting a large number of ideas on a general topic and allowing them to be weighted by participants. Another possibility is InnoCentive, which also specialises in capturing ideas and helping organisations work out solutions to their problems. The important thing is that there are common web themes and platforms out there that really demonstrate with a little research how to do this kind of work, yet falling back on a Wiki seems somewhat clumsy.

What the Wikimedia Foundation desperately needs are cohesive strategies that tackle the bigger issues they face in a unified way, not in a fragmented and piecemeal approach. It needs to engage with it’s readership more meaningfully and not just relying on page impressions or Alexa rankings as an indication of how they are performing. The foundation’s biggest asset is that it faces very little competition, while it’s biggest weakness remains what would happen if someone else produced something that was easier to use and easier to participate in. This is critical – all Wikipedia content is licensed as freely available under a Creative Commons agreement, making it quite trivial for someone to assemble a better framework purely in order to pull in editors and lure others away from Wikimedia projects. The feedback gained from the silent readership could be something as trivial as “Was this article helpful to you? Yes/No” – you know, the kind of thing Amazon has been doing for years for weighting reviews and which Facebook use successfully for calculating advert popularity. It could be something as complex as promoting the use of talk pages, or organising the global Wikimedia Chapters to go out and engage the public. Hold focus groups, ask members of the public to participate, that kind of thing. Without this external view to help shape and mold an organisation’s perception of itself, it just becomes an internal feedback loop that reinforces already held beliefs.

So what should the Wikimedia Foundation be doing? Over the next few posts, I’ll be looking at one aspect of the service and describing how things could be improved. None of it will be rocket science and none of it will be demanding the impossible, but all of it should be focusing on making the experience better for the readers and editors alike. If you think something’s worthwhile, feel free to shout out in the comments.

Regression

Thursday, September 17th, 2009

For anyone watching the print and online media organisations, it’s clear that change is in the air. Printed newspapers, struggling to compete against a tide of web based news agencies that deliver their content for free and update regularly, are having to change their business models in order to survive. Many have shut down completely, while others have cut back on the number of journalists or reduced the number of editions. Many cities in the United States have woken up to no longer having a local newspaper covering local issues. As advertising revenues fell during the economic slowdown, the writing was on the wall.

But of course, you know this already. It’s all well documented, well understood well publicised. We get that media in general is having a hard time of late. We appreciate it, feel sorry for them, but move on.

News Corporation

News Corporation: expanding the use of paywalls

There is one interesting nugget to the tale though. Rupert Murdoch’s News Corporation is looking at monetising it’s news organisations. This means that much like you’d pay for a newspaper, you’d pay to access content on their website. While this itself is nothing new – the model is already used on the Wall Street Journal and the Financial Times, the spread to other more mainstream publications is interesting. By their very nature though, mainstream content has more news agencies devoted to it. Given the choice, I suspect most readers would prefer their generic news to be free rather than having to fork out for it. This leaves media in a rather precarious position – what is it that they produce that readers feel is worth paying for?

For me it’s tied in to the magazines I read, the websites I browse and the podcasts I watch or listen to. I rarely if ever by a newspaper these days – almost all the latest information can be reached on my iPhone as I’m heading in to work, and I can even tailor it to my interests. For me the real value of a journalist isn’t being first with the news, it’s about having a unique opinion or a novel insight on things. It’s about being able to share your opinions and insights with an anonymous reader in an engaging and clear manner, all the while being able to reason your thoughts with facts and examples. For me it’s also the one thing that doesn’t decay with time. A journalist’s thoughts at that moment, captured on a page, are worth preserving.

Wikipedia: relies on volunteer contributions

Which brings me neatly round to the second half of this topic, the crowdsourcing project known as Wikipedia. Over the years, the world’s biggest online encyclopedia has needed to develop content policies in order to avoid promoting hoaxes and hosting inaccurate and in some cases libellous material. The key policy in this case is regarding the verifiability of information:

The threshold for inclusion in Wikipedia is verifiability, not truth—that is, whether readers are able to check that material added to Wikipedia has already been published by a reliable source, not whether we think it is true. Editors should provide a reliable source for quotations and for any material that is challenged or likely to be challenged, or the material may be removed.

This means that almost every Wikipedia article has references at the bottom of it so that readers can verify what the article states. Being an online encyclopedia, many of these references are online sources that have previously been vetted for accuracy and reliability. They almost act as referrals, taking readers from the encyclopedia article to the reference material used to make it. It’s at this point where the idea of news agencies using paywalls to charge and gate access to their content, breaks the process of creating and updating articles.

Wikipedia relies on volunteers in order to produce and maintain articles, and while they’ll happily donate their free time, it’s fair to say that producing good quality new content is a lengthy process and hard work. Once you start asking these volunteers to fork out for subscriptions to publications in order to research a story, several things may start to occur. The most obvious one is that the charging organisations get fewer citations in encyclopedia articles, leading to fewer referrals and fewer page impressions. A more subtle effect though is subjective bias creeping in to articles, particularly if those media organisations that elect to charge have a similar political leaning. There’s also the reduction in volunteer workforce if only some of them are able to afford to maintain online media subscriptions and yet have the free time to work substantially on article content – it adds a tilt on a formerly level playing field. Finally, there’s this whole verifiability aspect – how can a casual reader verify a fact if the source material is hidden behind a digital subscription?

At the end of the day, I feel emotionally that erecting a wall around content is somewhat of an anathema to me. It breaks the foundation of being able to freely construct webs of linked pages that take a reader on a journey from one website to another. It hearkens back to a darker age of the Internet when walled gardens, content portals and gateways were the modus operandi instead of the open access service we have today. More than that though, it means that there are creative, insightful people out there whose opinions I will never read. Not because I’ll never stumble upon their work, but because their work will be squirreled behind a paywall beyond my sight.

Complexity

Thursday, August 13th, 2009

There’s been a fair amount of discussion recently on some research being undertaken by the Augmented Social Cognition team at the Palo Alto Research Centre. They have a simple mission: “understanding how groups remember, think and reason”. With this in mind, they recently started data-mining Wikipedia’s vast archive of user interactions in order to spot trends and understand what it says about Wikipedia. The statistics that have been heavily cited were that the growth rate of Wikipedia had slowed dramatically, both in the total number of edits and the total number of active editors. A follow up study indicated that the only group to have increased output was those with 1000 or more edits, while users with very low edit counts were up to 25 times more likely to have their changes removed or “reverted” than more seasoned users.

Even more interesting are the responses from Ed H Chi on the results he provided to New Scientist and The Guardian. Chi describes that there is “evidence of growing resistance from the Wikipedia community to new content” and that “Over time the quality may degrade”. He adds “To power users it feels like Wikipedia operates in the way it always has – but for the newcomers or the occasional users, they feel like the resistance in the community has definitely changed”. Startling stuff indeed.

Even more curious is how the growth model has changed over time. Chi starts off with stating that Wikipedia followed the hockey-stick growth that other popular sites such as Twitter and Facebook also experienced. However, over the last few years, the data no longer fit the model. In fact, follwoing their initial survey, he likened it to a population growth curve, where there’s a resource constraint that the population encounters. “As you run out of food, people start competing for that food, and that results in a slowdown in population growth and means that the stronger, more well-adapted part of the population starts to have more power.”

So, what is the limiting resource that has dramatically changed the editing demographic? Well, for a year I heavily edited Wikipedia, starting there in Feb 2008 and becoming an admin in August last year. About a year later in March 2009 I stopped editing and retired. It wasn’t that Wikipedia no longer held an interest for me – far from it. It was just that I felt I no longer had anything to offer the site as it was apart from performing the routine administrative tasks of blocking vandals and removing content. In that time though, I learned a huge variety of policies, content guidelines and regulations about what could and could not be placed on Wikipedia. It’s this area that I think is the finite resource that Ed Chi talks about.

Let’s be honest, Wikipedia contains mistakes. Some of those mistakes are trivial, but some have caused real harm to companies, institutions or living people. As a result, policies sprang up about how content should be sourced and referenced, so that it can be verified to be true. Anything that doesn’t conform to these policies gets removed sooner or later, including additions that might be accurate but that remain unsourced. More than this though, Wikipedia editors will argue passionately and at length on a huge range of topics from date formatting to fringe science topics, and as such a range of peripheral-content policies such as a manual of style together with further processes for removing content have sprung up. There are even processes specifically designed for dispute resolution, as well as the management and eventual block of users who don’t adhere to the many rules now in existence. This is the limiting factor, the finite resource that Chi refers to in his research.

The inability of the average new population member (Wikipedia editor) to be able to quickly and easily understand all of these rules before they start editing is a real handicap to the further growth of the project. It’s what the growth curve has changed – as further complexity has been added to the project, the ability for a new user to be able to participate effectively is reduced. It’s why sites like Facebook and Twitter still enjoy phenomenal growth – they rely heavily on making the experience as easy and straightforward as possible for their users, and anything that adds complexity to the experience is removed. Conversely, Wikipedia requires users to become familiar with it’s complexity before they even start contributing

So, how does Wikipedia return to the meteoric growth it once enjoyed? As with so many other services, simplification is the key here. While some veteran users are comfortable with how the rules set has evolved, it has reached the stage where the barrier to new participants is too high to be able to recruit at a sustaining rate. Without being able to convert more new users into longtime editors through a simplification of the rules set, growth is likely to tail off and eventually decay. Whether this can be achieved by Wikipedia, or by someone else bringing in a new model and user experience remains to be seen.


SEO Powered by Platinum SEO from Techblissonline