Is Wikipedia growing a longer tail

Published on May 13, 2009 in GROUP (International Conference on Supporting Group Work)
· DOI :10.1145/1531674.1531690
Shyong K. Lam16
Estimated H-index: 16
(UMN: University of Minnesota),
John Riedl60
Estimated H-index: 60
(UMN: University of Minnesota)
Wikipedia has millions of articles, many of which receive little attention. One group of Wikipedians believes these obscure entries should be removed because they are uninteresting and neglected; these are the deletionists. Other Wikipedians disagree, arguing that this long tail of articles is precisely Wikipedia's advantage over other encyclopedias; these are the inclusionists. This paper looks at two overarching questions on the debate between deletionists and inclusionists: (1) What are the implications to the long tail of the evolving standards for article birth and death? (2) How is viewership affected by the decreasing notability of articles in the long tail? The answers to five detailed research questions that are inspired by these overarching questions should help better frame this debate and provide insight into how Wikipedia is evolving.
  • References (17)
  • Citations (40)
📖 Papers frequently viewed together
1 Author (Jim Giles)
1,373 Citations
2007CHI: Human Factors in Computing Systems
4 Authors (Aniket Kittur, ..., Ed H. Chi)
422 Citations
4 Authors (Bongwon Suh, ..., Peter Pirolli)
192 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Aaron Clauset (UNM: University of New Mexico)H-Index: 33
#2Cosma Rohilla Shalizi (CMU: Carnegie Mellon University)H-Index: 25
Last. Mark NewmanH-Index: 91
view all 3 authors...
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power...
5,217 CitationsSource
Aug 24, 2008 in KDD (Knowledge Discovery and Data Mining)
#1Fei Wu (UW: University of Washington)H-Index: 13
#2Raphael Hoffmann (UW: University of Washington)H-Index: 14
Last. Daniel S. Weld (UW: University of Washington)H-Index: 74
view all 3 authors...
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper present...
122 CitationsSource
Feb 11, 2008 in WSDM (Web Search and Data Mining)
#1Nitin Agarwal (ASU: Arizona State University)H-Index: 19
#2Huan Liu (ASU: Arizona State University)H-Index: 88
Last. Philip S. Yu (UIC: University of Illinois at Chicago)H-Index: 117
view all 4 authors...
Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their virtual communities of similar interests. Activities happened in Blogosphere affect the external world. One way to understand the development on Blogosphere is to find influential blog sites. There are many non-influential blog sites which form the "th...
440 CitationsSource
Feb 11, 2008 in WSDM (Web Search and Data Mining)
#1Ba-Quy Vuong (NTU: Nanyang Technological University)H-Index: 10
#2Ee-Peng Lim (NTU: Nanyang Technological University)H-Index: 46
Last. Kuiyu Chang (NTU: Nanyang Technological University)H-Index: 22
view all 6 authors...
Wikipedia 1 is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the contributors. Disputes often happen in articles with controversial content. They also occur frequently among contributors who are "aggressive" or controversial in their personalities. In this paper, we aim to identify controversial articles in Wikipedia. We propose three models, namely the Basic model an...
87 CitationsSource
#1Andrea Forte (Georgia Institute of Technology)H-Index: 22
#2Amy Bruckman (Georgia Institute of Technology)H-Index: 35
How does "self-governance" happen in Wikipedia? Through in-depth interviews with eleven individuals who have held a variety of responsibilities in the English Wikipedia, we obtained rich descriptions of how various forces produce and regulate social structures on the site. Our analysis describes Wikipedia as an organization with highly refined policies, norms, and a technological architecture that supports organizational ideals of consensus building and discussion. We describe how governance in ...
111 CitationsSource
Jan 1, 2008 in ICWSM (International Conference on Weblogs and Social Media)
#1Ivan Beschastnikh (UW: University of Washington)H-Index: 21
#2Travis Kriplean (UW: University of Washington)H-Index: 12
Last. David W. McDonald (UW: University of Washington)H-Index: 33
view all 3 authors...
While previous studies have used the Wikipedia dataset to provide an understanding of its growth, there have been few attempts to quantitatively analyze the establishment and evolution of the rich social practices that support this editing community. One such social practice is the enactment and creation of Wikipedian policies. We focus on the enactment of policies in discussions on the talk pages that accompany each article. These policy citations are a valuable micro-to-macro connection betwee...
83 Citations
#1Travis Kriplean (UW: University of Washington)H-Index: 12
#2Ivan Beschastnikh (UW: University of Washington)H-Index: 21
Last. Scott A. Golder (HP: Hewlett-Packard)H-Index: 14
view all 4 authors...
When large groups cooperate, issues of conflict and control surface because of differences in perspective. Managing such diverse views is a persistent problem in cooperative group work. The Wikipedian community has responded with an evolving body of policies that provide shared principles, processes, and strategies for collaboration. We employ a grounded approach to study a sample of active talk pages and examine how policies are employed as contributors work towards consensus. Although policies...
86 CitationsSource
#1Reid Priedhorsky (UMN: University of Minnesota)H-Index: 15
#2Jilin Chen (UMN: University of Minnesota)H-Index: 19
Last. John Riedl (UMN: University of Minnesota)H-Index: 60
view all 6 authors...
Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed. Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing. Similarly, using the same impact measure, we show that the probability of a typical article view be...
276 CitationsSource
#1Dennis M. Wilkinson (HP: Hewlett-Packard)H-Index: 16
#2Bernardo A. Huberman (HP: Hewlett-Packard)H-Index: 73
The rise of the Internet has enabled collaboration and cooperation on anunprecedentedly large scale. The online encyclopedia Wikipedia, which presently comprises 7.2 million articles created by 7.04 million distinct editors, provides a consummate example. We examined all 50 million edits made tothe 1.5 million English-language Wikipedia articles and found that the high-quality articles are distinguished by a marked increase in number of edits, number of editors, and intensity of cooperative beha...
201 CitationsSource
Apr 29, 2007 in CHI (Human Factors in Computing Systems)
#1Aniket Kittur (UCLA: University of California, Los Angeles)H-Index: 33
#2Bongwon Suh (PARC)H-Index: 20
Last. Ed H. Chi (PARC)H-Index: 45
view all 4 authors...
Wikipedia, a wiki-based encyclopedia, has become one of the most successful experiments in collaborative knowledge building on the Internet. As Wikipedia continues to grow, the potential for conflict and the need for coordination increase as well. This article examines the growth of such non-direct work and describes the development of tools to characterize conflict and coordination costs in Wikipedia. The results may inform the design of new collaborative knowledge systems.
422 CitationsSource
Cited By40
May 13, 2019 in WWW (The Web Conference)
#1Ali Javanmardi (UWO: University of Western Ontario)H-Index: 1
#2Lu Xiao (SU: Syracuse University)H-Index: 12
1 CitationsSource
#1Lu Xiao (SU: Syracuse University)H-Index: 12
#2Niraj Sitaula (SU: Syracuse University)H-Index: 1
Wikipedia provides a discussion forum, namely, Article for Deletion forum, for people to deliberate about whether or not an article should be deleted from the site. In this paper, we present interesting correlation between outcomes of the discussion and number of sentiments in the comments with different intensity. We performed sentiment analysis on 37,761 AfD discussions with 156,415 top-level comments and explored relationship between outcomes of the discussion and sentiments in the comments. ...
#1Ramine Tinati (University of Southampton)H-Index: 11
#2Markus Luczak-Roesch (UVic: University of Victoria)H-Index: 1
Wikipedia represents a successful peer-produced knowledge-resource constructed via the endeav- ours of millions of volunteers. We examine the activity of Wikipedia by analysing WikiProjects, an community-driven feature which allows communities of Wikipedians to coordinate their efforts in order to improve or produce Wikipedia articles. We harvested the content of over 600 active Wikipedia projects, which comprised of over 100 million edits and 15 million Talk entries, associ- ated with over 1:5 ...
4 CitationsSource
#1Quang-Vinh DangH-Index: 5
Wikipedia is a great example of large scale collaboration, where people from all over the world together build the largest and maybe the most important human knowledge repository in the history. However, a number of studies showed that the quality of Wikipedia articles is not equally distributed. While many articles are of good quality, many others need to be improved. Assessing the quality of Wikipedia articles is very important for guiding readers towards articles of high quality and suggestin...
16 CitationsSource
Apr 11, 2016 in WWW (The Web Conference)
#1Ramine Tinati (University of Southampton)H-Index: 11
#2Markus Luczak-Roesch (University of Southampton)H-Index: 7
Last. Wendy Hall (University of Southampton)H-Index: 47
view all 3 authors...
This paper documents a study of the real-time Wikipedia edit stream containing over 6 million edits on 1.5 million English Wikipedia articles, during 2015. We focus on answering questions related to identification and use of information cascades between Wikipedia articles, based on author editing activity. Our findings show that by constructing information cascades between Wikipedia articles using editing activity, we are able to construct an alternative linking structure in comparison to the em...
9 CitationsSource
Aug 1, 2015 in VLDB (Very Large Data Bases)
#1Senjuti Basu Roy (UW: University of Washington)H-Index: 18
#2Ioanna Lykourentzou (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 15
Last. Gautam Das (UTA: University of Texas at Arlington)H-Index: 47
view all 5 authors...
We present SmartCrowd, a framework for optimizing task assignment in knowledge-intensive crowdsourcing (KI-C). SmartCrowd distinguishes itself by formulating, for the first time, the problem of worker-to-task assignment in KI-C as an optimization problem, by proposing efficient adaptive algorithms to solve it and by accounting for human factors, such as worker expertise, wage requirements, and availability inside the optimization process. We present rigorous theoretical analyses of the task assi...
56 CitationsSource
1 CitationsSource
Nov 24, 2014 in DISC (International Symposium on Distributed Computing)
#1Zheng Zheng OuyangH-Index: 1
Evaluation and performance analysis of an online collaborative project are never easy tasks because the massive human involvement and other qualitative factors are hard to assess. To figure out the relationship between human related factors and quality of collaboration outcomes, we propose an effective formal approach to estimate the human involvement in collaboration process and testify our method on 100 articles extracted from Wikipedia and Scholar pedia, the qualities of whose historical cont...
1 CitationsSource
#1Simon DeDeoH-Index: 17
Group-level cognitive states are widely observed in human social systems, but their discussion is often ruled out a priori in quantitative approaches. In this paper, we show how reference to the irreducible mental states and psychological dynamics of a group is necessary to make sense of large scale social phenomena. We introduce the problem of mental boundaries by reference to a classic problem in the evolution of cooperation. We then provide an explicit quantitative example drawn from ongoing ...
7 CitationsSource