A comparison of methods for collecting web citation data for academic organizations

Published on Aug 1, 2011in Journal of the Association for Information Science and Technology2.738
· DOI :10.1002/asi.21571
Mike Thelwall65
Estimated H-index: 65
(University of Wolverhampton),
Pardeep Sud9
Estimated H-index: 9
(University of Wolverhampton)
The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy. © 2011 Wiley Periodicals, Inc.
  • References (70)
  • Citations (54)
📖 Papers frequently viewed together
126 Citations
397 Citations
24 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Mike Thelwall (University of Wolverhampton)H-Index: 65
Purpose – Link analysis is an established topic within webometrics. It normally uses counts of links between sets of web sites or to sets of web sites. These link counts are derived from web crawlers or commercial search engines with the latter being the only alternative for some investigations. This paper compares link counts with URL citation counts in order to assess whether the latter could be a replacement for the former if the major search engines withdraw their advanced hyperlink search f...
25 CitationsSource
#1Liwen Vaughan (UWO: University of Western Ontario)H-Index: 24
#2Justin YouH-Index: 4
Web hyperlink analysis has been a key topic of Webometric research. However, inlink data collection from commercial search engines has been limited to only one source in recent years, which is not a promising prospect for the future development of the field. We need to tap into other Web data sources and to develop new methods. Toward this end, we propose a new Webometrics concept that is based on words rather than inlinks on Webpages. We propose that word co-occurrences on Webpages can be a mea...
30 CitationsSource
#1Yves Gingras (UQAM: Université du Québec à Montréal)H-Index: 33
#2Matthew L. Wallace (UQAM: Université du Québec à Montréal)H-Index: 7
We propose a comprehensive bibliometric study of the profile of Nobel Prize winners in chemistry and physics from 1901 to 2007, based on citation data available over the same period. The data allows us to observe the evolution of the profiles of winners in the years leading up to—and following—nominations and awarding of the Nobel Prize. The degree centrality and citation rankings in these fields confirm that the Prize is awarded at the peak of the winners’ citation history, despite a brief Halo...
36 CitationsSource
#1Kayvan Kousha (UT: University of Tehran)H-Index: 23
#2Mike Thelwall (Information Technology University)H-Index: 65
Last. Somayeh Rezaie (Shahid Beheshti University)H-Index: 3
view all 3 authors...
Previous research has shown that citation data from different types of Web sources can potentially be used for research evaluation. Here we introduce a new combined Integrated Online Impact (IOI) indicator. For a case study, we selected research articles published in the Journal of the American Society for Information Science & Technology (JASIST) and Scientometrics in 2003. We compared the citation counts from Web of Science (WoS) and Scopus with five online sources of citation data including G...
52 CitationsSource
Purpose – The purpose of this paper is to provide an alternative, although complementary, system for the evaluation of the scholarly activities of academic organizations, scholars and researchers, based on web indicators, in order to speed up the change of paradigm in scholarly communication towards a new fully electronic twenty‐first century model.Design/methodology/approach – In order to achieve these goals, a new set of web indicators has been introduced, obtained mainly from data gathered fr...
44 CitationsSource
In this study we investigated the stemming mechanisms of Google. We used its web interface and submitted many queries via a program. Stemming is the process of correlating morphologically similar words with one another. Search engines use stemming to match documents having one form of a word with queries having another form of the same word. We investigated the stemming mechanism of Google for three classes of words: singulars/plurals, combined words, and verbs with many postfixes. Our results i...
15 CitationsSource
#1Judit Bar-Ilan (BIU: Bar-Ilan University)H-Index: 35
#2Bluma C. Peritz (HUJI: Hebrew University of Jerusalem)H-Index: 13
The universe of information has been enriched by the creation of the World Wide Web, which has become an indispensible source for research. Since this source is growing at an enormous speed, an in-depth look of its performance to create a method for its evaluation has become necessary; however, growth is not the only process that influences the evolution of the Web. During their lifetime, Web pages may change their content and links to-from other Web pages, be duplicated or moved to a different ...
11 CitationsSource
This study investigates the accuracy of search engine hit counts for search queries. We investigate the accuracy of hit counts for Google, Yahoo and Microsoft Live Search, and the accuracy of single and multiple term queries. In addition, we investigate the consistency of hit count estimates for 15 days. The results show that all three provide estimates for the number of matching documents and the estimation patterns of their counting algorithms differ greatly. The accuracy of hit counts for mul...
51 CitationsSource
#1Kayvan Kousha (UT: University of Tehran)H-Index: 23
#2Mike Thelwall (Information Technology University)H-Index: 65
In both the social sciences and the humanities, books and monographs play significant roles in research communication. The absence of citations from most books and monographs from the Thomson Reuters-Institute for Scientific Information databases (ISI) has been criticized, but attempts to include citations from or to books in the research evaluation of the social sciences and humanities have not led to widespread adoption. This article assesses whether Google Book Search (GBS) can partially fill...
78 CitationsSource
#1Judit Bar-Ilan (BIU: Bar-Ilan University)H-Index: 35
#2Bluma C. Peritz (HUJI: Hebrew University of Jerusalem)H-Index: 13
The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are constantly added, but there are additional processes as well: pages are moved or removed and/or their content changes. We report here the results of an eight year long project started in 1998, when multiple search engines were used to identify a set of pages containing the term informetrics. Data collection was repeated once a year for the last eight years (with ...
27 CitationsSource
Cited By54
The purpose of this study is to analyse the correlation between content and traffic of 21,485 academic websites (universities and research institutes). The achieved result is used as an indicator which shows the performance of the websites for attracting more visitors. This inspires a best practice for developing new websites or promoting the traffic of the existing websites. At the first step, content of the site is divided into three major items which are: Size, Papers and Rich Files. Then, th...
1 Citations
Abstract This paper reports on the real-time public responses to the eruption disaster on Mt. Kusatsu-Shirane in January 2018. This volcanic disaster attracted significant media coverage, with many people again acknowledging the risk of volcanic hazards from the many news reports. These days, as it has become increasingly common for people to express their opinions using online sites, online reader comments can be important sources of public opinion. This paper used a content analysis approach t...
#1Junwei MaH-Index: 1
#2Jianhua WangH-Index: 1
Last. Philip SzmedraH-Index: 1
view all 3 authors...
The identification of a sustainable competitive position in enterprises is a common concern of information science and sustainable development research. Achievement of sustainable competitive position and superior performance is the first priority of business organizations. However, the existing research focuses on technology competition and light market competition. The research proposes a comprehensive framework for identifying an enterprise’s sustainable competitive position based on a two-di...
1 CitationsSource
#1Rafael BallH-Index: 1
Bibliometric indicators form the basis for measuring scientific research. In the 1960s, Eugene Garfield first developed the Impact Factor, which was conceived as a tool for librarians and library management. Quickly, however, this indicator was used to assess the quality of journals and the scientific articles it contained. The other basic indicators are the amount of scientific output and the citation rate, as well as an immense amount of derivatives of these indicators. In addition to the Hirs...
1 CitationsSource
Numerous web co†link studies have analyzed a wide variety of websites ranging from those in the academic and business arena to those dealing with politics and governments. Such studies uncover rich information about these organizations. In recent years, however, there has been a dearth of co†link analysis, mainly due to the lack of sources from which co†link data can be collected directly. Although several commercial services such as Alexa provide inlink data, none provide co†link data. ...
#1Cristina I. Font-Julian (Polytechnic University of Valencia)H-Index: 1
#2José-Antonio Ontalba-Ruipérez (Polytechnic University of Valencia)H-Index: 4
Last. Enrique Orduña-Malea (Polytechnic University of Valencia)H-Index: 13
view all 3 authors...
Purpose The purpose of this paper is to determine the effect of the chosen search engine results page (SERP) on the website-specific hit count estimation indicator. Design/methodology/approach A sample of 100 Spanish rare disease association websites is analysed, obtaining the website-specific hit count estimation for the first and last SERPs in two search engines (Google and Bing) at two different periods in time (2016 and 2017). Findings It has been empirically demonstrated that there are diff...
1 CitationsSource
The main objectives of this chapter are the extraction and analysis of a wide range of web metrics (size, mention, usage, and formal aspects) relating to the web spaces of a sample of 184 international biotechnology companies, using a set of horizontal web sources. The central theme of the chapter is not the analysis of the biotechnology sector in itself but rather the study of the properties of the web metrics obtained from various statistical analyses. The results demonstrate the enormous depe...
#1Carlos Olmeda-Gómez (ISCIII: Carlos III Health Institute)H-Index: 8
#2María Antonia Ovalle-Perandones (ISCIII: Carlos III Health Institute)H-Index: 1
Last. Antonio Perianes-Rodríguez (ISCIII: Carlos III Health Institute)H-Index: 11
view all 3 authors...
This paper discusses the thematic backdrop for Spanish library and information science output. It draws from Web of Science records on papers authored by researchers at Spanish institutions and published under the category ‘Information Science & Library Science’ between 1985 and 2014. Two analytical techniques were used, one based on co-keyword and the other on document co-citation networks. Burst detection was applied to noun phrases and references of the intellectual base. Co-citation analysis...
10 CitationsSource
#1Enrique Orduña-Malea (Polytechnic University of Valencia)H-Index: 13
#2Mike ThelwallH-Index: 65
Last. Kayvan KoushaH-Index: 23
view all 3 authors...
Patents sometimes cite webpages either as general background to the problem being addressed or to identify prior publications that limit the scope of the patent granted. Counts of the number of patents citing an organization's website may therefore provide an indicator of its technological capacity or relevance. This article introduces methods to extract URL citations from patents and evaluates the usefulness of counts of patent web citations as a technology indicator. An analysis of patents cit...
4 CitationsSource