Can we use Google Scholar to identify highly-cited documents?

Published on Feb 1, 2017in Journal of Informetrics3.879
· DOI :10.1016/j.joi.2016.11.008
Alberto Martín-Martín10
Estimated H-index: 10
(UGR: University of Granada),
Enrique Orduña-Malea13
Estimated H-index: 13
(Polytechnic University of Valencia)
+ 1 AuthorsEmilio Delgado López-Cózar22
Estimated H-index: 22
(UGR: University of Granada)
The main objective of this paper is to empirically test whether the identification of highly-cited documents through Google Scholar is feasible and reliable. To this end, we carried out a longitudinal analysis (1950–2013), running a generic query (filtered only by year of publication) to minimise the effects of academic search engine optimisation. This gave us a final sample of 64,000 documents (1000 per year). The strong correlation between a document’s citations and its position in the search results (r=−0.67) led us to conclude that Google Scholar is able to identify highly-cited papers effectively. This, combined with Google Scholar’s unique coverage (no restrictions on document type and source), makes the academic search engine an invaluable tool for bibliometric research relating to the identification of the most influential scientific documents. We find evidence, however, that Google Scholar ranks those documents whose language (or geographical web domain) matches with the user’s interface language higher than could be expected based on citations. Nonetheless, this language effect and other factors related to the Google Scholar’s operation, i.e. the proper identification of versions and the date of publication, only have an incidental impact. They do not compromise the ability of Google Scholar to identify the highly-cited papers.
Figures & Tables
  • References (56)
  • Citations (26)
📖 Papers frequently viewed together
1 Citations
20 Citations
16 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Fiorenzo Franceschini (Polytechnic University of Turin)H-Index: 25
#2Domenico Augusto Francesco Maisano (Polytechnic University of Turin)H-Index: 20
Last. Luca Mastrogiacomo (Polytechnic University of Turin)H-Index: 18
view all 3 authors...
In the last decade, a growing number of studies focused on the qualitative/quantitative analysis of bibliometric-database errors. Most of these studies relied on the identification and (manual) examination of relatively limited samples of errors.
33 CitationsSource
#1Alberto Martín-Martín (UGR: University of Granada)H-Index: 10
#2Enrique Orduña-Malea (Polytechnic University of Valencia)H-Index: 13
Last. Emilio Delgado López-Cózar (UGR: University of Granada)H-Index: 22
view all 4 authors...
A study released by the Google Scholar team found an apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990---2013). To demonstrate this finding we conducted a complementary study using a different data source (Journal Citation Reports), metric (aggregate cited half-life), time spam (2003---2013), and set of categories (53 Social Science subject categories and 167 Science subject categories). Although the results obtained confirm and reinfor...
6 CitationsSource
#1Anne-Wil Harzing (Middlesex University)H-Index: 46
#2Satu Alakangas (University of Melbourne)H-Index: 4
This article aims to provide a systematic and comprehensive comparison of the coverage of the three major bibliometric databases: Google Scholar, Scopus and the Web of Science. Based on a sample of 146 senior academics in five broad disciplinary areas, we therefore provide both a longitudinal and a cross-disciplinary comparison of the three databases. Our longitudinal comparison of eight data points between 2013 and 2015 shows a consistent and reasonably stable quarterly growth for both publicat...
203 CitationsSource
#1Enrique Orduña-Malea (Polytechnic University of Valencia)H-Index: 13
#2Juan Manuel Ayllon (UGR: University of Granada)H-Index: 6
Last. Emilio Delgado López-Cózar (UGR: University of Granada)H-Index: 22
view all 4 authors...
The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical method...
40 CitationsSource
#1Simona Ştirbu (University of Liège)H-Index: 1
#2Paul Thirion (University of Liège)H-Index: 4
Last. Ninfa Greco (University of Liège)H-Index: 3
view all 5 authors...
Abstract This study aims to highlight what benefits, if any, Google Scholar (GS) has for academic literature searches in the field of geography, compared to three commercial bibliographic databases: Web of Science (WoS), FRANCIS (multidisciplinary databases) and GeoRef (specialized in geosciences). This study focuses exclusively on evaluating the results, and not the features, of GS and the databases under examination. To ensure a valid comparison, identical bibliographic searches were applied u...
11 CitationsSource
250 CitationsSource
#1Madian Khabsa (PSU: Pennsylvania State University)H-Index: 13
#2C. Lee Giles (PSU: Pennsylvania State University)H-Index: 66
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any k...
134 CitationsSource
#1Emilio Delgado López-Cózar (UGR: University of Granada)H-Index: 22
#2Nicolás Robinson-García (UGR: University of Granada)H-Index: 15
Last. Daniel Torres-Salinas (University of Navarra)H-Index: 19
view all 3 authors...
Google Scholar has been well received by the research community. Its promises of free, universal, and easy access to scientific literature coupled with the perception that it covers the social sciences and the humanities better than other traditional multidisciplinary databases have contributed to the quick expansion of Google Scholar Citations and Google Scholar Metrics: 2 new bibliometric products that offer citation data at the individual level and at journal level. In this article, we show t...
95 CitationsSource
#1Enrique Orduña-Malea (Polytechnic University of Valencia)H-Index: 13
#2Emilio Delgado López-Cózar (UGR: University of Granada)H-Index: 22
In November 2012 the Google Scholar Metrics (GSM) journal rankings were updated, making it possible to compare bibliometric indicators in the ten languages indexed--and their stability--with the April 2012 version. The h-index and h-5 median of 1,000 journals were analysed, comparing their averages, maximum and minimum values and the correlation coefficient within rankings. The bibliometric figures grew significantly. In just seven and a half months the h-index of the journals increased by 15 % ...
23 CitationsSource
#1J. C. Winter (TU Delft: Delft University of Technology)H-Index: 3
#2Amir A. Zadpoor (TU Delft: Delft University of Technology)H-Index: 39
Last. Dimitra Dodou (TU Delft: Delft University of Technology)H-Index: 18
view all 3 authors...
Web of Science (WoS) and Google Scholar (GS) are prominent citation services with distinct indexing mechanisms. Comprehensive knowledge about the growth patterns of these two citation services is lacking. We analyzed the development of citation counts in WoS and GS for two classic articles and 56 articles from diverse research fields, making a distinction between retroactive growth (i.e., the relative difference between citation counts up to mid-2005 measured in mid-2005 and citation counts up t...
109 CitationsSource
Cited By26
#1Bo-Christer BjörkH-Index: 29
Last. J. Tuomas Harviainen (UTA: University of Tampere)H-Index: 7
view all 3 authors...
Predatory journals are Open Access journals of highly questionable scientific quality. Such journals pretend to use peer review for quality assurance, and spam academics with requests for submissions, in order to collect author payments. In recent years predatory journals have received a lot of negative media. While much has been said about the harm that such journals cause to academic publishing in general, an overlooked aspect is how much articles in such journals are actually read and in part...
1 CitationsSource
#1Omid Mazandarani (IAU: Islamic Azad University)H-Index: 1
This manuscript examines the status of L2 teacher education research over the past 15 years from three perspectives. First, it examines the status of L2 teacher education research vis-a-vis...
ABSTRACTAs much as 94% of all sales effort results in outcomes that can be perceived as denoting failure. This article presents findings from a systematic review of literature which identifies that...
#1Casey Eaton (UAH: University of Alabama in Huntsville)
#2Amanda Banks (UAH: University of Alabama in Huntsville)
Last. Kristin Weger (UAH: University of Alabama in Huntsville)
view all 4 authors...
#1Viviane CouzinetH-Index: 2
Last. Icléia ThiesenH-Index: 1
view all 3 authors...
#1Cheng Zhang (A&M: Texas A&M University)H-Index: 4
#2Chao Fan (A&M: Texas A&M University)H-Index: 3
Last. Ali Mostafavi (A&M: Texas A&M University)H-Index: 9
view all 5 authors...
Abstract Social media offers participatory and collaborative structure and collective knowledge building capacity to the public information and warning approaches. Therefore, the author envisions the intelligent public information and warning in disaster based on social media, which has three functions: (1) efficiently and effectively acquiring disaster situational awareness information, (2) supporting self-organized peer-to-peer help activities, and (3) enabling the disaster management agencies...
3 CitationsSource
#1Kenneth Tay (NTU: Nanyang Technological University)
#2Leonard Tan (NTU: Nanyang Technological University)H-Index: 6
Last. Wilson Wen Bin Goh (NTU: Nanyang Technological University)H-Index: 4
view all 3 authors...
This PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) review examines collective flow experiences in music contexts. Articles (N = 598) were searched using a publicly ava...
#1Ehsan Jozaghi (UBC: University of British Columbia)H-Index: 10
Many countries around the globe have seen increases in the enrollment of female and visible minorities in postsecondary education. Therefore, it is critical to evaluate whether recent demographic c...
1 CitationsSource