Icons / Logo / Facebook Created with Sketch. Icons / Logo / Google Created with Sketch. Icons / Logo / ORCID Created with Sketch. Branding/Logomark minus Citation Combined Shape Icon/Bookmark-empty Icon/Copy Icon/Collection Icon/Close Copy 7 no author result Created with Sketch. Icon/Back Created with Sketch. Match!

Types of DOI errors of cited references in Web of Science with a cleaning method

Published on Jul 11, 2019in Scientometrics 2.77
· DOI :10.1007/s11192-019-03162-4
Shuo Xu (Beijing University of Technology), Liyuan Hao (Beijing University of Technology)+ 2 AuthorsHongshen Pang1
Estimated H-index: 1
(SZU: Shenzhen University)
Cite
Abstract
Though the bibliographic databases, such as Web of Science (WoS), largely promote the development of scientometrics and informetrics, these databases are not free of errors. The main purpose of this work is to figure out which types of DOI errors of cited references exist, how often each type of errors occur, and whether it is possible to automatically correct these errors. After careful analysis, several classic DOI errors of cited references, such as prefix-, suffix- and other-type errors, are identified, Then, a cleaning method is put forward on the basis of regular expressions. Experimental results on the bibliographic data in the gene editing field from the WoS database indicate that our cleaning approach can improve largely the quality of DOI names of cited references.
  • References (26)
  • Citations (0)
Cite
References26
Newest
Published on Jun 1, 2019in Scientometrics 2.77
Junwen Zhu2
Estimated H-index: 2
(ECNU: East China Normal University),
Fang Liu3
Estimated H-index: 3
(Zhejiang University of Finance and Economics),
Weishu Liu8
Estimated H-index: 8
(Zhejiang University of Finance and Economics)
With the flourish of scientific literature, the Digital Object Identifier (DOI) is increasingly adopted in academia to uniquely identify research articles. By using Web of Science’s DOI search, we find that millions of DOI names appear to begin with an alphabetic character which violates the naming rule of DOI. In this study, we try to uncover the secrets behind Web of Science’s DOI search and explain this mysterious phenomenon. A similar investigation is also conducted by using the Scopus datab...
Published on Apr 1, 2019in Scientometrics 2.77
Erwin Krauskopf7
Estimated H-index: 7
(Andrés Bello National University)
In a recent paper, a group of researchers estimated various bibliometric indicators for the Spanish journal Enfermeria Nefrologica using the software “Publish or Perish”, retrieving data exclusively from Google Scholar. Since their study revealed an unusual high number of citations for the documents published by the journal, we became interested in repeating the bibliometric analysis using data from Scopus. Surprisingly, our analysis revealed a high variability in the number of documents publish...
Published on Feb 1, 2019in Scientometrics 2.77
Junwen Zhu2
Estimated H-index: 2
(ECNU: East China Normal University),
Guangyuan Hu7
Estimated H-index: 7
(SUFE: Shanghai University of Finance and Economics),
Weishu Liu8
Estimated H-index: 8
(Zhejiang University of Finance and Economics)
As unique and permanent alphanumeric strings to identify objects, digital object identifier (DOI) has been increasingly used to identify academic publications. Previous studies have reported the incorrect assignment of a single DOI name to multiple papers in the Scopus database, yet it remains unknown if this also holds in other datasets. In this paper we found incorrect DOI names are also problematic in the Web of Science but with different errors of duplicate DOI names. Tentative solutions are...
Published on Oct 1, 2018in Scientometrics 2.77
Shuo Xu1
Estimated H-index: 1
(Beijing University of Technology),
Junwan Liu1
Estimated H-index: 1
(Beijing University of Technology)
+ 3 AuthorsHongshen Pang1
Estimated H-index: 1
(SZU: Shenzhen University)
It is increasing important to identify automatically thematic structures from massive scientific literature. The interdisciplinarity enables thematic structures without natural boundaries. In this work, the identification of thematic structures is regarded as an overlapping community detection problem from the large-scale citation-link network. A mixed-membership stochastic blockmodel, armed with stochastic variational inference algorithm, is utilized to detect the overlapping thematic structure...
Published on Aug 1, 2018in Journal of Informetrics 3.88
Weishu Liu8
Estimated H-index: 8
(Zhejiang University of Finance and Economics),
Guangyuan Hu7
Estimated H-index: 7
(SUFE: Shanghai University of Finance and Economics),
Li Tang13
Estimated H-index: 13
(Fudan University)
Abstract Bibliometric analysis is increasingly used to evaluate and compare research performance across geographical regions. However, the problem of missing information from author addresses has not attracted sufficient attention from scholars and practitioners. This study probes the missing data problem in the three core journal citation databases of Web of Science (WoS). Our findings reveal that from 1900 to 2015 over one-fifth of the publications indexed in WoS have completely missing inform...
Published on Mar 1, 2017
Li Tang13
Estimated H-index: 13
(Fudan University),
Guangyuan Hu7
Estimated H-index: 7
(SUFE: Shanghai University of Finance and Economics),
Weishu Liu8
Estimated H-index: 8
(SJTU: Shanghai Jiao Tong University)
Thomson Reuters's Web of Science WoS began systematically collecting acknowledgment information in August 2008. Since then, bibliometric analysis of funding acknowledgment FA has been growing and has aroused intense interest and attention from both academia and policy makers. Examining the distribution of FA by citation index database, by language, and by acknowledgment type, we noted coverage limitations and potential biases in each analysis. We argue that despite its great value, bibliometric ...
Published on Mar 1, 2017in Scientometrics 2.77
Christophe Boudry6
Estimated H-index: 6
(CNAM: Conservatoire national des arts et métiers),
Ghislaine Chartron4
Estimated H-index: 4
(CNAM: Conservatoire national des arts et métiers)
Digital object identifiers (DOIs) were launched in 1997 to facilitate the long-term access and identification of objects in digital environments. The objective of the present investigation is to assess the DOI availability of articles in biomedical journals indexed in the PubMed database and to complete this investigation with a geographical analysis of journals by the country of publisher. Articles were randomly selected from PubMed using their PubMed identifier and were downloaded from and pro...
Published on Apr 19, 2016in PLOS ONE 2.78
Markus Goldstein8
Estimated H-index: 8
(Kyushu University),
Seiichi Uchida20
Estimated H-index: 20
(Kyushu University)
Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Do...
Published on Feb 1, 2016in Journal of Informetrics 3.88
Fiorenzo Franceschini24
Estimated H-index: 24
(Polytechnic University of Turin),
Domenico Augusto Francesco Maisano18
Estimated H-index: 18
(Polytechnic University of Turin),
Luca Mastrogiacomo16
Estimated H-index: 16
(Polytechnic University of Turin)
Recent studies have shown that the Scopus bibliometric database is probably less accurate than one thinks. As a further evidence of this fact, this paper presents a structured collection of several weird typologies of database errors, which can therefore be classified as horrors. Some of them concern the incorrect indexing of so-called Online-First paper, duplicate publications, and the missing/incorrect indexing of references. A crucial point is that most of these errors could probably be avoid...
Cited By0
Newest
Published in Journal of Informetrics 3.88
Feifei Wang1
Estimated H-index: 1
(Beijing University of Technology),
Chenran Jia (Beijing University of Technology)+ -3 AuthorsChenyuyan Yang
Abstract Constructing academic networks to explore intellectual structure realize academic community detection, which can promote scientific research innovation and discipline progress, constitutes an important research topic. In this study, tripartite citation is fused with co-citation and coupling relations as a way of weighting the strength of direct citations, and all-author tripartite citation networks were constructed due to the contributions of all authors to the resulting publications. F...