Match!

Data-driven predictions in the science of science

Published on Feb 3, 2017in Science41.04
· DOI :10.1126/science.aal4217
Aaron Clauset30
Estimated H-index: 30
(CU: University of Colorado Boulder),
Daniel B. Larremore14
Estimated H-index: 14
(SFI: Santa Fe Institute),
Roberta Sinatra16
Estimated H-index: 16
(NU: Northeastern University)
Cite
Abstract
The desire to predict discoveries—to have some idea, in advance, of what will be discovered, by whom, when, and where—pervades nearly all aspects of modern science, from individual scientists to publishers, from funding agencies to hiring committees. In this Essay, we survey the emerging and interdisciplinary field of the “science of science” and what it teaches us about the predictability of scientific discovery. We then discuss future opportunities for improving predictions derived from the science of science and its potential impact, positive and negative, on the scientific community.
  • References (25)
  • Citations (32)
Cite
References25
Newest
Samuel F. Way6
Estimated H-index: 6
(CU: University of Colorado Boulder),
Allison C. Morgan2
Estimated H-index: 2
(CU: University of Colorado Boulder)
+ 1 AuthorsDaniel B. Larremore14
Estimated H-index: 14
(CU: University of Colorado Boulder)
A scientist may publish tens or hundreds of papers over a career, but these contributions are not evenly spaced in time. Sixty years of studies on career productivity patterns in a variety of fields suggest an intuitive and universal pattern: Productivity tends to rise rapidly to an early peak and then gradually declines. Here, we test the universality of this conventional narrative by analyzing the structures of individual faculty productivity time series, constructed from over 200,000 publicat...
Published on 2016in arXiv: Digital Libraries
Filippo Radicchi27
Estimated H-index: 27
,
Alexander Weissman4
Estimated H-index: 4
,
Johan Bollen29
Estimated H-index: 29
Citations are commonly held to represent scientific impact. To date, however, there is no empirical evidence in support of this postulate that is central to research assessment exercises and Science of Science studies. Here, we report on the first empirical verification of the degree to which citation numbers represent scientific impact as it is actually perceived by experts in their respective field. We run a large-scale survey of about 2000 corresponding authors who performed a pairwise impact...
Published on Nov 4, 2016in Science41.04
Roberta Sinatra16
Estimated H-index: 16
(NU: Northeastern University),
Dashun Wang14
Estimated H-index: 14
(NU: Northwestern University)
+ 2 AuthorsAlbert-La szlo Baraba si115
Estimated H-index: 115
Are there quantifiable patterns behind a successful scientific career? Sinatra et al. analyzed the publications of 2887 physicists, as well as data on scientists publishing in a variety of fields. When productivity (which is usually greatest early in the scientist's professional life) is accounted for, the paper with the greatest impact occurs randomly in a scientist's career. However, the process of generating a high-impact paper is not an entirely random one. The authors developed a quantitati...
Published on Oct 26, 2016in Nature43.07
Brendan Maher Bm12
Estimated H-index: 12
,
Miquel Sureda Anfres1
Estimated H-index: 1
Published on Oct 1, 2016in Management Science4.22
Kevin J. Boudreau5
Estimated H-index: 5
(Harvard University),
Eva C. Guinan43
Estimated H-index: 43
(Harvard University)
+ 1 AuthorsChristoph Riedl15
Estimated H-index: 15
(NU: Northeastern University)
Selecting among alternative projects is a core management task in all innovating organizations. In this paper, we focus on the evaluation of frontier scientific research projects. We argue that the "intellectual distance" between the knowledge embodied in research proposals and an evaluator's own expertise systematically relates to the evaluations given. To estimate relationships, we designed and executed a grant proposal process at a leading research university in which we randomized the assign...
Published on Sep 1, 2016in Royal Society Open Science2.52
Paul E. Smaldino13
Estimated H-index: 13
(UC: University of California),
Richard McElreath35
Estimated H-index: 35
(MPG: Max Planck Society)
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career ...
Samuel F. Way6
Estimated H-index: 6
(CU: University of Colorado Boulder),
Daniel B. Larremore14
Estimated H-index: 14
(SFI: Santa Fe Institute),
Aaron Clauset30
Estimated H-index: 30
(CU: University of Colorado Boulder)
Women are dramatically underrepresented in computer science at all levels in academia and account for just 15% of tenure-track faculty. Understanding the causes of this gender imbalance would inform both policies intended to rectify it and employment decisions by departments and individuals. Progress in this direction, however, is complicated by the complexity and decentralized nature of faculty hiring and the non-independence of hires. Using comprehensive data on both hiring outcomes and schola...
Published on Jan 1, 2016in California Law Review3.33
Solon Barocas10
Estimated H-index: 10
(Microsoft),
Andrew D. Selbst4
Estimated H-index: 4
(Yale University)
Advocates of algorithmic techniques like data mining argue that these techniques eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with. Data is frequently imperfect in ways that allow these algorithms to inherit the prejudices of prior decision makers. In other cases, data may simply reflect the widespread biases that persist in society at large. In still others, data mining can discover surprisingly useful regularities that are reall...
Published on Nov 16, 2015in PLOS ONE2.78
João Moreira6
Estimated H-index: 6
(NU: Northwestern University),
Xiao Han T. Zeng6
Estimated H-index: 6
(NU: Northwestern University),
Luís A. Nunes Amaral56
Estimated H-index: 56
How to quantify the impact of a researcher’s or an institution’s body of work is a matter of increasing importance to scientists, funding agencies, and hiring committees. The use of bibliometric indicators, such as the h-index or the Journal Impact Factor, have become widespread despite their known limitations. We argue that most existing bibliometric indicators are inconsistent, biased, and, worst of all, susceptible to manipulation. Here, we pursue a principled approach to the development of a...
Published on Oct 1, 2015in Social Networks2.95
Feng Shi5
Estimated H-index: 5
(U of C: University of Chicago),
Jacob G. Foster9
Estimated H-index: 9
(UCLA: University of California, Los Angeles),
James A. Evans17
Estimated H-index: 17
(U of C: University of Chicago)
Abstract Science is a complex system. Building on Latour's actor network theory, we model published science as a dynamic hypergraph and explore how this fabric provides a substrate for future scientific discovery. Using millions of abstracts from MEDLINE, we show that the network distance between biomedical things (i.e., people, methods, diseases, chemicals) is surprisingly small. We then show how science moves from questions answered in one year to problems investigated in the next through a we...
Cited By32
Newest
Published on Dec 1, 2019in EPJ Data Science3.26
Weihua Li4
Estimated H-index: 4
(UCL: University College London),
Tomaso Aste29
Estimated H-index: 29
(UCL: University College London)
+ 1 AuthorsGiacomo Livan8
Estimated H-index: 8
(UCL: University College London)
The growing importance of citation-based bibliometric indicators in shaping the prospects of academic careers incentivizes scientists to boost the numbers of citations they receive. Whereas the exploitation of self-citations has been extensively documented, the impact of reciprocated citations has not yet been studied. We study reciprocity in a citation network of authors, and compare it with the average reciprocity computed in an ensemble of null network models. We show that obtaining citations...
Published on Dec 1, 2019in EPJ Data Science3.26
Alberto Aleta3
Estimated H-index: 3
(University of Zaragoza),
Sandro Meloni1
Estimated H-index: 1
(University of Zaragoza)
+ 1 AuthorsYamir Moreno53
Estimated H-index: 53
(University of Zaragoza)
In the book The Essential Tension (1979) Thomas Kuhn described the conflict between tradition and innovation in scientific research—i.e., the desire to explore new promising areas, counterposed to the need to capitalize on the work done in the past. While it is probable that along their careers many scientists felt this tension, only few works have tried to quantify it. Here, we address this question by analyzing a large-scale dataset, containing all the papers published by the American Physical...
Published on Dec 10, 2018in arXiv: Physics and Society
An Zeng17
Estimated H-index: 17
,
Zhesi Shen6
Estimated H-index: 6
+ 5 AuthorsShlomo Havlin99
Estimated H-index: 99
We analyze the publication records of individual scientists, aiming to quantify the topic switching dynamics of scientists and its influence. For each scientist, the relations among her publications are characterized via shared references. We find that the co-citing network of the papers of a scientist exhibits a clear community structure where each major community represents a research topic. Our analysis suggests that scientists tend to have a narrow distribution of the number of topics. Howev...
Vahan Nanumyan3
Estimated H-index: 3
,
Christian Zingg (ETH Zurich), Frank Schweitzer38
Estimated H-index: 38
(ETH Zurich)
To what extent is the citation rate of new papers influenced by the past social relations of their authors? To answer this question, we present a data-driven analysis of nine different physics journals. Our analysis is based on a two-layer network representation constructed from two large-scale data sets, INSPIREHEP and APS. The social layer contains authors as nodes and coauthorship relations as links. This allows us to quantify the social relations of each author, prior to the publication of a...
Published on Sep 1, 2019in Technological Forecasting and Social Change3.81
Jianguo Xu1
Estimated H-index: 1
(National University of Defense Technology),
Lixiang Guo (National University of Defense Technology)+ 2 AuthorsLi Mengjun2
Estimated H-index: 2
(National University of Defense Technology)
Abstract It is imperative and arduous to acquire product and business intelligence of global technical market. In this paper, a deep learning methodology is proposed to automatically extract and discover vital technical information from large-scale news dataset. More specifically, six kinds of technical elements are first defined to provide the concrete syntax information. Next, the CRF-BiLSTM approach is used to automatically extract technical entities, in which a conditional random field (CRF)...
Brian C. Thomas47
Estimated H-index: 47
,
Harley Thronson4
Estimated H-index: 4
+ 5 AuthorsGiulio Varsi3
Estimated H-index: 3
Science funding agencies (NASA, DOE, and NSF), the science community, and the US taxpayer have all benefited enormously from the several-decade series of National Academies Decadal Surveys. These Surveys are one of the primary means whereby these agencies may align multi-year strategic priorities and funding to guide the scientific community. They comprise highly regarded subject matter experts whose goal is to develop a set of science and program priorities that are recommended for major invest...
Published on May 1, 2019in Scientometrics2.77
Rasmus Bjørk3
Estimated H-index: 3
(DTU: Technical University of Denmark)
Nobel Laureates are used as a proxy to study at what age scientists produce their most groundbreaking work. We determine the average age of Nobel Laureates at the time that their Prize-winning research was conducted. This is done using the Advanced Information document with scientific background information published by the Nobel Foundation for every awarded Nobel Prize since 1995 for physics and economics, 2000 for chemistry and 2006 for physiology or medicine. For all Laureates their average a...
Published on May 1, 2019in Perspectives on Psychological Science8.19
Thomas T. Hills22
Estimated H-index: 22
(Warw.: University of Warwick)
There are well-understood psychological limits on our capacity to process information. As information proliferation—the consumption and sharing of information—increases through social media and other communications technology, these limits create an attentional bottleneck, favoring information that is more likely to be searched for, attended to, comprehended, encoded, and later reproduced. In information-rich environments, this bottleneck influences the evolution of information via four forces o...
Agnieszka Geras , Grzegorz Siudem2
Estimated H-index: 2
,
Marek Gagolewski7
Estimated H-index: 7
(PAN: Polish Academy of Sciences)