Evaluating the impact of scientific research is a notoriously difficult problem with no standard solution. Nevertheless, making such evaluations has become increasingly important, as more universities and research administrators, research agencies, funding and government organizations and, ultimately, taxpayers want to assess the results of public and private research.
The pressure is on to find a way to measure the value of a researcher’s published work. But this has led to oversimplified and ultimately incorrect methods.
“Technically incorrect use of bibliometric indicators has caused great concern in the scholarly community,” says Gianluca Setti, vice president of IEEE Publication Services and Products.
As an example, some employers are using a single bibliometric indicator of a journal—the Thomson Reuters Impact Factor (IF)—as a gauge for evaluating each individual paper published in the journal and of the researchers who authored it. Setti points out that the IF was not designed for that purpose. In fact, it was introduced to help librarians decide whether to renew journal subscriptions.
Thanks to the “scientific recognition” attached to a citation, the IF is indeed a legitimate proxy for the relative importance of a journal within its field: the more citations, the higher the IF and the more “important” that journal is.
Currently, the IF of individual publications is being misused, Setti says, when it alone is employed to assess the performance of a researcher not only for salary increases but also for decisions on hiring, promotion, and tenure. In medicine, biology, and other areas, the situation is even worse, according to Setti, because of the practice of computing a single indicator to rank individual performance by totaling (or averaging) the IFs of the publications produced by a scientist in a given period. Doing so has no significance from a bibliometric point of view, he says.
Setti points to several problems with using the IF as a gold standard for assessing the quality of research.
First, the IF of a scholarly journal is a measure reflecting the average number of citations to articles it contains. Yet the number of citations is not evenly distributed but skewed: In each journal, only a few articles receive an appreciable number of citations, while most articles are cited only a few times, if at all, Setti notes.
As a consequence, use of basic statistics is sufficient to understand that an average measure like the IF, upon which the reputation of a journal is based, is not at all related to the quality (for example, number of citations) of a specific article.
Second, the IF possesses several weak points in the area of bibliometrics that have been criticized by the the scientific community. As a result, improved indicators have been introduced.
- The Eigenfactor score, developed by Jevin West and Carl Bergstrom at the University of Washington, in Seattle, computes the ranking of a scientific journal based on an algorithm similar to the one used by Google to rank Web pages as a result of a search. Journals are rated according to the number of citations their articles attract, with citations from highly ranked journals weighted to make a larger contribution to the Eigenfactor. Citations by authors to their own articles are excluded. Furthermore, unlike the IF, the Eigenfactor measures the performance of the journal as a whole; as such, it tends to be larger for journals publishing a substantial number of papers.
- The Article Influence Score is computed by normalizing the Eigenfactor to the number of papers published in a specific journal to obtain a measure of the average impact of an individual article.
- The Scimago Journal Ranking is similar to the Article Influence Score but with the partial inclusion of self-citations, to better evaluate—the thinking goes—the impact of journals that are the sole reference of a small scientific community.
One of the scientific community’s main bibliometric concerns is that a journal’s impact is multidimensional and cannot be captured by any single bibliometric indicator. For instance, the Article Influence Score and the IF are jointly necessary.
Another problem is that the misuse of a single indicator to evaluate the impact of research has led to manipulation, mainly by artificially inflating the number of self-citations. Although citing oneself is legitimate in cases of previous relevant work in the same area or when a scientist is part of a large research group, recently the number of self-citations for some journals has increased dramatically, according to Setti. In some cases, this has led Thomson Reuters to exclude some publications from its Journal Citation Reports.
Over the years, the IF has become the single most widely used factor measuring an article’s impact. And therein lies the problem.
“The IF can simply not be used for this purpose,” Setti says. “And what is worse, its use leads to many unintended consequences, including the manipulation of the indicator.” A better measure for the impact of an individual research article is simply the actual number of citations it has received, he says. Yet he believes reducing impact evaluation to that simple number is also inappropriate.
That’s because citation practices can vary widely across disciplines and subdisciplines. What’s more, the number of authors contributing to a specific field can be vastly different. And the count can include citations to poor work or even to incorrect results.
In short, Setti says, even if bibliometrics and citation analysis can be used as an additional source of information, nothing can replace human judgment through a fair peer-review process in assessing the impact of a research article or of a scientist.