Ever been frustrated with the IEEE Xplore Digital Library because you couldn’t find a paper when you searched the author’s name? Or maybe you’re an author who searched for one of your own papers and came up empty-handed.
This problem is not unique to IEEE Xplore; all digital libraries are saddled with this issue. That’s because authors use variants of their names on their research papers. An author may spell out a full first name on one paper but use a middle name or their initials on others. Or a researcher may use a shortened version of a first name—say, Kathy rather than Katherine—or even a nickname. And authors’ last names often change because of marriage or divorce. This contributes to inconsistencies in author records that can make finding all the papers written by an author tedious and frustrating.
IEEE has felt this frustration, and it recently upgraded IEEE Xplore’s author search function by creating a master record for each author that lists all variants of that person’s name. This has made for faster and more targeted results. Now when someone searches for an author, up comes a list of all versions of the author’s name, along with a count of the articles found under those names, as well as their titles. Authors’ affiliations are included, too.
“Improving the author search in IEEE Xplore has been No. 1 on the users’ wish list of improvements,” says Prakash Bellur, senior director of IEEE platform design, in Piscataway, N.J. “This was the right thing to do, so we made the investment and spent the time to clean up the data. If we had delayed, the problem would only have grown because the more authors you have, the larger the cleanup effort.”
Starting in 2012, IEEE’s database of 10 million author records underwent a massive cleanup process known as disambiguation. Sophisticated algorithms were used that compared the author or authors of each article against other articles written by those with similar names. The algorithm made certain assumptions based on the paper’s topic, citations, and the authors’ organizational affiliations, according to Gerry Grenier, senior director of IEEE publishing technology, also in Piscataway.
For example, Robert K. Smith, who writes about robotics and works at Georgia Tech, is most likely the same R.K. Smith who also writes on that topic and works at the same university. If enough criteria match, the system creates a master record for the author and assigns it a unique identifier. After the cleanup, the database had been boiled down to 4 million unique author records. As new authors and their articles are added to IEEE Xplore, their records also undergo the matching process.
“That uniqueness number will probably go down even further as we continue to clean up,” Grenier says. “There may still be anomalies or inconsistencies, but this is a work in progress.”
“The author disambiguation work is the first phase in improving the IEEE Xplore experience for authors and researchers,” Bellur continues. Future plans include individual profile pages that authors can update to ensure information is accurate and current.