Integrating the ReCiter author disambiguation engine with a faculty information system uri icon


  • Libraries and other administrative departments at medical universities are regularly called upon to produce reports detailing scholarly publications authored by members of their scholarly community.

    ORCID is touted as a solution to the problem of author disambiguation, and Weill Cornell Medical Library has explored this option. Despite growing interest, our analyses have shown ORCID's publication lists for an average person remain unreliable. Publisher mandates appear to have improved accuracy, but it's rare for all authors of a publication to be indexed with an ORCID ID. Practically speaking, we don't have the staff to manually assert publications on behalf of thousands of people, or the authority to require such people to maintain their own profiles. Indeed, we have even less influence over non-employees such as residents and voluntary faculty as well as inactive people such as alumni and historical faculty, all of whom we're called to report upon.

    For this reason, Weill Cornell Medicine has continued to pursue development of ReCiter, a homegrown Java-based tool which uses institutionally-maintained identity data to perform author name disambiguation using records harvested from PubMed. ReCiter employs 15 separate strategies for disambiguation including department name, known co-investigators, and year of degree.

    Fundamentally speaking, ReCiter is a publication suggestion engine. Provide it with a full complement of identity data, and it can return highly accurate suggestions, typically around 90-95%. What it has lacked to date is an integration with an application providing a user interface that captures feedback from its various end users including faculty, PhD students, administrators, and proxies.

    In the last year, we have ramped up our "Academic Staff Management System" initiative or ASMS. ASMS is a homegrown PHP-based system, which provides faculty, postdocs, other academics, and their administrators a single view of key information such as appointments, educational background, board certifications, licensure, grants, and contracts. This is also an appropriate system to collect feedback on ReCiter's suggested publications.

    For our presentation, we will demonstrate a proof of concept in which:
    - ReCiter is regularly updated with data from systems of record.
    - ReCiter makes suggestions for a specified group of individuals on a recurrent basis.
    - These suggestions are harvested by ASMS.
    - Administrative users (and eventually end users themselves) login to ASMS to provide feedback on these suggestions.
    - That feedback is harvested by ReCiter and used to make increasingly accurate suggestions going forward.
    - After either being validated or a period of time has elapsed with no response, we feed publication metadata to VIVO.

    See data flow diagram:


  • Albert, Paul
  • Bales, Michael
  • Lin, Jie

publication date

  • August 11, 2017