../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Calculating the H-index of a researcher

This notebook shows how to use Python and the Dimensions Analytics API to calculate the H-index of a researcher.


The h-index is an author-level metric that attempts to measure both the productivity and citation impact of the publications of a scientist or scholar. The index is based on the set of the scientist’s most cited papers and the number of citations that they have received in other publications.

A more precise definition:

The h-index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times.

How to calculate it:

Formally, if f is the function that corresponds to the number of citations for each publication, we compute the h-index as follows. First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position). For example, if we have a researcher with 5 publications A, B, C, D, and E with 10, 8, 5, 4, and 3 citations, respectively, the h-index is equal to 4 because the 4th publication has 4 citations and the 5th has only 3. In contrast, if the same publications have 25, 8, 5, 3, and 3 citations, then the index is 3 because the fourth paper has only 3 citations (wikipedia)


[ ]:
# @markdown Click the 'play' button on the left (or shift+enter) after entering your API credentials

username = "" #@param {type: "string"}
password = "" #@param {type: "string"}
endpoint = "https://app.dimensions.ai"

!pip install dimcli -U --quiet

# import all libraries and login
import dimcli
from dimcli.shortcuts import *
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

import pandas as pd

Selecting a researcher

Let’s take a researcher ID eg Michael Boutros ur.01357111535.49 and save its ID into a variable that can be referenced later.

Try modifying the researcher ID below to get different results!

RESEARCHER = "ur.01357111535.49"

The H-Index function

The h-Index function takes a list of citations and outputs the h-index value as explained above:

def the_H_function(sorted_citations_list, n=1):
    """from a list of integers [n1, n2 ..] representing publications citations,
    return the max list-position which is >= integer

    >>> the_H_function([10, 8, 5, 4, 3]) => 4
    >>> the_H_function([25, 8, 5, 3, 3]) => 3
    >>> the_H_function([1000, 20]) => 2
    if sorted_citations_list and sorted_citations_list[0] >= n:
        return the_H_function(sorted_citations_list[1:], n+1)
        return n-1

The H-index function is generic and can take any list of numbers representing publication citations.

Getting citations data from Dimensions

In order to pass some real-world data to the H-Index function, we can easily use the Dimensions API to extract all publication citations for a researcher, like this:

def get_pubs_citations(researcher_id):
    q = """search publications where researchers.id = "{}" return publications[times_cited] sort by times_cited limit 1000"""
    pubs = dsl.query(q.format(researcher_id))
    return list(pubs.as_dataframe().fillna(0)['times_cited'])

Wrapping things up

Finally, we combine the two functions to calculate the H-Index for a specific researcher:

print("H_index is:", the_H_function(get_pubs_citations(RESEARCHER)))
H_index is: 53


The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.