Journal Profiling Part 2: Impact Metrics

This Python notebook shows how to use the Dimensions Analytics API to extract publications data for a specific journal, as well its authors and affiliations.

This tutorial is the second of a series that uses the data extracted in order to generate a ‘journal profile’ report. See the API Lab homepage for the other tutorials in this series.

In this notebook we are going to:

  • Load the researchers data previously extracted

  • Enrich it by building a dataset focusing on their impact in terms of no of papers, citations etc..

  • Visualize the results with plotly to have a quick overview of the results

Prerequisites: Installing the Dimensions Library and Logging in


!pip install dimcli plotly tqdm -U --quiet
import dimcli
from dimcli.shortcuts import *
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

# load common libraries
import time
import sys
import os
import json
import pandas as pd
from pandas.io.json import json_normalize
from tqdm.notebook import tqdm as progress

# charts libs
# import plotly_express as px
import plotly.express as px
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports
  from plotly.offline import init_notebook_mode
# create output data folder
FOLDER_NAME = "journal-profile-data"
if not(os.path.exists(FOLDER_NAME)):

def save(df,filename_dot_csv):
    df.to_csv(FOLDER_NAME+"/"+filename_dot_csv, index=False)
DimCli v0.6.8.1 - Succesfully connected to <https://app.dimensions.ai> (method: dsl.ini file)

Measuring the Impact of Researchers within a Journal

Goal: from the list of authors and publications we previously extracted, we want to create a new dataset focused on researchers with the following information: * number of papers * citations median * altmetric median * last publication year

This data will allow to determine the ‘impact’ of a researcher within the journal.

Load the publications and authors data previously saved

NOTE If you are using Google Colab or don’t have the data available, just do the following: 1. open up the ‘Files’ panel in Google Colab and create a new folder journal-profile-data 2. grab this file, unzip it, open the enclosed folder and upload the files called 1_publications.csv and 1_publications_authors.csv to Google Colab (‘Upload’ menu or also by dragging then inside the panel window) 3. move the files inside the journal-profile-data folder you just created

dfpubs = pd.read_csv(FOLDER_NAME+"/1_publications.csv")
authors = pd.read_csv(FOLDER_NAME+"/1_publications_authors.csv")
# replace empty values with 0 so to allow bulk calculations
dfpubs = dfpubs.fillna(0)
authors = authors.fillna(0)

Isolate the Researchers data (= authors with an ID)

researchers = authors.query("researcher_id!=0")
print("Researchers total:",  len(researchers))
Researchers total: 33377
first_name last_name initials corresponding orcid current_organization_id researcher_id affiliations is_bogus pub_id
Enrich the data with Impact Statistics

First, let’s pivot on the researcher ID field to eliminate duplicates and count them

researchers_impact = researchers[['researcher_id', 'pub_id']].groupby('researcher_id', as_index=False).count().sort_values(by=['pub_id'], ascending=False).reset_index(drop=True)
researchers_impact.rename(columns={"pub_id": "pubs"}, inplace=True)
researcher_id pubs
Second, for each researcher ID we can query all of his/her publications so to calculate the following metrics:

  • citations median

  • altmetric median

  • last publication year

Also, we add a new field with the Dimensions URL of the researcher, as it can be handy later on to open up its profile page online.


def get_name_surname(researcher_id):
    >>> get_name_surname("ur.0723426172.10")
    'Kari Stefansson'
    q = "researcher_id=='%s'" % researcher_id
    x = researchers.query(q)['first_name'].value_counts().idxmax()
    y = researchers.query(q)['last_name'].value_counts().idxmax()
    return f"{x} {y}"

# def dimensions_url(researcher_id):
#     return f"https://app.dimensions.ai/discover/publication?and_facet_researcher={researcher_id}"

fullnames, citations, altmetric, last_year, urls = [], [], [], [], []

for i, row in progress(researchers_impact.iterrows(), total=researchers_impact.shape[0]):
    q = "researcher_id=='%s'" % row['researcher_id']
    pub_ids = list(researchers.query(q)['pub_id'])

researchers_impact['full_name'] = fullnames
researchers_impact['citations_mean'] = citations
researchers_impact['altmetric_mean'] = altmetric
researchers_impact['last_pub_year'] = last_year
researchers_impact['url'] = urls
# finally..
print("Researchers total:",  len(researchers_impact))

Researchers total: 18524
researcher_id pubs full_name citations_mean altmetric_mean last_pub_year url
Save the data

save(researchers_impact, "2_researchers_impact_metrics.csv")

Couple of Dataviz

top100 = researchers_impact[:100]
           x="full_name", y="pubs",
           hover_data=['citations_mean', 'altmetric_mean'],
           title="Researchers Impact - top 100")
           x="citations_mean", y="altmetric_mean",
           hover_data=['pubs', 'citations_mean', 'altmetric_mean'],
           title="Researchers Impact (citations vs pubs)")
           x="citations_mean", y="altmetric_mean",
           hover_data=['pubs', 'citations_mean', 'altmetric_mean'],
           title="Researchers Impact (citations vs pubs) by last publication year")


