../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Clinical Trials by Volume of Publications

This notebook shows how use the Dimensions Analytics API in order to get a list of clinical trials records and then sort them by the total number of publications they cite.

import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
This notebook was last run on Jan 25, 2022


This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.

!pip install dimcli plotly tqdm -U --quiet

import dimcli
from dimcli.utils import *

import os, sys, time, json
from tqdm.notebook import tqdm
import pandas as pd
import plotly.express as px
from plotly.offline import plot
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports
  from plotly.offline import init_notebook_mode

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')
  dimcli.login(key=KEY, endpoint=ENDPOINT)
  KEY = ""
  dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
Searching config file credentials for 'https://app.dimensions.ai' endpoint..
Logging in..
Dimcli - Dimensions API Client (v0.9.6)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0
Method: dsl.ini file

Query for Clinical Trials

q = """search clinical_trials where category_rcdc.name="Multiple Sclerosis"
        and active_years=[2017, 2018, 2019]
        return clinical_trials[basics+publication_ids]"""
df = dsl.query_iterative(q).as_dataframe()
Starting iteration with limit=1000 skip=0 ...
0-1000 / 3353 (3.08s)
1000-2000 / 3353 (3.93s)
2000-3000 / 3353 (2.83s)
3000-3353 / 3353 (1.22s)
Records extracted: 3353
id investigators title active_years publication_ids
0 UMIN000045085 [[Mostafa Sarabzadeh, , Public Contact, Natio... Neurophysiological effects of a high-speed neu... NaN NaN
1 UMIN000044955 [[Seina Toneri, , Public Contact, Mebix, Inc.... A multicentre, retrospective study in patients... NaN NaN
2 UMIN000043910 [[Hiroaki Yokote, , Public Contact, Nitobe Me... Association between brain atrophy and intestin... NaN NaN
3 UMIN000038903 [[Takami Ishizuka, , Public Contact, National... Efficacy and Safety of OCH-NCNP1 in patients w... NaN NaN
4 UMIN000038431 [[Mohammad Bayattork, , Public Contact, Unive... Twelve weeks of Pilates training improved func... NaN NaN

Counting publications per clinical trial

Before we can count publications, we should ensure that all the values are ‘countable’. So we have to transform all None values in publication_ids into empty lists first.

# replace empty values with empty lists so that they can be counted
for row in df.loc[df.publication_ids.isnull(), 'publication_ids'].index:
    df.at[row, 'publication_ids'] = []

Now it’s ok to count publications

# create new column
df['pubs_tot'] = df['publication_ids'].apply(lambda x: len(x))
# sort
df.sort_values("pubs_tot", ascending=False, inplace=True)
id investigators title active_years publication_ids pubs_tot
521 NCT04073940 [[Citlali Lopez-Ortiz, PhD, MA, Contact, Unive... Exploration of Brain Changes Due to a Targeted... [2019, 2020, 2021] [pub.1100827489, pub.1035450339, pub.103700030... 99
648 NCT03782246 [[Dawn Ehde, PhD, Principal Investigator, Univ... Mindfulness-based Cognitive Therapy and Cognit... [2018, 2019, 2020, 2021, 2022] [pub.1059401852, pub.1001854818, pub.102868830... 72
327 NCT04550455 [[Keith R Edwards, MD, Study Director, MS Cent... A Prospective Biomarker Study in Active Second... [2020, 2021, 2022, 2023, 2024, 2025] [pub.1104144237, pub.1122095683, pub.101866541... 52
1313 NCT02104661 [[Gavin Givannoni, , Principal Investigator, Q... OxCarbazepine as a Neuroprotective Agent in MS... [2014, 2015, 2016, 2017, 2018] [pub.1035408940, pub.1011294811, pub.100257806... 48
947 NCT03004079 [[Myla Goldman, MD, Principal Investigator, Un... Assessment of the Clinical Importance of Insul... [2016, 2017, 2018, 2019, 2020] [pub.1001068458, pub.1013002909, pub.103549213... 46

A simple data visualization

px.bar(df[:200], x="id", y="pubs_tot",
      hover_name="title", hover_data=["active_years"])


The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.