Clinical Trials by Volume of Publications¶

This notebook shows how use the Dimensions Analytics API in order to get a list of clinical trials records and then sort them by the total number of publications they cite.

[2]:

import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))

==
CHANGELOG
This notebook was last run on Jan 25, 2022
==

Prerequisites¶

This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.

[3]:

!pip install dimcli plotly tqdm -U --quiet

import dimcli
from dimcli.utils import *

import os, sys, time, json
from tqdm.notebook import tqdm
import pandas as pd
import plotly.express as px
from plotly.offline import plot
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports
  from plotly.offline import init_notebook_mode
  init_notebook_mode(connected=True)

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')
  dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
  KEY = ""
  dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()

Searching config file credentials for 'https://app.dimensions.ai' endpoint..

==
Logging in..
Dimcli - Dimensions API Client (v0.9.6)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0
Method: dsl.ini file

Query for Clinical Trials¶

[4]:

q = """search clinical_trials where category_rcdc.name="Multiple Sclerosis"
        and active_years=[2017, 2018, 2019]
        return clinical_trials[basics+publication_ids]"""

[5]:

df = dsl.query_iterative(q).as_dataframe()
df.head()

Starting iteration with limit=1000 skip=0 ...
0-1000 / 3353 (3.08s)
1000-2000 / 3353 (3.93s)
2000-3000 / 3353 (2.83s)
3000-3353 / 3353 (1.22s)
===
Records extracted: 3353

[5]:

	id	investigators	title	active_years	publication_ids
0	UMIN000045085	[[Mostafa Sarabzadeh, , Public Contact, Natio...	Neurophysiological effects of a high-speed neu...	NaN	NaN
1	UMIN000044955	[[Seina Toneri, , Public Contact, Mebix, Inc....	A multicentre, retrospective study in patients...	NaN	NaN
2	UMIN000043910	[[Hiroaki Yokote, , Public Contact, Nitobe Me...	Association between brain atrophy and intestin...	NaN	NaN
3	UMIN000038903	[[Takami Ishizuka, , Public Contact, National...	Efficacy and Safety of OCH-NCNP1 in patients w...	NaN	NaN
4	UMIN000038431	[[Mohammad Bayattork, , Public Contact, Unive...	Twelve weeks of Pilates training improved func...	NaN	NaN

Counting publications per clinical trial¶

Before we can count publications, we should ensure that all the values are ‘countable’. So we have to transform all None values in publication_ids into empty lists first.

[6]:

# replace empty values with empty lists so that they can be counted
for row in df.loc[df.publication_ids.isnull(), 'publication_ids'].index:
    df.at[row, 'publication_ids'] = []

Now it’s ok to count publications

[7]:

# create new column
df['pubs_tot'] = df['publication_ids'].apply(lambda x: len(x))
# sort
df.sort_values("pubs_tot", ascending=False, inplace=True)
df.head(5)

[7]:

	id	investigators	title	active_years	publication_ids	pubs_tot
521	NCT04073940	[[Citlali Lopez-Ortiz, PhD, MA, Contact, Unive...	Exploration of Brain Changes Due to a Targeted...	[2019, 2020, 2021]	[pub.1100827489, pub.1035450339, pub.103700030...	99
648	NCT03782246	[[Dawn Ehde, PhD, Principal Investigator, Univ...	Mindfulness-based Cognitive Therapy and Cognit...	[2018, 2019, 2020, 2021, 2022]	[pub.1059401852, pub.1001854818, pub.102868830...	72
327	NCT04550455	[[Keith R Edwards, MD, Study Director, MS Cent...	A Prospective Biomarker Study in Active Second...	[2020, 2021, 2022, 2023, 2024, 2025]	[pub.1104144237, pub.1122095683, pub.101866541...	52
1313	NCT02104661	[[Gavin Givannoni, , Principal Investigator, Q...	OxCarbazepine as a Neuroprotective Agent in MS...	[2014, 2015, 2016, 2017, 2018]	[pub.1035408940, pub.1011294811, pub.100257806...	48
947	NCT03004079	[[Myla Goldman, MD, Principal Investigator, Un...	Assessment of the Clinical Importance of Insul...	[2016, 2017, 2018, 2019, 2020]	[pub.1001068458, pub.1013002909, pub.103549213...	46

A simple data visualization

[8]:

px.bar(df[:200], x="id", y="pubs_tot",
      hover_name="title", hover_data=["active_years"])

Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.