../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Clinical Trials by Volume of Publications

This notebook shows how use the Dimensions Analytics API in order to get a list of clinical trials records and then sort them by the total number of publications they cite.

Load libraries and log in

[1]:
# @markdown # Get the API library and login
# @markdown Click the 'play' button on the left (or shift+enter) after entering your API credentials

username = "" #@param {type: "string"}
password = "" #@param {type: "string"}
endpoint = "https://app.dimensions.ai" #@param {type: "string"}


!pip install dimcli plotly_express -U --quiet
import dimcli
from dimcli.shortcuts import *
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

#
# load common libraries
import time
import sys
import json
import pandas as pd
from pandas.io.json import json_normalize
from tqdm import tqdm_notebook as tqdm

#
# charts libs
import plotly_express as px
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports
  from plotly.offline import init_notebook_mode
  init_notebook_mode(connected=True)
DimCli v0.6.1.2 - Succesfully connected to <https://app.dimensions.ai> (method: manual login)

Query for Clinical Trials

[2]:
q = """search clinical_trials where category_rcdc.name="Multiple Sclerosis"
        and active_years=[2017, 2018, 2019]
        return clinical_trials[basics+publication_ids]"""
[13]:
df = dsl.query_iterative(q).as_dataframe()
df.head()
1000 / 2132
2000 / 2132
2132 / 2132
[13]:
active_years id publication_ids investigator_details title
0 [2005, 2006, 2007, 2008, 2009] NCT00257855 [pub.1016078006] [[Raju Kapoor, MD PhD, Study Director, Nationa... A Randomised Controlled Trial of Neuroprotecti...
1 [2003, 2004, 2005, 2006] NCT00260741 None [[Mark Agius, MD, Principal Investigator, Univ... Cannabis for Spasticity in Multiple Sclerosis:...
2 [2006, 2007, 2008, 2009, 2010, 2011] NCT00261326 [pub.1014996254] [[Jette L Frederiksen, DrMed, Study Director, ... Simvastatin Treatment of Patients With Acut Op...
3 [2000, 2001, 2002, 2003, 2004, 2005, 2006, 200... NCT00262314 None [[Randy Bennett, , Study Director, EMD Serono,... Prospective, Open-label Tolerability and Safet...
4 None NCT00267319 None [[Zuzana Priborska, , Study Director, Sanofi, ... Fatigue Outcomes of Copaxone Users in Relapsin...

Counting publications per clinical trial

Before we can count publications, we should ensure that all the values are ‘countable’. So we have to transform all None values in publication_ids into empty lists first.

[15]:
# replace empty values with empty lists so that they can be counted
for row in df.loc[df.publication_ids.isnull(), 'publication_ids'].index:
    df.at[row, 'publication_ids'] = []

Now it’s ok to count publications

[16]:
# create new column
df['pubs_tot'] = df['publication_ids'].apply(lambda x: len(x))
# sort
df.sort_values("pubs_tot", ascending=False, inplace=True)
df.head(5)
[16]:
active_years id publication_ids investigator_details title pubs_tot
1695 [2019, 2020, 2021] NCT04073940 [pub.1022342218, pub.1010367173, pub.104532283... None Exploration of Brain Changes Due to a Targeted... 92
1622 [2018, 2019, 2020, 2021, 2022] NCT03782246 [pub.1028617232, pub.1035385239, pub.103188188... [[Dawn Ehde, PhD, Principal Investigator, Univ... Mindfulness-based Cognitive Therapy and Cognit... 72
383 [2014, 2015, 2016, 2017, 2018] NCT02104661 [pub.1042342392, pub.1016078006, pub.102366073... [[Gavin Givannoni, , Principal Investigator, Q... OxCarbazepine as a Neuroprotective Agent in MS... 48
625 [2016, 2017, 2018, 2019, 2020] NCT03004079 [pub.1033284389, pub.1053353274, pub.107152083... [[Myla Goldman, MD, Principal Investigator, Un... Assessment of the Clinical Importance of Insul... 46
309 [2014, 2015] NCT02367222 [pub.1026901855, pub.1051111163, pub.106264497... [[Salah Mahmud, MD, PhD, Principal Investigato... An Observational Retrospective Database Analys... 43

A simple data visualization

[24]:
px.bar(df[:200], x="id", y="pubs_tot",
      hover_name="title", hover_data=["active_years"])


Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg