../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Working with lists in the Dimensions API

In this notebook we are going to show:

  • How to use lists in order to write more efficient DSL queries

  • How lists can be used to concatenate the results of one query with another query

  • How these methods can be used for real-word applications e.g., getting publications/patents/grants that cite my publications

Prerequisites

This notebook assumes you have installed the Dimcli library and are familiar with the Getting Started tutorial.

[52]:
!pip install dimcli -U --quiet

import dimcli
from dimcli.shortcuts import *
import sys
import json
import pandas as pd
import numpy as np

print("==\nLogging in..")
# https://github.com/digital-science/dimcli#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  USERNAME = getpass.getpass(prompt='Username: ')
  PASSWORD = getpass.getpass(prompt='Password: ')
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
else:
  USERNAME, PASSWORD  = "", ""
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
dsl = dimcli.Dsl()
==
Logging in..
Dimcli - Dimensions API Client (v0.7.4.2)
Connected to: https://app.dimensions.ai - DSL v1.27
Method: dsl.ini file

1. How do we use lists in the Dimensions API?

We use lists in the API because they are easier to read, and easier to work with.

Here is a query without lists.

How many publications were produced from either Monash or Melbourne University ( grid.1002.3, grid.1008.9 ) in either (2019 OR 2020). Be really careful with your brakets!

[2]:
%%dsldf

search publications
where
      (
          research_orgs.id = "grid.1008.9"
       or research_orgs.id = "grid.1002.3"
       )
  and (
          year = 2019
       or year = 2020
       )
return publications
limit 1

Returned Publications: 1 (total = 39938)
Time: 0.45s
[2]:
title pages author_affiliations year issue id type volume journal.id journal.title
0 “Surviving not thriving”: experiences of healt... 809-823 [[{'first_name': 'Madelaine', 'last_name': 'Sm... 2020 1 pub.1128910744 article 25 jour.1097842 International Journal of Adolescence and Youth

The query above could get really messy. What if I wanted 20 institutions. What if I wanted the last ten years: (or,or,or,or,or….) and (or,or,or,or,or)

By using lists we can quickly add a large number of conditions by means of an easy to read square-brakets notation:

[3]:
%%dsldf
search publications
where research_orgs.id in ["grid.1008.9","grid.1002.3"]
  and year in [2019:2020]
return publications[id]
limit 100
Returned Publications: 100 (total = 39938)
Time: 0.45s
[3]:
id
0 pub.1128910744
1 pub.1125408894
2 pub.1125679504
3 pub.1121881108
4 pub.1125399511
... ...
95 pub.1130154967
96 pub.1130473004
97 pub.1130565969
98 pub.1129877063
99 pub.1130210026

100 rows × 1 columns

2. What are all the things that we can make lists of in the Dimensions API?

What are the internal Entities that we might put in a list?

[4]:
%dsldocs
dsl_last_results[dsl_last_results['is_entity']==True]
[4]:
sources field type description is_filter is_entity is_facet
6 publications category_bra categories `Broad Research Areas <https://dimensions.fres... True True True
7 publications category_for categories `ANZSRC Fields of Research classification <htt... True True True
8 publications category_hra categories `Health Research Areas <https://dimensions.fre... True True True
9 publications category_hrcs_hc categories `HRCS - Health Categories <https://dimensions.... True True True
10 publications category_hrcs_rac categories `HRCS – Research Activity Codes <https://dimen... True True True
... ... ... ... ... ... ... ...
284 datasets research_org_cities cities City of the organisations the publication auth... True True True
285 datasets research_org_countries countries Country of the organisations the publication a... True True True
286 datasets research_org_states states State of the organisations the publication aut... True True True
287 datasets research_orgs organizations GRID organisations linked to the publication a... True True True
288 datasets researchers researchers Dimensions researchers IDs associated to the d... True True True

101 rows × 7 columns

What about lists of ids?

[5]:
%dsldocs
dsl_last_results[dsl_last_results['field'].str.contains('id')==True]
[5]:
sources field type description is_filter is_entity is_facet
1 publications altmetric_id integer AltMetric Publication ID True False False
25 publications id string Dimensions publication ID. True False False
34 publications pmcid string PubMed Central ID. True False False
35 publications pmid string PubMed ID. True False False
39 publications reference_ids string Dimensions publication ID for publications in ... True False False
51 publications supporting_grant_ids string Grants supporting a publication, returned as a... True False False
89 grants id string Dimensions grant ID. True False False
111 patents associated_grant_ids string Dimensions IDs of the grants associated to the... True False False
120 patents cited_by_ids string Dimensions IDs of the patents that cite this p... True False False
133 patents id string Dimensions patent ID True False False
143 patents publication_ids string Dimensions IDs of the publications related to ... True False False
146 patents reference_ids string Dimensions IDs of the patents which are cited ... True False False
154 clinical_trials associated_grant_ids string Dimensions IDs of the grants associated to the... True False False
171 clinical_trials id string Dimensions clinical trial ID True False False
177 clinical_trials publication_ids string Dimensions IDs of the publications related to ... True False False
193 policy_documents id string Dimensions policy document ID True False False
195 policy_documents publication_ids string Dimensions IDs of the publications related to ... True False False
208 researchers id string Dimensions researcher ID. True False False
212 researchers nih_ppid string The PI Profile ID (i.e., ppid) is a Researcher... True False False
214 researchers orcid_id string `ORCID <https://orcid.org/>`_ ID. True False False
221 organizations cnrs_ids string CNRS IDs for this organization True False False
225 organizations external_ids_fundref string Fundref IDs for this organization True False False
226 organizations hesa_ids string HESA IDs for this organization True False False
227 organizations id string GRID ID of the organization. E.g., "grid.26999... True False False
228 organizations isni_ids string ISNI IDs for this organization True False False
239 organizations organization_child_ids string Child organization IDs True False False
240 organizations organization_parent_ids string Parent organization IDs True False False
241 organizations organization_related_ids string Related organization IDs True False False
242 organizations orgref_ids string OrgRef IDs for this organization True False False
244 organizations ror_ids string ROR IDs for this organization True False False
248 organizations ucas_ids string UCAS IDs for this organization True False False
249 organizations ukprn_ids string UKPRN IDs for this organization True False False
250 organizations wikidata_ids string WikiData IDs for this organization True False False
252 datasets associated_grant_ids string The Dimensions IDs of the grants linked to the... True False False
254 datasets associated_publication_id string The Dimensions ID of the publication linked to... True False False
276 datasets id string Dimensions dataset ID. True False False
283 datasets repository_id string The ID of the repository of the dataset. True False True

What are the external entities that we can put in a list?

  • a list of ISSN’s

  • a list of External Grant IDs

  • a list of DOIs

  • a list of categories

3. Making a list from the results of a query

The list syntax for the Dimensions API is the same as the list syntax for json, so we can use python’s json-to-string functions to make a list of ids for us from the previous query.

Let’s run our example query again.

[6]:
%%dsldf
search publications
where research_orgs.id in ["grid.1008.9","grid.1002.3"]
  and year in [2019:2020]
return publications[id]
limit 100
Returned Publications: 100 (total = 39938)
Time: 0.51s
[6]:
id
0 pub.1128910744
1 pub.1125408894
2 pub.1125679504
3 pub.1121881108
4 pub.1125399511
... ...
95 pub.1130154967
96 pub.1130473004
97 pub.1130565969
98 pub.1129877063
99 pub.1130210026

100 rows × 1 columns

[7]:
json.dumps(list(dsl_last_results.id))


[7]:
'["pub.1128910744", "pub.1125408894", "pub.1125679504", "pub.1121881108", "pub.1125399511", "pub.1125025149", "pub.1129750783", "pub.1129934894", "pub.1116652110", "pub.1130428426", "pub.1125617654", "pub.1125508088", "pub.1125663277", "pub.1127397677", "pub.1128670172", "pub.1129738952", "pub.1129927065", "pub.1125488818", "pub.1127579881", "pub.1129330483", "pub.1129969190", "pub.1124060229", "pub.1130469744", "pub.1128526965", "pub.1127449188", "pub.1128910055", "pub.1129152052", "pub.1128835792", "pub.1128553673", "pub.1127171381", "pub.1128188711", "pub.1124550366", "pub.1127403602", "pub.1126704737", "pub.1126705781", "pub.1126697386", "pub.1128490805", "pub.1127453116", "pub.1128581013", "pub.1125617777", "pub.1125902856", "pub.1125545402", "pub.1128250005", "pub.1129711942", "pub.1124633600", "pub.1125753634", "pub.1125756054", "pub.1128670125", "pub.1127351832", "pub.1125664038", "pub.1126125606", "pub.1124372639", "pub.1124670593", "pub.1127514900", "pub.1126502051", "pub.1126013995", "pub.1129038309", "pub.1129100811", "pub.1126702062", "pub.1129384636", "pub.1129492233", "pub.1124830567", "pub.1129357934", "pub.1129327398", "pub.1125516023", "pub.1130259916", "pub.1130003198", "pub.1130128246", "pub.1129417550", "pub.1128978928", "pub.1129155555", "pub.1130240573", "pub.1130286746", "pub.1125502024", "pub.1129668495", "pub.1125609619", "pub.1127726353", "pub.1124284911", "pub.1124342595", "pub.1124132392", "pub.1125488824", "pub.1128939571", "pub.1125488819", "pub.1125152336", "pub.1129605398", "pub.1128877084", "pub.1129604079", "pub.1127419474", "pub.1125402193", "pub.1129847614", "pub.1128556442", "pub.1124885124", "pub.1129640786", "pub.1127252780", "pub.1130469679", "pub.1130154967", "pub.1130473004", "pub.1130565969", "pub.1129877063", "pub.1130210026"]'

Let’s try to use this list of IDs.

Unfortunately, you can’t just put your results directly into the query

[8]:
%%dsldf
  search publications
  where id in [json.dumps(list(dsl_last_results.id))]

  return publications

Returned Errors: 1
Time: 0.45s
4 QuerySyntaxErrors found
4 ParserErrors found
  * [Line 2:15] ('json') no viable alternative at input '[json'
  * [Line 2:26] ('list') extraneous input 'list' expecting {'for', 'in', '('}
  * [Line 2:31] ('dsl_last_results') mismatched input 'dsl_last_results' expecting {'for', 'in', '('}
  * [Line 2:52] (']') extraneous input ']' expecting 'return'

..so let’s get our results back again

[9]:
%%dsldf
search publications
where research_orgs.id in ["grid.1008.9","grid.1002.3"]
  and year in [2019:2020]
return publications[id]
limit 100
Returned Publications: 100 (total = 39938)
Time: 0.47s
[9]:
id
0 pub.1128910744
1 pub.1125408894
2 pub.1125679504
3 pub.1121881108
4 pub.1125399511
... ...
95 pub.1130154967
96 pub.1130473004
97 pub.1130565969
98 pub.1129877063
99 pub.1130210026

100 rows × 1 columns

… and use the python way of calling the Dimensions API instead

[10]:
dsl.query(f"""

 search publications
  where id in {json.dumps(list(dsl_last_results.id))}

  return publications


""").as_dataframe()

f"""

 search publications
  where id in {json.dumps(list(dsl_last_results.id))}

  return publications


"""
Returned Publications: 20 (total = 100)
Time: 0.55s
[10]:
'\n\n search publications\n  where id in ["pub.1128910744", "pub.1125408894", "pub.1125679504", "pub.1121881108", "pub.1125399511", "pub.1125025149", "pub.1129750783", "pub.1129934894", "pub.1116652110", "pub.1130428426", "pub.1125617654", "pub.1125508088", "pub.1125663277", "pub.1127397677", "pub.1128670172", "pub.1129738952", "pub.1129927065", "pub.1125488818", "pub.1127579881", "pub.1129330483", "pub.1129969190", "pub.1124060229", "pub.1130469744", "pub.1128526965", "pub.1127449188", "pub.1128910055", "pub.1129152052", "pub.1128835792", "pub.1128553673", "pub.1127171381", "pub.1128188711", "pub.1124550366", "pub.1127403602", "pub.1126704737", "pub.1126705781", "pub.1126697386", "pub.1128490805", "pub.1127453116", "pub.1128581013", "pub.1125617777", "pub.1125902856", "pub.1125545402", "pub.1128250005", "pub.1129711942", "pub.1124633600", "pub.1125753634", "pub.1125756054", "pub.1128670125", "pub.1127351832", "pub.1125664038", "pub.1126125606", "pub.1124372639", "pub.1124670593", "pub.1127514900", "pub.1126502051", "pub.1126013995", "pub.1129038309", "pub.1129100811", "pub.1126702062", "pub.1129384636", "pub.1129492233", "pub.1124830567", "pub.1129357934", "pub.1129327398", "pub.1125516023", "pub.1130259916", "pub.1130003198", "pub.1130128246", "pub.1129417550", "pub.1128978928", "pub.1129155555", "pub.1130240573", "pub.1130286746", "pub.1125502024", "pub.1129668495", "pub.1125609619", "pub.1127726353", "pub.1124284911", "pub.1124342595", "pub.1124132392", "pub.1125488824", "pub.1128939571", "pub.1125488819", "pub.1125152336", "pub.1129605398", "pub.1128877084", "pub.1129604079", "pub.1127419474", "pub.1125402193", "pub.1129847614", "pub.1128556442", "pub.1124885124", "pub.1129640786", "pub.1127252780", "pub.1130469679", "pub.1130154967", "pub.1130473004", "pub.1130565969", "pub.1129877063", "pub.1130210026"]\n\n  return publications\n\n\n'

Putting both parts of this example together

[11]:
# Step 1. Get the list of publications..

pubs = dsl.query("""
                  search publications
                    where research_orgs.id in ["grid.1008.9","grid.1002.3"]
                      and year in [2019:2020]
                    return publications[id]
                    limit 100
                """).as_dataframe()

# Step 2. Put the list into the next query...

dsl.query_iterative(f"""
                 search publications
                    where id in {json.dumps(list(pubs.id))}
                    return publications
""").as_dataframe().head(5)
Returned Publications: 100 (total = 39938)
Time: 0.47s
Starting iteration with limit=1000 skip=0 ...
0-100 / 100 (0.98s)
===
Records extracted: 100
[11]:
id title volume issue pages type year author_affiliations journal.id journal.title
0 pub.1128910744 “Surviving not thriving”: experiences of healt... 25 1 809-823 article 2020 [[{'first_name': 'Madelaine', 'last_name': 'Sm... jour.1097842 International Journal of Adolescence and Youth
1 pub.1125408894 Posttraumatic anger: a confirmatory factor ana... 11 1 1731127 article 2020 [[{'first_name': 'Grazia', 'last_name': 'Cesch... jour.1045059 European Journal of Psychotraumatology
2 pub.1125679504 Direct assessment of mental health and metabol... 13 1 1732665 article 2020 [[{'first_name': 'Peter S', 'last_name': 'Azzo... jour.1041075 Global Health Action
3 pub.1121881108 The large-scale implementation and evaluation ... 25 1 1-11 article 2020 [[{'first_name': 'Bengianni', 'last_name': 'Pi... jour.1097842 International Journal of Adolescence and Youth
4 pub.1125399511 Structural brain changes with lifetime trauma ... 11 1 1733247 article 2020 [[{'first_name': 'Marie-Laure', 'last_name': '... jour.1045059 European Journal of Psychotraumatology

Doing something useful: Get all the publications that cite my publications

[12]:
pubs = dsl.query("""
                  search publications
                    where research_orgs.id in ["grid.1008.9","grid.1002.3"]
                      and year in [2019:2020]
                    return publications[id]
                    limit 100
                """)

mypubslist = json.dumps(list(pubs.as_dataframe().id))

dsl.query_iterative(f"""
                 search publications
                    where reference_ids in {mypubslist}
                    return publications
""").as_dataframe().head()
Returned Publications: 100 (total = 39938)
Time: 0.46s
Starting iteration with limit=1000 skip=0 ...
0-47 / 47 (0.61s)
===
Records extracted: 47
[12]:
title pages author_affiliations year issue id type volume journal.id journal.title
0 Non-photochemical quenching, a non-invasive pr... 32-43 [[{'first_name': 'Pranali', 'last_name': 'Deor... 2020 1 pub.1125663277 article 1 NaN NaN
1 Distribution and pyrethroid resistance status ... 213 [[{'first_name': 'Hitoshi', 'last_name': 'Kawa... 2020 1 pub.1126895840 article 13 jour.1039458 Parasites & Vectors
2 Reducing ignorance about who dies of what: res... 58 [[{'first_name': 'Alan D.', 'last_name': 'Lope... 2020 1 pub.1125488824 article 18 jour.1032885 BMC Medicine
3 Wolbachia: a possible weapon for controlling d... 50 [[{'first_name': 'Sujan', 'last_name': 'Khadka... 2020 1 pub.1128670136 article 48 jour.1312315 Tropical Medicine and Health
4 Can increasing years of schooling reduce type ... 12908 [[{'first_name': 'Charleen D.', 'last_name': '... 2020 1 pub.1129772083 article 10 jour.1045337 Scientific Reports

5. How Long can lists get?

It is a bit dependent on string length, plus a fixed length of 512 items

This won’t work

[13]:
pubs = dsl.query("""
                  search publications
                    where research_orgs.id in ["grid.1008.9","grid.1002.3"]
                      and year in [2019:2020]
                    return publications[id]
                    limit 1000
                """)

mypubslist = json.dumps(list(pubs.as_dataframe().id))

dsl.query(f"""
                 search publications
                    where reference_ids in {mypubslist}
                    return publications
""").as_dataframe()
Returned Publications: 1000 (total = 39938)
Time: 0.50s
Returned Errors: 1
Time: 1.11s
Semantic Error
Semantic errors found:
        Filter operator 'in' requires 0 < items < 512. '1000 is out of this range'.

This will

[14]:
pubs = dsl.query("""
                  search publications
                    where research_orgs.id in ["grid.1008.9","grid.1002.3"]
                      and year in [2019:2020]
                    return publications[id]
                    limit 250
                """)

mypubslist = json.dumps(list(pubs.as_dataframe().id))

dsl.query(f"""
                 search publications
                    where reference_ids in {mypubslist}
                    return publications
""").as_dataframe().head(2)
Returned Publications: 250 (total = 39938)
Time: 0.59s
Returned Publications: 20 (total = 168)
Time: 0.98s
[14]:
title pages author_affiliations year issue id type volume journal.id journal.title
0 Non-photochemical quenching, a non-invasive pr... 32-43 [[{'first_name': 'Pranali', 'last_name': 'Deor... 2020 1 pub.1125663277 article 1 NaN NaN
1 Predicting breast cancer risk using interactin... 11044 [[{'first_name': 'Hamid', 'last_name': 'Behrav... 2020 1 pub.1129015188 article 10 jour.1045337 Scientific Reports

What if I need a very long list?

The Dimcli library can break up your query into chunks.

We then loop through each chunk - get the result, and stick them back together again at the end.

[15]:
# Step 1 - same as before - except now we want the query in chunks

pubs_chunks = dsl.query("""
                  search publications
                    where research_orgs.id in ["grid.1008.9","grid.1002.3"]
                      and year in [2019:2020]
                    return publications[id]
                    limit 1000
                """).chunks(250)

# Step 2 - almost the same as before - except now we use a for loop to loop through our results

query_results = []

for c in pubs_chunks:

      mypubslist = json.dumps(list(pd.DataFrame(c).id))

      query_results.append(

                  dsl.query_iterative(f"""
                        search publications
                            where reference_ids in {mypubslist}
                            return publications
                        """).as_dataframe()
      )

# Step 3 - join our results back together again, and get rid of duplicates

pd.concat(query_results).\
   drop_duplicates(subset='id').\
   head(2)

Returned Publications: 1000 (total = 39938)
Time: 0.87s
Starting iteration with limit=1000 skip=0 ...
0-167 / 167 (2.92s)
===
Records extracted: 167
Starting iteration with limit=1000 skip=0 ...
0-218 / 218 (1.16s)
===
Records extracted: 218
Starting iteration with limit=1000 skip=0 ...
0-262 / 262 (1.29s)
===
Records extracted: 262
Starting iteration with limit=1000 skip=0 ...
0-74 / 74 (0.75s)
===
Records extracted: 74
[15]:
id title volume issue pages type year author_affiliations journal.id journal.title
0 pub.1125663277 Non-photochemical quenching, a non-invasive pr... 1 1 32-43 article 2020 [[{'first_name': 'Pranali', 'last_name': 'Deor... NaN NaN
1 pub.1126895840 Distribution and pyrethroid resistance status ... 13 1 213 article 2020 [[{'first_name': 'Hitoshi', 'last_name': 'Kawa... jour.1039458 Parasites & Vectors

6. What if I want to get the researchers associated with the publications the cite my institution?

[16]:
# Step 1 - same as before

pubs_chunks = dsl.query("""
                  search publications
                    where research_orgs.id in ["grid.1008.9","grid.1002.3"]
                      and year in [2019:2020]
                    return publications[id]
                    limit 1000
                """).chunks(250)

query_results = []

# Step 2 same as before, but now I returning researchers instead of publications

for c in pubs_chunks:

      mypubslist = json.dumps(list(pd.DataFrame(c).id))

      query_results.append(

                  dsl.query(f"""
                        search publications
                            where reference_ids in {mypubslist}
                            return researchers limit 1000
                        """).as_dataframe()
      # Warning 1, If there are more than 1000 researchers involved in this query, you will miss some
      )

# Step 3 join the queries back together, this time using a groupby statement to join the counts back together again

my_researchers = pd.concat(query_results).\
                 groupby(['id','first_name','last_name']).\
                  agg({'count':'sum'}).\
                  sort_values(by='count', ascending=False).\
                  head(10)

Returned Publications: 1000 (total = 39938)
Time: 0.46s
Returned Researchers: 1000
Time: 1.48s
Returned Researchers: 1000
Time: 1.43s
Returned Researchers: 1000
Time: 1.29s
Returned Researchers: 293
Time: 1.02s

7. What if I want to get all the researchers associated with the publications that cite my institution?

[35]:
# Step 1 - same as before

pubs_chunks = dsl.query("""
                  search publications
                    where research_orgs.id in ["grid.1008.9","grid.1002.3"]
                      and year in [2019:2020]
                    return publications[id]
                    limit 1000
                """).chunks(250)

query_results = []

# Step 2 - almost the same as before -
#          except now we are asking for the as_dataframe_authors data frame

for c in pubs_chunks:

      mypubslist = json.dumps(list(pd.DataFrame(c).id))

      query_results.append(

                  dsl.query_iterative(f"""
                        search publications
                            where reference_ids in {mypubslist}
                            return publications[id+title+authors]
                        """).as_dataframe_authors() # I have changed this line from as_dataframe to as_datframe_authors
      )

# Step 3 - join the publications back together

researcher_pubs = pd.concat(query_results).\
                drop_duplicates(subset=['researcher_id','pub_id'])


# Step 4 - count up the publications using a groupby statement

my_researchers = researcher_pubs[researcher_pubs['researcher_id'] != ''].\
    groupby(['researcher_id']).\
    agg({'first_name':'max','last_name':'max','pub_id':'count'}).\
    sort_values(by='pub_id', ascending=False).\
    reset_index()

my_researchers.\
    head(10)

Returned Publications: 1000 (total = 39938)
Time: 0.46s
Starting iteration with limit=1000 skip=0 ...
0-167 / 167 (1.42s)
===
Records extracted: 167
Starting iteration with limit=1000 skip=0 ...
0-218 / 218 (1.11s)
===
Records extracted: 218
Starting iteration with limit=1000 skip=0 ...
0-262 / 262 (1.26s)
===
Records extracted: 262
Starting iteration with limit=1000 skip=0 ...
0-74 / 74 (0.87s)
===
Records extracted: 74
[35]:
researcher_id first_name last_name pub_id
0 ur.013035660527.88 Paul M. Thompson 16
1 ur.0735661630.67 Neda Jahanshad 11
2 ur.010365645710.55 Sophia I. Thomopoulos 9
3 ur.0627614723.01 Dan J. Stein 9
4 ur.01015666354.63 Dick J. Veltman 8
5 ur.01153670556.36 Carlos M. Grilo 7
6 ur.01117110035.48 Nynke A. Groenewold 6
7 ur.0652517237.06 Tomáš Paus 6
8 ur.0750404115.06 Suzanne Fraser 5
9 ur.01056633333.53 Lars T. Westlye 5

8. ..and if we want details about our researchers, we can put our list of researchers into the researcher API

See the researcher source docs for more details.

[39]:
## First, we need to chunk up our researcher list

query_results = []

for g, rschr in my_researchers.groupby(np.arange(len(my_researchers)) // 250):
          # This does *almost* the same thing as the chunks command used above

     myreslist = json.dumps(list(rschr.researcher_id))

     query_results.append(

                  dsl.query_iterative(f"""
                        search researchers
                            where id in {myreslist}
                            return researchers
                        """).as_dataframe() #
      )


pd.concat(query_results).head()
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (2.62s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.47s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.63s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.38s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.29s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.24s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.12s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.22s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.77s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.12s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.12s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.25s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.16s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.15s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.11s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.16s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.14s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.17s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (1.27s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-40 / 40 (0.63s)
===
Records extracted: 40
[39]:
id first_name last_name research_orgs orcid_id
0 ur.01262341717.29 Eric T Hahnen [{'id': 'grid.411668.c', 'types': ['Healthcare... NaN
1 ur.01264721100.06 Daniel Barrowdale [{'id': 'grid.5335.0', 'types': ['Education'],... [0000-0003-1661-3939]
2 ur.0743471221.03 Tjoung-Won Park-Simon [{'id': 'grid.4562.5', 'types': ['Education'],... [0000-0002-2863-3040]
3 ur.01105220501.02 Maria Adelaide Caligo [{'id': 'grid.5395.a', 'types': ['Education'],... [0000-0003-0589-1829]
4 ur.01116254763.53 Jonathan P Tyrer [{'id': 'grid.470869.4', 'types': ['Facility']... NaN

9. Patents example (patents -> publications)

Using the same method, we can retrieve all patents citing publications from my institution.

[40]:
%dsldocs patents
dsl_last_results[dsl_last_results['field']=='publication_ids']
[40]:
sources field type description is_filter is_entity is_facet
39 patents publication_ids string Dimensions IDs of the publications related to ... True False False
[41]:
# Step 1 - same as before - except now we want the query in chunks

pubs_chunks = dsl.query_iterative("""
                  search publications
                    where research_orgs.id in ["grid.1008.9"]
                      and year = 2015
                    return publications[id]
                """).chunks(250)

# Step 2 - almost the same as before - except now we use a for loop to loop through our results
#. We changed 2 things below.  publications was replaced with patents, and refernce_ids was replaced by publication_ids

query_results = []

for c in pubs_chunks:

      mypubslist = json.dumps(list(pd.DataFrame(c).id))

      query_results.append(

                  dsl.query_iterative(f"""
                        search patents
                            where publication_ids in {mypubslist}
                            return patents
                        """).as_dataframe()
      )

# Step 3 - join our results back together again, and get rid of duplicates

cited_patents = pd.concat(query_results).\
   drop_duplicates(subset='id')

cited_patents.head(2)
Starting iteration with limit=1000 skip=0 ...
0-1000 / 8388 (2.17s)
1000-2000 / 8388 (1.40s)
2000-3000 / 8388 (1.14s)
3000-4000 / 8388 (2.58s)
4000-5000 / 8388 (1.09s)
5000-6000 / 8388 (1.15s)
6000-7000 / 8388 (0.95s)
7000-8000 / 8388 (1.12s)
8000-8388 / 8388 (0.89s)
===
Records extracted: 8388
Starting iteration with limit=1000 skip=0 ...
0-4 / 4 (1.16s)
===
Records extracted: 4
Starting iteration with limit=1000 skip=0 ...
===
Records extracted: 0
Starting iteration with limit=1000 skip=0 ...
===
Records extracted: 0
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.69s)
===
Records extracted: 3
Starting iteration with limit=1000 skip=0 ...
0-4 / 4 (0.77s)
===
Records extracted: 4
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (0.78s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
0-5 / 5 (0.71s)
===
Records extracted: 5
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.78s)
===
Records extracted: 3
Starting iteration with limit=1000 skip=0 ...
0-9 / 9 (0.73s)
===
Records extracted: 9
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (0.68s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (0.93s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.74s)
===
Records extracted: 3
Starting iteration with limit=1000 skip=0 ...
0-1 / 1 (0.72s)
===
Records extracted: 1
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.69s)
===
Records extracted: 3
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (0.69s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.77s)
===
Records extracted: 3
Starting iteration with limit=1000 skip=0 ...
0-4 / 4 (0.69s)
===
Records extracted: 4
Starting iteration with limit=1000 skip=0 ...
0-4 / 4 (0.67s)
===
Records extracted: 4
Starting iteration with limit=1000 skip=0 ...
0-8 / 8 (0.70s)
===
Records extracted: 8
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.71s)
===
Records extracted: 3
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (1.15s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
0-5 / 5 (0.84s)
===
Records extracted: 5
Starting iteration with limit=1000 skip=0 ...
0-6 / 6 (0.80s)
===
Records extracted: 6
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.74s)
===
Records extracted: 3
Starting iteration with limit=1000 skip=0 ...
0-11 / 11 (0.91s)
===
Records extracted: 11
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (0.74s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
0-8 / 8 (0.69s)
===
Records extracted: 8
Starting iteration with limit=1000 skip=0 ...
===
Records extracted: 0
Starting iteration with limit=1000 skip=0 ...
0-5 / 5 (0.86s)
===
Records extracted: 5
Starting iteration with limit=1000 skip=0 ...
0-4 / 4 (0.81s)
===
Records extracted: 4
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (0.70s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
0-1 / 1 (0.68s)
===
Records extracted: 1
Starting iteration with limit=1000 skip=0 ...
0-1 / 1 (0.69s)
===
Records extracted: 1
Starting iteration with limit=1000 skip=0 ...
===
Records extracted: 0
[41]:
title publication_date assignee_names year assignees inventor_names times_cited filing_status id granted_year
0 COMPOSITIONS WITH SPECIFIC OLIGOSACCHARIDES TO... 2017-08-03 [NESTEC SA] 2017 [{'id': 'grid.419905.0', 'city_name': 'Vevey',... [BLANCHARD, CARINE, NEMBRINI, Chiara] 0.0 Application WO-2017129644-A1 NaN
1 COMPOSITION FOR USE IN THE PREVENTION AND/OR T... 2017-08-03 [NESTEC SA] 2017 [{'id': 'grid.419905.0', 'city_name': 'Vevey',... [BLANCHARD, CARINE, NEMBRINI, Chiara] 0.0 Application WO-2017129642-A1 NaN
[42]:
%dsldocs patents
dsl_last_results[dsl_last_results['type']=='organizations']
[42]:
sources field type description is_filter is_entity is_facet
6 patents assignees organizations Disambiguated GRID organisations who own or ha... True True True
19 patents current_assignees organizations Disambiguated GRID organisations currenlty own... True True True
26 patents funders organizations GRID organisations funding the patent. True True True
35 patents original_assignees organizations Disambiguated GRID organisations that first ow... True True True
[43]:
import json
cited_patents_assignees = cited_patents.explode('assignees')

cited_patents_assignees['assignee_grid_id'] = cited_patents_assignees['assignees'].\
    apply(lambda g: g['id'] if type(g) == dict else 0 )

cited_patents_assignees['assignee_name'] = cited_patents_assignees['assignees'].\
    apply(lambda g: g['name'] if type(g) == dict else 0 )

cited_patents_assignees.\
    groupby(['assignee_grid_id','assignee_name']).\
    agg({'id':'count'}).\
    sort_values(by='id', ascending=False).\
    head(20)
[43]:
id
assignee_grid_id assignee_name
0 0 29
grid.428999.7 Pasteur Institute 5
grid.1058.c Murdoch Children's Research Institute 4
grid.453773.1 Wisconsin Alumni Research Foundation 3
grid.419905.0 Nestlé (Switzerland) 3
grid.420918.6 Imperial Innovations (United Kingdom) 3
grid.1055.1 Peter MacCallum Cancer Centre 3
grid.25879.31 University of Pennsylvania 2
grid.1042.7 Walter and Eliza Hall Institute of Medical Research 2
grid.4444.0 French National Centre for Scientific Research 2
grid.452266.1 Campus Science Support Facilities 2
grid.1003.2 University of Queensland 2
grid.420377.5 NEC (Japan) 2
grid.420214.1 Sanofi (Germany) 2
grid.419859.8 NEC Corporation of America 2
grid.419318.6 Intel (United States) 2
grid.417521.4 Institute of Molecular Biotechnology 2
grid.29857.31 Pennsylvania State University 2
grid.431532.2 Mesoblast (United States) 2
grid.7429.8 French Institute of Health and Medical Research 2

10. Clinical Trials (clinical trials -> publications)

Using the same method, we can retrieve all clinical trials citing publications from my institution.

[44]:
%dsldocs clinical_trials
dsl_last_results[dsl_last_results['field']=='research_orgs']
[44]:
sources field type description is_filter is_entity is_facet
29 clinical_trials research_orgs organizations GRID organizations involved, e.g. as sponsors ... True True True
[45]:
# Step 1 - same as before - except now we want the query in chunks

clinical_trials_chunks = dsl.query_iterative("""
                  search publications
                    where research_orgs.id in ["grid.1008.9"]
                      and year = 2015
                    return publications[id]
                """).chunks(400)

# Step 2 - almost the same as before - except now we use a for loop to loop through our results
#. We changed 2 things below.  publications was replaced with clinical_trials, and reference_ids was replaced by publication_ids

query_results = []

for c in clinical_trials_chunks:

      mypubslist = json.dumps(list(pd.DataFrame(c).id))

      query_results.append(

                  dsl.query_iterative(f"""
                        search clinical_trials
                            where publication_ids in {mypubslist}
                            return clinical_trials[all]
                        """).as_dataframe()
      )

# Step 3 - join our results back together again, and get rid of duplicates

cited_clinical_trials = pd.concat(query_results).\
   drop_duplicates(subset='id')

cited_clinical_trials.head(2)
Starting iteration with limit=1000 skip=0 ...
0-1000 / 8388 (0.46s)
1000-2000 / 8388 (0.48s)
2000-3000 / 8388 (0.48s)
3000-4000 / 8388 (0.48s)
4000-5000 / 8388 (0.47s)
5000-6000 / 8388 (0.48s)
6000-7000 / 8388 (0.52s)
7000-8000 / 8388 (0.50s)
8000-8388 / 8388 (0.47s)
===
Records extracted: 8388
Starting iteration with limit=1000 skip=0 ...
0-12 / 12 (1.20s)
===
Records extracted: 12
Starting iteration with limit=1000 skip=0 ...
0-18 / 18 (1.36s)
===
Records extracted: 18
Starting iteration with limit=1000 skip=0 ...
0-25 / 25 (1.40s)
===
Records extracted: 25
Starting iteration with limit=1000 skip=0 ...
0-9 / 9 (1.13s)
===
Records extracted: 9
Starting iteration with limit=1000 skip=0 ...
0-8 / 8 (1.06s)
===
Records extracted: 8
Starting iteration with limit=1000 skip=0 ...
0-15 / 15 (1.11s)
===
Records extracted: 15
Starting iteration with limit=1000 skip=0 ...
0-5 / 5 (1.71s)
===
Records extracted: 5
Starting iteration with limit=1000 skip=0 ...
0-8 / 8 (1.01s)
===
Records extracted: 8
Starting iteration with limit=1000 skip=0 ...
0-13 / 13 (1.17s)
===
Records extracted: 13
Starting iteration with limit=1000 skip=0 ...
0-12 / 12 (1.06s)
===
Records extracted: 12
Starting iteration with limit=1000 skip=0 ...
0-14 / 14 (1.02s)
===
Records extracted: 14
Starting iteration with limit=1000 skip=0 ...
0-15 / 15 (1.08s)
===
Records extracted: 15
Starting iteration with limit=1000 skip=0 ...
0-6 / 6 (0.96s)
===
Records extracted: 6
Starting iteration with limit=1000 skip=0 ...
0-6 / 6 (1.00s)
===
Records extracted: 6
Starting iteration with limit=1000 skip=0 ...
0-16 / 16 (1.33s)
===
Records extracted: 16
Starting iteration with limit=1000 skip=0 ...
0-12 / 12 (1.13s)
===
Records extracted: 12
Starting iteration with limit=1000 skip=0 ...
0-10 / 10 (1.23s)
===
Records extracted: 10
Starting iteration with limit=1000 skip=0 ...
0-13 / 13 (1.10s)
===
Records extracted: 13
Starting iteration with limit=1000 skip=0 ...
0-2 / 2 (0.84s)
===
Records extracted: 2
Starting iteration with limit=1000 skip=0 ...
===
Records extracted: 0
Starting iteration with limit=1000 skip=0 ...
0-3 / 3 (0.90s)
===
Records extracted: 3
[45]:
phase investigators abstract active_years FOR RCDC researchers organizations HRCS_HC dimensions_url ... publication_ids category_for HRCS_RAC funder_groups category_hrcs_rac mesh_terms associated_grant_ids category_icrp_ct category_icrp_cso acronym
0 N/A [[Beverley-Ann Biggs, Prof, Contact person for... In this study, we will compare the effects of ... [2010] [{'id': '3177', 'name': '1117 Public Health an... [{'id': '388', 'name': 'Nutrition'}, {'id': '4... [{'id': 'ur.015271425054.80', 'first_name': 'B... [{'id': 'grid.431143.0', 'name': 'National Hea... [{'id': '908', 'name': 'Reproductive Health an... https://app.dimensions.ai/details/clinical_tri... ... [pub.1029788523, pub.1032854231, pub.100861447... [{'id': '3177', 'name': '1117 Public Health an... NaN NaN NaN NaN NaN NaN NaN NaN
1 Phase 1/2 [[Steven Deeks, MD, Principal Investigator, Un... The purpose of this study is to determine the ... [2013, 2014] [{'id': '3114', 'name': '1108 Medical Microbio... [{'id': '533', 'name': 'Infectious Diseases'},... [{'id': 'ur.012534455317.24', 'first_name': 'S... [{'id': 'grid.266102.1', 'name': 'University o... [{'id': '898', 'name': 'Infection'}] https://app.dimensions.ai/details/clinical_tri... ... [pub.1032634520] [{'id': '3114', 'name': '1108 Medical Microbio... [{'id': '10601', 'name': '6.1 Pharmaceuticals'... [{'id': '5', 'name': 'NIH'}, {'id': '25', 'nam... [{'id': '10500', 'name': '5 Development of Tre... [Acquired Immunodeficiency Syndrome, HIV Infec... [grant.2420809] NaN NaN NaN

2 rows × 40 columns

[46]:
%dsldocs clinical_trials
dsl_last_results[dsl_last_results['type']=='organizations']
[46]:
sources field type description is_filter is_entity is_facet
18 clinical_trials funders organizations GRID funding organisations that are involved w... True True True
29 clinical_trials research_orgs organizations GRID organizations involved, e.g. as sponsors ... True True True
[47]:
cited_clinical_trials_orgs = cited_clinical_trials.explode('research_orgs')

cited_clinical_trials_orgs['research_orgs_grid_id'] = cited_clinical_trials_orgs['research_orgs'].\
    apply(lambda g: g['id'] if type(g) == dict else 0 )

cited_clinical_trials_orgs['research_orgs_name'] = cited_clinical_trials_orgs['research_orgs'].\
    apply(lambda g: g['name'] if type(g) == dict else 0 )

cited_clinical_trials_orgs.\
    groupby(['research_orgs_grid_id','research_orgs_name']).\
    agg({'id':'count'}).\
    sort_values(by='id', ascending=False).\
    head(20)
[47]:
id
research_orgs_grid_id research_orgs_name
grid.431143.0 National Health and Medical Research Council 11
grid.1008.9 University of Melbourne 11
grid.416153.4 Royal Melbourne Hospital 6
grid.21107.35 Johns Hopkins University 6
0 0 5
grid.266102.1 University of California, San Francisco 5
grid.1058.c Murdoch Children's Research Institute 5
grid.1055.1 Peter MacCallum Cancer Centre 5
grid.1002.3 Monash University 5
grid.413249.9 Royal Prince Alfred Hospital 4
grid.84393.35 Hospital Universitari i Politècnic La Fe 4
grid.416259.d Royal Women's Hospital 4
grid.419681.3 National Institute of Allergy and Infectious Diseases 4
grid.416100.2 Royal Brisbane and Women's Hospital 4
grid.277151.7 Centre Hospitalier Universitaire de Nantes 4
grid.1003.2 University of Queensland 4
grid.1623.6 The Alfred Hospital 4
grid.5650.6 Academic Medical Center 4
grid.411109.c Virgen del Rocío University Hospital 4
grid.412687.e Ottawa Hospital 3

11. Grants (publications -> grants)

Using the same method, we can retrieve all grants funding publications from my institution.

[48]:
%dsldocs publications
dsl_last_results[dsl_last_results['field'].str.contains('ids')]
[48]:
sources field type description is_filter is_entity is_facet
39 publications reference_ids string Dimensions publication ID for publications in ... True False False
51 publications supporting_grant_ids string Grants supporting a publication, returned as a... True False False
[49]:
# Step 1 - same as before - except now we want the query in chunks

publications = dsl.query_iterative("""
                  search publications
                    where research_orgs.id in ["grid.1008.9"]
                      and year = 2020
                    return publications[id+supporting_grant_ids]
                """).as_dataframe()

# Step 2 - we can get the grants IDs directly from publications this time.
# So as a second step, we want to pull grants metadata using these identifiers.

pubs_grants = publications.explode('supporting_grant_ids')

grants_from_pubs = pd.DataFrame(pubs_grants.supporting_grant_ids.unique()).\
                   dropna().\
                   rename(columns={0:'id'})

query_results = []

for g, gnts in grants_from_pubs.groupby(np.arange(len(grants_from_pubs)) // 250):
          # This does *almost* the same thing as the chunks command used above

      myglist = json.dumps(list(gnts.id))

      query_results.append(

                  dsl.query_iterative(f"""
                        search grants
                            where id in {myglist}
                          return grants[all]
                        """).as_dataframe()
      )

# Step 3 - join our results back together again, and get rid of duplicates

grant_details = pd.concat(query_results).\
   drop_duplicates(subset='id')

grant_details.head(5)
Starting iteration with limit=1000 skip=0 ...
0-1000 / 10740 (2.57s)
1000-2000 / 10740 (1.55s)
2000-3000 / 10740 (1.03s)
3000-4000 / 10740 (0.97s)
4000-5000 / 10740 (1.16s)
5000-6000 / 10740 (2.27s)
6000-7000 / 10740 (0.96s)
7000-8000 / 10740 (0.97s)
8000-9000 / 10740 (0.96s)
9000-10000 / 10740 (1.00s)
10000-10740 / 10740 (0.95s)
===
Records extracted: 10740
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (2.85s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (2.21s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (2.34s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (2.33s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (3.55s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (5.50s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-248 / 248 (3.39s)
===
Records extracted: 248
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (3.06s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (2.95s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (3.08s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (3.34s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (2.94s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (2.87s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (2.85s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (2.86s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (3.12s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (3.11s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-250 / 250 (2.95s)
===
Records extracted: 250
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (2.99s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-249 / 249 (2.90s)
===
Records extracted: 249
Starting iteration with limit=1000 skip=0 ...
0-29 / 29 (0.77s)
===
Records extracted: 29
[49]:
project_num funding_org_name abstract HRCS_RAC FOR funding_jpy researchers funding_nzd original_title funding_cad ... terms foa_number category_for category_bra category_hra category_icrp_ct category_icrp_cso funding_org_acronym funding_org_city category_sdg
0 R01HG010480 National Human Genome Research Institute Abstract Modern genome-wide association studie... [{'id': '10201', 'name': '2.1 Biological and e... [{'id': '2358', 'name': '0104 Statistics'}, {'... 120732088.0 [{'id': 'ur.013651144457.90', 'first_name': 'N... 1675920.0 Robust Methods for Polygenic Analysis to Infor... 1459322.0 ... [robust method, polygenic analysis, analysis, ... PA-18-484 [{'id': '2201', 'name': '01 Mathematical Scien... NaN NaN NaN NaN NaN NaN NaN
1 R21NS107739 National Institute of Neurological Disorders a... ABSTRACT Epilepsy is a devastating neurologica... NaN [{'id': '3120', 'name': '1109 Neurosciences'}] 48433820.0 [{'id': 'ur.01011324163.07', 'first_name': 'Ca... 674575.0 Prediction of seizure lateralization and posto... 589505.0 ... [seizure lateralization, lateralization, posto... PA-18-358 [{'id': '2211', 'name': '11 Medical and Health... [{'id': '4001', 'name': 'Clinical Medicine and... [{'id': '3901', 'name': 'Clinical'}] NaN NaN NaN NaN NaN
2 1158127 National Health and Medical Research Council Genome-wide association studies for psychiatri... [{'id': '10201', 'name': '2.1 Biological and e... [{'id': '2620', 'name': '0604 Genetics'}, {'id... 72444320.0 [{'id': 'ur.01074226355.01', 'first_name': 'Sa... 1008990.0 Using Statistical Genetics to Elucidate the Ef... 881753.0 ... [statistical genetics, genetics, effect, assoc... Not available [{'id': '2620', 'name': '0604 Genetics'}, {'id... [{'id': '4000', 'name': 'Basic Science'}] [{'id': '3900', 'name': 'Biomedical'}] NaN NaN NaN NaN NaN
3 1154217 National Health and Medical Research Council Arthritis and musculoskeletal conditions are a... [{'id': '10201', 'name': '2.1 Biological and e... [{'id': '3053', 'name': '1103 Clinical Science... 54872472.0 [{'id': 'ur.016155464604.94', 'first_name': 'R... 764253.0 Lifestyle management of knee osteoarthritis: c... 667878.0 ... [lifestyle management, management, knee osteoa... Not available [{'id': '3053', 'name': '1103 Clinical Science... [{'id': '4001', 'name': 'Clinical Medicine and... [{'id': '3901', 'name': 'Clinical'}] NaN NaN NaN NaN NaN
4 1159261 National Health and Medical Research Council Preeclampsia and fetal growth restriction are ... NaN [{'id': '3158', 'name': '1114 Paediatrics and ... 36628680.0 [{'id': 'ur.010007620107.38', 'first_name': 'T... 510157.0 New diagnostics to predict preelampsia and fet... 445824.0 ... [new diagnostics, diagnostics, fetal growth re... Not available [{'id': '3158', 'name': '1114 Paediatrics and ... [{'id': '4000', 'name': 'Basic Science'}] [{'id': '3900', 'name': 'Biomedical'}] NaN NaN NaN NaN NaN

5 rows × 58 columns

[50]:
pubs_grants.groupby('supporting_grant_ids').\
    agg({'id':'count'}).\
    reset_index().\
    rename(columns={'id':'pubs','supporting_grant_ids':'id'}).\
    merge(grant_details[['id','original_title','funding_usd']],
          on='id').\
    sort_values(by='pubs', ascending=False)

[50]:
id pubs original_title funding_usd
2213 grant.6711717 35 ARC Centre of Excellence in Exciton Science 22669994.0
1210 grant.3931418 31 ARC Centre of Excellence in Convergent Bio-Nan... 19674412.0
3478 grant.7874297 23 Advancing Nanomedicine through Particle Techno... 617151.0
1146 grant.3860228 22 ENIGMA Center for Worldwide Medicine, Imaging ... 10065774.0
4159 grant.7878111 22 Novel therapies, risk pathways and prevention ... 643236.0
... ... ... ... ...
1953 grant.5498709 1 Non-Invasive Imaging of Glymphatic Clearance: ... 411126.0
1954 grant.5498721 1 Mitochondrial ubiquitin dynamics and apoptotic... 754966.0
1955 grant.5499803 1 EMPOWER: Early Signs Monitoring to Prevent Rel... 1105477.0
1956 grant.5503573 1 Molecular Regulation of AEP during Ageing 3307629.0
5017 grant.8681118 1 Development of molecular markers for applicati... NaN

5018 rows × 4 columns

Why didn’t I use resulting_publication_ids ?

[51]:
%%dsldf

search grants
where resulting_publication_ids in ["pub.1005269097"]
Returned Grants: 3 (total = 3)
Time: 0.46s
WARNINGS [1]
Field 'resulting_publication_ids' is deprecated. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
[51]:
start_date language id original_title title_language title active_year start_year funding_org_name end_date project_num funders
0 2018-12-15 en grant.4320525 Nanoscale X-Ray Imaging and Dynamics of Electr... en Nanoscale X-Ray Imaging and Dynamics of Electr... [2018, 2019, 2020, 2021] 2018 Office of Basic Energy Sciences 2021-12-14 DE-SC0001805 [{'id': 'grid.452988.a', 'types': ['Government...
1 2014-08-15 en grant.3660654 Strain-induced modification of nanoscale mater... en Strain-induced modification of nanoscale mater... [2014, 2015, 2016, 2017, 2018] 2014 Directorate for Mathematical & Physical Sciences 2018-07-31 1411335 [{'id': 'grid.457875.c', 'types': ['Government...
2 2009-08-15 en grant.3100327 Magnetic Transition Metal Nanowires en Magnetic Transition Metal Nanowires [2009, 2010, 2011, 2012, 2013] 2009 Directorate for Mathematical & Physical Sciences 2013-09-30 0906957 [{'id': 'grid.457875.c', 'types': ['Government...

Conclusions

Lists are a simple data structure that can have a great number of applications.

When used in conjuction with the DSL language, they make it easy to concatenate the results of one query with another query e.g. in order to navigate through links available in Dimensions (from publications to grants, patents etc…).

See also this patents tutorial or this clinical trials tutorial for more in-depth applications of the queries discussed above.



Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg