../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Exploring The Dimensions Search Language (DSL) - Quick Intro

This Notebook takes you through the basics of using the Dimensions API.

In this tutorial we leverage the capabilities of the Dimcli library in the context of Jupyter Notebooks. Dimcli is an open source Python library that simplifies common operations like logging in, querying and displaying results.

Prerequisites

This notebook assumes you have installed the Dimcli library and are familiar with the Getting Started tutorial.

[1]:
!pip install dimcli -U --quiet

import dimcli
from dimcli.shortcuts import *
import sys

print("==\nLogging in..")
# https://github.com/digital-science/dimcli#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  USERNAME = getpass.getpass(prompt='Username: ')
  PASSWORD = getpass.getpass(prompt='Password: ')
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
else:
  USERNAME, PASSWORD  = "", ""
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
dsl = dimcli.Dsl()
==
Logging in..
Dimcli - Dimensions API Client (v0.7.4.2)
Connected to: https://app.dimensions.ai - DSL v1.27
Method: dsl.ini file

What the query statistics refer to

When performing a DSL search, a _stats object is return which contains some useful info eg the total number of records available for a search.

[2]:
res1 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications""", verbose=False)
print(res1.stats) # PS this is short for `res.json['_stats'])`
{'total_count': 3727}

It is important to note though that the total number always refers to the main source one is searching for, not necessarily the results being returned. For example, in this query we return researchers linked to publications:

[3]:
res2 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers""", verbose=False)
print(res2.stats)
{'total_count': 3727}

Still 3815 records! That’s because the total count always refers to the main object type one is searching for, not to the facet being returned.

Tip: this basic information about objects returned is also available via the count_batch and count_total methods of the query results object.

[4]:
result = dsl.query("""
     search publications
       for "malaria AND congo"
     return publications[basics]
     limit 30
""", verbose=False)
# print some stats using the Result object
print("Results in this batch: ", result.count_batch)
print("Results in total: ", result.count_total)
print("Errors: ",result.errors)
Results in this batch:  30
Results in total:  71812
Errors:  None

Working with fields

Note: in the following examples we use the magic command %%dsldf for quicker querying.

Control the fields you return

[5]:
%%dsldf

search publications
return publications[id+title+year+doi]
limit 5
Returned Publications: 5 (total = 112275334)
Time: 1.41s
[5]:
title year id doi
0 Literature 2020 pub.1125632078 10.1515/9783110823547-013
1 To start or to complete? – Challenges in imple... 2020 pub.1124099280 10.1080/16549716.2019.1704540
2 Long-term trends in seasonality of mortality i... 2020 pub.1124649186 10.1080/16549716.2020.1717411
3 Eine Warnung an alle, dy sych etwaz duncken: D... 2020 pub.1125632729 10.1515/9783110950762-012
4 Marienklagen und Pietà 2020 pub.1125635978 10.1515/9783110922035-011

Make a mistake, and the DSL will tell you what fields that you could have used

[6]:
%%dsldf

search publications
return publications[dois]
limit 100
Returned Errors: 1
Time: 0.45s
Semantic Error
Semantic errors found:
        Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,altmetric,altmetric_id,author_affiliations,authors,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,concepts,concepts_scores,date,date_inserted,dimensions_url,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,linkout,mesh_terms,open_access,open_access_categories,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,referenced_pubs,references,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,resulting_publication_doi,supporting_grant_ids,terms,times_cited,title,type,volume,year and available fieldsets: all,basics,book,categories,extras

Get all fields

[7]:
%%dsldf

search publications
  for "malaria"
return publications[all]
limit 1
Returned Publications: 1 (total = 786126)
Time: 0.92s
WARNINGS [10]
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'category_ua' is deprecated in favor of category_uoa. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_HC' is deprecated in favor of category_hrcs_hc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR_first' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
[7]:
title terms type recent_citations referenced_pubs year category_ua research_org_countries research_org_cities linkout ... category_uoa publisher RCDC category_hra altmetric_id concepts pmid research_org_state_names journal.id journal.title
0 Long-term trends in seasonality of mortality i... [patterns, mortality, Sub-Saharan Africa, chan... article 1 [{'id': 'pub.1070577469', 'doi': '10.2307/4148... 2020 [{'id': '30002', 'name': 'A02 Public Health, H... [{'id': 'US', 'name': 'United States'}, {'id':... [{'id': 2792073, 'name': 'Louvain-la-Neuve'}, ... https://www.tandfonline.com/doi/pdf/10.1080/16... ... [{'id': '30002', 'name': 'A02 Public Health, H... Taylor & Francis [{'id': '547', 'name': 'Pediatric'}] [{'id': '3903', 'name': 'Population & Society'}] 75135566 [cause-specific mortality, cause mortality, ep... 32027239 [New Jersey] jour.1041075 Global Health Action

1 rows × 51 columns

..or search for a researcher by a specific id

[11]:
%%dsldf

search publications
where researchers.id = "ur.013514345521.07"
return publications[doi+researchers]
limit 1
Returned Publications: 1 (total = 16)
Time: 0.48s
[11]:
doi researchers
0 10.1038/s41385-020-0334-2 [{'id': 'ur.015441462403.62', 'last_name': 'Be...

Sources VS Facets

One of the queries above is using the researchers facet of the publications source.

In general source-queries can return up to 1000 records. For example this throws an exception:

[12]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 2000
  """)
Returned Errors: 1
Time: 0.45s
Semantic Error
Semantic errors found:
        Limit 2000 exceeds maximum allowed limit 1000
[12]:
<dimcli.DslDataset object #4523680624. Errors: 1>

You can paginate through source results up to 50000 rows

With sources, you can use the limit/skip syntax in order to paginate through results:

[13]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 1000 skip 1000
  """)
Returned Publications: 1000 (total = 3727)
Time: 2.38s
[13]:
<dimcli.DslDataset object #4506154128. Records: 1000/3727>

You can return max 1000 facet rows

It is important to remember that when using facets you cannot use the skip operation so the maximum number of records is always 1000.

[14]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1 skip 1000
  """)
Returned Errors: 1
Time: 0.44s
Semantic Error
Semantic errors found:
        Offset is not supported for facet results
[14]:
<dimcli.DslDataset object #4770861936. Errors: 1>

While this works…

[15]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1000
  """)
Returned Researchers: 1000
Time: 1.64s
[15]:
<dimcli.DslDataset object #4523681536. Records: 1000/3727>

Just make a mistake, and you will ge the complete list of available facets

[16]:
dsl.query("""
search publications
return years
""")
Returned Errors: 1
Time: 0.46s
Semantic Error
Semantic errors found:
        Facet 'years' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,referenced_pubs,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year
[16]:
<dimcli.DslDataset object #4779131712. Errors: 1>


Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg