../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Exploring The Dimensions Search Language (DSL) - Quick Intro

This Notebook takes you through the basics of using the Dimensions API.

In this tutorial we leverage the capabilities of the Dimcli library in the context of Jupyter Notebooks. Dimcli is an open source Python library that simplifies common operations like logging in, querying and displaying results.

Prerequisites

This notebook assumes you have installed the Dimcli library and are familiar with the Getting Started tutorial.

[1]:
!pip install dimcli -U --quiet
[2]:
username = ""
password = ""
endpoint = "https://app.dimensions.ai"

# import all libraries and login
import dimcli
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()
Dimcli - Dimensions API Client (v0.6.9)
Connected to endpoint: https://app.dimensions.ai - DSL version: 1.24
Method: dsl.ini file

What the query statistics refer to

When performing a DSL search, a _stats object is return which contains some useful info eg the total number of records available for a search.

[3]:
res1 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications""", verbose=False)
print(res1.stats) # PS this is short for `res.json['_stats'])`
{'total_count': 3769}

It is important to note though that the total number always refers to the main source one is searching for, not necessarily the results being returned. For example, in this query we return researchers linked to publications:

[4]:
res2 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers""", verbose=False)
print(res2.stats)
{'total_count': 3769}

Still 3815 records! That’s because the total count always refers to the main object type one is searching for, not to the facet being returned.

Tip: this basic information about objects returned is also available via the count_batch and count_total methods of the query results object.

[5]:
result = dsl.query("""
     search publications
       for "malaria AND congo"
     return publications[basics]
     limit 30
""", verbose=False)
# print some stats using the Result object
print("Results in this batch: ", result.count_batch)
print("Results in total: ", result.count_total)
print("Errors: ",result.errors)
Results in this batch:  30
Results in total:  66828
Errors:  None

Working with fields

Note: in the following examples we use the magic command %%dsldf for quicker querying.

Control the fields you return

[6]:
%%dsldf

search publications
return publications[id+title+year+doi]
limit 5
Returned Publications: 5 (total = 109848482)
[6]:
title doi year id
0 Visual research on the trustability of classic... 10.15672/hujms.630402 2020 pub.1125931386
1 5. ‘Martyrs of Love’. Genesis, Development and... 10.1515/9789048540211-008 2020 pub.1125801610
2 Introduction: Murra, Materialism, Anthropology... 10.7591/9781501734977-002 2020 pub.1125788851
3 22. Structure and application of the slanting ... 10.7591/9781501737688-031 2020 pub.1125789246
4 4. Perpetual Contest 10.1515/9789048540211-007 2020 pub.1125801609

Make a mistake, and the DSL will tell you what fields that you could have used

[7]:
%%dsldf

search publications
return publications[dois]
limit 100
Returned Errors: 1
Semantic Error
Semantic errors found:
        Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,altmetric,altmetric_id,author_affiliations,authors,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,concepts,date,date_inserted,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,linkout,mesh_terms,open_access,open_access_categories,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,references,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,resulting_publication_doi,supporting_grant_ids,terms,times_cited,title,type,volume,year and available fieldsets: all,basics,book,categories,extras

Get all fields

[8]:
%%dsldf

search publications
  for "malaria"
return publications[all]
limit 1
Returned Publications: 1 (total = 756489)
WARNINGS [10]
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'category_ua' is deprecated in favor of category_uoa. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR_first' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_HC' is deprecated in favor of category_hrcs_hc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
[8]:
doi open_access_categories volume date date_inserted recent_citations type category_sdg altmetric year ... pmcid title research_org_country_names FOR_first pages FOR pmid id journal.id journal.title
0 10.1080/16549716.2019.1711335 [{'id': 'oa_all', 'description': 'Article is f... 13 2020-12-31 2020-01-21 0 article [] 13.0 2020 ... PMC7006634 The gender responsiveness of social marketing ... [Switzerland] [{'id': '2211', 'name': '11 Medical and Health... 1711335 [{'id': '3177', 'name': '1117 Public Health an... 31955668 pub.1124196727 jour.1041075 Global Health Action

1 rows × 51 columns

..or search for a researcher by a specific id

[12]:
%%dsldf

search publications
where researchers.id = "ur.013514345521.07"
return publications[doi+researchers]
limit 1
Returned Publications: 1 (total = 15)
[12]:
doi researchers
0 10.12928/telkomnika.v17i5.12802 [{'id': 'ur.013505711524.10', 'first_name': 'R...

Sources VS Facets

One of the queries above is using the researchers facet of the publications source.

In general source-queries can return up to 1000 records. For example this throws an exception:

[13]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 2000
  """)
Returned Errors: 1
Semantic Error
Semantic errors found:
        Limit 2000 exceeds maximum allowed limit 1000
[13]:
<dimcli.DslDataset object #4733978448. Errors: 1>

You can paginate through source results up to 50000 rows

With sources, you can use the limit/skip syntax in order to paginate through results:

[14]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 1000 skip 1000
  """)
Returned Publications: 1000 (total = 3769)
[14]:
<dimcli.DslDataset object #4748413712. Records: 1000/3769>

You can return max 1000 facet rows

It is important to remember that when using facets you cannot use the skip operation so the maximum number of records is always 1000.

[15]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1 skip 1000
  """)
Returned Errors: 1
Semantic Error
Semantic errors found:
        Offset is not supported for facet results
[15]:
<dimcli.DslDataset object #4748450064. Errors: 1>

While this works…

[16]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1000
  """)
Returned Researchers: 1000
[16]:
<dimcli.DslDataset object #4753798096. Records: 1000/3769>

Just make a mistake, and you will ge the complete list of available facets

[17]:
dsl.query("""
search publications
return years
""")
Returned Errors: 1
Semantic Error
Semantic errors found:
        Facet 'years' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year
[17]:
<dimcli.DslDataset object #4753799888. Errors: 1>


Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg