The Dimcli Python library: Magic Commands¶
The purpose of this notebook is to show how to use Dimcli magic commands.
Python magic commands are essentially shortcuts that allow to perform some common operation without having to type much code.
For example, Dimcli magic commands can be used to quickly launch queries or to retrieve API documentation.
Magic commands can be very useful when testing things out e.g. while trying out a new query, or checking what data is available in Dimensions on a certain topic.
Prerequisites¶
This notebook assumes you have installed the Dimcli library and are familiar with the Getting Started tutorial.
[1]:
!pip install dimcli --quiet
import dimcli
from dimcli.utils import *
import sys
#
print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
import getpass
KEY = getpass.getpass(prompt='API Key: ')
dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
KEY = ""
dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
==
Logging in..
Dimcli - Dimensions API Client (v0.8.2)
Connected to: https://app.dimensions.ai - DSL v1.28
Method: dsl.ini file
Prerequisites¶
This notebook assumes you have installed the Dimcli library and are familiar with the Getting Started tutorial.
[2]:
# !pip install dimcli -U --quiet
[3]:
# username = ""
# password = ""
# endpoint = "https://app.dimensions.ai"
# # import all libraries and login
# import dimcli
# dimcli.login(username, password, endpoint)
# dsl = dimcli.Dsl()
Dimcli ‘magic’ commands¶
Dimcli includes 5 types of magic commands:
%dsl
can be used to run an API query%dslloop
can be used to run an API query, using pagination (= iterations up to 50k records)%dsldf
can be used to run an API query and transform the JSON data to a dataframe%dslloopdf
can be used to run a paginated API query and transform the JSON data to a dataframe%dsldocs
can be used to programmatically extract API schema information
Tip: Accessing data returned by magic queries¶
By default the results of magic command queries are saved into a variable called dsl_last_results
:
[4]:
%dsl search publications for "something" return publications limit 1
type(dsl_last_results)
Returned Publications: 1 (total = 6055557)
Time: 1.18s
[4]:
dimcli.core.api.DslDataset
Note: a DimCli DslDataset
object is a wrapper around the raw JSON data, which provides various functionalities (eg counting objects, returning dataframes etc..)
[5]:
print(dsl_last_results.publications[0]['title'])
The Prosimetrum Form 1: Verses as the Voice of the Past
1. Simple queries with %dsl
or %%dsl
¶
These commands allow to run an API query after typing %dsl
.
Moreover, if you press ‘tab’ after the command, one can also take advantage of a custom DSL autocompleter.
These commands are shortcuts for the standard syntax:
dsl = dimcli.Dsl()
dsl.query("...<some dsl query>...")
Single-line version: ``%dsl``
[6]:
%dsl search publications where journal.title="Nature Energy" return publications
Returned Publications: 20 (total = 1082)
Time: 0.70s
WARNINGS [1]
Please review your query, as it contains an entity filter (journal.title) that can lead to incomplete results. More details on https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields
[6]:
<dimcli.DslDataset object #4398528256. Records: 20/1082>
Multi-line version: ``%%dsl``
You can split the query into multiple lines, only this time you need to use the %%dsl
command (two %
):
[7]:
%%dsl
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications[title]
Returned Publications: 20 (total = 3727)
Time: 0.49s
[7]:
<dimcli.DslDataset object #4398527008. Records: 20/3727>
Note: the autocompleter is available only with single-line queries.
2. Loop queries with %dslloop
or %%dslloop
¶
This magic command automatically loops over all the pages of a results set, until all possible records have been returned.
This is a short version of the Dimcli.Dsl.query_iterative
method, which takes care of timing queries appropriately and aggregating results within a single object (see the Dimcli Library: Installation and Querying notebook for more details).
Single-line version: ``%dslloop``
[8]:
%dslloop search publications for "malaria AND Egypt" where year=2015 return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 2496 (4.16s)
1000-2000 / 2496 (1.73s)
2000-2496 / 2496 (0.91s)
===
Records extracted: 2496
[8]:
<dimcli.DslDataset object #4658682560. Records: 2496/2496>
Multi-line version: ``%%dslloop``
[9]:
%%dslloop
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 3727 (2.05s)
1000-2000 / 3727 (4.34s)
2000-3000 / 3727 (2.01s)
3000-3727 / 3727 (1.71s)
===
Records extracted: 3727
[9]:
<dimcli.DslDataset object #4398004736. Records: 3727/3727>
Like before, the results of a loop
query are stored into the dsl_last_results
variable.
[10]:
dsl_last_results.stats
[10]:
{'total_count': 3727}
3. Returning dataframes: %dsldf
and %%dsldf
¶
These magic commands are similar to the ones above, only they transform the data directly into Pandas dataframe objects.
Dataframes are then easy to sort, analyse, export as CSV and use within visualisation softwares.
Single-line version: ``%dsldf``
[11]:
%dsldf search publications where journal.id="jour.1136447" return publications
Returned Publications: 20 (total = 1082)
Time: 0.49s
[11]:
title | pages | author_affiliations | year | id | type | journal.id | journal.title | issue | volume | |
---|---|---|---|---|---|---|---|---|---|---|
0 | The role of exciton lifetime for charge genera... | 1-9 | [[{'first_name': 'Andrej', 'last_name': 'Class... | 2020 | pub.1130455861 | article | jour.1136447 | Nature Energy | NaN | NaN |
1 | A global analysis of the progress and failure ... | 1-8 | [[{'first_name': 'Galina', 'last_name': 'Alova... | 2020 | pub.1130455686 | article | jour.1136447 | Nature Energy | NaN | NaN |
2 | Effects of technology complexity on the emerge... | 1-11 | [[{'first_name': 'Kavita', 'last_name': 'Suran... | 2020 | pub.1130455730 | article | jour.1136447 | Nature Energy | NaN | NaN |
3 | The short-term costs of local content requirem... | 1-9 | [[{'first_name': 'Benedict', 'last_name': 'Pro... | 2020 | pub.1130456280 | article | jour.1136447 | Nature Energy | NaN | NaN |
4 | How to split an exciton | 1-2 | [[{'first_name': 'Tracey M.', 'last_name': 'Cl... | 2020 | pub.1130456254 | article | jour.1136447 | Nature Energy | NaN | NaN |
5 | Operando decoding of chemical and thermal even... | 1-10 | [[{'first_name': 'Jiaqiang', 'last_name': 'Hua... | 2020 | pub.1130291960 | article | jour.1136447 | Nature Energy | NaN | NaN |
6 | Molecularly engineered photocatalyst sheet for... | 1-8 | [[{'first_name': 'Qian', 'last_name': 'Wang', ... | 2020 | pub.1130290805 | article | jour.1136447 | Nature Energy | NaN | NaN |
7 | Sacrificing nothing to reduce CO2 | 1-2 | [[{'first_name': 'Tuo', 'last_name': 'Wang', '... | 2020 | pub.1130292228 | article | jour.1136447 | Nature Energy | NaN | NaN |
8 | Benefits and costs of a utility-ownership busi... | 1-9 | [[{'first_name': 'Galen', 'last_name': 'Barbos... | 2020 | pub.1130148818 | article | jour.1136447 | Nature Energy | NaN | NaN |
9 | Realizing high zinc reversibility in rechargea... | 1-7 | [[{'first_name': 'Lin', 'last_name': 'Ma', 'co... | 2020 | pub.1130097386 | article | jour.1136447 | Nature Energy | NaN | NaN |
10 | Challenges and prospects for negawatt trading ... | 1-8 | [[{'first_name': 'Wayes', 'last_name': 'Tushar... | 2020 | pub.1130043666 | article | jour.1136447 | Nature Energy | NaN | NaN |
11 | Diagnosing and correcting anode-free cell fail... | 1-10 | [[{'first_name': 'A. J.', 'last_name': 'Louli'... | 2020 | pub.1130003672 | article | jour.1136447 | Nature Energy | NaN | NaN |
12 | Molecular engineering of dispersed nickel phth... | 1-9 | [[{'first_name': 'Xiao', 'last_name': 'Zhang',... | 2020 | pub.1130002178 | article | jour.1136447 | Nature Energy | NaN | NaN |
13 | Five thermal energy grand challenges for decar... | 1-3 | [[{'first_name': 'Asegun', 'last_name': 'Henry... | 2020 | pub.1130002495 | article | jour.1136447 | Nature Energy | NaN | NaN |
14 | Quantification beyond expenditure | 1-2 | [[{'first_name': 'Harriet', 'last_name': 'Thom... | 2020 | pub.1129934526 | article | jour.1136447 | Nature Energy | NaN | NaN |
15 | Impacts of climate change on energy systems in... | 1-9 | [[{'first_name': 'Seleshi G.', 'last_name': 'Y... | 2020 | pub.1129832206 | article | jour.1136447 | Nature Energy | NaN | NaN |
16 | Leaving the competition in its wake | 555-556 | [[{'first_name': 'Ian D.', 'last_name': 'Broad... | 2020 | pub.1129666495 | article | jour.1136447 | Nature Energy | 8 | 5 |
17 | Understanding and applying coulombic efficienc... | 561-568 | [[{'first_name': 'Jie', 'last_name': 'Xiao', '... | 2020 | pub.1128743948 | article | jour.1136447 | Nature Energy | 8 | 5 |
18 | A holistic approach to interface stabilization... | 596-604 | [[{'first_name': 'Zonghao', 'last_name': 'Liu'... | 2020 | pub.1129487556 | article | jour.1136447 | Nature Energy | 8 | 5 |
19 | Energy justice towards racial justice | 551-551 | NaN | 2020 | pub.1130097517 | article | jour.1136447 | Nature Energy | 8 | 5 |
Multi-line version ``%%dsldf``
You can split the query into multiple lines, only this time you need to use the %%dsldf
command (two %
):
[12]:
%%dsldf
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications[title+year+times_cited] sort by times_cited
Returned Publications: 20 (total = 3727)
Time: 0.50s
[12]:
year | times_cited | title | |
---|---|---|---|
0 | 2014 | 504 | CH3NH3SnxPb(1-x)I3 Perovskite Solar Cells Cove... |
1 | 2015 | 503 | Asymmetric Supercapacitors Using 3D Nanoporous... |
2 | 2016 | 323 | Hierarchical Gaussian Descriptor for Person Re... |
3 | 2018 | 322 | Brain Intelligence: Go beyond Artificial Intel... |
4 | 2014 | 318 | Improved understanding of the electronic and e... |
5 | 2014 | 248 | Underwater image dehazing using joint trilater... |
6 | 2013 | 247 | Comparative study of ceramic and single crysta... |
7 | 2016 | 236 | Flexible Graphene-Based Supercapacitors: A Review |
8 | 2017 | 202 | Highly Luminescent Phase-Stable CsPbI3 Perovsk... |
9 | 2014 | 187 | Recent Progress of Counter Electrode Catalysts... |
10 | 2018 | 183 | Motor Anomaly Detection for Unmanned Aerial Ve... |
11 | 2018 | 177 | Low illumination underwater light field images... |
12 | 2014 | 173 | Hole-Conductor-Free, Metal-Electrode-Free TiO2... |
13 | 2016 | 159 | Implementation of Super-Twisting Control: Supe... |
14 | 2013 | 149 | Study of rare-earth-doped scintillators |
15 | 2015 | 142 | Low-Temperature and Solution-Processed Amorpho... |
16 | 2015 | 126 | Insight into Perovskite Solar Cells Based on S... |
17 | 2014 | 125 | All-Solid Perovskite Solar Cells with HOCO-R-N... |
18 | 2016 | 117 | Photoelectrochemical CO2 reduction by a p-type... |
19 | 2016 | 117 | Fermi-level-dependent charge-to-spin current c... |
Note: the autocompleter is available only with single-line queries.
4. Looped dataframe queries: %dslloopdf
and %%dslloopdf
¶
These commands behave just like the dataframes magics above, only they trigger an iterative query that will attempt to extract all records available for a chosen DSL query up to the maximum limit of 50k.
[13]:
%dslloopdf search publications for "malaria AND Egypt" where year=2015 return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 2496 (3.11s)
1000-2000 / 2496 (1.83s)
2000-2496 / 2496 (0.90s)
===
Records extracted: 2496
[13]:
type | volume | pages | author_affiliations | id | year | title | issue | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | chapter | 65 | 1002-1012 | [[{'first_name': '', 'last_name': 'UN', 'corre... | pub.1090179052 | 2015 | Population | NaN | NaN | NaN |
1 | chapter | NaN | 127-136 | NaN | pub.1007697326 | 2015 | D | NaN | NaN | NaN |
2 | chapter | 65 | 902-936 | [[{'first_name': '', 'last_name': 'UN', 'corre... | pub.1090180227 | 2015 | International trade, finance and transport | NaN | NaN | NaN |
3 | chapter | 65 | 753-785 | [[{'first_name': '', 'last_name': 'UN', 'corre... | pub.1090179484 | 2015 | Human rights country situations | NaN | NaN | NaN |
4 | chapter | NaN | 1-80 | NaN | pub.1086527362 | 2015 | Part I. Introduction to Applied Mathematics | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2491 | chapter | NaN | 101-147 | [[{'first_name': 'A.A.', 'last_name': 'Gajadha... | pub.1028227767 | 2015 | 6 Foodborne apicomplexan protozoa Coccidia | NaN | NaN | NaN |
2492 | book | 16 | NaN | NaN | pub.1042689035 | 2015 | Sustainable Agriculture Reviews, Cereals | NaN | NaN | NaN |
2493 | chapter | 88 | 165-241 | [[{'first_name': 'Rafael', 'last_name': 'Toled... | pub.1033609880 | 2015 | Chapter Five Strongyloidiasis with Emphasis on... | NaN | NaN | NaN |
2494 | book | 4 | NaN | NaN | pub.1009592139 | 2015 | Urban Vulnerability and Climate Change in Afri... | NaN | NaN | NaN |
2495 | book | 335 | NaN | NaN | pub.1008794359 | 2015 | Proceedings of Fourth International Conference... | NaN | NaN | NaN |
2496 rows × 10 columns
5. Getting API schema documentation with %dsldocs
¶
The %dsldocs
magic prints out information about the fields and entities available via the Dimensions Search Language. This command returns a tabular version of the data model specs online (in case you are interested, this is possible thanks to the describe DSL command).
For example, if you pass a source name like grants
, what you get back is a nice table showing all fields available for that source.
[14]:
%dsldocs grants
[14]:
sources | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | grants | abstract | string | Abstract or summary from a grant proposal. | False | False | False |
1 | grants | active_year | integer | List of active years for a grant. | True | False | True |
2 | grants | category_bra | categories | `Broad Research Areas <https://dimensions.fres... | True | True | True |
3 | grants | category_for | categories | `ANZSRC Fields of Research classification <htt... | True | True | True |
4 | grants | category_hra | categories | `Health Research Areas <https://dimensions.fre... | True | True | True |
5 | grants | category_hrcs_hc | categories | `HRCS - Health Categories <https://dimensions.... | True | True | True |
6 | grants | category_hrcs_rac | categories | `HRCS – Research Activity Codes <https://dimen... | True | True | True |
7 | grants | category_icrp_cso | categories | `ICRP Common Scientific Outline <https://dimen... | True | True | True |
8 | grants | category_icrp_ct | categories | `ICRP Cancer Types <https://dimensions.freshde... | True | True | True |
9 | grants | category_rcdc | categories | `Research, Condition, and Disease Categorizati... | True | True | True |
10 | grants | category_sdg | categories | SDG - Sustainable Development Goals | True | True | True |
11 | grants | category_uoa | categories | `Units of Assessment <https://dimensions.fresh... | True | True | True |
12 | grants | concepts | string | Concepts describing the main topics of a grant... | False | False | False |
13 | grants | date_inserted | date | Date when the record was inserted into Dimensi... | True | False | False |
14 | grants | dimensions_url | string | Link pointing to the Dimensions web application | False | False | False |
15 | grants | end_date | date | Date when the grant ends. | True | False | False |
16 | grants | foa_number | string | The funding opportunity announcement (FOA) num... | True | False | False |
17 | grants | funder_countries | countries | The country linked to the organisation funding... | True | True | True |
18 | grants | funders | organizations | The organisation funding the grant. This is no... | True | True | True |
19 | grants | funding_aud | float | Funding amount awarded in AUD. | True | False | False |
20 | grants | funding_cad | float | Funding amount awarded in CAD. | True | False | False |
21 | grants | funding_chf | float | Funding amount awarded in CHF. | True | False | False |
22 | grants | funding_currency | string | Original funding currency. | True | False | True |
23 | grants | funding_eur | float | Funding amount awarded in EUR. | True | False | False |
24 | grants | funding_gbp | float | Funding amount awarded in GBP. | True | False | False |
25 | grants | funding_jpy | float | Funding amount awarded in JPY. | True | False | False |
26 | grants | funding_nzd | float | Funding amount awarded in NZD. | True | False | False |
27 | grants | funding_org_acronym | string | Acronym for funding organisation. | True | False | True |
28 | grants | funding_org_city | string | City name for funding organisation. | True | False | True |
29 | grants | funding_org_name | string | Name of funding organisation. | True | False | True |
30 | grants | funding_usd | float | Funding amount awarded in USD. | True | False | False |
31 | grants | grant_number | string | Grant identifier, as provided by the source (e... | True | False | False |
32 | grants | id | string | Dimensions grant ID. | True | False | False |
33 | grants | investigator_details | json | Additional details about investigators, includ... | True | False | False |
34 | grants | language | string | Grant original language, as ISO 639-1 language... | True | False | True |
35 | grants | language_title | string | ISO 639-1 language code for the original grant... | True | False | True |
36 | grants | linkout | string | Original URL for the grant. | False | False | False |
37 | grants | original_title | string | Title of the grant in its original language. | False | False | False |
38 | grants | research_org_cities | cities | City of the research organisations receiving t... | True | True | True |
39 | grants | research_org_countries | countries | Country of the research organisations receivin... | True | True | True |
40 | grants | research_org_names | string | Names of organizations investigators are affil... | True | False | False |
41 | grants | research_org_state_codes | states | State of the organisations receiving the grant... | True | True | True |
42 | grants | research_orgs | organizations | GRID organisations receiving the grant (note: ... | True | True | True |
43 | grants | researchers | researchers | Dimensions researchers IDs associated to the g... | True | True | True |
44 | grants | start_date | date | Date when the grant starts, in the format 'YYY... | True | False | False |
45 | grants | start_year | integer | Year when the grant starts. | True | False | True |
46 | grants | title | string | Title of the grant in English (if the grant la... | False | False | False |
Similarly, for objects of type ‘Entity’ eg countries
[15]:
%dsldocs countries
[15]:
entities | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | countries | id | string | GeoNames country code (eg 'US' for `geonames:6... | True | False | False |
1 | countries | name | string | GeoNames country name. | True | False | False |
But don’t worry if you don’t get it right: if you pass a wrong object name, the full list of available sources and entities is printed.
[16]:
%dsldocs unknown
Can't recognize this object. Dimcli knows about:
Sources=[publications - grants - patents - clinical_trials - policy_documents - researchers - organizations - datasets] Entities=[categories - cities - countries - journals - org_groups - states - publication_links - open_access]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-16-e3d3c8c65656> in <module>
----> 1 get_ipython().run_line_magic('dsldocs', 'unknown')
~/Envs/jupyterlab/lib/python3.8/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
2324 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2325 with self.builtin_trap:
-> 2326 result = fn(*args, **kwargs)
2327 return result
2328
<decorator-gen-130> in dsldocs(self, line)
~/Envs/jupyterlab/lib/python3.8/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/Envs/jupyterlab/lib/python3.8/site-packages/dimcli/jupyter/magics.py in dsldocs(self, line)
144 d = {header: [], 'field': [], 'type': [], 'description':[], 'is_filter':[], 'is_entity': [], 'is_facet':[],}
145 for S in docs_for:
--> 146 for x in sorted(res.json[header][S]['fields']):
147 d[header] += [S]
148 d['field'] += [x]
KeyError: 'unknown'
Finally, if no object is requested, the full documentation for all the sources gets returned.
[17]:
%dsldocs
[17]:
sources | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | publications | altmetric | float | Altmetric attention score. | True | False | False |
1 | publications | altmetric_id | integer | AltMetric Publication ID | True | False | False |
2 | publications | authors | json | Ordered list of authors names and their affili... | True | False | False |
3 | publications | book_doi | string | The DOI of the book a chapter belongs to (note... | True | False | False |
4 | publications | book_series_title | string | The title of the book series book, belong to. | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... |
286 | datasets | research_org_states | states | State of the organisations the publication aut... | True | True | True |
287 | datasets | research_orgs | organizations | GRID organisations linked to the publication a... | True | True | True |
288 | datasets | researchers | researchers | Dimensions researchers IDs associated to the d... | True | True | True |
289 | datasets | title | string | Title of the dataset. | False | False | False |
290 | datasets | year | integer | Year of publication of the dataset. | True | False | True |
291 rows × 7 columns
Note
The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.