The Dimcli Python library: Magic Commands¶
The purpose of this notebook is to show how to use Dimcli magic commands.
Python magic commands are essentially shortcuts that allow to perform some common operation without having to type much code.
For example, Dimcli magic commands can be used to quickly launch queries or to retrieve API documentation.
Magic commands can be very useful when testing things out e.g. while trying out a new query, or checking what data is available in Dimensions on a certain topic.
[1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
==
CHANGELOG
This notebook was last run on Jan 24, 2022
==
Prerequisites¶
This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.
[1]:
!pip install dimcli --quiet
import dimcli
from dimcli.utils import *
import sys
#
print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
import getpass
KEY = getpass.getpass(prompt='API Key: ')
dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
KEY = ""
dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
Searching config file credentials for 'https://app.dimensions.ai' endpoint..
==
Logging in..
Dimcli - Dimensions API Client (v0.9.6)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0
Method: dsl.ini file
Dimcli ‘magic’ commands¶
Dimcli includes 5 types of magic commands:
%dsl
can be used to run an API query%dslloop
can be used to run an API query, using pagination (= iterations up to 50k records)%dsldf
can be used to run an API query and transform the JSON data to a dataframe%dslloopdf
can be used to run a paginated API query and transform the JSON data to a dataframe%dsldocs
can be used to programmatically extract API schema information
Tip: Accessing data returned by magic queries¶
By default the results of magic command queries are saved into a variable called dsl_last_results
:
[4]:
%dsl search publications for "something" return publications limit 1
type(dsl_last_results)
Returned Publications: 1 (total = 6938991)
Time: 1.87s
[4]:
dimcli.core.api.DslDataset
Note: a DimCli DslDataset
object is a wrapper around the raw JSON data, which provides various functionalities (eg counting objects, returning dataframes etc..)
[5]:
print(dsl_last_results.publications[0]['title'])
Which Factor Influences Environmental Care Characters More: Knowledge of Issue or Demographic Factors?
1. Simple queries with %dsl
or %%dsl
¶
These commands allow to run an API query after typing %dsl
.
Moreover, if you press ‘tab’ after the command, one can also take advantage of a custom DSL autocompleter.
These commands are shortcuts for the standard syntax:
dsl = dimcli.Dsl()
dsl.query("...<some dsl query>...")
Single-line version: ``%dsl``
[6]:
%dsl search publications where journal.title="Nature Energy" return publications
Returned Publications: 20 (total = 1359)
Time: 0.84s
[6]:
<dimcli.DslDataset object #4423808816. Records: 20/1359>
Multi-line version: ``%%dsl``
You can split the query into multiple lines, only this time you need to use the %%dsl
command (two %
):
[7]:
%%dsl
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications[title]
Returned Publications: 20 (total = 5807)
Time: 1.98s
[7]:
<dimcli.DslDataset object #4561410032. Records: 20/5807>
Note: the autocompleter is available only with single-line queries.
2. Loop queries with %dslloop
or %%dslloop
¶
This magic command automatically loops over all the pages of a results set, until all possible records have been returned.
This is a short version of the Dimcli.Dsl.query_iterative
method, which takes care of timing queries appropriately and aggregating results within a single object (see the Dimcli Library: Installation and Querying notebook for more details).
Single-line version: ``%dslloop``
[8]:
%dslloop search publications for "malaria AND Egypt" where year=2015 return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 2699 (3.15s)
1000-2000 / 2699 (2.91s)
2000-2699 / 2699 (2.84s)
===
Records extracted: 2699
[8]:
<dimcli.DslDataset object #4423455024. Records: 2699/2699>
Multi-line version: ``%%dslloop``
[9]:
%%dslloop
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 5807 (1.73s)
1000-2000 / 5807 (1.61s)
2000-3000 / 5807 (1.58s)
3000-4000 / 5807 (1.55s)
4000-5000 / 5807 (2.25s)
5000-5807 / 5807 (1.47s)
===
Records extracted: 5807
[9]:
<dimcli.DslDataset object #4561410992. Records: 5807/5807>
Like before, the results of a loop
query are stored into the dsl_last_results
variable.
[10]:
dsl_last_results.stats
[10]:
{'total_count': 5807}
3. Returning dataframes: %dsldf
and %%dsldf
¶
These magic commands are similar to the ones above, only they transform the data directly into Pandas dataframe objects.
Dataframes are then easy to sort, analyse, export as CSV and use within visualisation softwares.
Single-line version: ``%dsldf``
[11]:
%dsldf search publications where journal.id="jour.1136447" return publications
Returned Publications: 20 (total = 1359)
Time: 19.09s
[11]:
authors | id | pages | title | type | year | journal.id | journal.title | issue | volume | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Zurich', 'city_id... | pub.1144714324 | 1-10 | Techno-economic analysis of renewable fuels fo... | article | 2022 | jour.1136447 | Nature Energy | NaN | NaN |
1 | [{'affiliations': [{'city': 'Stanford', 'city_... | pub.1144625544 | 1-13 | Rational solvent molecule tuning for high-perf... | article | 2022 | jour.1136447 | Nature Energy | NaN | NaN |
2 | [{'affiliations': [{'city': 'Shanghai', 'city_... | pub.1144466459 | 1-9 | Toxic potency-adjusted control of air pollutio... | article | 2022 | jour.1136447 | Nature Energy | NaN | NaN |
3 | [{'affiliations': [{'city': 'State College', '... | pub.1144465259 | 1-7 | Integrated hydrological, power system and econ... | article | 2022 | jour.1136447 | Nature Energy | NaN | NaN |
4 | [{'affiliations': [{'city': 'Taiyuan', 'city_i... | pub.1144361028 | 1-10 | Fuel cells with an operational range of –20 °C... | article | 2022 | jour.1136447 | Nature Energy | NaN | NaN |
5 | [{'affiliations': [{'city': 'Waterloo', 'city_... | pub.1144359921 | 1-11 | High areal capacity, long cycle life 4 V ceram... | article | 2022 | jour.1136447 | Nature Energy | NaN | NaN |
6 | [{'affiliations': [{'city': 'Chapel Hill', 'ci... | pub.1144054488 | 1-9 | Evolution of defects during the degradation of... | article | 2021 | jour.1136447 | Nature Energy | NaN | NaN |
7 | [{'affiliations': [{'city': 'Darmstadt', 'city... | pub.1144037103 | 1-2 | Whittling iridium down to size | article | 2021 | jour.1136447 | Nature Energy | NaN | NaN |
8 | [{'affiliations': [{'city': 'Dalian', 'city_id... | pub.1143966904 | 1154-1163 | Ti1–graphene single-atom material for improved... | article | 2021 | jour.1136447 | Nature Energy | 12 | 6 |
9 | [{'affiliations': [{'city': 'Kyoto', 'city_id'... | pub.1143964787 | 1176-1187 | Overcoming humidity-induced swelling of graphe... | article | 2021 | jour.1136447 | Nature Energy | 12 | 6 |
10 | [{'affiliations': [{'city': 'Canberra', 'city_... | pub.1143932020 | 1-12 | Energy insecurity during temperature extremes ... | article | 2021 | jour.1136447 | Nature Energy | NaN | NaN |
11 | [{'affiliations': [{'city': 'Erlangen', 'city_... | pub.1143931976 | 1-9 | A bilayer conducting polymer structure for pla... | article | 2021 | jour.1136447 | Nature Energy | NaN | NaN |
12 | [{'affiliations': [{'city': 'Leeds', 'city_id'... | pub.1143836040 | 1188-1197 | Characterizing the energy use of disabled peop... | article | 2021 | jour.1136447 | Nature Energy | 12 | 6 |
13 | [{'affiliations': [{'city': 'Ulsan', 'city_id'... | pub.1143836012 | 1164-1175 | Subnano-sized silicon anode via crystal growth... | article | 2021 | jour.1136447 | Nature Energy | 12 | 6 |
14 | [{'affiliations': [{'city': 'Geneva', 'city_id... | pub.1143835741 | 1-9 | Integration of prosumer peer-to-peer trading d... | article | 2021 | jour.1136447 | Nature Energy | NaN | NaN |
15 | [{'affiliations': [{'city': 'Berlin', 'city_id... | pub.1143833707 | 1-9 | An open-access database and analysis tool for ... | article | 2021 | jour.1136447 | Nature Energy | NaN | NaN |
16 | [{'affiliations': [{'city': 'Davis', 'city_id'... | pub.1143833650 | 1-2 | A dataquake for solar cells | article | 2021 | jour.1136447 | Nature Energy | NaN | NaN |
17 | [{'affiliations': [{'city': 'Dresden', 'city_i... | pub.1143833612 | 1092-1093 | Upscaling sub-nano-sized silicon particles | article | 2021 | jour.1136447 | Nature Energy | 12 | 6 |
18 | [{'affiliations': [{'city': 'Galway', 'city_id... | pub.1143780310 | 1094-1095 | Cooking fuel switch or mix | article | 2021 | jour.1136447 | Nature Energy | 12 | 6 |
19 | [{'affiliations': [{'city': 'Messina', 'city_i... | pub.1143745033 | 1096-1097 | Fluorine-doping boosts performance | article | 2021 | jour.1136447 | Nature Energy | 12 | 6 |
Multi-line version ``%%dsldf``
You can split the query into multiple lines, only this time you need to use the %%dsldf
command (two %
):
[12]:
%%dsldf
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications[title+year+times_cited] sort by times_cited
Returned Publications: 20 (total = 5807)
Time: 0.58s
[12]:
times_cited | title | year | |
---|---|---|---|
0 | 744 | Asymmetric Supercapacitors Using 3D Nanoporous... | 2015 |
1 | 710 | CH3NH3Sn x Pb(1–x)I3 Perovskite Solar Cells Co... | 2014 |
2 | 564 | Brain Intelligence: Go beyond Artificial Intel... | 2017 |
3 | 503 | Highly Luminescent Phase-Stable CsPbI3 Perovsk... | 2017 |
4 | 457 | Improved Understanding of the Electronic and E... | 2014 |
5 | 456 | Long Noncoding RNA NEAT1-Dependent SFPQ Reloca... | 2014 |
6 | 403 | Hierarchical Gaussian Descriptor for Person Re... | 2016 |
7 | 387 | Flexible Graphene-Based Supercapacitors: A Review | 2016 |
8 | 320 | Comparative study of ceramic and single crysta... | 2013 |
9 | 318 | Underwater image dehazing using joint trilater... | 2014 |
10 | 296 | Motor Anomaly Detection for Unmanned Aerial Ve... | 2017 |
11 | 263 | Implementation of Super-Twisting Control: Supe... | 2016 |
12 | 222 | Hole-Conductor-Free, Metal-Electrode-Free TiO2... | 2014 |
13 | 218 | Recent Progress of Counter Electrode Catalysts... | 2014 |
14 | 216 | Colloidal Synthesis of Air-Stable Alloyed CsSn... | 2017 |
15 | 209 | Low illumination underwater light field images... | 2018 |
16 | 203 | Photoelectrochemical CO2 reduction by a p-type... | 2016 |
17 | 203 | Fermi-level-dependent charge-to-spin current c... | 2016 |
18 | 193 | Wound intensity correction and segmentation wi... | 2016 |
19 | 192 | Low-Temperature and Solution-Processed Amorpho... | 2015 |
Note: the autocompleter is available only with single-line queries.
4. Looped dataframe queries: %dslloopdf
and %%dslloopdf
¶
These commands behave just like the dataframes magics above, only they trigger an iterative query that will attempt to extract all records available for a chosen DSL query up to the maximum limit of 50k.
[13]:
%dslloopdf search publications for "malaria AND Egypt" where year=2015 return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 2699 (2.18s)
1000-2000 / 2699 (2.01s)
2000-2699 / 2699 (2.58s)
===
Records extracted: 2699
[13]:
authors | id | pages | title | type | year | volume | issue | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1142494539 | 473-520 | Literatur | chapter | 2015 | NaN | NaN | NaN | NaN |
1 | NaN | pub.1142492136 | NaN | Lexikon der Mensch-Tier-Beziehungen | book | 2015 | Band 1 | NaN | NaN | NaN |
2 | NaN | pub.1142474104 | NaN | Die Erforschung der Kolonien, Expeditionen und... | book | 2015 | Band 75 | NaN | NaN | NaN |
3 | NaN | pub.1142467346 | NaN | Vom Geist des Bauches, Für eine Philosophie de... | book | 2015 | NaN | NaN | NaN | NaN |
4 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1142370689 | 1-19 | Introduction | chapter | 2015 | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2694 | [{'affiliations': [{'city': 'Tampa', 'city_id'... | pub.1000633746 | 477-500 | Zika Virus | chapter | 2015 | NaN | NaN | NaN | NaN |
2695 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1000392107 | 224-274 | Chapter 4 Mitigation | chapter | 2015 | NaN | NaN | NaN | NaN |
2696 | [{'affiliations': [{'city': 'Buea', 'city_id':... | pub.1000250057 | 26580-26595 | The chemistry and biological activities of nat... | article | 2015 | 5 | 34 | jour.1046724 | RSC Advances |
2697 | [{'affiliations': [{'city': 'New York City', '... | pub.1000241832 | 59-73 | Peacekeeping and the Rule of Law: Challenges P... | chapter | 2015 | NaN | NaN | NaN | NaN |
2698 | NaN | pub.1000058849 | NaN | Handbook of Sustainable Luxury Textiles and Fa... | book | 2015 | NaN | NaN | NaN | NaN |
2699 rows × 10 columns
5. Getting API schema documentation with %dsldocs
¶
The %dsldocs
magic prints out information about the fields and entities available via the Dimensions Search Language. This command returns a tabular version of the data model specs online (in case you are interested, this is possible thanks to the describe DSL command).
For example, if you pass a source name like grants
, what you get back is a nice table showing all fields available for that source.
[14]:
%dsldocs grants
[14]:
sources | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | grants | abstract | string | Abstract or summary from a grant proposal. | False | False | False |
1 | grants | active_year | integer | List of active years for a grant. | True | False | True |
2 | grants | category_bra | categories | `Broad Research Areas <https://dimensions.fres... | True | True | True |
3 | grants | category_for | categories | `ANZSRC Fields of Research classification <htt... | True | True | True |
4 | grants | category_hra | categories | `Health Research Areas <https://dimensions.fre... | True | True | True |
5 | grants | category_hrcs_hc | categories | `HRCS - Health Categories <https://dimensions.... | True | True | True |
6 | grants | category_hrcs_rac | categories | `HRCS – Research Activity Codes <https://dimen... | True | True | True |
7 | grants | category_icrp_cso | categories | `ICRP Common Scientific Outline <https://dimen... | True | True | True |
8 | grants | category_icrp_ct | categories | `ICRP Cancer Types <https://dimensions.freshde... | True | True | True |
9 | grants | category_rcdc | categories | `Research, Condition, and Disease Categorizati... | True | True | True |
10 | grants | category_sdg | categories | SDG - Sustainable Development Goals | True | True | True |
11 | grants | category_uoa | categories | `Units of Assessment <https://dimensions.fresh... | True | True | True |
12 | grants | concepts | json | Concepts describing the main topics of a publi... | True | False | False |
13 | grants | concepts_scores | json | Relevancy scores for `concepts`. | True | False | False |
14 | grants | date_inserted | date | Date when the record was inserted into Dimensi... | True | False | False |
15 | grants | dimensions_url | string | Link pointing to the Dimensions web application | False | False | False |
16 | grants | end_date | date | Date when the grant ends. | True | False | False |
17 | grants | foa_number | string | The funding opportunity announcement (FOA) num... | True | False | False |
18 | grants | funder_countries | countries | The country linked to the organisation funding... | True | True | True |
19 | grants | funders | organizations | The organisation funding the grant. This is no... | True | True | True |
20 | grants | funding_aud | float | Funding amount awarded in AUD. | True | False | False |
21 | grants | funding_cad | float | Funding amount awarded in CAD. | True | False | False |
22 | grants | funding_chf | float | Funding amount awarded in CHF. | True | False | False |
23 | grants | funding_currency | string | Original funding currency. | True | False | True |
24 | grants | funding_eur | float | Funding amount awarded in EUR. | True | False | False |
25 | grants | funding_gbp | float | Funding amount awarded in GBP. | True | False | False |
26 | grants | funding_jpy | float | Funding amount awarded in JPY. | True | False | False |
27 | grants | funding_nzd | float | Funding amount awarded in NZD. | True | False | False |
28 | grants | funding_org_acronym | string | Acronym for funding organisation. | True | False | True |
29 | grants | funding_org_city | string | City name for funding organisation. | True | False | True |
30 | grants | funding_org_name | string | Name of funding organisation. | True | False | True |
31 | grants | funding_usd | float | Funding amount awarded in USD. | True | False | False |
32 | grants | grant_number | string | Grant identifier, as provided by the source (e... | True | False | False |
33 | grants | id | string | Dimensions grant ID. | True | False | False |
34 | grants | investigators | json | Additional details about investigators, includ... | True | False | False |
35 | grants | language | string | Grant original language, as ISO 639-1 language... | True | False | True |
36 | grants | language_title | string | ISO 639-1 language code for the original grant... | True | False | True |
37 | grants | linkout | string | Original URL for the grant. | False | False | False |
38 | grants | original_title | string | Title of the grant in its original language. | False | False | False |
39 | grants | research_org_cities | cities | City of the research organisations receiving t... | True | True | True |
40 | grants | research_org_countries | countries | Country of the research organisations receivin... | True | True | True |
41 | grants | research_org_names | string | Names of organizations investigators are affil... | True | False | False |
42 | grants | research_org_state_codes | states | State of the organisations receiving the grant... | True | True | True |
43 | grants | research_orgs | organizations | GRID organisations receiving the grant (note: ... | True | True | True |
44 | grants | researchers | researchers | Dimensions researchers IDs associated to the g... | True | True | True |
45 | grants | start_date | date | Date when the grant starts, in the format 'YYY... | True | False | False |
46 | grants | start_year | integer | Year when the grant starts. | True | False | True |
47 | grants | title | string | Title of the grant in English (if the grant la... | False | False | False |
Similarly, for objects of type ‘Entity’ eg countries
[15]:
%dsldocs countries
[15]:
entities | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | countries | id | string | GeoNames country code (eg 'US' for `geonames:6... | True | False | False |
1 | countries | name | string | GeoNames country name. | True | False | False |
But don’t worry if you don’t get it right: if you pass a wrong object name, the full list of available sources and entities is printed.
[17]:
%dsldocs unknown
Can't recognize this object. Dimcli knows about:
Sources=[clinical_trials - datasets - grants - organizations - patents - policy_documents - publications - reports - researchers - source_titles] Entities=[categories - cities - countries - journals - open_access - publication_links - states]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/zk/bxslv_1d01b983n6l5ky91b80000gn/T/ipykernel_28928/3474323623.py in <module>
----> 1 get_ipython().run_line_magic('dsldocs', 'unknown')
~/Envs/jupyterlab/lib/python3.9/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
2362 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2363 with self.builtin_trap:
-> 2364 result = fn(*args, **kwargs)
2365 return result
2366
~/Envs/jupyterlab/lib/python3.9/site-packages/decorator.py in fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/Envs/jupyterlab/lib/python3.9/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/Envs/jupyterlab/lib/python3.9/site-packages/dimcli/jupyter/magics.py in dsldocs(self, line)
364 d = {header: [], 'field': [], 'type': [], 'description':[], 'is_filter':[], 'is_entity': [], 'is_facet':[],}
365 for S in docs_for:
--> 366 for x in sorted(res.json[header][S]['fields']):
367 d[header] += [S]
368 d['field'] += [x]
KeyError: 'unknown'
Finally, if no object is requested, the full documentation for all the sources gets returned.
[18]:
%dsldocs
[18]:
sources | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | clinical_trials | abstract | string | Abstract or description of the clinical trial. | False | False | False |
1 | clinical_trials | acronym | string | Acronym of the clinical trial. | True | False | False |
2 | clinical_trials | active_years | integer | List of active years for a clinical trial. | True | False | True |
3 | clinical_trials | altmetric | float | Altmetric Attention Score. | True | False | False |
4 | clinical_trials | associated_grant_ids | string | Dimensions IDs of the grants associated to the... | True | False | False |
... | ... | ... | ... | ... | ... | ... | ... |
349 | source_titles | sjr | float | SJR indicator (SCImago Journal Rank). This ind... | True | False | False |
350 | source_titles | snip | float | SNIP indicator (source normalized impact per p... | True | False | False |
351 | source_titles | start_year | integer | Year when the source started publishing. | True | False | True |
352 | source_titles | title | string | The title of the source. | False | False | False |
353 | source_titles | type | string | The source type: one of `book_series`, `procee... | True | False | True |
354 rows × 7 columns
[ ]:
Note
The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.