The Dimcli Python library: Magic Commands¶
The purpose of this notebook is to show how to use Dimcli magic commands.
Python magic commands are essentially shortcuts that allow to perform some common operation without having to type much code.
For example, Dimcli magic commands can be used to quickly launch queries or to retrieve API documentation.
Magic commands can be very useful when testing things out e.g. while trying out a new query, or checking what data is available in Dimensions on a certain topic.
[1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
==
CHANGELOG
This notebook was last run on Jul 28, 2023
==
Prerequisites¶
This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.
[2]:
!pip install dimcli --quiet
import dimcli
from dimcli.utils import *
import sys
#
print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
import getpass
KEY = getpass.getpass(prompt='API Key: ')
dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
KEY = ""
dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.6/240.6 kB 6.7 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 80.6 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.1/51.1 kB 5.6 MB/s eta 0:00:00
==
Logging in..
API Key: ··········
Dimcli - Dimensions API Client (v1.1)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.7
Method: manual login
Dimcli ‘magic’ commands¶
Dimcli includes 5 types of magic commands:
%dsl
can be used to run an API query%dslloop
can be used to run an API query, using pagination (= iterations up to 50k records)%dsldf
can be used to run an API query and transform the JSON data to a dataframe%dslloopdf
can be used to run a paginated API query and transform the JSON data to a dataframe%dsldocs
can be used to programmatically extract API schema information
Tip: Accessing data returned by magic queries¶
By default the results of magic command queries are saved into a variable called dsl_last_results
:
[3]:
%dsl search publications for "something" return publications limit 1
type(dsl_last_results)
Returned Publications: 1 (total = 7968398)
Time: 0.55s
WARNINGS [1]
Field current_organization_id of the authors field is deprecated and will be removed in the next major release.
[3]:
dimcli.core.api.DslDataset
Note: a DimCli DslDataset
object is a wrapper around the raw JSON data, which provides various functionalities (eg counting objects, returning dataframes etc..)
[4]:
print(dsl_last_results.publications[0]['title'])
Assessing the pragmatic competence of Arab learners of English: The case of apology
1. Simple queries with %dsl
or %%dsl
¶
These commands allow to run an API query after typing %dsl
.
Moreover, if you press ‘tab’ after the command, one can also take advantage of a custom DSL autocompleter.
These commands are shortcuts for the standard syntax:
dsl = dimcli.Dsl()
dsl.query("...<some dsl query>...")
Single-line version: ``%dsl``
[5]:
%dsl search publications where journal.title="Nature Energy" return publications
Returned Publications: 20 (total = 1705)
Time: 1.31s
WARNINGS [1]
Field current_organization_id of the authors field is deprecated and will be removed in the next major release.
[5]:
<dimcli.DslDataset object #139304202085152. Records: 20/1705>
Multi-line version: ``%%dsl``
You can split the query into multiple lines, only this time you need to use the %%dsl
command (two %
):
[6]:
%%dsl
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications[title]
Returned Publications: 20 (total = 6369)
Time: 0.28s
[6]:
<dimcli.DslDataset object #139304202086784. Records: 20/6369>
Note: the autocompleter is available only with single-line queries.
2. Loop queries with %dslloop
or %%dslloop
¶
This magic command automatically loops over all the pages of a results set, until all possible records have been returned.
This is a short version of the Dimcli.Dsl.query_iterative
method, which takes care of timing queries appropriately and aggregating results within a single object (see the Dimcli Library: Installation and Querying notebook for more details).
Single-line version: ``%dslloop``
[7]:
%dslloop search publications for "malaria AND Egypt" where year=2015 return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 2852 (1.48s)
1000-2000 / 2852 (1.16s)
2000-2852 / 2852 (1.06s)
===
Records extracted: 2852
Warnings: 3
[7]:
<dimcli.DslDataset object #139304216831904. Records: 2852/2852>
Multi-line version: ``%%dslloop``
[8]:
%%dslloop
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 6369 (1.75s)
1000-2000 / 6369 (1.14s)
2000-3000 / 6369 (1.19s)
3000-4000 / 6369 (1.03s)
4000-5000 / 6369 (2.06s)
5000-6000 / 6369 (1.20s)
6000-6369 / 6369 (3.98s)
===
Records extracted: 6369
Warnings: 7
[8]:
<dimcli.DslDataset object #139304216833344. Records: 6369/6369>
Like before, the results of a loop
query are stored into the dsl_last_results
variable.
[9]:
dsl_last_results.stats
[9]:
{'total_count': 6369}
3. Returning dataframes: %dsldf
and %%dsldf
¶
These magic commands are similar to the ones above, only they transform the data directly into Pandas dataframe objects.
Dataframes are then easy to sort, analyse, export as CSV and use within visualisation softwares.
Single-line version: ``%dsldf``
[10]:
%dsldf search publications where journal.id="jour.1136447" return publications
Returned Publications: 20 (total = 1705)
Time: 0.26s
WARNINGS [1]
Field current_organization_id of the authors field is deprecated and will be removed in the next major release.
[10]:
id | title | authors | pages | type | year | journal.id | journal.title | issue | volume | |
---|---|---|---|---|---|---|---|---|---|---|
0 | pub.1162679092 | Climate change impacts on planned supply–deman... | [{'affiliations': [{'city': 'Beijing', 'city_i... | 1-11 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
1 | pub.1162678127 | Unequal residential heating burden caused by c... | [{'affiliations': [{'city': 'Beijing', 'city_i... | 1-10 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
2 | pub.1161701679 | Sodium-ion batteries: capturing and reducing d... | [{'affiliations': [{'city': 'Daejeon', 'city_i... | 1-2 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
3 | pub.1160834246 | Insights into advanced models for energy pover... | [{'affiliations': [{'city': 'Ljubljana', 'city... | 1-3 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
4 | pub.1160826991 | Using narratives to infer preferences in under... | [{'affiliations': [{'city': 'Zurich', 'city_id... | 1-13 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
5 | pub.1160822084 | Contextualizing coal communities for Australia... | [{'affiliations': [{'city': 'Canberra', 'city_... | 1-3 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
6 | pub.1160811717 | Connecting women in the hydrogen world | [{'affiliations': [{'city': 'Berlin', 'city_id... | 1-1 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
7 | pub.1160805012 | Silicon solar cells step up | [{'affiliations': [{'city': 'Sydney', 'city_id... | 1-2 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
8 | pub.1160649649 | Identifying the intrinsic anti-site defect in ... | [{'affiliations': [{'city': 'Beijing', 'city_i... | 1-9 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
9 | pub.1160568081 | Engineering relaxors by entropy for high energ... | [{'affiliations': [{'city': 'Beijing', 'city_i... | 1-9 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
10 | pub.1160403680 | 2D/3D heterojunction engineering at the buried... | [{'affiliations': [{'city': 'Chongqing', 'city... | 1-10 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
11 | pub.1160394361 | Diversifying the solvent | [{'affiliations': [{'city': 'Singapore', 'city... | 1-2 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
12 | pub.1160392655 | Understanding hydrogen electrocatalysis by pro... | [{'affiliations': [{'city': 'Boston', 'city_id... | 1-11 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
13 | pub.1160386348 | High-entropy electrolytes for practical lithiu... | [{'affiliations': [{'city': 'Stanford', 'city_... | 1-13 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
14 | pub.1160383454 | Increasing the reach of low-income energy prog... | [{'affiliations': [{'city': 'Chicago', 'city_i... | 1-9 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
15 | pub.1160378064 | A Li-rich layered oxide cathode with negligibl... | [{'affiliations': [{'city': 'Hong Kong', 'city... | 1-10 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
16 | pub.1160324074 | Reduction of bulk and surface defects in inver... | [{'affiliations': [{'city': 'Wuhan', 'city_id'... | 1-11 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
17 | pub.1160323839 | Addendum to: Understanding environmental trade... | [{'affiliations': [{'city': 'Freiburg', 'city_... | 1-2 | article | 2023 | jour.1136447 | Nature Energy | NaN | NaN |
18 | pub.1162678453 | Devices for Li-mediated synthesis | [{'affiliations': [{'city': None, 'city_id': N... | 641-641 | article | 2023 | jour.1136447 | Nature Energy | 7 | 8 |
19 | pub.1162678348 | Granularity and green recovery | [{'affiliations': [{'city': None, 'city_id': N... | 642-642 | article | 2023 | jour.1136447 | Nature Energy | 7 | 8 |
Multi-line version ``%%dsldf``
You can split the query into multiple lines, only this time you need to use the %%dsldf
command (two %
):
[11]:
%%dsldf
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications[title+year+times_cited] sort by times_cited
Returned Publications: 20 (total = 6369)
Time: 0.26s
[11]:
title | times_cited | year | |
---|---|---|---|
0 | Asymmetric Supercapacitors Using 3D Nanoporous... | 849 | 2015 |
1 | Brain Intelligence: Go beyond Artificial Intel... | 819 | 2017 |
2 | CH3NH3Sn x Pb(1–x)I3 Perovskite Solar Cells Co... | 818 | 2014 |
3 | Highly Luminescent Phase-Stable CsPbI3 Perovsk... | 697 | 2017 |
4 | Improved Understanding of the Electronic and E... | 558 | 2014 |
5 | Long Noncoding RNA NEAT1-Dependent SFPQ Reloca... | 533 | 2014 |
6 | Pt‐Free Counter Electrode for Dye‐Sensitized S... | 506 | 2014 |
7 | Flexible Graphene-Based Supercapacitors: A Review | 481 | 2016 |
8 | Hierarchical Gaussian Descriptor for Person Re... | 468 | 2016 |
9 | Comparative study of ceramic and single crysta... | 396 | 2013 |
10 | Motor Anomaly Detection for Unmanned Aerial Ve... | 392 | 2017 |
11 | Implementation of Super-Twisting Control: Supe... | 358 | 2016 |
12 | Underwater image dehazing using joint trilater... | 354 | 2014 |
13 | Development of X-ray-induced afterglow charact... | 314 | 2014 |
14 | Fermi-level-dependent charge-to-spin current c... | 293 | 2016 |
15 | Colloidal Synthesis of Air-Stable Alloyed CsSn... | 290 | 2017 |
16 | Thermal diodes, regulators, and switches: Phys... | 287 | 2017 |
17 | Photoelectrochemical CO2 reduction by a p-type... | 278 | 2016 |
18 | Hole-Conductor-Free, Metal-Electrode-Free TiO2... | 252 | 2014 |
19 | Low illumination underwater light field images... | 235 | 2018 |
Note: the autocompleter is available only with single-line queries.
4. Looped dataframe queries: %dslloopdf
and %%dslloopdf
¶
These commands behave just like the dataframes magics above, only they trigger an iterative query that will attempt to extract all records available for a chosen DSL query up to the maximum limit of 50k.
[12]:
%dslloopdf search publications for "malaria AND Egypt" where year=2015 return publications
Starting iteration with limit=1000 skip=0 ...
0-1000 / 2852 (1.68s)
1000-2000 / 2852 (1.21s)
2000-2852 / 2852 (3.93s)
===
Records extracted: 2852
Warnings: 3
[12]:
id | title | type | year | authors | pages | volume | issue | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | pub.1154679158 | The Sound of the Sundial | book | 2015 | NaN | NaN | NaN | NaN | NaN | NaN |
1 | pub.1154675682 | VI The Witch Savitri | chapter | 2015 | [{'affiliations': [], 'corresponding': '', 'cu... | 93-126 | NaN | NaN | NaN | NaN |
2 | pub.1142494539 | Literatur | chapter | 2015 | [{'affiliations': [], 'corresponding': '', 'cu... | 473-520 | NaN | NaN | NaN | NaN |
3 | pub.1142492136 | Lexikon der Mensch-Tier-Beziehungen | book | 2015 | NaN | NaN | Band 1 | NaN | NaN | NaN |
4 | pub.1142474104 | Die Erforschung der Kolonien, Expeditionen und... | book | 2015 | NaN | NaN | Band 75 | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2847 | pub.1000633746 | Zika Virus | chapter | 2015 | [{'affiliations': [{'city': 'Tampa', 'city_id'... | 477-500 | NaN | NaN | NaN | NaN |
2848 | pub.1000392107 | Chapter 4 Mitigation | chapter | 2015 | [{'affiliations': [], 'corresponding': '', 'cu... | 224-274 | NaN | NaN | NaN | NaN |
2849 | pub.1000250057 | The chemistry and biological activities of nat... | article | 2015 | [{'affiliations': [{'city': 'Buea', 'city_id':... | 26580-26595 | 5 | 34 | jour.1046724 | RSC Advances |
2850 | pub.1000241832 | Peacekeeping and the Rule of Law: Challenges P... | chapter | 2015 | [{'affiliations': [{'city': 'New York City', '... | 59-73 | NaN | NaN | NaN | NaN |
2851 | pub.1000058849 | Handbook of Sustainable Luxury Textiles and Fa... | book | 2015 | NaN | NaN | NaN | NaN | NaN | NaN |
2852 rows × 10 columns
5. Getting API schema documentation with %dsldocs
¶
The %dsldocs
magic prints out information about the fields and entities available via the Dimensions Search Language. This command returns a tabular version of the data model specs online (in case you are interested, this is possible thanks to the describe DSL command).
For example, if you pass a source name like grants
, what you get back is a nice table showing all fields available for that source.
[13]:
%dsldocs grants
[13]:
sources | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | grants | abstract | string | Abstract or summary from a grant proposal. | False | False | False |
1 | grants | active_year | integer | List of active years for a grant. | True | False | True |
2 | grants | category_bra | categories | `Broad Research Areas <https://dimensions.fres... | True | True | True |
3 | grants | category_for | categories | ANZSRC Fields of Research classification (alia... | True | True | True |
4 | grants | category_for_2008 | categories | `ANZSRC Fields of Research classification <htt... | True | True | True |
5 | grants | category_for_2020 | categories | `ANZSRC Fields of Research classification <htt... | True | True | True |
6 | grants | category_hra | categories | `Health Research Areas <https://dimensions.fre... | True | True | True |
7 | grants | category_hrcs_hc | categories | `HRCS - Health Categories <https://dimensions.... | True | True | True |
8 | grants | category_hrcs_rac | categories | `HRCS – Research Activity Codes <https://dimen... | True | True | True |
9 | grants | category_icrp_cso | categories | `ICRP Common Scientific Outline <https://dimen... | True | True | True |
10 | grants | category_icrp_ct | categories | `ICRP Cancer Types <https://dimensions.freshde... | True | True | True |
11 | grants | category_rcdc | categories | `Research, Condition, and Disease Categorizati... | True | True | True |
12 | grants | category_sdg | categories | SDG - Sustainable Development Goals | True | True | True |
13 | grants | category_uoa | categories | `Units of Assessment <https://dimensions.fresh... | True | True | True |
14 | grants | concepts | json | Concepts describing the main topics of a publi... | True | False | False |
15 | grants | concepts_scores | json | Relevancy scores for `concepts`. | True | False | False |
16 | grants | date_inserted | date | Date when the record was inserted into Dimensi... | True | False | False |
17 | grants | dimensions_url | string | Link pointing to the Dimensions web application | False | False | False |
18 | grants | end_date | date | Date when the grant ends. | True | False | False |
19 | grants | foa_number | string | The funding opportunity announcement (FOA) num... | True | False | False |
20 | grants | funder_org_acronym | string | None | True | False | True |
21 | grants | funder_org_cities | cities | City name for funding organisation. | True | True | True |
22 | grants | funder_org_countries | countries | The country linked to the organisation funding... | True | True | True |
23 | grants | funder_org_name | string | Name of funding organisation. | True | False | True |
24 | grants | funder_org_states | states | State name for funding organisation. | True | True | True |
25 | grants | funder_orgs | organizations | The organisation funding the grant. This is no... | True | True | True |
26 | grants | funding_aud | float | Funding amount awarded in AUD. | True | False | False |
27 | grants | funding_cad | float | Funding amount awarded in CAD. | True | False | False |
28 | grants | funding_chf | float | Funding amount awarded in CHF. | True | False | False |
29 | grants | funding_cny | float | Funding amount awarded in CNY. | True | False | False |
30 | grants | funding_currency | string | Original funding currency. | True | False | True |
31 | grants | funding_eur | float | Funding amount awarded in EUR. | True | False | False |
32 | grants | funding_gbp | float | Funding amount awarded in GBP. | True | False | False |
33 | grants | funding_jpy | float | Funding amount awarded in JPY. | True | False | False |
34 | grants | funding_nzd | float | Funding amount awarded in NZD. | True | False | False |
35 | grants | funding_schemes | string | Information that the data sources provide rega... | True | False | False |
36 | grants | funding_usd | float | Funding amount awarded in USD. | True | False | False |
37 | grants | id | string | Dimensions grant ID. | True | False | False |
38 | grants | investigators | json | Additional details about investigators, includ... | True | False | False |
39 | grants | keywords | string | Keywords provided by the original data source. | True | False | True |
40 | grants | language | string | Grant original language, as ISO 639-1 language... | True | False | True |
41 | grants | language_title | string | ISO 639-1 language code for the original grant... | True | False | True |
42 | grants | linkout | string | Original URL for the grant. | False | False | False |
43 | grants | original_title | string | Title of the grant in its original language. | False | False | False |
44 | grants | project_numbers | json | Grant identifiers, as provided by the source (... | True | False | False |
45 | grants | research_org_cities | cities | City of the research organisations receiving t... | True | True | True |
46 | grants | research_org_countries | countries | Country of the research organisations receivin... | True | True | True |
47 | grants | research_org_names | string | Names of organizations investigators are affil... | True | False | False |
48 | grants | research_org_state_codes | states | State of the organisations receiving the grant... | True | True | True |
49 | grants | research_orgs | organizations | GRID organisations receiving the grant (note: ... | True | True | True |
50 | grants | researchers | researchers | Dimensions researchers IDs associated to the g... | True | True | True |
51 | grants | score | float | For full-text queries, the relevance score is ... | True | False | False |
52 | grants | start_date | date | Date when the grant starts, in the format 'YYY... | True | False | False |
53 | grants | start_year | integer | Year when the grant starts. | True | False | True |
54 | grants | title | string | Title of the grant in English (if the grant la... | False | False | False |
Similarly, for objects of type ‘Entity’ eg countries
[14]:
%dsldocs countries
[14]:
entities | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | countries | id | string | GeoNames country code (eg 'US' for `geonames:6... | True | False | False |
1 | countries | name | string | GeoNames country name. | True | False | False |
But don’t worry if you don’t get it right: if you pass a wrong object name, the full list of available sources and entities is printed.
[15]:
%dsldocs unknown
Can't recognize this object. Dimcli knows about:
Sources=[clinical_trials - datasets - funder_groups - grants - organizations - patents - policy_documents - publications - reports - research_org_groups - researchers - source_titles] Entities=[categories - cities - countries - journals - open_access - publication_links - repositories - states]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-15-e3d3c8c65656> in <cell line: 1>()
----> 1 get_ipython().run_line_magic('dsldocs', 'unknown')
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
2416 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2417 with self.builtin_trap:
-> 2418 result = fn(*args, **kwargs)
2419 return result
2420
<decorator-gen-126> in dsldocs(self, line, cell)
/usr/local/lib/python3.10/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/usr/local/lib/python3.10/dist-packages/dimcli/jupyter/magics.py in dsldocs(self, line, cell)
449 d = {header: [], 'field': [], 'type': [], 'description':[], 'is_filter':[], 'is_entity': [], 'is_facet':[],}
450 for S in docs_for:
--> 451 for x in sorted(res.json[header][S]['fields']):
452 d[header] += [S]
453 d['field'] += [x]
KeyError: 'unknown'
Finally, if no object is requested, the full documentation for all the sources gets returned.
[16]:
%dsldocs
[16]:
sources | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | clinical_trials | abstract | string | Abstract or description of the clinical trial. | False | False | False |
1 | clinical_trials | acronym | string | Acronym of the clinical trial. | True | False | False |
2 | clinical_trials | active_years | integer | List of active years for a clinical trial. | True | False | True |
3 | clinical_trials | altmetric | float | Altmetric Attention Score. | True | False | False |
4 | clinical_trials | associated_grant_ids | string | Dimensions IDs of the grants associated to the... | True | False | False |
... | ... | ... | ... | ... | ... | ... | ... |
400 | source_titles | sjr | float | SJR indicator (SCImago Journal Rank). This ind... | True | False | False |
401 | source_titles | snip | float | SNIP indicator (source normalized impact per p... | True | False | False |
402 | source_titles | start_year | integer | Year when the source started publishing. | True | False | True |
403 | source_titles | title | string | The title of the source. | False | False | False |
404 | source_titles | type | string | The source type: one of `book_series`, `procee... | True | False | True |
405 rows × 7 columns
Note
The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.