The Dimcli Python library: Installation and Querying¶
The purpose of this notebook is to show how to use Dimcli. Dimcli is an open source Python client for accessing the Dimensions Analytics API. It makes it easier to authenticate against the API, send queries to it and process the JSON data being returned.
Dimcli includes also a command line interface (CLI) that aims at simplifying the process of learning the grammar of the Dimensions Search Language (DSL). Running
dimcli from the terminal opens up an interactive query console with syntax autocomplete, persistent history across sessions, pretty-printing and preview of JSON results, export to HTML and CSV, and more.
You can install DimCli as follows from a Jupyter notebook:
!pip install dimcli -U
Then each time you want to use it within a notebook you can load it like this:
There are different ways to authenticate with the Dimensions API using DimCli. The easiest is passing your credentials explicilty like this:
dimcli.login(username="email@example.com", password="my-secret-password", endpoint="https://your-url.dimensions.ai")
NOTE if you use a key instead of username and password to authenticate, you’d do the following:
This method could be handy if you quickly want to login and cannot save a credentials file. However, this method is not ideal if you want to protect your credentials, especially within a shared environment.
More secure method: storing a private credentials file¶
DimCli allows you to store your access credentials (e.g. email and password) in a file on your computer, so that you don’t have to type it in each time.
If you have access to a terminal prompt, you can use Dimcli’s setup assistant to automatically create the API credentials file (for more info see also the docs):
Alternatively, if you don’t have access to a terminal prompt, you can create the credentials file manually as follows:
create a file called
dsl.iniin the uppermost directory where your notebooks are located
open the file and add your credentials, making sure that
the text structure is exactly the same as below (in particular, don’t change the
instance.livedirective unless you know what you’re doing!)
you update the login and password fields as needed!
This is what the
dsl.ini file should look like:
[instance.live] url=https://app.dimensions.ai firstname.lastname@example.org password=yourpasswordhere
Then, all you have to do is:
Dimcli - Dimensions API Client (v0.6.9) Connected to endpoint: https://app.dimensions.ai - DSL version: 1.24 Method: dsl.ini file
Dimcli provides a few handy shortcuts for querying the API.
dsl = dimcli.Dsl() data = dsl.query("""search publications for "black holes" return publications""")
Returned Publications: 20 (total = 1561596)
By default, Dimcli prints out a short statement with info about the query. You can turn that off by passing the argument
data = dsl.query("""search publications for "black holes" return publications""", verbose=False) # no feedback this time!
The raw json data is accessible via the
json property of the resulting object.
dict_keys(['_stats', '_version', 'publications'])
The main JSON keys of the data returned are accessible as properties
count_total methods provide quick shortcuts to find out how many records are available:
print("We got", data.count_batch, "results out of", data.count_total)
We got 20 results out of 1561596
If the query returns an error, the
errors_string methods can be handy too:
# ps errors are printed out by default data = dsl.query("""search publications for "black holes" return spaceships""")
Returned Errors: 1 Semantic Error Semantic errors found: Facet 'spaceships' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year
Semantic ErrorSemantic errors found: Facet 'spaceships' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year
Iterative querying (loops)¶
Dimcli includes a utility method for looping over a query that produces more than 1000 results (the max number of records a single query can return). This method basically generates a loop in the background, which goes through all results available for a query using the limit/skip syntax.
Looped queries are very useful e.g. if you want to quickly extract a full dataset. There are a few things to remember though:
Each query happens after one second, so to comply with the 30 queries per minute API limit.
The results are collated into a single
dimcli.Datasetobject (same as with normal querying) that can be accessed via the methods illustrated above. So no need for any extra aggregration step when the query completes.
You can use
verbose=Falseto off the notifications e.g. within a larger script with multiple steps.
You can pass
limit = 500(or any other number <=1000) to specify how many records to extract per iteration - which by default is 1000 (the max amount). This can be handy i.e. when a query is particularly long and so it may impact the API performance (or cause an error); in this case returning less records per iteration ensures that the API server never gets overloaded.
data = dsl.query_iterative("""search publications for "black holes" where year=1990 and times_cited > 10 return publications""")
1000 / ... 1000 / 3385 2000 / 3385 3000 / 3385 3385 / 3385 === Records extracted: 3385
Command line interface¶
If you have access to a command-prompt or terminal, Dimcli includes a handy Command Line Interface that allows to query the Dimensions API interactively, similarly to what a query console would do.
The CLI has several features but, most importantly, it allows to use the TAB key to autocomplete your queries (based on the latest API syntax and fields), which makes it an ideal tool for both newbies and expert users.
Running the CLI¶
On a Jupyterlab environment, for example, use the menu
File > new > terminal to open a terminal window. Then simply type in
dimcli to start:
$ dimcli Dimcli - Dimensions API Client (v0.6.6.5) Welcome! Type help for more info. Using endpoint: https://app.dimensions.ai - DSL version: 1.23.1 > help COMMANDS LIST ==================== All special commands start with '.' ---- >>> help: show this help message ---- >>> <tab>: autocomplete. ---- >>> .docs: print out documentation for DSL data objects. >>> .export_as_json: save results from last query as JSON file. >>> .export_as_csv: save results from last query as CSV file. >>> .export_as_html: save results from last query as HTML file. >>> .export_as_bar_chart: save results from last query as Plotly bar chart. >>> .show [optional: N]: print N results from last query, trying to build URLs for objects. Default N=10. >>> .json_compact: print results of last query as single-line JSON. >>> .json_full: print results of last query as formatted JSON. >>> .url: resolve a Dimensions ID into a public URL. ---- >>> <Ctrl-o>: search docs online. >>> <Ctrl-c>: abort query. >>> <Ctrl-d>: exit console. ---- >>> quit: exit console ====================
See the video below for a quick demonstration of the query autocomplete functionalities, or please visit the Github repo for more information.
from IPython.display import Image Image(url= "http://api-sample-data.dimensions.ai/videos/dimcli_animated.gif", width=800)
The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.