../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

The Dimcli Python library: Installation and Querying

The purpose of this notebook is to show how to use Dimcli. Dimcli is an open source Python client for accessing the Dimensions Analytics API. It makes it easier to authenticate against the API, send queries to it and process the JSON data being returned.

Dimcli includes also a command line interface (CLI) that aims at simplifying the process of learning the grammar of the Dimensions Search Language (DSL). Running dimcli from the terminal opens up an interactive query console with syntax autocomplete, persistent history across sessions, pretty-printing and preview of JSON results, export to HTML and CSV, and more.

This guide assumes that you already have a Python 3 working environment and pip - the python package manager - installed. For more background, see this link.

Installation

You can install DimCli as follows from a Jupyter notebook:

[ ]:
!pip install dimcli -U

Then each time you want to use it within a notebook you can load it like this:

[2]:
import dimcli

Authentication

There are different ways to authenticate with the Dimensions API using DimCli. The easiest is passing your credentials explicilty like this:

[ ]:
dimcli.login(username="dimensions-username@me.com",
             password="my-secret-password",
             endpoint="https://your-url.dimensions.ai")

NOTE if you use a key instead of username and password to authenticate, you’d do the following:

[ ]:
dimcli.login(key="my-secret-key",
             endpoint="https://your-url.dimensions.ai")

This method could be handy if you quickly want to login and cannot save a credentials file. However, this method is not ideal if you want to protect your credentials, especially within a shared environment.

More secure method: storing a private credentials file

DimCli allows you to store your access credentials (e.g. email and password) in a file on your computer, so that you don’t have to type it in each time.

If you have access to a terminal prompt, you can use Dimcli’s setup assistant to automatically create the API credentials file (for more info see also the docs):

dimcli --init

Alternatively, if you don’t have access to a terminal prompt, you can create the credentials file manually as follows:

  • create a file called dsl.ini in the uppermost directory where your notebooks are located

  • open the file and add your credentials, making sure that

    • the text structure is exactly the same as below (in particular, don’t change the instance.live directive unless you know what you’re doing!)

    • you update the login and password fields as needed!

This is what the dsl.ini file should look like:

[instance.live]
url=https://app.dimensions.ai
login=user@mail.com
password=yourpasswordhere

Then, all you have to do is:

[6]:
dimcli.login()
Dimcli - Dimensions API Client (v0.6.9)
Connected to endpoint: https://app.dimensions.ai - DSL version: 1.24
Method: dsl.ini file

Querying

Dimcli provides a few handy shortcuts for querying the API.

Simple Querying

[7]:
dsl = dimcli.Dsl()
data = dsl.query("""search publications for "black holes" return publications""")
Returned Publications: 20 (total = 1561596)

By default, Dimcli prints out a short statement with info about the query. You can turn that off by passing the argument verbose=False.

[8]:
data = dsl.query("""search publications for "black holes" return publications""", verbose=False)
# no feedback this time!

The raw json data is accessible via the json property of the resulting object.

[9]:
data.json.keys()
[9]:
dict_keys(['_stats', '_version', 'publications'])

The main JSON keys of the data returned are accessible as properties

[10]:
len(data.publications)
[10]:
20

The count_batch and count_total methods provide quick shortcuts to find out how many records are available:

[11]:
print("We got", data.count_batch, "results out of", data.count_total)
We got 20 results out of 1561596

If the query returns an error, the errors and errors_string methods can be handy too:

[12]:
# ps errors are printed out by default
data = dsl.query("""search publications for "black holes" return spaceships""")
Returned Errors: 1
Semantic Error
Semantic errors found:
        Facet 'spaceships' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year
[13]:
print(data.errors_string)
Semantic ErrorSemantic errors found:
        Facet 'spaceships' is not present in source 'publications'. Available facets are: FOR,FOR_first,HRCS_HC,HRCS_RAC,RCDC,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_ua,category_uoa,experts,funder_countries,funders,journal,journal_lists,mesh_terms,open_access_categories,pf01,publisher,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,times_cited,type,year

Iterative querying (loops)

Dimcli includes a utility method for looping over a query that produces more than 1000 results (the max number of records a single query can return). This method basically generates a loop in the background, which goes through all results available for a query using the limit/skip syntax.

Looped queries are very useful e.g. if you want to quickly extract a full dataset. There are a few things to remember though:

  • Each query happens after one second, so to comply with the 30 queries per minute API limit.

  • The results are collated into a single dimcli.Dataset object (same as with normal querying) that can be accessed via the methods illustrated above. So no need for any extra aggregration step when the query completes.

  • You can use verbose=False to off the notifications e.g. within a larger script with multiple steps.

  • You can pass limit = 500 (or any other number <=1000) to specify how many records to extract per iteration - which by default is 1000 (the max amount). This can be handy i.e. when a query is particularly long and so it may impact the API performance (or cause an error); in this case returning less records per iteration ensures that the API server never gets overloaded.

[14]:
data = dsl.query_iterative("""search publications for "black holes" where year=1990 and times_cited > 10 return publications""")
1000 / ...
1000 / 3385
2000 / 3385
3000 / 3385
3385 / 3385
===
Records extracted: 3385
[15]:
len(data.publications)
[15]:
3385

Command line interface

If you have access to a command-prompt or terminal, Dimcli includes a handy Command Line Interface that allows to query the Dimensions API interactively, similarly to what a query console would do.

The CLI has several features but, most importantly, it allows to use the TAB key to autocomplete your queries (based on the latest API syntax and fields), which makes it an ideal tool for both newbies and expert users.

Running the CLI

On a Jupyterlab environment, for example, use the menu File > new > terminal to open a terminal window. Then simply type in dimcli to start:

$ dimcli

Dimcli - Dimensions API Client (v0.6.6.5)
Welcome! Type help for more info.
Using endpoint: https://app.dimensions.ai - DSL version: 1.23.1

> help

COMMANDS LIST
====================
All special commands start with '.'
----
>>> help: show this help message
----
>>> <tab>:  autocomplete.
----
>>> .docs: print out documentation for DSL data objects.
>>> .export_as_json: save results from last query as JSON file.
>>> .export_as_csv: save results from last query as CSV file.
>>> .export_as_html: save results from last query as HTML file.
>>> .export_as_bar_chart: save results from last query as Plotly bar chart.
>>> .show [optional: N]: print N results from last query, trying to build URLs for objects. Default N=10.
>>> .json_compact: print results of last query as single-line JSON.
>>> .json_full: print results of last query as formatted JSON.
>>> .url: resolve a Dimensions ID into a public URL.
----
>>> <Ctrl-o>: search docs online.
>>> <Ctrl-c>: abort query.
>>> <Ctrl-d>: exit console.
----
>>> quit: exit console
====================

See the video below for a quick demonstration of the query autocomplete functionalities, or please visit the Github repo for more information.

[16]:
from IPython.display import Image
Image(url= "http://api-sample-data.dimensions.ai/videos/dimcli_animated.gif", width=800)
[16]:


Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg