Exploring The Dimensions Search Language (DSL) - Deep Dive¶
This tutorial provides a detailed walkthrough of the most important features of the Dimensions Search Language.
This tutorial is based on the Query Syntax section of the official documentation. So, it can be used as an interactive version of the documentation, as it allows to try out the various DSL queries presented there.
What is the Dimensions Search Language?¶
The DSL aims to capture the type of interaction with Dimensions data that users are accustomed to performing graphically via the web application, and enable web app developers, power users, and others to carry out such interactions by writing query statements in a syntax loosely inspired by SQL but particularly suited to our specific domain and data organization.
Note: this notebook uses the Python programming language, however all the DSL queries are not Python-specific and can in fact be reused with any other API client.
[1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
==
CHANGELOG
This notebook was last run on Jan 26, 2022
==
Prerequisites¶
This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.
[2]:
!pip install dimcli --quiet
import dimcli
from dimcli.utils import *
import json
import sys
import pandas as pd
#
print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
import getpass
KEY = getpass.getpass(prompt='API Key: ')
dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
KEY = ""
dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
Searching config file credentials for 'https://app.dimensions.ai' endpoint..
==
Logging in..
Dimcli - Dimensions API Client (v0.9.6)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0
Method: dsl.ini file
Sections Index¶
Basic query structure
Full-text searching
Field searching
Searching for researchers
Returning results
Aggregations
1. Basic query structure¶
DSL queries consist of two required components: a search
phrase that indicates the scientific records to be searched, and one or more return
phrases which specify the contents and structure of the desired results.
The simplest valid DSL query is of the form search <source>|return <result>
:
[3]:
%%dsldf
search grants return grants limit 5
Returned Grants: 5 (total = 6115921)
Time: 0.73s
[3]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2022, 2023, 2024, 2025, 2026, 2027] | 2027-12-01 | [{'acronym': 'ARC', 'city_name': 'Canberra', '... | Australian Research Council | FL210100107 | grant.9782236 | en | Tracking nanoparticles: from cell culture to i... | 2022-12-01 | 2022 | Tracking nanoparticles: from cell culture to i... |
1 | [2022, 2023] | 2023-05-31 | [{'acronym': 'NSF GEO', 'city_name': 'Arlingto... | Directorate for Geosciences | 2127438 | grant.9752271 | en | NNA Planning: Developing community frameworks ... | 2022-12-01 | 2022 | NNA Planning: Developing community frameworks ... |
2 | [2022, 2023, 2024, 2025, 2026, 2027] | 2027-11-30 | [{'acronym': 'ERC', 'city_name': 'Brussels', '... | European Research Council | 101019146 | grant.9708025 | en | Seeking Constraints on Open Ocean Biocalcifica... | 2022-12-01 | 2022 | Seeking Constraints on Open Ocean Biocalcifica... |
3 | [2022, 2023, 2024, 2025, 2026, 2027] | 2027-11-30 | [{'acronym': 'ERC', 'city_name': 'Brussels', '... | European Research Council | 101003021 | grant.9661402 | en | New frontiers in advanced glycotherapy for cancer | 2022-12-01 | 2022 | New frontiers in advanced glycotherapy for cancer |
4 | [2022, 2023, 2024, 2025] | 2025-10-31 | [{'acronym': 'NSF MPS', 'city_name': 'Arlingto... | Directorate for Mathematical & Physical Sciences | 2105918 | grant.9890102 | en | RUI: Exciton-Phonon Interactions in Solids bas... | 2022-11-15 | 2022 | RUI: Exciton-Phonon Interactions in Solids bas... |
search source
¶
A query must begin with the word search
followed by a source
name, i.e. the name of a type of scientific record
, such as grants
or publications
.
What are the sources available? See the data sources section of the documentation.
Alternatively, we can use the ‘schema’ API (describe) to return this information programmatically:
[4]:
dsl.query("describe schema")
[4]:
<dimcli.DslDataset object #4642007888. Dict keys: 'entities', 'sources'>
A more useful query might also make use of the optional for
and where
phrases to limit the set of records returned.
[5]:
%%dsldf
search grants for "lung cancer"
where active_year=2000
return grants limit 5
Returned Grants: 5 (total = 1764)
Time: 0.69s
[5]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2000, 2001, 2002] | 2002-01-01 | [{'acronym': 'NHLBI', 'city_name': 'Bethesda',... | National Heart Lung and Blood Institute | F32HL010455 | grant.2386513 | en | ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE | 2000-12-31 | 2000 | ROLE OF CD44 ISOFORMS IN ENDOTHELIAL CELL DAMAGE |
1 | [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007] | 2007-11-30 | [{'acronym': 'NHLBI', 'city_name': 'Bethesda',... | National Heart Lung and Blood Institute | R01HL066221 | grant.2537801 | en | GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN... | 2000-12-18 | 2000 | GENETIC ANALYSIS OF EPHRIN-EPH SIGNALING IN AN... |
2 | [2000, 2001, 2002, 2003, 2004] | 2004-11-30 | [{'acronym': 'NHLBI', 'city_name': 'Bethesda',... | National Heart Lung and Blood Institute | R01HL063695 | grant.2537116 | en | ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI... | 2000-12-18 | 2000 | ESTROGEN, ANGIOGENESIS AND ENDOTHELIAL PROGENI... |
3 | [2000, 2001, 2002, 2003, 2004, 2005, 2006, 200... | 2017-12-31 | [{'acronym': 'NHLBI', 'city_name': 'Bethesda',... | National Heart Lung and Blood Institute | R01HL062244 | grant.2536777 | en | Synthetic Heparan Sulfate: Probing Biosynthesi... | 2000-12-15 | 2000 | Synthetic Heparan Sulfate: Probing Biosynthesi... |
4 | [2000, 2001] | 2001-02-28 | [{'acronym': 'RWJF', 'city_name': 'Princeton',... | Robert Wood Johnson Foundation | 41067 | grant.8616620 | en | SmokeLess States Program - Implementation | 2000-12-01 | 2000 | SmokeLess States Program - Implementation |
return
result (source or facet)¶
The most basic return
phrase consists of the keyword return
followed by the name of a record
or facet
to be returned.
This must be the name of the source
used in the search
phrase, or the name of a facet
of that source.
[6]:
%%dsldf
search grants for "laryngectomy"
return grants limit 5
Returned Grants: 5 (total = 127)
Time: 0.54s
[6]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2021, 2022, 2023, 2024, 2025] | 2025-09-30 | [{'acronym': 'NCN', 'city_name': 'Krakow', 'co... | National Science Center | 2020/39/O/HS6/01774 | grant.9750968 | pl | Psychologia ekologiczna i enaktywizm w praktyc... | 2021-07-07 | 2021 | Ecological psychology and enactivism in resear... |
1 | [2021, 2022, 2023, 2024, 2025, 2026] | 2026-03-31 | [{'acronym': 'NIDCD', 'city_name': 'Bethesda',... | National Institute on Deafness and Other Commu... | R01DC019352 | grant.9643970 | en | Reducing social isolation for adults with chro... | 2021-04-05 | 2021 | Reducing social isolation for adults with chro... |
2 | [2021, 2022, 2023, 2024] | 2024-03-31 | [{'acronym': 'JSPS', 'city_name': 'Tokyo', 'co... | Japan Society for the Promotion of Science | 21K17363 | grant.9692444 | ja | 喉頭摘出術後の無喉頭者の嗅覚リハビリテーション効果について | 2021-04-01 | 2021 | About the olfactory rehabilitation effect of l... |
3 | [2021, 2022, 2023, 2024, 2025] | 2025-03-31 | [{'acronym': 'JSPS', 'city_name': 'Tokyo', 'co... | Japan Society for the Promotion of Science | 21K10776 | grant.9685857 | ja | 喉頭がん、下咽頭がんにより喉頭摘出術を受けた患者に対する携帯型嗅覚知覚装置の開発 | 2021-04-01 | 2021 | Development of a portable sensory sensory devi... |
4 | [2021, 2022, 2023, 2024, 2025] | 2025-03-31 | [{'acronym': 'JSPS', 'city_name': 'Tokyo', 'co... | Japan Society for the Promotion of Science | 21K10721 | grant.9685802 | ja | 喉頭摘出者の食道発声トレーニングプログラムの構築と効果の検証 | 2021-04-01 | 2021 | Construction and effectiveness verification of... |
Eg let’s see what are the facets available for the grants source:
[7]:
fields = dsl.query("describe schema")['sources']['grants']['fields']
[x for x in fields if fields[x]['is_facet']]
[7]:
['active_year',
'category_bra',
'category_for',
'category_hra',
'category_hrcs_hc',
'category_hrcs_rac',
'category_icrp_cso',
'category_icrp_ct',
'category_rcdc',
'category_sdg',
'category_uoa',
'funder_countries',
'funders',
'funding_currency',
'funding_org_acronym',
'funding_org_city',
'funding_org_name',
'language',
'language_title',
'research_org_cities',
'research_org_countries',
'research_org_state_codes',
'research_orgs',
'researchers',
'start_year']
2. Full-text Searching¶
Full-text search or keyword search finds all instances of a term (keyword) in a document, or group of documents.
Full text search works by using search indexes, which can be targeting specific sections of a document e.g. its \(abstract\), \(authors\), \(full text\) etc…
[8]:
%%dsldf
search publications
in full_data for "moon landing"
return publications limit 5
Returned Publications: 5 (total = 198654)
Time: 1.49s
[8]:
authors | id | pages | title | type | volume | year | journal.id | journal.title | issue | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Noumea', 'city_id... | pub.1144582099 | 106194 | Estimating post-release mortality of long-line... | article | 249 | 2022 | jour.1032122 | Fisheries Research | NaN |
1 | [{'affiliations': [{'name': 'ESSEC Business Sc... | pub.1141367962 | 1-20 | Optimization in Multimodal Freight Transportat... | article | 299 | 2022 | jour.1027009 | European Journal of Operational Research | 1 |
2 | [{'affiliations': [{'city': 'Beijing', 'city_i... | pub.1144677835 | 197-208 | Adaptive control of hypersonic vehicles with u... | article | 193 | 2022 | jour.1134138 | Acta Astronautica | NaN |
3 | [{'affiliations': [{'city': 'Shanghai', 'city_... | pub.1144650554 | 106956 | Constructing Highly Tribopositive Elastic Yarn... | article | 94 | 2022 | jour.1051417 | Nano Energy | NaN |
4 | [{'affiliations': [{'city': 'Chengdu', 'city_i... | pub.1144554233 | 108783 | Nonlinear vibration model and response charact... | article | 169 | 2022 | jour.1042338 | Mechanical Systems and Signal Processing | NaN |
2.1 in [search index]
¶
This optional phrase consists of the particle in
followed by a term indicating a search index
, specifying for example whether the search is limited to full text, title and abstract only, or title only.
[9]:
%%dsldf
search grants
in title_abstract_only for "something"
return grants limit 5
Returned Grants: 5 (total = 11654)
Time: 0.54s
[9]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2022, 2023, 2024] | 2024-08-31 | [{'acronym': 'EC', 'city_name': 'Brussels', 'c... | European Commission | 101018408 | grant.9662248 | en | Conceptual Engineering, Inquiry and Communication | 2022-09-01 | 2022 | Conceptual Engineering, Inquiry and Communication |
1 | [2022, 2023, 2024] | 2024-08-31 | [{'acronym': 'EC', 'city_name': 'Brussels', 'c... | European Commission | 101031970 | grant.9652677 | en | Molecular dynamics of size sensing at the sing... | 2022-09-01 | 2022 | Molecular dynamics of size sensing at the sing... |
2 | [2022, 2023, 2024, 2025] | 2025-04-30 | [{'acronym': 'EC', 'city_name': 'Brussels', 'c... | European Commission | 101019008 | grant.9662230 | en | Yamatology of the Axis. Japan as a Nazi-Fascis... | 2022-05-01 | 2022 | Yamatology of the Axis. Japan as a Nazi-Fascis... |
3 | [2022] | 2022-09-30 | [{'acronym': 'SNF', 'city_name': 'Bern', 'coun... | Swiss National Science Foundation | 205597 | grant.9943737 | en | Innovation performance and knowledge spillover... | 2022-04-01 | 2022 | Innovation performance and knowledge spillover... |
4 | [2022, 2023, 2024] | 2024-12-31 | [{'acronym': 'EPSRC', 'city_name': 'Swindon', ... | Engineering and Physical Sciences Research Cou... | EP/W002817/1 | grant.9944524 | en | The Farey framework for SL2-tilings | 2022-01-01 | 2022 | The Farey framework for SL2-tilings |
Eg let’s see what are the search fields available for the grants source:
[10]:
dsl.query("describe schema")['sources']['grants']['search_fields']
[10]:
['full_data', 'concepts', 'title_only', 'title_abstract_only', 'investigators']
[11]:
%%dsldf
search grants
in full_data for "graphene AND computer AND iron"
return grants limit 5
Returned Grants: 5 (total = 13)
Time: 0.55s
[11]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2020, 2021] | 2021-03-31 | [{'acronym': 'NSERC', 'city_name': 'Ottawa', '... | Natural Sciences and Engineering Research Council | N/A | grant.9873234 | en | Phase transitions in two-dimensional ultrathin... | 2020-04-01 | 2020 | Phase transitions in two-dimensional ultrathin... |
1 | [2019, 2020] | 2020-03-31 | [{'acronym': 'NSERC', 'city_name': 'Ottawa', '... | Natural Sciences and Engineering Research Council | N/A | grant.9445377 | en | Phase transitions in two-dimensional ultrathin... | 2019-04-01 | 2019 | Phase transitions in two-dimensional ultrathin... |
2 | [2019, 2020, 2021] | 2021-12-31 | [{'acronym': 'RSF', 'city_name': 'Moscow', 'co... | Russian Science Foundation | 19-43-04129 | grant.8413990 | en | Weyl and Dirac semimetals and beyond - predict... | 2019-01-01 | 2019 | Weyl and Dirac semimetals and beyond - predict... |
3 | [2018] | 2018-12-31 | [{'acronym': 'RFBR', 'city_name': 'Moscow', 'c... | Russian Foundation for Basic Research | 18-02-20097 | grant.8731867 | ru | Проект организации 18-ой Международной конфере... | 2018-01-01 | 2018 | Project of the organization of the 18th Intern... |
4 | [2016] | 2016-12-31 | [{'acronym': 'MNiSW', 'city_name': 'Warsaw', '... | Ministry of Science and Higher Education | 4491/E-370/S/2016 | grant.7397800 | pl | Dotacja podmiotowa na utrzymanie potencjału ba... | 2016-02-22 | 2016 | Subject subsidy for maintaining the research p... |
Special search indexes for persons names permit to perform full text searches on publications authors
or grants investigators
. Please see the Researchers Search section below for more information on how searches work in this case.
[12]:
%dsldf search publications in authors for "\"Jennifer A Doudna\"" return publications limit 5
Returned Publications: 5 (total = 387)
Time: 0.69s
[12]:
authors | id | pages | title | type | year | journal.id | journal.title | issue | volume | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'San Francisco', '... | pub.1144086202 | 2021.12.20.21268048 | Omicron mutations enhance infectivity and redu... | preprint | 2021 | jour.1369542 | medRxiv | NaN | NaN |
1 | [{'affiliations': [{'city': 'Berkeley', 'city_... | pub.1143670587 | 2021.12.06.471469 | A naturally DNase-free CRISPR-Cas12c enzyme si... | preprint | 2021 | jour.1293558 | bioRxiv | NaN | NaN |
2 | [{'affiliations': [{'city': 'Berkeley', 'city_... | pub.1143662570 | 34-47 | Species- and site-specific genome editing in c... | article | 2021 | jour.1052984 | Nature Microbiology | 1 | 7 |
3 | [{'affiliations': [{'city': 'Berkeley', 'city_... | pub.1142569174 | 100527-100527 | Optimizing COVID-19 control with asymptomatic ... | article | 2021 | jour.1040589 | Epidemics | NaN | 37 |
4 | [{'affiliations': [{'city': 'Berkeley', 'city_... | pub.1142452381 | e0258263 | LuNER: Multiplexed SARS-CoV-2 detection in cli... | article | 2021 | jour.1037553 | PLOS ONE | 11 | 16 |
2.2 for "search term"
¶
This optional phrase consists of the keyword for
followed by a search term
string
, enclosed in double quotes ("
).
Strings in double quotes can contain nested quotes escaped by a backslash \
. This will ensure that the string in nested double quotes is searched for as if it was a single phrase, not multiple words.
An example of a phrase: "\"Machine Learning\""
: results must contain Machine Learning
as a phrase.
[13]:
%dsldf search publications for "\"Machine Learning\"" return publications limit 5
Returned Publications: 5 (total = 1750245)
Time: 1.20s
[13]:
authors | id | issue | pages | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Barcelona', 'city... | pub.1141731106 | 9 | 1-38 | Computing Graph Neural Networks: A Survey from... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
1 | [{'affiliations': [{'city': 'Barcelona', 'city... | pub.1141731105 | 9 | 1-35 | A Survey on Uncertainty Estimation in Deep Lea... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
2 | [{'affiliations': [{'city': 'Pittsburgh', 'cit... | pub.1141731091 | 9 | 1-36 | A Survey on Data-driven Network Intrusion Dete... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
3 | [{'affiliations': [{'city': "Xi'an", 'city_id'... | pub.1141731088 | 9 | 1-40 | A Survey of Deep Active Learning | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
4 | [{'affiliations': [{'city': 'Kuwait City', 'ci... | pub.1141731057 | 9 | 1-35 | Design Guidelines for Cooperative UAV-supporte... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
Example of multiple keywords: "Machine Learning"
: this searches for keywords independently.
[14]:
%dsldf search publications for "Machine Learning" return publications limit 5
Returned Publications: 5 (total = 3301914)
Time: 0.71s
[14]:
authors | id | issue | pages | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Barcelona', 'city... | pub.1141731106 | 9 | 1-38 | Computing Graph Neural Networks: A Survey from... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
1 | [{'affiliations': [{'city': 'Barcelona', 'city... | pub.1141731105 | 9 | 1-35 | A Survey on Uncertainty Estimation in Deep Lea... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
2 | [{'affiliations': [{'city': 'Pittsburgh', 'cit... | pub.1141731091 | 9 | 1-36 | A Survey on Data-driven Network Intrusion Dete... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
3 | [{'affiliations': [{'city': "Xi'an", 'city_id'... | pub.1141731088 | 9 | 1-40 | A Survey of Deep Active Learning | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
4 | [{'affiliations': [{'city': 'Kuwait City', 'ci... | pub.1141731057 | 9 | 1-35 | Design Guidelines for Cooperative UAV-supporte... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
Note: Special characters, such as any of ^ " : ~ \ [ ] { } ( ) ! | & +
must be escaped by a backslash \
. Also, please note escaping rules in Python (or other languages). For example, when writing a query with escaped quotes, such as search publications for "\"phrase 1\" AND \"phrase 2\""
, in Python, it is necessary to escape the backslashes as well, so it would look like:
'search publications for "\\"phrase 1\\" AND \\"phrase 2\\""'
.
See the official docs for more details.
2.3 Boolean Operators¶
Search term can consist of multiple keywords or phrases connected using boolean logic operators, e.g. AND
, OR
and NOT
.
[15]:
%dsldf search publications for "(dose AND concentration)" return publications limit 5
Returned Publications: 5 (total = 6050778)
Time: 0.99s
[15]:
authors | id | pages | title | type | year | journal.id | journal.title | volume | |
---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Davangere', 'city... | pub.1143861372 | 1-14 | Screening of Antibacterial and Antioxidant Act... | article | 2022 | jour.1047355 | BioNanoScience | NaN |
1 | [{'affiliations': [{'city': 'Pune', 'city_id':... | pub.1144548884 | 17-22 | Daily oral Vitamin D3 without concomitant ther... | article | 2022 | NaN | NaN | 2 |
2 | [{'affiliations': [{'city': 'Jatobá', 'city_id... | pub.1144546526 | 100051 | Nutritional feed additives reduce the adverse ... | article | 2022 | jour.1400582 | Fish and Shellfish Immunology Reports | 3 |
3 | [{'affiliations': [{'city': 'Lecce', 'city_id'... | pub.1144545525 | 100071 | Income-dependent expansion of electricity dema... | article | 2022 | jour.1400160 | Energy and Climate Change | 3 |
4 | [{'affiliations': [{'city': 'Tokyo', 'city_id'... | pub.1144519018 | 100049 | Alteration of hemoglobin ß gene expression in ... | article | 2022 | jour.1400582 | Fish and Shellfish Immunology Reports | 3 |
When specifying Boolean operators with keywords such as AND
, OR
and NOT
, the keywords must appear in all uppercase.
The operators available are shown in the table below. .
Boolean Operator |
Alternative Symbol |
Description |
---|---|---|
|
|
Requires both terms on either side of the Boolean operator to be present for a match. |
|
|
Requires that the following term not be present. |
|
|
Requires that either term (or both terms) be present for a match. |
|
Requires that the following term be present. |
|
|
Prohibits the following term (that is, matches on fields or documents that do not include that term). The |
[16]:
%dsldf search publications for "(dose OR concentration) AND (-malaria +africa)" return publications limit 5
Returned Publications: 5 (total = 1644481)
Time: 0.88s
[16]:
authors | id | pages | title | type | volume | year | journal.id | journal.title | issue | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Baltimore', 'city... | pub.1144345471 | 100059 | Reliability and validity of a perinatal depres... | article | 2 | 2022 | jour.1410012 | SSM - Mental Health | NaN |
1 | [{'affiliations': [{'city': 'Clemson', 'city_i... | pub.1144477432 | 100074 | Detection of Trypanosoma brucei by microwave c... | article | 4 | 2022 | jour.1386020 | Sensors and Actuators Reports | NaN |
2 | [{'affiliations': [{'city': 'Beijing', 'city_i... | pub.1144284673 | 1174-1190 | Examining and applying the theory of “explorin... | article | 8 | 2022 | jour.1150945 | Energy Reports | NaN |
3 | [{'affiliations': [{'city': 'Beijing', 'city_i... | pub.1143898003 | 161-182 | A review of regional energy internet in smart ... | article | 8 | 2022 | jour.1150945 | Energy Reports | NaN |
4 | [{'affiliations': [{'city': 'Giza', 'city_id':... | pub.1144780769 | 101687 | A Unified index of water resources systems vul... | article | 13 | 2022 | jour.1139894 | Ain Shams Engineering Journal | 5 |
The combination of keywords and boolean operators allow to construct rather sophisticated queries. For example, here’s a real-world query used to extract publications related to COVID-19.
[17]:
q_inner = """ "2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR "HCoV-2019" OR "hcov" OR "NCOVID-19" OR
"severe acute respiratory syndrome coronavirus 2" OR "severe acute respiratory syndrome corona virus 2"
OR (("coronavirus" OR "corona virus") AND (Wuhan OR China OR novel)) """
# tip: dsl_escape is a dimcli utility function for escaping special characters
q_outer = f"""search publications in full_data for "{dsl_escape(q_inner)}" return publications"""
print(q_outer)
dsl.query(q_outer)
search publications in full_data for " \"2019-nCoV\" OR \"COVID-19\" OR \"SARS-CoV-2\" OR \"HCoV-2019\" OR \"hcov\" OR \"NCOVID-19\" OR
\"severe acute respiratory syndrome coronavirus 2\" OR \"severe acute respiratory syndrome corona virus 2\"
OR ((\"coronavirus\" OR \"corona virus\") AND (Wuhan OR China OR novel)) " return publications
Returned Publications: 20 (total = 937518)
Time: 4.56s
[17]:
<dimcli.DslDataset object #4644757408. Records: 20/937518>
2.4 Wildcard Searches¶
The DSL supports single and multiple character wildcard searches within single terms. Wildcard characters can be applied to single terms, but not to search phrases.
[18]:
%dsldf search publications in title_only for "ital? malaria" return publications limit 5
Returned Publications: 5 (total = 151)
Time: 0.90s
[18]:
authors | id | pages | title | type | year | journal.id | journal.title | issue | volume | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Milan', 'city_id'... | pub.1144404199 | 1-8 | The value of lamp to rule out imported malaria... | article | 2022 | jour.1012240 | Infectious Diseases | NaN | NaN |
1 | [{'affiliations': [{'city': 'Foggia', 'city_id... | pub.1143322096 | 1521 | Entomological Surveillance in Former Malaria-e... | article | 2021 | jour.1047674 | Pathogens | 11 | 10 |
2 | [{'affiliations': [{'city': 'Rome', 'city_id':... | pub.1136415339 | 621974 | Plasmodium matutinum Transmitted by Culex pipi... | article | 2021 | jour.1052367 | Frontiers in Veterinary Science | NaN | 8 |
3 | [{'affiliations': [{'city': 'Rome', 'city_id':... | pub.1133261890 | NaN | Artemisinin resistance surveillance in African... | article | 2020 | jour.1112262 | Journal of Travel Medicine | 5 | 28 |
4 | NaN | pub.1132438137 | 9744-9748 | Does Living in Previously Exposed Malaria or W... | article | 2020 | jour.1278986 | Biointerface Research in Applied Chemistry | 2 | 11 |
[19]:
%dsldf search publications in title_only for "it* malaria" return publications limit 5
Returned Publications: 5 (total = 1710)
Time: 0.65s
[19]:
authors | id | issue | pages | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Rāiganj', 'city_i... | pub.1144613444 | 1 | 630 | Study of epidemiological behaviour of malaria ... | article | 12 | 2022 | jour.1045337 | Scientific Reports |
1 | [{'affiliations': [{'city': 'Paris', 'city_id'... | pub.1144507923 | 1 | e0262018 | Retrospective study of toxoplasmosis prevalenc... | article | 17 | 2022 | jour.1037553 | PLOS ONE |
2 | [{'affiliations': [{'city': 'Manaus', 'city_id... | pub.1144484495 | NaN | NaN | Essential Oil of Piper Purusanum C.DC (Piperac... | preprint | NaN | 2022 | jour.1380788 | Research Square |
3 | [{'affiliations': [{'city': 'Rio de Janeiro', ... | pub.1144409586 | 1 | 6 | Naturally acquired antibody response to a Plas... | article | 21 | 2022 | jour.1030597 | Malaria Journal |
4 | [{'affiliations': [{'city': 'Milan', 'city_id'... | pub.1144404199 | NaN | 1-8 | The value of lamp to rule out imported malaria... | article | NaN | 2022 | jour.1012240 | Infectious Diseases |
Wildcard Search Type |
Special Character |
Example |
---|---|---|
Single character - matches a single character |
|
The search string |
Multiple characters - matches zero or more sequential characters |
|
The wildcard search: |
2.5 Proximity Searches¶
A proximity search looks for terms that are within a specific distance from one another.
To perform a proximity search, add the tilde character ~
and a numeric value to the end of a search phrase. For example, to search for a formal
and model
within 10 words of each other in a document, use the search:
[20]:
%dsldf search publications for "\"formal model\"~10" return publications limit 5
Returned Publications: 5 (total = 550689)
Time: 1.48s
[20]:
authors | id | issue | pages | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Linz', 'city_id':... | pub.1141731109 | 9 | 1-35 | Adversary Models for Mobile Device Authentication | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
1 | [{'affiliations': [{'city': 'Singapore', 'city... | pub.1139789629 | 7 | 1-38 | A Survey of Smart Contract Formal Specificatio... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
2 | [{'affiliations': [{'city': 'Valencia', 'city_... | pub.1144441268 | 3 | 100187 | Institutional factors affecting entrepreneursh... | article | 28 | 2022 | jour.1357518 | Investigaciones Europeas de Dirección y Econom... |
3 | [{'affiliations': [{'city': 'Shanghai', 'city_... | pub.1144280894 | 7 | 1-20 | Research on Evaluation of Intelligent Manufact... | article | 30 | 2022 | jour.1140896 | Journal of Global Information Management |
4 | [{'affiliations': [{'city': 'Buffalo', 'city_i... | pub.1144816193 | NaN | 102306 | Task allocation and planning for product disas... | article | 76 | 2022 | jour.1044008 | Robotics and Computer-Integrated Manufacturing |
[21]:
%dsldf search publications for "\"digital humanities\"~5 +ontology" return publications limit 5
Returned Publications: 5 (total = 12057)
Time: 1.04s
[21]:
authors | id | issue | pages | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Ljubljana', 'city... | pub.1144807122 | 1 | 345-369 | COVID-19, Digital Tracking Control and Chinese... | article | 10 | 2022 | jour.1371758 | Asian Studies |
1 | [{'affiliations': [{'city': 'Wuhan', 'city_id'... | pub.1144781071 | 1 | 7-15 | Reuse‐oriented data publishing: How to make th... | article | 35 | 2022 | jour.1043368 | Learned Publishing |
2 | [{'affiliations': [{'city': 'Isparta', 'city_i... | pub.1144707794 | NaN | 1-21 | Automatic and intelligent content visualizatio... | article | NaN | 2022 | jour.1104357 | Neural Computing and Applications |
3 | NaN | pub.1144681286 | NaN | NaN | Handbook of the American Short Story | book | 15 | 2022 | NaN | NaN |
4 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1144619382 | NaN | NaN | The Performance of Sculpture in Renaissance Ve... | monograph | NaN | 2022 | NaN | NaN |
formal
and model
were 10 spaces apart in a field, but formal
appeared before model
, more than 10 term movements would be required to move the terms together and position formal
to the right of model
with a space in between.3. Field Searching¶
Field searching allows to use a specific field
of a source
as a query filter. For example, this can be a Literal field such as the \(type\) of a publication, its \(date\), \(mesh terms\), etc.. Or it can be an Entity field, such as the \(journal title\) for a publication, the \(country name\) of its author affiliations, etc..
What are the fields available for each source? See the data sources section of the documentation.
Alternatively, we can use the ‘schema’ API (describe) to return this information programmatically:
[22]:
%dsldocs publications
[22]:
sources | field | type | description | is_filter | is_entity | is_facet | |
---|---|---|---|---|---|---|---|
0 | publications | abstract | string | The publication abstract. | False | False | False |
1 | publications | acknowledgements | string | The acknowledgements section text as found in ... | False | False | False |
2 | publications | altmetric | float | Altmetric Attention Score. | True | False | False |
3 | publications | altmetric_id | integer | Altmetric Publication ID | True | False | False |
4 | publications | arxiv_id | string | The publications arXiv identifier (e.g. ‘arXiv... | True | False | False |
... | ... | ... | ... | ... | ... | ... | ... |
62 | publications | times_cited | integer | Number of citations (note: does not support em... | True | False | True |
63 | publications | title | string | Title of a publication. | False | False | False |
64 | publications | type | string | Publication type (one of: article, chapter, pr... | True | False | True |
65 | publications | volume | string | Publication volume. | True | False | False |
66 | publications | year | integer | The year for the version of record of publicat... | True | False | True |
67 rows × 7 columns
3.1 where
¶
This optional phrase consists of the keyword where
followed by a filters
phrase consisting of DSL filter expressions, as described below.
[23]:
%dsldf search publications where type = "book" return publications limit 5
Returned Publications: 5 (total = 596368)
Time: 0.56s
[23]:
id | title | type | year | |
---|---|---|---|---|
0 | pub.1132180584 | De consensu evangelistarum | book | 2022 |
1 | pub.1144839778 | PESQUISAS EM PRÁTICAS DISCURSIVAS, SENTIDOS E ... | book | 2022 |
2 | pub.1144311572 | L'anti-manuel de management dans les EHPAD | book | 2022 |
3 | pub.1144411807 | Roots and Trajectories of Violent Extremism an... | book | 2022 |
4 | pub.1144515956 | Land Use Change and Its Ecological Effects in ... | book | 2022 |
If a for
phrase is also used in a filtered query, the system will first apply the filters, and then search the resulting restricted set of documents for the search term
.
[24]:
%dsldf search publications for "malaria" where type = "book" return publications limit 5
Returned Publications: 5 (total = 22341)
Time: 0.61s
[24]:
id | title | type | year | volume | |
---|---|---|---|---|---|
0 | pub.1144315336 | Invertebrate Medicine | book | 2022 | NaN |
1 | pub.1144494935 | Metallosurfactants | book | 2022 | NaN |
2 | pub.1144493884 | Probiotics, Prebiotics and Synbiotics | book | 2022 | NaN |
3 | pub.1144718404 | Zusammenfassung | book | 2022 | Band 10 |
4 | pub.1144752324 | Handbook of Biomass Valorization for Industria... | book | 2022 | NaN |
3.2 in
¶
For convenience, the DSL also supports shorthand notation for filters where a particular field should be restricted to a specified range or list of values (although the same logic may be expressed using complex filters as shown below).
Syntax: a range filter consists of the field
name, the keyword in
, and a range of values enclosed in square brackets ([]
), where the range consists of a low
value, colon :
, and a high
value.
[25]:
%%dsldf
search grants
for "malaria"
where start_year in [ 2010 : 2015 ]
return grants limit 5
Returned Grants: 5 (total = 3281)
Time: 0.60s
[25]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2015, 2016, 2017] | 2017-11-30 | [{'acronym': 'NIAID', 'city_name': 'Bethesda',... | National Institute of Allergy and Infectious D... | R21AI120981 | grant.4729738 | en | Bloodborne tropical pathogen detection using m... | 2015-12-28 | 2015 | Bloodborne tropical pathogen detection using m... |
1 | [2015, 2016, 2017, 2018, 2019] | 2019-02-28 | [{'acronym': 'NIAID', 'city_name': 'Bethesda',... | National Institute of Allergy and Infectious D... | R21AI120973 | grant.4729736 | en | Field-deployable Assay for Differential Diagno... | 2015-12-24 | 2015 | Field-deployable Assay for Differential Diagno... |
2 | [2015, 2016, 2017, 2018] | 2018-11-30 | [{'acronym': 'NIAID', 'city_name': 'Bethesda',... | National Institute of Allergy and Infectious D... | R21AI109439 | grant.4729699 | en | T cell driven antigen discovery for vaccine ca... | 2015-12-21 | 2015 | T cell driven antigen discovery for vaccine ca... |
3 | [2015, 2016, 2017, 2018] | 2018-12-18 | [{'acronym': 'VolkswagenStiftung', 'city_name'... | Volkswagen Foundation | 91488 | grant.4854433 | en | Senior Fellowship for Dr. Eduardo Samo Gudo: E... | 2015-12-18 | 2015 | Senior Fellowship for Dr. Eduardo Samo Gudo: E... |
4 | [2015, 2016, 2017, 2018, 2019] | 2019-09-30 | [{'acronym': 'NIFA', 'city_name': 'Washington ... | National Institute of Food and Agriculture | N/A | grant.8821176 | en | Biology, Ecology & Management of Emerging Dise... | 2015-12-10 | 2015 | Biology, Ecology & Management of Emerging Dise... |
Syntax: a list filter consists of the field
name, the keyword in
, and a list of one or more value
s enclosed in square brackets ([]
), where values are separated by commas (,
):
[26]:
%%dsldf
search grants
for "malaria"
where research_org_names in [ "UC Berkeley", "UC Davis", "UCLA" ]
return grants limit 5
Returned Grants: 0
Time: 0.68s
[26]:
3.3 count
- filter function¶
The filter function count
is supported on some fields in publications (e.g. researchers
and research_orgs
).
Use of this filter is shown on the example below:
[27]:
%%dsldf
search publications
for "malaria"
where count(research_orgs) > 5
return research_orgs limit 5
Returned Research_orgs: 5
Time: 0.81s
[27]:
city_name | count | country_name | id | latitude | linkout | longitude | name | state_name | types | acronym | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Oxford | 2321 | United Kingdom | grid.4991.5 | 51.753437 | [http://www.ox.ac.uk/] | -1.254010 | University of Oxford | Oxfordshire | [Education] | NaN |
1 | London | 2043 | United Kingdom | grid.8991.9 | 51.520900 | [http://www.lshtm.ac.uk/] | -0.130700 | London School of Hygiene & Tropical Medicine | Camden | [Education] | LSHTM |
2 | Cambridge | 1901 | United States | grid.38142.3c | 42.377052 | [http://www.harvard.edu/] | -71.116650 | Harvard University | Massachusetts | [Education] | NaN |
3 | Baltimore | 1121 | United States | grid.21107.35 | 39.328888 | [https://www.jhu.edu/] | -76.620280 | Johns Hopkins University | Maryland | [Education] | JHU |
4 | London | 1110 | United Kingdom | grid.7445.2 | 51.498600 | [http://www.imperial.ac.uk/] | -0.175478 | Imperial College London | Westminster | [Education] | NaN |
Number of publications with more than 50 researcher.
[28]:
%%dsldf
search publications
for "malaria"
where count(researchers) > 50
return publications limit 5
Returned Publications: 5 (total = 323)
Time: 1.16s
[28]:
authors | id | issue | title | type | volume | year | journal.id | journal.title | pages | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1144290266 | 3 | Cancer Incidence, Mortality, Years of Life Los... | article | 8 | 2021 | jour.1051466 | JAMA Oncology | NaN |
1 | [{'affiliations': [{'city': 'Washington D.C.',... | pub.1143711356 | 12 | SARS-CoV-2 ferritin nanoparticle vaccines elic... | article | 37 | 2021 | jour.1046010 | Cell Reports | 110143-110143 |
2 | [{'affiliations': [{'city': 'Modena', 'city_id... | pub.1143715249 | 12 | Guidelines for the use of flow cytometry and c... | article | 51 | 2021 | jour.1054998 | European Journal of Immunology | 2708-3145 |
3 | [{'affiliations': [], 'corresponding': True, '... | pub.1143609205 | 1 | The global burden of adolescent and young adul... | article | 23 | 2021 | jour.1023279 | The Lancet Oncology | 27-52 |
4 | [{'affiliations': [{'city': 'Dhaka', 'city_id'... | pub.1143648995 | NaN | Global research priorities on COVID-19 for mat... | article | 11 | 2021 | jour.1046459 | Journal of Global Health | 04071 |
Number of publications with more than one researcher.
[29]:
%%dsldf
search publications
where count(researchers) > 1
return funders limit 5
Returned Funders: 5
Time: 2.10s
[29]:
acronym | city_name | count | country_name | id | latitude | linkout | longitude | name | types | state_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | NSFC | Beijing | 2440430 | China | grid.419696.5 | 40.005177 | [http://www.nsfc.gov.cn/publish/portal1/] | 116.339830 | National Natural Science Foundation of China | [Government] | NaN |
1 | EC | Brussels | 849457 | Belgium | grid.270680.b | 50.851650 | [http://ec.europa.eu/index_en.htm] | 4.363670 | European Commission | [Government] | NaN |
2 | MOST | Beijing | 760688 | China | grid.424020.0 | 39.827835 | [http://www.most.gov.cn/eng/] | 116.316284 | Ministry of Science and Technology of the Peop... | [Government] | NaN |
3 | JSPS | Tokyo | 656041 | Japan | grid.54432.34 | 35.687160 | [http://www.jsps.go.jp/] | 139.740390 | Japan Society for the Promotion of Science | [Nonprofit] | NaN |
4 | NCI | Bethesda | 593537 | United States | grid.48336.3a | 39.004326 | [http://www.cancer.gov/] | -77.101190 | National Cancer Institute | [Government] | Maryland |
International collaborations: number of publications with more than one author and affiliations located in more than one country.
[30]:
%%dsldf
search publications
where count(researchers) > 1
and count(research_org_countries) > 1
return funders limit 5
Returned Funders: 5
Time: 1.00s
[30]:
acronym | city_name | count | country_name | id | latitude | linkout | longitude | name | types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | NSFC | Beijing | 574560 | China | grid.419696.5 | 40.005177 | [http://www.nsfc.gov.cn/publish/portal1/] | 116.339830 | National Natural Science Foundation of China | [Government] |
1 | EC | Brussels | 440042 | Belgium | grid.270680.b | 50.851650 | [http://ec.europa.eu/index_en.htm] | 4.363670 | European Commission | [Government] |
2 | DFG | Bonn | 192288 | Germany | grid.424150.6 | 50.699340 | [http://www.dfg.de/en/] | 7.147797 | German Research Foundation | [Nonprofit] |
3 | MOST | Beijing | 186241 | China | grid.424020.0 | 39.827835 | [http://www.most.gov.cn/eng/] | 116.316284 | Ministry of Science and Technology of the Peop... | [Government] |
4 | JSPS | Tokyo | 170517 | Japan | grid.54432.34 | 35.687160 | [http://www.jsps.go.jp/] | 139.740390 | Japan Society for the Promotion of Science | [Nonprofit] |
Domestic collaborations: number of publications with more than one author and more than one affiliation located in exactly one country.
[31]:
%%dsldf
search publications
where count(researchers) > 1
and count(research_org_countries) = 1
return funders limit 5
Returned Funders: 5
Time: 1.82s
[31]:
acronym | city_name | count | country_name | id | latitude | linkout | longitude | name | types | state_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | NSFC | Beijing | 1812269 | China | grid.419696.5 | 40.005177 | [http://www.nsfc.gov.cn/publish/portal1/] | 116.339830 | National Natural Science Foundation of China | [Government] | NaN |
1 | MOST | Beijing | 560284 | China | grid.424020.0 | 39.827835 | [http://www.most.gov.cn/eng/] | 116.316284 | Ministry of Science and Technology of the Peop... | [Government] | NaN |
2 | JSPS | Tokyo | 454249 | Japan | grid.54432.34 | 35.687160 | [http://www.jsps.go.jp/] | 139.740390 | Japan Society for the Promotion of Science | [Nonprofit] | NaN |
3 | NCI | Bethesda | 433280 | United States | grid.48336.3a | 39.004326 | [http://www.cancer.gov/] | -77.101190 | National Cancer Institute | [Government] | Maryland |
4 | EC | Brussels | 389598 | Belgium | grid.270680.b | 50.851650 | [http://ec.europa.eu/index_en.htm] | 4.363670 | European Commission | [Government] | NaN |
3.4 Filter Operators¶
A simple filter expression consists of a field
name, an in-/equality operator op
, and the desired field value
.
The value
must be a string
enclosed in double quotes ("
) or an integer (e.g. 1234
).
The available operators are:
|
meaning |
---|---|
|
is (or contains if the given |
|
is not |
|
is greater than |
|
is less than |
|
is greater than or equal to |
|
is less than or equal to |
|
partially matches (see partial-string-matching below) |
|
is empty (see emptiness-filters below) |
|
is not empty (see emptiness-filters below) |
A couple of examples
[32]:
%dsldf search datasets where year > 2010 and year < 2012 return datasets limit 5
Returned Datasets: 5 (total = 158057)
Time: 0.65s
[32]:
authors | id | title | year | |
---|---|---|---|---|
0 | [{'name': 'Tsuji, Kazuki'}, {'name': 'Kikuta, ... | dataset.999 | DiaBroodproduction Pupae | 2011 |
1 | [{'name': 'Tsuji, Kazuki'}, {'name': 'Kikuta, ... | dataset.998 | Data from: Determination of the cost of worker... | 2011 |
2 | [{'name': 'Wessex Archaeology'}, {'name': 'Wes... | dataset.99758 | Stricklands, Chapel Road, Southampton (OASIS I... | 2011 |
3 | [{'name': 'Vijendravarma, Roshan K.'}, {'name'... | dataset.997 | critical weight data | 2011 |
4 | [{'name': 'Vijendravarma, Roshan K.'}, {'name'... | dataset.996 | Data from: Chronic malnutrition favours smalle... | 2011 |
[33]:
%dsldf search patents where assignees != "grid.410484.d" return patents limit 5
Returned Patents: 5 (total = 142264864)
Time: 1.21s
[33]:
assignee_names | filing_status | id | inventor_names | publication_date | times_cited | title | year | |
---|---|---|---|---|---|---|---|---|
0 | [KHASHOGGI E IND] | Application | ZW-9994-A1 | [ANDERSEN PER JUST, HODSON SIMON K] | 1994-09-28 | 0 | Sealable liquid-tight, thin-walled containers ... | 1994 |
1 | [H L & H TIMBER PROD] | Application | ZW-9993-A1 | [FRANS ROELOF PETRUS PIENAAR, RICHARD GEORGE K... | 1993-12-15 | 0 | SPACER ASSEMBLY AND METHOD | 1993 |
2 | [GLAVERBEL] | Application | ZW-9992-A1 | [JEAN-PIERRE MEYNCKENS, LEON-PHILIPPE MOTTET] | 1993-07-28 | 0 | PROCESS AND MIXTURE FOR FORMING A COHERENT REF... | 1992 |
3 | [SENTRACHEM LTD] | Application | ZW-9991-A1 | [ANTHONY PATRICK REYNOLDS, MARK BURDON COCKSED... | 1991-10-16 | 0 | INORGANIC FLOCCULANT MANUFACTURE | 1991 |
4 | [DANTEX EXPLOSIVES] | Application | ZW-9990-A1 | [LEON MICHAEL ZIMMERMANN] | 1990-10-31 | 0 | EXPLOSIVE COMPOSITION | 1990 |
3.5 Partial string matching with ~
¶
The ~
operator indicates that the given field
need only partially, instead of exactly, match the given string
(the value
used with this operator must be a string
, not an integer).
For example, the filter where research_orgs.name~"Saarland Uni"
would match both the organization named “Saarland University” and the one named “Universitätsklinikum des Saarlandes”, and any other organization whose name includes the terms “Saarland” and “Uni” (the order is unimportant).
[34]:
%%dsldf
search patents
where assignee_names ~ "IBM"
return assignees limit 5
Returned Assignees: 5
Time: 4.83s
[34]:
city_name | count | country_name | id | latitude | linkout | longitude | name | state_name | types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Armonk | 101184 | United States | grid.410484.d | 41.108540 | [http://www.ibm.com/] | -73.720470 | IBM (United States) | New York | [Company] |
1 | Winchester | 5830 | United Kingdom | grid.14648.3f | 51.026752 | [https://www.ibm.com/in-en] | -1.397260 | IBM (United Kingdom) | Hampshire | [Company] |
2 | Böblingen | 3806 | Germany | grid.424815.e | 48.673832 | [http://www.ibm.com/de/de/] | 9.034824 | IBM (Germany) | NaN | [Company] |
3 | Paris | 1511 | France | grid.424192.8 | 48.843975 | [https://www.ibm.com/fr-fr/] | 2.396280 | IBM (France) | NaN | [Company] |
4 | Markham | 1359 | Canada | grid.292504.8 | 43.819103 | [http://www.ibm.com/ca/en/] | -79.333930 | IBM (Canada) | Ontario | [Company] |
3.6 Emptiness filters is empty
¶
To filter records which contain specific field or to filter those which contain an empty field, it is possible to use something like where research_orgs is not empty
or where issn is empty
.
[35]:
%%dsldf
search publications
for "iron graphene"
where researchers is empty
and research_orgs is not empty
return publications[id+title+researchers+research_orgs+type] limit 5
Returned Publications: 5 (total = 5298)
Time: 1.45s
[35]:
id | research_orgs | title | type | |
---|---|---|---|---|
0 | pub.1144344172 | [{'acronym': 'USP', 'city_name': 'São Paulo', ... | Electrochemical sensor for isoniazid detection... | article |
1 | pub.1143970005 | [{'acronym': 'KFUPM', 'city_name': 'Dhahran', ... | A review on underground hydrogen storage: Insi... | article |
2 | pub.1144588192 | [{'city_name': 'Shanghai', 'country_name': 'Ch... | Destructing surfactant network in nanoemulsion... | article |
3 | pub.1144553605 | [{'acronym': 'SIT', 'city_name': 'Hoboken', 'c... | Release of Pb adsorbed on graphene oxide surfa... | article |
4 | pub.1144390006 | [{'city_name': 'Semenyih', 'country_name': 'Ma... | Synthesis of a highly recoverable 3D MnO2/rGO ... | article |
4. Searching for Researchers¶
The DSL offers different mechanisms for searching for researchers (e.g. publication authors, grant investigators), each of them presenting specific advantages.
4.1 Exact name searches¶
Special full-text indices allows to look up a researcher’s name and surname exactly as they appear in the source documents they derive from.
This approach has a broad scope, as it allows to search the full collection of Dimensions documents irrespectively of whether a researcher was succesfully disambiguated (and hence given a Dimensions ID). On the other hand, this approach will only match names as they appear in the source document, so different spellings or initials are not necessarily returned via a single query.
search in [authors|investigators|inventors]
It is possible to look up publications authors using a specific search index
called authors
.
This method expects case insensitive phrases, in format \("<first name> <last name>"\) or reverse order. Note that strings in double quotes that contain nested quotes must always be escaped by a backslash \
.
[36]:
%dsldf search publications in authors for "\"Charles Peirce\"" return publications limit 5
Returned Publications: 5 (total = 267)
Time: 0.54s
[36]:
authors | id | issue | pages | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1141018395 | 4 | 81-91 | PAP : prolégomènes à une apologie du pragmatisme | article | N° 163 | 2021 | jour.1142204 | Cahiers philosophiques |
1 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1140830594 | NaN | 63-71 | COMO TEORIZAR | chapter | NaN | 2021 | NaN | NaN |
2 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1140830593 | NaN | 55-61 | INDUÇÃO ABDUTÓRIA | chapter | NaN | 2021 | NaN | NaN |
3 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1137292963 | NaN | 256-279 | 37 Lowell Lecture IV | chapter | NaN | 2021 | NaN | NaN |
4 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1137292823 | NaN | 437-438 | Name Index | chapter | NaN | 2021 | NaN | NaN |
Instead of first name, initials can also be used. These are examples of valid research search phrases:
\"Peirce, Charles S.\"
\"Charles S. Peirce\"
\"CS Peirce\"
\"Peirce CS\"
\"C S Peirce\"
\"Peirce C S\"
\"C Peirce\"
\"Peirce C\"
\"Charles Peirce\"
\"Peirce Charles\"
Warning: In order to produce valid results an author or an investigator search query must contain at least two components or more (e.g., name and surname, either in full or initials).
Investigators search is similar to authors search, only it allows to search on grants
and clinical trials
using a separate search index investigators
, and on patents
using the index inventors
.
[37]:
%%dsldf
search clinical_trials in investigators for "\"John Smith\""
return clinical_trials limit 5
Returned Clinical_trials: 5 (total = 7)
Time: 1.00s
[37]:
active_years | id | investigators | title | |
---|---|---|---|---|
0 | [2021, 2022, 2023] | NCT05110248 | [[Bryn M Horsington, , Contact, Gemini One, 55... | Research and Development of Novel Quantitative... |
1 | [2019, 2020] | NCT04107519 | [[Herman Taylor, MD, MPH, Principal Investigat... | Goal-Directed Resilience Training to Mitigate ... |
2 | [2019, 2020, 2021] | NCT04072380 | [[Sallie Pulliam, , Contact, Pinnacle Research... | A Phase 2, Double-blind, Placebo-controlled, P... |
3 | [2019, 2020, 2021, 2022, 2023] | NCT03694600 | [[Alamdar Rizvi, MS, Study Director, Laborator... | Prospective Clinical Trial to Detect Liver Can... |
4 | [2018, 2019, 2020, 2021, 2022, 2023] | NCT03653832 | [[Timothy Walsh, MBChB MD MSc, Principal Inves... | Alpha 2 Agonists for Sedation to Produce Bette... |
[38]:
%%dsldf
search grants in investigators for "\"Satoko Shimazaki\""
return grants limit 5
Returned Grants: 4 (total = 4)
Time: 0.54s
[38]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2021, 2022] | 2022-08-31 | [{'acronym': 'NEH', 'city_name': 'Washington D... | National Endowment for the Humanities | FEL-263245-19 | grant.7925589 | en | Kabuki Actors, Print Technology, and the Theat... | 2021-09-01 | 2021 | Kabuki Actors, Print Technology, and the Theat... |
1 | [2018, 2019, 2020, 2021] | 2021-03-31 | [{'acronym': 'JSPS', 'city_name': 'Tokyo', 'co... | Japan Society for the Promotion of Science | 18K00431 | grant.7527261 | ja | 古・中英語期における女性聖人伝の系譜研究:Aelfricのテクストと言語を中心に | 2018-04-01 | 2018 | Genealogy research on female saints in the Pal... |
2 | [2015, 2016, 2017, 2018] | 2018-03-31 | [{'acronym': 'JSPS', 'city_name': 'Tokyo', 'co... | Japan Society for the Promotion of Science | 15K02313 | grant.5858713 | en | Images of Women in the Old English Lives of Sa... | 2015-04-01 | 2015 | Images of Women in the Old English Lives of Sa... |
3 | [2012, 2013, 2014, 2015] | 2015-03-31 | [{'acronym': 'JSPS', 'city_name': 'Tokyo', 'co... | Japan Society for the Promotion of Science | 24520310 | grant.6086985 | en | Reception and Transfromation of the Images of ... | 2012-04-01 | 2012 | Reception and Transfromation of the Images of ... |
[39]:
%%dsldf
search patents in inventors for "\"John Smith\""
return patents limit 5
Returned Patents: 5 (total = 724)
Time: 0.72s
[39]:
assignee_names | filing_status | id | inventor_names | publication_date | times_cited | title | year | assignees | |
---|---|---|---|---|---|---|---|---|---|
0 | [KLEEN TEX IND INC] | Application | ZA-989065-B | [GORDON DAVID E, SMITH JOHN, LEVESQUE JIM, MCH... | 1999-06-30 | 0 | Track control floor mats and applications ther... | 1998 | NaN |
1 | [P I P PROPERTY MANAGERS CC] | Application | ZA-200303048-B | [SMITH JOHN, ROOS ANDREW IAN, ASHURST WILLIAM ... | 2003-10-22 | 2 | System for and method of transferring informat... | 2003 | NaN |
2 | [MILNES PTY LTD] | Application | ZA-200002202-B | [RICKARDS GARY, SMITH JOHN] | 2000-11-23 | 0 | Pipe clamp. | 2000 | NaN |
3 | [SMITH SOLUTIONS LP] | Application | WO-2021100020-A1 | [SMITH JOHN] | 2021-05-27 | 0 | SOLAR BLOCKER | 2020 | NaN |
4 | [IBM CHINA INVESTMENT CO LTD, IBM UK, IBM] | Application | WO-2019224650-A1 | [CONNELL II JONATHAN, PANKANTI SHARATHCHANDRA,... | 2019-11-28 | 0 | GENERATING A TEXTUAL DESCRIPTION OF AN IMAGE U... | 2019 | [{'city_name': 'Armonk', 'country_name': 'Unit... |
4.2 Fuzzy Searches¶
This type of search is similar to full-text search, with the difference that it allows searching by only a part of a name, e.g. only the ‘last name’ of a person, by using the where
clause.
Note At this moment, this type of search is only available for publications
. Other sources will add this option in the future.
For example:
[40]:
%%dsldf
search publications where authors = "Hawking"
return publications[id+doi+title+authors] limit 5
Returned Publications: 5 (total = 2103)
Time: 2.69s
[40]:
authors | doi | id | title | |
---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Ghent', 'city_id'... | 10.1016/j.geomorph.2021.108080 | pub.1144137800 | Long-lasting impacts of a 20th century glacial... |
1 | [{'affiliations': [{'city': "Xi'an", 'city_id'... | 10.1038/s41467-022-28032-1 | pub.1144813161 | Globally elevated chemical weathering rates be... |
2 | [{'affiliations': [{'city': 'Marseille', 'city... | 10.1007/jhep01(2022)063 | pub.1144755056 | Search for exotic decays of the Higgs boson in... |
3 | [{'affiliations': [{'city': 'Marseille', 'city... | 10.1140/epjc/s10052-021-09807-0 | pub.1144445886 | Performance of the ATLAS Level-1 topological t... |
4 | [{'affiliations': [{'city': 'Marseille', 'city... | 10.1103/physrevd.105.012006 | pub.1144591364 | Search for Higgs boson decays into a pair of p... |
Generally speaking, using a where
clause to search authors is less precise that using the relevant exact-search syntax.
On the other hand, using a where
clause can be handy if one wants to combine an author search with another full-text search index.
For example:
[41]:
%%dsldf
search publications
in title_abstract_only for "dna replication"
where authors = "smith"
return publications limit 5
Returned Publications: 5 (total = 1637)
Time: 1.14s
[41]:
authors | id | title | type | year | journal.id | journal.title | issue | pages | volume | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Seattle', 'city_i... | pub.1144684784 | Dynamic configurations of meiotic DNA-break ho... | article | 2022 | jour.1006098 | Journal of Cell Science | NaN | NaN | NaN |
1 | [{'affiliations': [{'city': 'Urbana', 'city_id... | pub.1144817613 | Fundamental behaviors emerge from simulations ... | article | 2022 | jour.1019114 | Cell | 2 | 345-360.e28 | 185 |
2 | [{'affiliations': [{'city': 'Jackson', 'city_i... | pub.1143745236 | Epigenome-wide association study of serum urat... | article | 2021 | jour.1043282 | Nature Communications | 1 | 7173 | 12 |
3 | [{'affiliations': [{'city': 'Rochester', 'city... | pub.1142528147 | The mitochondrial iron transporter ABCB7 is re... | article | 2021 | jour.1046517 | eLife | NaN | e69621 | 10 |
4 | [{'affiliations': [{'city': 'London', 'city_id... | pub.1142494563 | Pre-existing polymerase-specific T cells expan... | article | 2021 | jour.1018957 | Nature | 7891 | 110-117 | 601 |
4.3 Using the disambiguated Researchers database¶
The Dimensions Researchers source is a database of researchers information algorithmically extracted and disambiguated from all of the other content sources (publications, grants, clinical trials etc..).
By using the researchers
source it is possible to match an ‘aggregated’ person object linking together multiple publication authors, grant investigators etc.. irrespectively of the form their names can take in the original source documents.
However, since database does not contain all authors and investigators information available in Dimensions.
E.g. think of authors from older publications, or authors with very common names that are difficult to disambiguate, or very new authors, who have only one or few publications. In such cases, using full-text authors search might be more appropriate.
Examples:
[42]:
%%dsldf
search researchers for "\"Satoko Shimazaki\""
return researchers[basics+obsolete]
Returned Researchers: 4 (total = 4)
Time: 0.61s
[42]:
first_name | id | last_name | obsolete | research_orgs | |
---|---|---|---|---|---|
0 | Satoko | ur.07751146721.59 | Shimazaki | 0 | NaN |
1 | Satoko | ur.015527473602.63 | Shimazaki | 0 | [{'acronym': 'UCB', 'city_name': 'Boulder', 'c... |
2 | Satoko | ur.014307627665.09 | Shimazaki | 0 | [{'acronym': 'UCLA', 'city_name': 'Los Angeles... |
3 | Satoko | ur.010537333602.30 | Shimazaki | 1 | NaN |
NOTE pay attentiont to the obsolete
field. This indicates the researcher ID status. 0 means that the researcher ID is still active, 1 means that the researcher ID is no longer valid. This is due to the ongoing process of refinement of Dimensions researchers.
Hence the query above is best written like this:
[43]:
%%dsldf
search researchers where obsolete=0 for "\"Satoko Shimazaki\""
return researchers[basics+obsolete]
Returned Researchers: 3 (total = 3)
Time: 1.00s
[43]:
first_name | id | last_name | obsolete | research_orgs | |
---|---|---|---|---|---|
0 | Satoko | ur.07751146721.59 | Shimazaki | 0 | NaN |
1 | Satoko | ur.015527473602.63 | Shimazaki | 0 | [{'acronym': 'UCB', 'city_name': 'Boulder', 'c... |
2 | Satoko | ur.014307627665.09 | Shimazaki | 0 | [{'acronym': 'UCLA', 'city_name': 'Los Angeles... |
With Researchers
, one can use other fields as well:
[44]:
%%dsldf
search researchers
where obsolete=0 and last_name="Shimazaki"
return researchers[basics] limit 5
Returned Researchers: 5 (total = 479)
Time: 0.70s
[44]:
first_name | id | last_name | research_orgs | |
---|---|---|---|---|
0 | Yoshiaki | ur.07777053663.32 | Shimazaki | [{'acronym': 'KUIS', 'city_name': 'Miki', 'cou... |
1 | Takanori | ur.07761700432.36 | Shimazaki | [{'city_name': 'Sendai', 'country_name': 'Japa... |
2 | Toshiharu | ur.07755624517.66 | Shimazaki | [{'city_name': 'Toyama', 'country_name': 'Japa... |
3 | Satoko | ur.07751146721.59 | Shimazaki | NaN |
4 | Junya | ur.0773771745.48 | Shimazaki | [{'city_name': 'Suita', 'country_name': 'Japan... |
5. Returning results¶
After the search
phrase, a query must contain one or more return
phrases, specifying the content and format of the information that should be returned.
5.1 Returning Multiple Sources¶
Multiple results may not be returned in a single return
phrase.
[45]:
%%dsldf
search publications
return funders limit 5
return research_orgs limit 5
return year
Returned Funders: 5
Returned Research_orgs: 5
Returned Year: 20
Time: 3.56s
[Warning] Dataframe created from first available key, but more than one JSON key found: ['funders', 'research_orgs', 'year']
[45]:
acronym | city_name | count | country_name | id | latitude | linkout | longitude | name | types | state_name | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | NSFC | Beijing | 2619510 | China | grid.419696.5 | 40.005177 | [http://www.nsfc.gov.cn/publish/portal1/] | 116.339830 | National Natural Science Foundation of China | [Government] | NaN |
1 | EC | Brussels | 893414 | Belgium | grid.270680.b | 50.851650 | [http://ec.europa.eu/index_en.htm] | 4.363670 | European Commission | [Government] | NaN |
2 | MOST | Beijing | 806641 | China | grid.424020.0 | 39.827835 | [http://www.most.gov.cn/eng/] | 116.316284 | Ministry of Science and Technology of the Peop... | [Government] | NaN |
3 | JSPS | Tokyo | 712701 | Japan | grid.54432.34 | 35.687160 | [http://www.jsps.go.jp/] | 139.740390 | Japan Society for the Promotion of Science | [Nonprofit] | NaN |
4 | NCI | Bethesda | 623054 | United States | grid.48336.3a | 39.004326 | [http://www.cancer.gov/] | -77.101190 | National Cancer Institute | [Government] | Maryland |
5.2 Returning Specific Fields¶
For control over which information from each given record
will be returned, a source
or entity
name in the results
phrase can be optionally followed by a specification of fields
and fieldsets
to be included in the JSON results for each retrieved record.
The fields specification may be an arbitrary list of field
names enclosed in brackets ([
, ]
), with field names separated by a plus sign (+
). Minus sign (-
) can be used to exclude field
or a fieldset
from the result. Field names thus listed within brackets must be “known” to the DSL, and therefore only a subset of fields may be used in this syntax (see note below).
[46]:
%%dsldf
search grants
return grants[grant_number + title + language] limit 5
Returned Grants: 5 (total = 6115921)
Time: 0.59s
[46]:
grant_number | language | title | |
---|---|---|---|
0 | FL210100107 | en | Tracking nanoparticles: from cell culture to i... |
1 | 2127438 | en | NNA Planning: Developing community frameworks ... |
2 | 101019146 | en | Seeking Constraints on Open Ocean Biocalcifica... |
3 | 101003021 | en | New frontiers in advanced glycotherapy for cancer |
4 | 2105918 | en | RUI: Exciton-Phonon Interactions in Solids bas... |
[47]:
%%dsldf
search clinical_trials
return clinical_trials [id+ title + acronym + phase] limit 5
Returned Clinical_trials: 5 (total = 685074)
Time: 0.53s
[47]:
acronym | id | phase | title | |
---|---|---|---|---|
0 | HFHRV | UMIN000046696 | N/A | Prediction of development and severity of hear... |
1 | Examination of stimulation brightness of multi... | UMIN000046695 | N/A | Examination of stimulation brightness of multi... |
2 | FJ logic test | UMIN000046689 | N/A | Effect on logical thinking ability and cogniti... |
3 | A study to evaluate the effect after using dia... | UMIN000046688 | N/A | A study to evaluate the effect after using dia... |
4 | The effect of Dipeptidyl Peptidase 4 inhibitor... | UMIN000046686 | N/A | The effect of Dipeptidyl Peptidase 4 inhibitor... |
Shortcuts: ``fieldsets``
The fields specification may be the name of a pre-defined fieldset
(e.g. extras
, basics
). These are shortcuts that can be handy when testing out new queries, for example.
NOTE In general when writing code used in integrations or long-standing extraction scripts it is best to return specific fields rather that a predefined set. This has also the advantage of making queries faster by avoiding the extraction of unnecessary data.
[48]:
%%dsldf
search grants
return grants [basics] limit 5
Returned Grants: 5 (total = 6115921)
Time: 0.57s
[48]:
active_year | end_date | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2022, 2023, 2024, 2025, 2026, 2027] | 2027-12-01 | [{'acronym': 'ARC', 'city_name': 'Canberra', '... | Australian Research Council | FL210100107 | grant.9782236 | en | Tracking nanoparticles: from cell culture to i... | 2022-12-01 | 2022 | Tracking nanoparticles: from cell culture to i... |
1 | [2022, 2023] | 2023-05-31 | [{'acronym': 'NSF GEO', 'city_name': 'Arlingto... | Directorate for Geosciences | 2127438 | grant.9752271 | en | NNA Planning: Developing community frameworks ... | 2022-12-01 | 2022 | NNA Planning: Developing community frameworks ... |
2 | [2022, 2023, 2024, 2025, 2026, 2027] | 2027-11-30 | [{'acronym': 'ERC', 'city_name': 'Brussels', '... | European Research Council | 101019146 | grant.9708025 | en | Seeking Constraints on Open Ocean Biocalcifica... | 2022-12-01 | 2022 | Seeking Constraints on Open Ocean Biocalcifica... |
3 | [2022, 2023, 2024, 2025, 2026, 2027] | 2027-11-30 | [{'acronym': 'ERC', 'city_name': 'Brussels', '... | European Research Council | 101003021 | grant.9661402 | en | New frontiers in advanced glycotherapy for cancer | 2022-12-01 | 2022 | New frontiers in advanced glycotherapy for cancer |
4 | [2022, 2023, 2024, 2025] | 2025-10-31 | [{'acronym': 'NSF MPS', 'city_name': 'Arlingto... | Directorate for Mathematical & Physical Sciences | 2105918 | grant.9890102 | en | RUI: Exciton-Phonon Interactions in Solids bas... | 2022-11-15 | 2022 | RUI: Exciton-Phonon Interactions in Solids bas... |
[49]:
%%dsldf
search publications
return publications [basics+times_cited] limit 5
Returned Publications: 5 (total = 124848001)
Time: 1.29s
[49]:
authors | id | issue | pages | times_cited | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1144593888 | 3 | 284-292 | 0 | Profile of ectoparasites and biometric conditi... | article | 10 | 2022 | jour.1150855 | Depik Jurnal |
1 | [{'affiliations': [{'city': 'Shanghai', 'city_... | pub.1144587500 | NaN | 1-16 | 0 | Experimental study of stratified lean burn cha... | article | NaN | 2022 | jour.1136510 | Frontiers in Energy |
2 | [{'affiliations': [{'city': 'Budapest', 'city_... | pub.1144327837 | NaN | NaN | 0 | Statistical approaches to explore the linkages... | article | NaN | 2022 | jour.1271150 | Journal of Water Supply Research and Technolog... |
3 | [{'affiliations': [{'city': 'Hangzhou', 'city_... | pub.1141731113 | 9 | 1-40 | 0 | Opportunities and Challenges in Code Search Tools | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
4 | [{'affiliations': [{'city': 'Melbourne', 'city... | pub.1141731112 | 9 | 1-36 | 0 | Ransomware Mitigation in the Modern Era: A Com... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
The fields specification may be an (all
), to indicate that all fields available for the given source
should be returned.
[50]:
%%dsldf
search publications
return publications [all] limit 5
Returned Errors: 1
Time: 0.48s
1 QueryError found
Semantic errors found:
Field / Fieldset 'all' is not present in Source 'publications'. Available fields: abstract,acknowledgements,altmetric,altmetric_id,arxiv_id,authors,authors_count,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_uoa,clinical_trial_ids,concepts,concepts_scores,date,date_inserted,date_online,date_print,dimensions_url,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,journal_title_raw,linkout,mesh_terms,open_access,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,referenced_pubs,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,resulting_publication_doi,source_title,subtitles,supporting_grant_ids,times_cited,title,type,volume,year and available fieldsets: basics,book,categories,extras
5.3 Returning Facets¶
In addition to returning source records matching a query, it is possible to \(facet\) on the entity fields related to a particular source and return only those entity values as an aggregrated view of the related source data. This operation is similar to a \(group by\) or \(pivot table\).
Warning Faceting can return up to a maximum of 1000 results. This is to ensure adequate performance with all queries. Furthemore, although the limit
operator is allowed, the skip
operator cannot be used.
[51]:
%%dsldf
search publications
for "coronavirus"
return research_orgs limit 5
Returned Research_orgs: 5
Time: 0.59s
[51]:
city_name | count | country_name | id | latitude | linkout | longitude | name | state_name | types | acronym | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Cambridge | 7327 | United States | grid.38142.3c | 42.377052 | [http://www.harvard.edu/] | -71.116650 | Harvard University | Massachusetts | [Education] | NaN |
1 | Oxford | 4357 | United Kingdom | grid.4991.5 | 51.753437 | [http://www.ox.ac.uk/] | -1.254010 | University of Oxford | Oxfordshire | [Education] | NaN |
2 | Baltimore | 4071 | United States | grid.21107.35 | 39.328888 | [https://www.jhu.edu/] | -76.620280 | Johns Hopkins University | Maryland | [Education] | JHU |
3 | Toronto | 3876 | Canada | grid.17063.33 | 43.661667 | [http://www.utoronto.ca/] | -79.395000 | University of Toronto | Ontario | [Education] | NaN |
4 | London | 3477 | United Kingdom | grid.83440.3b | 51.524470 | [http://www.ucl.ac.uk/] | -0.133982 | University College London | NaN | [Education] | UCL |
[52]:
%%dsldf
search publications
for "coronavirus"
return research_org_countries limit 5
return year limit 5
return category_for limit 5
Returned Category_for: 5
Returned Research_org_countries: 5
Returned Year: 5
Time: 0.91s
[Warning] Dataframe created from first available key, but more than one JSON key found: ['category_for', 'research_org_countries', 'year']
[52]:
count | id | name | |
---|---|---|---|
0 | 290203 | 2211 | 11 Medical and Health Sciences |
1 | 116222 | 3177 | 1117 Public Health and Health Services |
2 | 88129 | 3053 | 1103 Clinical Sciences |
3 | 46291 | 2206 | 06 Biological Sciences |
4 | 39871 | 3114 | 1108 Medical Microbiology |
For control over the organization and headers of the JSON query results, the return
keyword in a return phrase may be followed by the keyword in
and then a group
name for this group of results, where the group name is enclosed in double quotes("
).
Also, one can define aliases
that replace the defaul JSON fields names with other ones provided by the user.
See the official documentation for more details about this feature.
[53]:
%%dsl
search publications
return in "facets" funders
return in "facets" research_orgs
Returned Facets: 2
Time: 2.74s
[53]:
<dimcli.DslDataset object #4694861376. Records: 2/124848001>
5.4 What the query statistics refer to - sources VS facets¶
When performing a DSL search, a _stats
object is return which contains some useful info eg the total number of records available for a search.
[54]:
%%dsldf
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return publications limit 5
Returned Publications: 5 (total = 5807)
Time: 0.68s
[54]:
authors | id | pages | title | type | year | volume | issue | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [{'city': 'Kitakyushu', 'cit... | pub.1113308928 | 123-127 | A Hybrid DCT-CLAHE Approach for Brightness Enh... | proceeding | 2018 | NaN | NaN | NaN | NaN |
1 | [{'affiliations': [{'city': 'Khulna', 'city_id... | pub.1112614472 | 1-5 | Saliency Detection using Boundary Aware Region... | proceeding | 2018 | 00 | NaN | NaN | NaN |
2 | [{'affiliations': [{'city': 'Kitakyushu', 'cit... | pub.1110958161 | 39 | Optimized coordinated control of LFC and SMES ... | article | 2018 | 3 | 1 | jour.1157179 | Protection and Control of Modern Power Systems |
3 | [{'affiliations': [{'city': 'El Paso', 'city_i... | pub.1110932965 | 1445-1452 | Electrostatic Discharge Threshold on Coverglas... | article | 2018 | 47 | 2 | jour.1031080 | IEEE Transactions on Plasma Science |
4 | [{'affiliations': [{'city': 'Kitakyushu', 'cit... | pub.1110012351 | 518-526 | The Role of Lanthanum in a Nickel Oxide‐Based ... | article | 2018 | 12 | 2 | jour.1297486 | ChemSusChem |
It is important to note though that the total number always refers to the main source, never the facets one is searching for.
For example, in this query we return researchers
linked to publications:
[55]:
%%dsldf
search publications
where year in [2013:2018] and research_orgs="grid.258806.1"
return researchers limit 5
Returned Researchers: 5
Time: 0.77s
[55]:
count | first_name | id | last_name | orcid_id | research_orgs | |
---|---|---|---|---|---|---|
0 | 152 | Shuzi | ur.01055753603.27 | Hayase | [0000-0002-0756-4808] | [grid.419082.6, grid.266298.1, grid.410825.a, ... |
1 | 108 | Ting-Li | ur.01144540527.52 | Ma | [0000-0002-3310-459X] | [grid.11135.37, grid.258806.1, grid.411485.d, ... |
2 | 107 | Masayuki | ur.011212042763.67 | Hikita | NaN | [grid.27476.30, grid.462727.2, grid.258806.1] |
3 | 101 | Huimin | ur.016357156077.09 | Lu | [0000-0001-9794-3221] | [grid.9227.e, grid.16821.3c, grid.1024.7, grid... |
4 | 100 | M Kozako M | ur.07644453127.11 | Kozako | NaN | [grid.462727.2, grid.482504.f, grid.4444.0, gr... |
NOTE: facet results can be 1000 at most (due to performance limitations) so if there are more than 1000 it is not possible to know the total number.
5.5 Paginating Results¶
At the end of a return
phrase, the user can specify the maximum number of results to be returned and the number of top records to skip over before returning the first result record, for e.g. returning large result sets page-by-page (i.e. “paging” results) as described below.
This is done using the keyword limit
followed by the maximum number of results to return, optionally followed by the keyword skip
and the number of results to skip (the offset).
[56]:
%%dsldf
search publications return publications limit 10
Returned Publications: 10 (total = 124848001)
Time: 0.59s
[56]:
authors | id | issue | pages | title | type | volume | year | journal.id | journal.title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [{'affiliations': [], 'corresponding': '', 'cu... | pub.1144593888 | 3 | 284-292 | Profile of ectoparasites and biometric conditi... | article | 10 | 2022 | jour.1150855 | Depik Jurnal |
1 | [{'affiliations': [{'city': 'Shanghai', 'city_... | pub.1144587500 | NaN | 1-16 | Experimental study of stratified lean burn cha... | article | NaN | 2022 | jour.1136510 | Frontiers in Energy |
2 | [{'affiliations': [{'city': 'Budapest', 'city_... | pub.1144327837 | NaN | NaN | Statistical approaches to explore the linkages... | article | NaN | 2022 | jour.1271150 | Journal of Water Supply Research and Technolog... |
3 | [{'affiliations': [{'city': 'Hangzhou', 'city_... | pub.1141731113 | 9 | 1-40 | Opportunities and Challenges in Code Search Tools | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
4 | [{'affiliations': [{'city': 'Melbourne', 'city... | pub.1141731112 | 9 | 1-36 | Ransomware Mitigation in the Modern Era: A Com... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
5 | [{'affiliations': [{'city': 'Sydney', 'city_id... | pub.1141731111 | 9 | 1-38 | Service Computing for Industry 4.0: State of t... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
6 | [{'affiliations': [{'city': 'Berlin', 'city_id... | pub.1141731110 | 9 | 1-38 | Handling Iterations in Distributed Dataflow Sy... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
7 | [{'affiliations': [{'city': 'Linz', 'city_id':... | pub.1141731109 | 9 | 1-35 | Adversary Models for Mobile Device Authentication | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
8 | [{'affiliations': [{'city': 'Amberg', 'city_id... | pub.1141731108 | 9 | 1-33 | A Survey on Client Throughput Prediction Algor... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
9 | [{'affiliations': [{'city': 'Genoa', 'city_id'... | pub.1141731107 | 9 | 1-33 | Gotta CAPTCHA ’Em All: A Survey of 20 Years of... | article | 54 | 2022 | jour.1119907 | ACM Computing Surveys |
If paging information is not provided, the default values limit 20 skip 0
are used, so the two following queries are equivalent:
Combining limit
and skip
across multiple queries enables paging or batching of results; e.g. to retrieve 30 grant records divided into 3 pages of 10 records each, the following three queries could be used:
return grants limit 10 => get 1st 10 records for page 1 (skip 0, by default)
return grants limit 10 skip 10 => get next 10 for page 2; skip the 10 we already have
return grants limit 10 skip 20 => get another 10 for page 3, for a total of 30
5.6 Sorting Results¶
A sort order for the results in a given return
phrase can be specified with the keyword sort by
followed by the name of * a field
(in the case that a source
is being requested) * an indicator (aggregation)
(in the case that one or more facets are being requested).
By default, the result set of full text queries (\(search ... for "full text query"\)) is sorted by “relevance”. Additionally, it is possible to specify the sort order, using asc
or desc
keywords. By default, descending order is selected.
[57]:
%%dsldf
search grants
for "nanomaterials"
return grants sort by title desc limit 5
Returned Grants: 5 (total = 20429)
Time: 0.71s
[57]:
active_year | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | end_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2015] | [{'acronym': 'DFG', 'city_name': 'Bonn', 'coun... | German Research Foundation | 280331443 | grant.4841519 | en | Transmissionselektronenmikroskop | 2015-01-01 | 2015 | Transmissionselektronenmikroskop | NaN |
1 | [2012] | [{'acronym': 'DFG', 'city_name': 'Bonn', 'coun... | German Research Foundation | 220923099 | grant.4823271 | de | Transmissionselektronenmikroskop | 2012-01-01 | 2012 | Transmissionselektronenmikroskop | NaN |
2 | [2011, 2012, 2013, 2014, 2015] | [{'acronym': 'BELSPO', 'city_name': 'Brussels'... | Belgian Federal Science Policy Office | 3E120109 | grant.6774902 | en | Snowcontrol. | 2011-06-16 | 2011 | Snowcontrol. | 2015-06-13 |
3 | [2015, 2016] | [{'acronym': 'FNP', 'city_name': 'Warsaw', 'co... | Foundation for Polish Science | START 79.2015 | grant.9182996 | pl | Stypendium Naukowe START | 2015-06-01 | 2015 | START Scholarship | 2016-06-01 |
4 | [2014, 2015] | [{'acronym': 'FNP', 'city_name': 'Warsaw', 'co... | Foundation for Polish Science | START 81.2014 | grant.9182975 | pl | Stypendium Naukowe START | 2014-06-01 | 2014 | START Scholarship | 2015-06-01 |
[58]:
%%dsldf
search grants
for "nanomaterials"
return grants sort by relevance desc limit 5
Returned Grants: 5 (total = 20429)
Time: 0.56s
[58]:
active_year | funders | funding_org_name | grant_number | id | language | original_title | start_date | start_year | title | end_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [2016] | [{'acronym': 'BELSPO', 'city_name': 'Brussels'... | Belgian Federal Science Policy Office | 37_3655 | grant.8597838 | en | "Bottom up" self-assembly of π-functional nano... | 2016-11-09 | 2016 | "Bottom up" self-assembly of π-functional nano... | NaN |
1 | [2005, 2006, 2007, 2008, 2009, 2010, 2011] | [{'acronym': 'Royal Society', 'city_name': 'Lo... | Royal Society | 7780-1 | grant.7928225 | en | NANOMATERIALS FOR BIOMOLECULAR SCIENCES AND NA... | 2005-10-01 | 2005 | NANOMATERIALS FOR BIOMOLECULAR SCIENCES AND NA... | 2011-03-01 |
2 | [2011, 2012, 2013, 2014] | [{'acronym': 'Royal Society', 'city_name': 'Lo... | Royal Society | 7780 | grant.7928224 | en | Nanomaterials for Biomolecular and Biomedical ... | 2011-03-01 | 2011 | Nanomaterials for Biomolecular and Biomedical ... | 2014-03-01 |
3 | [2005, 2006] | [{'acronym': 'ITC', 'city_name': 'Hong Kong', ... | Innovation and Technology Commission | InP/006/05 | grant.7166197 | en | Institute of NanoMaterials and NanoTechnology ... | 2005-06-24 | 2005 | Institute of NanoMaterials and NanoTechnology ... | 2006-06-23 |
4 | [2011] | [{'acronym': 'CFI', 'city_name': 'Ottawa', 'co... | Canada Foundation for Innovation | CFI9790 | grant.6767895 | en | Platform for nanomaterials excitation and in-s... | 2011-06-14 | 2011 | Platform for nanomaterials excitation and in-s... | NaN |
Number of citations per publication
[59]:
%%dsldf
search publications
return publications [doi + times_cited]
sort by times_cited limit 5
Returned Publications: 5 (total = 124848001)
Time: 1.27s
[59]:
doi | times_cited | |
---|---|---|
0 | 10.1016/s0021-9258(19)52451-6 | 262717 |
1 | 10.1038/227680a0 | 220122 |
2 | 10.1016/0003-2697(76)90527-3 | 200817 |
3 | 10.1103/physrevlett.77.3865 | 117222 |
4 | 10.1006/meth.2001.1262 | 111893 |
Recent citations per publication. Note: Recent citation refers to the number of citations accrued in the last two year period. A single value is stored per document and the year window rolls over in July.
[60]:
%%dsldf
search publications
return publications [doi + recent_citations]
sort by recent_citations limit 5
Returned Publications: 5 (total = 124848001)
Time: 1.21s
[60]:
doi | recent_citations | |
---|---|---|
0 | 10.1109/cvpr.2016.90 | 37726 |
1 | 10.3322/caac.21492 | 35929 |
2 | 10.1103/physrevlett.77.3865 | 30347 |
3 | 10.1006/meth.2001.1262 | 28571 |
4 | 10.1016/s0140-6736(20)30183-5 | 27347 |
When a facet is being returned, the indicator
used in the sort
phrase must either be count
(the default, such that sort by count
is unnecessary), or one of the indicators specified in the aggregate
phrase, i.e. one whose values are being computed in the faceting operation.
[61]:
%%dsldf
search publications
for "nanomaterials"
return research_orgs
aggregate altmetric_median, rcr_avg sort by rcr_avg limit 5
Returned Research_orgs: 5
Time: 3.38s
[61]:
acronym | altmetric_median | city_name | count | country_name | id | latitude | linkout | longitude | name | rcr_avg | state_name | types | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NIDCD | 85.0 | Bethesda | 1 | United States | grid.214431.1 | 39.105713 | [http://www.nidcd.nih.gov/Pages/default.aspx] | -77.186860 | National Institute on Deafness and Other Commu... | 245.190002 | Maryland | [Government] |
1 | NaN | 85.0 | Providence | 1 | United States | grid.241223.4 | 41.810280 | [http://www.womenandinfants.org/] | -71.410880 | Women & Infants Hospital of Rhode Island | 245.190002 | Rhode Island | [Healthcare] |
2 | SGPGIMS | 85.0 | Lucknow | 1 | India | grid.263138.d | 26.742609 | [http://www.sgpgi.ac.in/] | 80.946144 | Sanjay Gandhi Post Graduate Institute of Medic... | 245.190002 | NaN | [Facility] |
3 | NaN | 85.0 | Charlottesville | 1 | United States | grid.412587.d | 38.032307 | [http://www.uvahealth.com/] | -78.498610 | University of Virginia Health System | 245.190002 | Virginia | [Healthcare] |
4 | NaN | 85.0 | São Paulo | 1 | Brazil | grid.413320.7 | -23.565285 | [http://www.accamargo.org.br/] | -46.635864 | AC Camargo Hospital | 245.190002 | NaN | [Healthcare] |
5.7 Unnesting results¶
Multi-value entity and JSON fields, such as researchers
, authors
or research_orgs
or any of category_*
fields may be unnested into top level objects.
This operation makes it easier to do further operations on these objects e.g. counting or processing them further.
This functionality will transform all of the returned multi-value data and turn them into top level keys, such as researchers.id
, researchers.first_name
, researchers.last_name
, while copying other, non-unnested fields, such as id
or title
of publication for each of them. Returned results are therefore multiplied by as many researchers and categories each original publication has, so they will likely be more than the overall query limit, as the limit applies on the source
objects, not the unnested one. If multiple fields are being unnested, then a cartesian product of all unnested fields is being returned.
[62]:
%%dsldf
search publications for "Japan AND Buddhism"
where researchers is not empty
return publications[id+year+title+unnest(researchers)] limit 10
Returned objects: 23 (total publications= 45209)
Time: 1.38s
[62]:
id | researchers.first_name | researchers.id | researchers.last_name | researchers.orcid_id | researchers.research_orgs | title | year | |
---|---|---|---|---|---|---|---|---|
0 | pub.1142524267 | Christophe J | ur.011737013277.00 | Godlewski | [0000-0002-1391-1108] | [grid.11843.3f, grid.462209.b, grid.9156.b] | Family firms and the cost of borrowing: empiri... | 2022 |
1 | pub.1142437244 | Saibal | ur.014521275617.23 | Ghosh | NaN | [grid.465042.1, grid.507449.b] | Religiosity and bank performance: How strong i... | 2022 |
2 | pub.1144045080 | Tsuyoshi | ur.016274403511.19 | Hatori | NaN | [grid.255464.4, grid.258799.8, grid.32197.3e] | Posttraumatic stress disorder and its predicto... | 2022 |
3 | pub.1144045080 | Netra Prakash | ur.015100044631.08 | Bhandary | NaN | [grid.255464.4] | Posttraumatic stress disorder and its predicto... | 2022 |
4 | pub.1142405234 | Sibel | ur.016623460225.89 | Kuşdemir | NaN | NaN | A Critical Analysis of the Tidal Model of Ment... | 2022 |
5 | pub.1142405234 | Abe | ur.01164051372.26 | Oudshoorn | [0000-0003-0277-8724] | [grid.39381.30] | A Critical Analysis of the Tidal Model of Ment... | 2022 |
6 | pub.1142405234 | Jean Pierre | ur.011564037563.23 | Ndayisenga | [0000-0002-3508-975X] | [grid.39381.30, grid.10818.30] | A Critical Analysis of the Tidal Model of Ment... | 2022 |
7 | pub.1142222763 | Namgay Pem | ur.011002223475.74 | Dorji | NaN | [grid.1009.8, grid.449502.e] | Productivity improvement to sustain small-scal... | 2022 |
8 | pub.1142222763 | Pema | ur.016673751074.70 | Thinley | NaN | [grid.449502.e, grid.473381.a] | Productivity improvement to sustain small-scal... | 2022 |
9 | pub.1144292593 | Ranjeet | ur.010660116554.38 | John | [0000-0002-0150-8450] | [grid.430387.b, grid.17088.36, grid.65519.3e, ... | Sustainability challenges for the social-envir... | 2022 |
10 | pub.1144292593 | Pavel Ya | ur.014213100657.48 | Groisman | [0000-0001-6255-324X] | [grid.4886.2, grid.40803.3f, grid.426292.9, gr... | Sustainability challenges for the social-envir... | 2022 |
11 | pub.1144292593 | Ginger R H | ur.01323604570.79 | Allington | [0000-0003-0446-0576] | [grid.253615.6, grid.484514.8, grid.262962.b, ... | Sustainability challenges for the social-envir... | 2022 |
12 | pub.1144292593 | Kirsten M | ur.07665600562.61 | De Beurs | [0000-0002-9244-3292] | [grid.438526.e, grid.266900.b, grid.14003.36, ... | Sustainability challenges for the social-envir... | 2022 |
13 | pub.1144292593 | Arnon M | ur.014070213767.33 | Karnieli | [0000-0001-8065-9793] | [grid.7489.2, grid.410727.7, grid.41156.37] | Sustainability challenges for the social-envir... | 2022 |
14 | pub.1144292593 | G Garik | ur.012323502577.70 | Gutman | [0000-0002-2979-9675] | [grid.3532.7, grid.164295.d, grid.238252.c, gr... | Sustainability challenges for the social-envir... | 2022 |
15 | pub.1144292593 | Beyza | ur.014460477732.44 | Şat | NaN | NaN | Sustainability challenges for the social-envir... | 2022 |
16 | pub.1144292593 | Geoffrey M | ur.0751300123.34 | Henebry | [0000-0002-8999-2709] | [grid.419222.e, grid.430387.b, grid.17088.36, ... | Sustainability challenges for the social-envir... | 2022 |
17 | pub.1144292593 | Amarjargal | ur.07662063177.45 | Amartuvshin | NaN | [grid.444538.a] | Sustainability challenges for the social-envir... | 2022 |
18 | pub.1144292593 | Maira | ur.014277742317.84 | Kussainova | [0000-0002-9800-6093] | [grid.171588.2] | Sustainability challenges for the social-envir... | 2022 |
19 | pub.1143489614 | Yunqi | ur.07715015627.67 | Fan | NaN | [grid.258151.a] | Audit firm's Confucianism and stock price cras... | 2022 |
20 | pub.1140433062 | Lorraine K C | ur.016053415455.43 | Yeung | NaN | NaN | Which Way Down the Slippery Slope: Arkangel or... | 2022 |
21 | pub.1144342388 | Goran | ur.010065312255.23 | Zendelovski | NaN | NaN | The Ranking of the Western Balkan Countries Ac... | 2021 |
22 | pub.1144329854 | Rajan Binayek | ur.012335611513.76 | Pasa | NaN | [grid.80817.36] | Interface between Tourism and Rural Developmen... | 2021 |
[63]:
%%dsldf
search publications for "Japan AND Buddhism"
return publications[id+year+title+unnest(category_for)] limit 5
Returned objects: 12 (total publications= 139609)
Time: 1.00s
[63]:
category_for.id | category_for.name | id | title | year | |
---|---|---|---|---|---|
0 | 3675 | 2103 Historical Studies | pub.1143990292 | Arrangement Plan of Inner Mongolia Buddhist Te... | 2022 |
1 | 2221 | 21 History and Archaeology | pub.1143990292 | Arrangement Plan of Inner Mongolia Buddhist Te... | 2022 |
2 | 3373 | 1506 Tourism | pub.1144778122 | East meets West: Spiritual tourism in Chinese ... | 2022 |
3 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1144778122 | East meets West: Spiritual tourism in Chinese ... | 2022 |
4 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1142524267 | Family firms and the cost of borrowing: empiri... | 2022 |
5 | 3335 | 1502 Banking, Finance and Investment | pub.1142524267 | Family firms and the cost of borrowing: empiri... | 2022 |
6 | 2217 | 17 Psychology and Cognitive Sciences | pub.1144241292 | Racist Love | 2022 |
7 | 3468 | 1701 Psychology | pub.1144241292 | Racist Love | 2022 |
8 | 3364 | 1505 Marketing | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 |
9 | 3373 | 1506 Tourism | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 |
10 | 3342 | 1503 Business and Management | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 |
11 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 |
You can unnest
as many fields as you want. However the number of results will grow pretty quickly!
[64]:
%%dsldf
search publications for "Japan AND Buddhism"
return publications[id+year+title+unnest(category_for)+unnest(researchers)+unnest(research_orgs)] limit 5
Returned objects: 24 (total publications= 139609)
Time: 0.70s
[64]:
category_for.id | category_for.name | id | title | year | research_orgs.acronym | research_orgs.city_name | research_orgs.country_name | research_orgs.id | research_orgs.latitude | research_orgs.linkout | research_orgs.longitude | research_orgs.name | research_orgs.types | research_orgs.state_name | researchers.first_name | researchers.id | researchers.last_name | researchers.orcid_id | researchers.research_orgs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3675 | 2103 Historical Studies | pub.1143990292 | Arrangement Plan of Inner Mongolia Buddhist Te... | 2022 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 2221 | 21 History and Archaeology | pub.1143990292 | Arrangement Plan of Inner Mongolia Buddhist Te... | 2022 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 3373 | 1506 Tourism | pub.1144778122 | East meets West: Spiritual tourism in Chinese ... | 2022 | UdG | Girona | Spain | grid.5319.e | 41.985695 | [http://www.udg.edu/Not%C3%ADciesiagenda/tabid... | 2.827373 | University of Girona | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1144778122 | East meets West: Spiritual tourism in Chinese ... | 2022 | UdG | Girona | Spain | grid.5319.e | 41.985695 | [http://www.udg.edu/Not%C3%ADciesiagenda/tabid... | 2.827373 | University of Girona | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1142524267 | Family firms and the cost of borrowing: empiri... | 2022 | NaN | Strasbourg | France | grid.11843.3f | 48.580276 | [http://www.en.unistra.fr/index.php?id=21707] | 7.764444 | University of Strasbourg | [Education] | Alsace | Christophe J | ur.011737013277.00 | Godlewski | [0000-0002-1391-1108] | [grid.11843.3f, grid.462209.b, grid.9156.b] |
5 | 3335 | 1502 Banking, Finance and Investment | pub.1142524267 | Family firms and the cost of borrowing: empiri... | 2022 | NaN | Strasbourg | France | grid.11843.3f | 48.580276 | [http://www.en.unistra.fr/index.php?id=21707] | 7.764444 | University of Strasbourg | [Education] | Alsace | Christophe J | ur.011737013277.00 | Godlewski | [0000-0002-1391-1108] | [grid.11843.3f, grid.462209.b, grid.9156.b] |
6 | 2217 | 17 Psychology and Cognitive Sciences | pub.1144241292 | Racist Love | 2022 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
7 | 3468 | 1701 Psychology | pub.1144241292 | Racist Love | 2022 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8 | 3364 | 1505 Marketing | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | MCU | Taoyuan District | Taiwan | grid.411804.8 | 25.086351 | [http://www1.mcu.edu.tw/Apps/SB/SB_Site.aspx?P... | 121.528080 | Ming Chuan University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
9 | 3364 | 1505 Marketing | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NaN | Brisbane | Australia | grid.1022.1 | -27.470610 | [http://www.griffith.edu.au/] | 153.022860 | Griffith University | [Education] | Queensland | NaN | NaN | NaN | NaN | NaN |
10 | 3364 | 1505 Marketing | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | FJU | Taipei | Taiwan | grid.256105.5 | 25.035807 | [http://www.fju.edu.tw/#&panel1-1] | 121.433170 | Fu Jen Catholic University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
11 | 3364 | 1505 Marketing | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NKFUST | Kaohsiung City | Taiwan | grid.412071.1 | 22.754444 | [http://www.nkfust.edu.tw/bin/home.php] | 120.333336 | National Kaohsiung First University of Science... | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
12 | 3373 | 1506 Tourism | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | MCU | Taoyuan District | Taiwan | grid.411804.8 | 25.086351 | [http://www1.mcu.edu.tw/Apps/SB/SB_Site.aspx?P... | 121.528080 | Ming Chuan University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
13 | 3373 | 1506 Tourism | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NaN | Brisbane | Australia | grid.1022.1 | -27.470610 | [http://www.griffith.edu.au/] | 153.022860 | Griffith University | [Education] | Queensland | NaN | NaN | NaN | NaN | NaN |
14 | 3373 | 1506 Tourism | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | FJU | Taipei | Taiwan | grid.256105.5 | 25.035807 | [http://www.fju.edu.tw/#&panel1-1] | 121.433170 | Fu Jen Catholic University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
15 | 3373 | 1506 Tourism | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NKFUST | Kaohsiung City | Taiwan | grid.412071.1 | 22.754444 | [http://www.nkfust.edu.tw/bin/home.php] | 120.333336 | National Kaohsiung First University of Science... | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
16 | 3342 | 1503 Business and Management | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | MCU | Taoyuan District | Taiwan | grid.411804.8 | 25.086351 | [http://www1.mcu.edu.tw/Apps/SB/SB_Site.aspx?P... | 121.528080 | Ming Chuan University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
17 | 3342 | 1503 Business and Management | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NaN | Brisbane | Australia | grid.1022.1 | -27.470610 | [http://www.griffith.edu.au/] | 153.022860 | Griffith University | [Education] | Queensland | NaN | NaN | NaN | NaN | NaN |
18 | 3342 | 1503 Business and Management | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | FJU | Taipei | Taiwan | grid.256105.5 | 25.035807 | [http://www.fju.edu.tw/#&panel1-1] | 121.433170 | Fu Jen Catholic University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
19 | 3342 | 1503 Business and Management | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NKFUST | Kaohsiung City | Taiwan | grid.412071.1 | 22.754444 | [http://www.nkfust.edu.tw/bin/home.php] | 120.333336 | National Kaohsiung First University of Science... | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
20 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | MCU | Taoyuan District | Taiwan | grid.411804.8 | 25.086351 | [http://www1.mcu.edu.tw/Apps/SB/SB_Site.aspx?P... | 121.528080 | Ming Chuan University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
21 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NaN | Brisbane | Australia | grid.1022.1 | -27.470610 | [http://www.griffith.edu.au/] | 153.022860 | Griffith University | [Education] | Queensland | NaN | NaN | NaN | NaN | NaN |
22 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | FJU | Taipei | Taiwan | grid.256105.5 | 25.035807 | [http://www.fju.edu.tw/#&panel1-1] | 121.433170 | Fu Jen Catholic University | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
23 | 2215 | 15 Commerce, Management, Tourism and Services | pub.1143842506 | Comparison of localized and foreign restaurant... | 2022 | NKFUST | Kaohsiung City | Taiwan | grid.412071.1 | 22.754444 | [http://www.nkfust.edu.tw/bin/home.php] | 120.333336 | National Kaohsiung First University of Science... | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
6. Aggregations¶
In a return
phrase requesting one or more facet
results, aggregation operations to perform during faceting can be specified after the facet name(s) by using the keyword aggregate
followed by a comma-separated list of one or more indicator
names corresponding to the source
being searched.
[65]:
%%dsldf
search publications
where year > 2010
return research_orgs
aggregate rcr_avg, altmetric_median limit 5
Returned Research_orgs: 5
Time: 19.81s
[65]:
altmetric_median | city_name | count | country_name | id | latitude | linkout | longitude | name | rcr_avg | state_name | types | acronym | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5.0 | Cambridge | 270683 | United States | grid.38142.3c | 42.377052 | [http://www.harvard.edu/] | -71.116650 | Harvard University | 2.493135 | Massachusetts | [Education] | NaN |
1 | 4.0 | Toronto | 177499 | Canada | grid.17063.33 | 43.661667 | [http://www.utoronto.ca/] | -79.395000 | University of Toronto | 2.051205 | Ontario | [Education] | NaN |
2 | 2.0 | Tokyo | 174026 | Japan | grid.26999.3d | 35.713333 | [http://www.u-tokyo.ac.jp/en/] | 139.762220 | University of Tokyo | 1.401742 | NaN | [Education] | UT |
3 | 2.0 | São Paulo | 169020 | Brazil | grid.11899.38 | -23.563051 | [http://www5.usp.br/en/] | -46.730103 | University of São Paulo | 1.357405 | NaN | [Education] | USP |
4 | 2.0 | Beijing | 159195 | China | grid.410726.6 | 39.909058 | [http://www.ucas.ac.cn/] | 116.250570 | University of Chinese Academy of Sciences | 1.678706 | NaN | [Education] | UCAS |
What are the metrics/aggregations available? See the data sources documentation for information about available indicators.
Alternatively, we can use the ‘schema’ API (describe) to return this information programmatically:
[66]:
schema = dsl.query("describe schema")
sources = [x for x in schema['sources']]
# for each source name, extract metrics info
for s in sources:
print("SOURCE:", s)
for m in schema['sources'][s]['metrics']:
print("--", schema['sources'][s]['metrics'][m]['name'], " => ", schema['sources'][s]['metrics'][m]['description'], )
SOURCE: clinical_trials
-- count => Total count
SOURCE: datasets
-- count => Total count
SOURCE: grants
-- count => Total count
-- funding => Total funding amount, in USD.
SOURCE: organizations
-- count => Total count
SOURCE: patents
-- count => Total count
SOURCE: policy_documents
-- count => Total count
SOURCE: publications
-- altmetric_avg => Altmetric Attention Score mean
-- altmetric_median => Median Altmetric Attention Score
-- citations_avg => Arithmetic mean of citations
-- citations_median => Median of citations
-- citations_total => Aggregated number of citations
-- count => Total count
-- fcr_gavg => Geometric mean of `field_citation_ratio` field (note: This field cannot be used for sorting results).
-- rcr_avg => Arithmetic mean of `relative_citation_ratio` field.
-- recent_citations_total => For a given article, in a given year, the number of citations accrued in the last two year period. Single value stored per document, year window rolls over in July.
SOURCE: reports
-- count => Total count
SOURCE: researchers
-- count => Total count
SOURCE: source_titles
-- count => Total count
NOTE In addition to any specified aggregations, count
is always computed and reported when facet results are requested.
[67]:
%%dsldf
search grants
for "5g network"
return funders
aggregate count, funding sort by funding limit 5
Returned Funders: 5
Time: 0.63s
[67]:
acronym | city_name | count | country_name | funding | id | latitude | linkout | longitude | name | types | state_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | EC | Brussels | 233 | Belgium | 1.190915e+09 | grid.270680.b | 50.851650 | [http://ec.europa.eu/index_en.htm] | 4.363670 | European Commission | [Government] | NaN |
1 | NSF CISE | Arlington | 191 | United States | 1.078005e+08 | grid.457785.c | 38.880580 | [http://www.nsf.gov/dir/index.jsp?org=CISE] | -77.111000 | Directorate for Computer & Information Science... | [Government] | Virginia |
2 | EPSRC | Swindon | 88 | United Kingdom | 6.816998e+07 | grid.421091.f | 51.567093 | [https://www.epsrc.ac.uk/] | -1.784602 | Engineering and Physical Sciences Research Cou... | [Government] | England |
3 | NCRD | Warsaw | 8 | Poland | 5.036449e+07 | grid.55047.33 | 52.227455 | [http://www.ncbr.gov.pl/en/] | 21.007630 | National Centre for Research and Development | [Government] | NaN |
4 | ITC | Hong Kong | 53 | China | 4.686881e+07 | grid.453115.7 | 22.282640 | [http://www.itc.gov.hk/en/about/org.htm] | 114.166580 | Innovation and Technology Commission | [Government] | NaN |
Aggregated total number of citations
[68]:
%%dsldf
search publications
for "ontologies"
return funders
aggregate citations_total
sort by citations_total limit 5
Returned Funders: 5
Time: 1.09s
[68]:
acronym | citations_total | city_name | count | country_name | id | latitude | linkout | longitude | name | state_name | types | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NIGMS | 1137448.0 | Bethesda | 16287 | United States | grid.280785.0 | 38.997833 | [http://www.nigms.nih.gov/Pages/default.aspx] | -77.09938 | National Institute of General Medical Sciences | Maryland | [Facility] |
1 | NCI | 1124818.0 | Bethesda | 15855 | United States | grid.48336.3a | 39.004326 | [http://www.cancer.gov/] | -77.10119 | National Cancer Institute | Maryland | [Government] |
2 | EC | 812606.0 | Brussels | 23498 | Belgium | grid.270680.b | 50.851650 | [http://ec.europa.eu/index_en.htm] | 4.36367 | European Commission | NaN | [Government] |
3 | NHGRI | 778811.0 | Bethesda | 5653 | United States | grid.280128.1 | 38.996967 | [https://www.genome.gov/] | -77.09693 | National Human Genome Research Institute | Maryland | [Facility] |
4 | NSFC | 728741.0 | Beijing | 47927 | China | grid.419696.5 | 40.005177 | [http://www.nsfc.gov.cn/publish/portal1/] | 116.33983 | National Natural Science Foundation of China | NaN | [Government] |
Arithmetic mean number of citations
[69]:
%%dsldf
search publications
return funders
aggregate citations_avg
sort by citations_avg limit 5
Returned Funders: 5
Time: 2.75s
[69]:
acronym | citations_avg | city_name | count | country_name | id | latitude | linkout | longitude | name | types | state_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NIAA | 436.0 | London | 1 | United Kingdom | grid.453470.1 | 51.519370 | [http://www.niaa.org.uk/] | -0.119685 | National Institute of Academic Anaesthesia | [Facility] | NaN |
1 | NaN | 401.0 | Torrance | 1 | United States | grid.467593.a | 33.851273 | [http://www.toyota.com/usa/] | -118.316730 | Toyota Motor Corporation (United States) | [Company] | California |
2 | AHF | 365.0 | Los Angeles | 2 | United States | grid.427827.c | 34.098557 | [http://www.aidshealth.org/#/] | -118.325600 | AIDS Healthcare Foundation | [Nonprofit] | California |
3 | MDS | 332.0 | Milwaukee | 1 | United States | grid.469679.3 | 43.040794 | [http://www.movementdisorders.org/MDS.htm] | -87.904630 | International Parkinson and Movement Disorder ... | [Other] | Wisconsin |
4 | JRC | 297.5 | Brussels | 2 | Belgium | grid.489339.c | 50.850403 | [https://ec.europa.eu/info/departments/joint-r... | 4.347922 | Directorate-General Joint Research Centre | [Government] | NaN |
Geometric mean of FCR
[70]:
%%dsldf
search publications
return funders
aggregate fcr_gavg limit 5
Returned Funders: 5
Time: 4.59s
[70]:
acronym | city_name | count | country_name | fcr_gavg | id | latitude | linkout | longitude | name | types | state_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NSFC | Beijing | 2619510 | China | 2.487496 | grid.419696.5 | 40.005177 | [http://www.nsfc.gov.cn/publish/portal1/] | 116.339830 | National Natural Science Foundation of China | [Government] | NaN |
1 | EC | Brussels | 893414 | Belgium | 3.290802 | grid.270680.b | 50.851650 | [http://ec.europa.eu/index_en.htm] | 4.363670 | European Commission | [Government] | NaN |
2 | MOST | Beijing | 806641 | China | 2.695834 | grid.424020.0 | 39.827835 | [http://www.most.gov.cn/eng/] | 116.316284 | Ministry of Science and Technology of the Peop... | [Government] | NaN |
3 | JSPS | Tokyo | 712701 | Japan | 2.298782 | grid.54432.34 | 35.687160 | [http://www.jsps.go.jp/] | 139.740390 | Japan Society for the Promotion of Science | [Nonprofit] | NaN |
4 | NCI | Bethesda | 623054 | United States | 4.814527 | grid.48336.3a | 39.004326 | [http://www.cancer.gov/] | -77.101190 | National Cancer Institute | [Government] | Maryland |
Median Altmetric Attention Score
[71]:
%%dsldf
search publications
return funders aggregate altmetric_median
sort by altmetric_median limit 5
Returned Funders: 5
Time: 8.40s
[71]:
acronym | altmetric_median | city_name | count | country_name | id | latitude | linkout | longitude | name | state_name | types | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NPT | 749.0 | Jenkintown | 1 | United States | grid.479428.4 | 40.090862 | [https://www.nptrust.org/] | -75.136150 | National Philanthropic Trust | Pennsylvania | [Nonprofit] |
1 | NaN | 679.0 | Tel Aviv | 1 | Israel | grid.481313.c | 32.089615 | [http://www.rcf.org.il/index.php?lang=en] | 34.778706 | Rothschild Caesarea Foundation | NaN | [Nonprofit] |
2 | APSF | 666.0 | Indianapolis | 1 | United States | grid.478218.0 | 39.651688 | [http://www.apsf.org/] | -86.157870 | Anesthesia Patient Safety Foundation | Indiana | [Nonprofit] |
3 | TOS | 649.0 | Silver Spring | 1 | United States | grid.430827.a | 38.998497 | [http://www.obesity.org/] | -77.029990 | Obesity Society | Maryland | [Nonprofit] |
4 | NaN | 564.0 | St Louis | 2 | United States | grid.453832.b | 38.635190 | [http://www.longerlife.org/] | -90.262726 | Longer Life Foundation | Missouri | [Nonprofit] |
6.1 Complex aggregations¶
The return
phrase may be followed by a function expression, to return additional calculations, such as per year funding or citations statistics. These functions may take their own arguments, and are calculated using the source data as specified in the search part
of the query.
At the time of writing, there are two functions available: Publications citations_per_year
and Grants funding_per_year
Publications citations_per_year
¶
Publication citations is the number of times that publications have been cited by other publications in the database. This function returns the number of citations received in each year.
[72]:
%%dsldf
search publications for "brexit"
return citations_per_year(2010, 2020)
Returned Citations_per_year: 11
Time: 1.07s
[72]:
citations_per_year | |
---|---|
2010 | 13.0 |
2011 | 14.0 |
2012 | 17.0 |
2013 | 31.0 |
2014 | 43.0 |
2015 | 170.0 |
2016 | 1007.0 |
2017 | 6321.0 |
2018 | 18699.0 |
2019 | 35480.0 |
2020 | 62346.0 |
Grants funding_per_year
¶
Returns grant funding per year in the given currency, starting from specified year, ending in specified year (including).
Supported currencies are: CAD,USD,JPY,GBP,CHF,CNY,EUR,NZD,AUD
[73]:
%%dsldf
search grants for "brexit"
return funding_per_year(2010, 2020, "USD")
Returned Funding_per_year: 11
Time: 0.57s
[73]:
funding_per_year | |
---|---|
2010 | 0.0 |
2011 | 0.0 |
2012 | 0.0 |
2013 | 4412.0 |
2014 | 10020.0 |
2015 | 313173.0 |
2016 | 788787.0 |
2017 | 6194077.0 |
2018 | 15698532.0 |
2019 | 34474721.0 |
2020 | 42305367.0 |
Note
The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.