Expert Identification with the Dimensions API - An Introduction¶
This notebook shows to use the expert identification workflow available via Dimensions Analytics API.
[1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
==
CHANGELOG
This notebook was last run on Jan 25, 2022
==
Prerequisites¶
This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.
[2]:
!pip install dimcli --quiet
import dimcli
from dimcli.utils import *
import json
import sys
import pandas as pd
print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
import getpass
KEY = getpass.getpass(prompt='API Key: ')
dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
KEY = ""
dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
Searching config file credentials for 'https://app.dimensions.ai' endpoint..
==
Logging in..
Dimcli - Dimensions API Client (v0.9.6)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0
Method: dsl.ini file
At a glance¶
At its simplest, an expert search query looks like this:
[3]:
%%dsl
identify experts from concepts "malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
using publications
where year >= 2015
return experts[basics]
[3]:
<dimcli.DslDataset object #4415239408. Dict keys: '_copyright', '_stats', '_version', 'experts'>
The query takes a list of concepts defining the expertise you’re looking for, plus other parameters defining the pool of publications to be used, and it returns a list of researchers sorted by relevance.
[4]:
pd.DataFrame(dsl_last_results['experts'])
[4]:
docs_found | first_name | id | last_name | research_orgs | score | orcid_id | |
---|---|---|---|---|---|---|---|
0 | 5 | Martha | ur.01162445502.98 | Sedegah | [grid.4437.4, grid.94365.3d, grid.411439.a, gr... | 124.891458 | NaN |
1 | 5 | James G | ur.01225135650.70 | Beeson | [grid.1002.3, grid.10223.32, grid.33058.3d, gr... | 112.559316 | [0000-0002-1018-7898] |
2 | 4 | Danielle I | ur.01323510115.98 | Stanisic | [grid.1022.1, grid.1049.c, grid.1042.7, grid.1... | 109.252085 | [0000-0003-3908-7468] |
3 | 5 | Kazutoyo | ur.01253714727.65 | Miura | [grid.94365.3d, grid.429651.d, grid.265107.7, ... | 101.304842 | [0000-0003-4455-2432] |
4 | 3 | Michael Francis | ur.0752141120.95 | Good | [grid.1008.9, grid.1043.6, grid.415913.b, grid... | 90.852515 | NaN |
5 | 4 | Jack S | ur.01354757704.29 | Richards | [grid.1056.2, grid.1623.6, grid.416153.4, grid... | 80.522901 | [0000-0001-5786-6989] |
6 | 3 | Michael R | ur.01165702423.17 | Hollingdale | [grid.507680.c, grid.418352.9, grid.265436.0, ... | 80.158875 | NaN |
7 | 3 | Eileen D | ur.0703623237.41 | Villasante | [grid.415913.b] | 80.158875 | NaN |
8 | 4 | Carole A | ur.01153247161.33 | Long | [grid.419681.3, grid.94365.3d, grid.4991.5, gr... | 79.714262 | [0000-0002-3835-5443] |
9 | 3 | Harini D | ur.01066177176.10 | Ganeshan | [grid.415913.b, grid.201075.1] | 77.814179 | NaN |
10 | 3 | Maria N | ur.01211504563.47 | Belmonte | [grid.415913.b, grid.201075.1] | 77.814179 | [0000-0002-4103-6611] |
11 | 3 | Bjoern | ur.07447331037.97 | Peters | [grid.5170.3, grid.266100.3, grid.185006.a, gr... | 77.814179 | [0000-0002-8457-6693] |
12 | 3 | Stefan H I | ur.01214724604.55 | Kappe | [grid.34477.33, grid.412623.0, grid.7849.2, gr... | 76.991466 | NaN |
13 | 3 | Simon J | ur.0751102271.80 | Draper | [grid.4991.5, grid.10253.35, grid.425090.a, gr... | 76.402222 | [0000-0002-9415-1357] |
14 | 3 | Robert W | ur.01136367303.27 | Sauerwein | [grid.475691.8, grid.420155.7, grid.461088.3, ... | 73.216753 | NaN |
15 | 4 | Kenneth D | ur.07564342517.54 | Stuart | [grid.280051.e, grid.418352.9, grid.5252.0, gr... | 69.617699 | NaN |
16 | 3 | Arnel D | ur.01104675207.03 | Belmonte | [grid.415913.b, grid.201075.1] | 68.449397 | NaN |
17 | 2 | Adrian Vivian Sinton | ur.010475203247.10 | Hill | [grid.412587.d, grid.8191.1, grid.418159.0, gr... | 64.351523 | [0000-0003-0900-9629] |
18 | 2 | Ashley Michael | ur.0667763776.52 | Vaughan | [grid.418227.a, grid.53964.3d, grid.34477.33, ... | 61.861446 | [0000-0001-5815-756X] |
19 | 3 | Takafumi | ur.01246255474.14 | Tsuboi | [grid.31501.36, grid.174567.6, grid.267625.2, ... | 59.119142 | [0000-0002-7415-1325] |
Often though, we start from some text and want to find experts relevant to that text (as opposed to starting from concepts).
The expert identification workflow, in such a case, consists of two steps:
Concepts extraction from text
Expert identification using concepts
In the first step, the user extracts concepts from an abstract. The user can review and modify the list of extracted concepts and then feed it into the actual expert identification workflow. In the following sections we will go though these steps in details.
Step 1: Concept Extraction¶
What are concepts?¶
Concepts are noun-phrases automatically extracted from a document’s abstract as well as the rest of the Dimensions database, which is used to weight their importance and relevance within the document’s field of study (see also the official documentation: searching using concepts).
For instance, the phrases machine learning and neural network will be considered very relevant in a computer science paper, while project and study will have their relevance scores low as they are generic phrases.
Extracting concepts with the DSL¶
Extracting concepts is implemented using the extract_concepts DSL function. This is the syntax:
extract_concepts("publication abstract")
This query will return a list of extracted concepts, ordered by weight, in descending order. For example:
[5]:
abstract = """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
centimeters per volt-second can be induced by applying gate voltage.
"""
abstract = abstract.replace("\n", " ")
res = dsl.query(f"""extract_concepts("{abstract}")""")
CONCEPTS = res['extracted_concepts']
pd.DataFrame(CONCEPTS)
[5]:
0 | |
---|---|
0 | films |
1 | ambipolar electric field effect |
2 | two-dimensional semimetal |
3 | electric field effects |
4 | room temperature mobility |
5 | conductance band |
6 | field effects |
7 | graphitic films |
8 | centimeters |
9 | gate voltage |
10 | semimetals |
11 | electrons |
12 | atoms |
13 | holes |
14 | square centimeter |
15 | metallic |
16 | ambient conditions |
17 | band |
18 | valence |
19 | voltage |
20 | mobility |
21 | high quality |
22 | overlap |
23 | effect |
24 | conditions |
25 | concentration |
26 | quality |
Step 2: Expert Identification¶
Extracted concepts, from step one, can be used in a identify experts
queries, for example:
identify experts from concepts "+malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
using publications
where research_org_countries is not empty
and year >= 2013
return experts[basics]
limit 20 skip 0
annotate organizational, coauthorship overlap
with ["ur.016204724721.35", "ur.012127355561.32"]
Returned experts are ordered by their relevance.
A few important things to remember:
Sources. Experts identification can use either
publications
orgrants
(when not specified, publications are used)Default connector is AND. When multiple concepts are provided, these are transformed automatically into an
AND
query. To match any of the concepts, one should then explicitly addOR
connectors.Where conditions. It is possible to specify
where-filters
but that’s not required. Fields available for filtering are exactly the same as the ones in standardsearch
expressions.Pagination. Similarly, the
paging-phrase
is optional. By default, the top 20 experts get returned - using limit/skip it is possible up to a maximum of 200.Overlap annotations. Annotating results with organizational and/or coauthorship overlap will produce another JSON object for each identified expert. This object has two parts.
The Organizational overlap is defined as a boolean value that is true if the expert and the researchers from the query have the same current research organization.
The Coauthorship conflict is defined as the number documents the expert has coauthored with any of the researchers provided in the query, in the last three years.
Example 1. Basic query using concepts
¶
[6]:
# take the top 15 concepts
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:15]])
q = f"""
identify experts
from concepts "{dsl_escape(some_concepts)}"
return experts
"""
print("Query:\n======", q)
dsl.query(q).as_dataframe()
Query:
======
identify experts
from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\" \"semimetals\" \"electrons\" \"atoms\" \"holes\" \"square centimeter\""
return experts
[6]:
docs_found | first_name | id | last_name | research_orgs | score | orcid_id | |
---|---|---|---|---|---|---|---|
0 | 1 | Anatoly A | ur.011033016243.08 | Firsov | [grid.4886.2, grid.424048.e, grid.425037.7, gr... | 269.41174 | NaN |
1 | 1 | Da | ur.01146544531.57 | Jiang | [grid.5379.8] | 269.41174 | NaN |
2 | 1 | Sergey V | ur.011535264111.51 | Dubonos | [grid.5254.6, grid.510709.a, grid.5379.8, grid... | 269.41174 | NaN |
3 | 1 | Konstantin Sergeevich | ur.01207120103.29 | Novoselov | [grid.5335.0, grid.423905.9, grid.425037.7, gr... | 269.41174 | [0000-0003-4972-5371] |
4 | 1 | Yuanbo | ur.0657076451.24 | Zhang | [grid.8547.e, grid.30389.31, grid.184769.5, gr... | 269.41174 | [0000-0003-1290-7980] |
5 | 1 | Andre Konstantin | ur.0721730631.45 | Geim | [grid.418975.6, grid.9026.d, grid.12527.33, gr... | 269.41174 | [0000-0003-2861-8331] |
6 | 1 | Sergey V | ur.07423561367.62 | Morozov | [grid.5379.8, grid.9026.d, grid.470117.4, grid... | 269.41174 | [0000-0003-3075-7787] |
7 | 1 | Irina V | ur.0767105504.29 | Grigorieva | [grid.418975.6, grid.500282.d, grid.418751.e, ... | 269.41174 | [0000-0001-5991-7778] |
Example 2. Query with OR
connectors¶
Note: this time we return all experts fields by using the syntax experts[all]
.
[7]:
some_concepts = " OR ".join(['"%s"' % x for x in CONCEPTS[:15]])
q = f"""
identify experts
from concepts "{dsl_escape(some_concepts)}"
return experts[all]
"""
print("Query:\n======", q)
dsl.query(q).as_dataframe()
Query:
======
identify experts
from concepts "\"films\" OR \"ambipolar electric field effect\" OR \"two-dimensional semimetal\" OR \"electric field effects\" OR \"room temperature mobility\" OR \"conductance band\" OR \"field effects\" OR \"graphitic films\" OR \"centimeters\" OR \"gate voltage\" OR \"semimetals\" OR \"electrons\" OR \"atoms\" OR \"holes\" OR \"square centimeter\""
return experts[all]
1 QueryError found
Semantic errors found:
Field / Fieldset 'all' is not present in Source 'researchers'. Available fields: current_research_org,dimensions_url,first_grant_year,first_name,first_publication_year,id,last_grant_year,last_name,last_publication_year,nih_ppid,obsolete,orcid_id,redirect,research_orgs,total_grants,total_publications and available fieldsets: basics,extras
Example 3. Query with where
filters¶
[8]:
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:10]])
q = f"""identify experts
from concepts "{dsl_escape(some_concepts)}"
using publications
where research_org_countries is not empty
and year >= 2000
and times_cited > 100
return experts
"""
print("Query:\n======", q)
dsl.query(q).as_dataframe()
Query:
====== identify experts
from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\""
using publications
where research_org_countries is not empty
and year >= 2000
and times_cited > 100
return experts
[8]:
docs_found | first_name | id | last_name | research_orgs | score | orcid_id | |
---|---|---|---|---|---|---|---|
0 | 1 | Anatoly A | ur.011033016243.08 | Firsov | [grid.4886.2, grid.424048.e, grid.425037.7, gr... | 204.01543 | NaN |
1 | 1 | Da | ur.01146544531.57 | Jiang | [grid.5379.8] | 204.01543 | NaN |
2 | 1 | Sergey V | ur.011535264111.51 | Dubonos | [grid.5254.6, grid.510709.a, grid.5379.8, grid... | 204.01543 | NaN |
3 | 1 | Konstantin Sergeevich | ur.01207120103.29 | Novoselov | [grid.5335.0, grid.423905.9, grid.425037.7, gr... | 204.01543 | [0000-0003-4972-5371] |
4 | 1 | Yuanbo | ur.0657076451.24 | Zhang | [grid.8547.e, grid.30389.31, grid.184769.5, gr... | 204.01543 | [0000-0003-1290-7980] |
5 | 1 | Andre Konstantin | ur.0721730631.45 | Geim | [grid.418975.6, grid.9026.d, grid.12527.33, gr... | 204.01543 | [0000-0003-2861-8331] |
6 | 1 | Sergey V | ur.07423561367.62 | Morozov | [grid.5379.8, grid.9026.d, grid.470117.4, grid... | 204.01543 | [0000-0003-3075-7787] |
7 | 1 | Irina V | ur.0767105504.29 | Grigorieva | [grid.418975.6, grid.500282.d, grid.418751.e, ... | 204.01543 | [0000-0001-5991-7778] |
Example 4. Adding Overlap Annotations (eg for conflict of interests checks)¶
[9]:
overlap_researchers = ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]
q = f"""
identify experts
from concepts "{dsl_escape(some_concepts)}"
using publications
where research_org_countries is not empty
and year >= 2000
return experts
annotate coauthorship, organizational overlap
with {json.dumps(overlap_researchers)}
"""
print("Query:\n======", q)
dsl.query(q).as_dataframe()
Query:
======
identify experts
from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\""
using publications
where research_org_countries is not empty
and year >= 2000
return experts
annotate coauthorship, organizational overlap
with ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]
[9]:
docs_found | first_name | id | last_name | research_orgs | score | overlap.coauthorship | overlap.organizational | orcid_id | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | Anatoly A | ur.011033016243.08 | Firsov | [grid.4886.2, grid.424048.e, grid.425037.7, gr... | 204.01543 | 0 | True | NaN |
1 | 1 | Da | ur.01146544531.57 | Jiang | [grid.5379.8] | 204.01543 | 0 | False | NaN |
2 | 1 | Sergey V | ur.011535264111.51 | Dubonos | [grid.5254.6, grid.510709.a, grid.5379.8, grid... | 204.01543 | 0 | True | NaN |
3 | 1 | Konstantin Sergeevich | ur.01207120103.29 | Novoselov | [grid.5335.0, grid.423905.9, grid.425037.7, gr... | 204.01543 | 175 | True | [0000-0003-4972-5371] |
4 | 1 | Yuanbo | ur.0657076451.24 | Zhang | [grid.8547.e, grid.30389.31, grid.184769.5, gr... | 204.01543 | 1 | False | [0000-0003-1290-7980] |
5 | 1 | Andre Konstantin | ur.0721730631.45 | Geim | [grid.418975.6, grid.9026.d, grid.12527.33, gr... | 204.01543 | 26 | False | [0000-0003-2861-8331] |
6 | 1 | Sergey V | ur.07423561367.62 | Morozov | [grid.5379.8, grid.9026.d, grid.470117.4, grid... | 204.01543 | 7 | False | [0000-0003-3075-7787] |
7 | 1 | Irina V | ur.0767105504.29 | Grigorieva | [grid.418975.6, grid.500282.d, grid.418751.e, ... | 204.01543 | 8 | False | [0000-0001-5991-7778] |
Example 5. Query with MUST/NOT Operators¶
By default, the string containing a list of concepts is interpreted as a sequence of AND
clauses. That is, the query tries to match the highest number of concepts without any preference.
It is possible to specify MUST/NOT rules with concepts by passing them via a string and using the +
and -
operators.
Note: please remember that concepts phrases (= concepts that are composed by more than one word) need to be wrapped using quotes, and the quotes need to be escaped with a \
.
[10]:
concepts = """
+"ambipolar electric field effect"
-"graphitic films"
+"films"
"electric field effects"
"""
q = f"""
identify experts
from concepts "{dsl_escape(concepts)}"
using publications
return experts
"""
print("Query:\n======", q)
dsl.query(q).as_dataframe()
Query:
======
identify experts
from concepts "
+\"ambipolar electric field effect\"
-\"graphitic films\"
+\"films\"
\"electric field effects\"
"
using publications
return experts
[10]:
docs_found | first_name | id | last_name | orcid_id | research_orgs | score | |
---|---|---|---|---|---|---|---|
0 | 1 | Luc | ur.01005576245.93 | Henrard | [0000-0002-2564-1221] | [grid.5284.b, grid.121334.6, grid.6520.1] | 51.435350 |
1 | 1 | Sylvain | ur.01251242035.86 | Latil | NaN | [grid.14095.39, grid.462531.7, grid.457336.0, ... | 51.435350 |
2 | 1 | Paul | ur.01000623240.81 | Syers | NaN | [grid.164295.d] | 44.719536 |
3 | 1 | Nicholas Patrick | ur.01046736440.46 | Butch | [0000-0002-6083-8388] | [grid.8547.e, grid.507868.4, grid.94225.38, gr... | 44.719536 |
4 | 1 | John-Pierre | ur.01060352233.12 | Paglione | NaN | [grid.8547.e, grid.507868.4, grid.440050.5, gr... | 44.719536 |
5 | 1 | Michael Sears | ur.01200656557.13 | Fuhrer | [0000-0001-6183-2773] | [grid.184769.5, grid.1002.3, grid.499241.3, gr... | 44.719536 |
6 | 1 | Dohun | ur.01205352017.54 | Kim | [0000-0001-9687-2089] | [grid.14003.36, grid.35541.36, grid.15444.30, ... | 44.719536 |
7 | 1 | Victor V | ur.01025667341.62 | Sysoev | [0000-0002-0372-1802] | [grid.446088.6, grid.263856.c, grid.78837.33, ... | 38.569305 |
8 | 1 | Mikhail A | ur.01245543252.06 | Shekhirev | [0000-0002-8381-1276] | [grid.14476.30, grid.24434.35, grid.166341.7] | 38.569305 |
9 | 1 | Alexey | ur.01276657166.76 | Lipatov | [0000-0001-5043-1616] | [grid.14476.30, grid.426324.5, grid.10420.37, ... | 38.569305 |
10 | 1 | Andrey V | ur.013212454037.49 | Lashkov | [0000-0001-6794-8523] | [grid.78837.33] | 38.569305 |
11 | 1 | Angel | ur.014430675711.42 | Torres | NaN | [grid.24434.35] | 38.569305 |
12 | 1 | Nataliia S | ur.016560200577.43 | Vorobeva | NaN | [grid.24434.35] | 38.569305 |
13 | 1 | Alexander S | ur.0646414360.09 | Sinitskii | [0000-0002-8688-3451] | [grid.24434.35, grid.170430.1, grid.1957.a, gr... | 38.569305 |
Example 6. MUST together with AND/OR¶
[11]:
concepts = """
(+"ambipolar electric field effect" -"graphitic films") OR
(+"films" -"electric field effects")
"""
q = f"""
identify experts
from concepts "{dsl_escape(concepts)}"
using publications
return experts
"""
print("Query:\n======", q)
dsl.query(q).as_dataframe()
Query:
======
identify experts
from concepts "
(+\"ambipolar electric field effect\" -\"graphitic films\") OR
(+\"films\" -\"electric field effects\")
"
using publications
return experts
[11]:
docs_found | first_name | id | last_name | orcid_id | research_orgs | score | |
---|---|---|---|---|---|---|---|
0 | 3 | Pablo | ur.01034030721.03 | Jarillo-Herrero | [0000-0001-8217-8213] | [grid.159791.2, grid.5338.d, grid.116068.8, gr... | 78.260747 |
1 | 3 | Young Sang | ur.01342755473.89 | Lee | NaN | [grid.69566.3a, grid.94225.38, grid.507868.4, ... | 78.260747 |
2 | 3 | Lan | ur.014670440227.86 | Wang | [0000-0001-7124-2718] | [grid.418788.a, grid.1007.6, grid.17635.36, gr... | 75.645182 |
3 | 3 | Shun-Qing | ur.0624630056.98 | Shen | [0000-0002-1954-5882] | [grid.8547.e, grid.450298.2, grid.464262.0, gr... | 75.645182 |
4 | 3 | Alexander S | ur.0646414360.09 | Sinitskii | [0000-0002-8688-3451] | [grid.24434.35, grid.170430.1, grid.1957.a, gr... | 69.488090 |
5 | 2 | Peng | ur.01150036175.42 | Ren | NaN | [grid.59025.3b] | 51.162295 |
6 | 2 | Azat | ur.056250446.77 | Sulaev | NaN | [grid.59025.3b] | 51.162295 |
7 | 2 | Bin | ur.0756673070.05 | Xia | NaN | [grid.59025.3b] | 51.162295 |
8 | 2 | James Mitchell | ur.01275626274.52 | Tour | [0000-0002-8479-9328] | [grid.264756.4, grid.21940.3e, grid.254567.7, ... | 49.416673 |
9 | 2 | Christian F | ur.01010600302.93 | Kisielowski | NaN | [grid.184769.5, grid.8385.6, grid.469490.6, gr... | 47.051859 |
10 | 2 | Andrey A | ur.01022425321.95 | Turchanin | [0000-0003-2388-1042] | [grid.7491.b, grid.4764.1, grid.35043.31, grid... | 47.051859 |
11 | 2 | Joachim | ur.01161437031.05 | Mayer | [0000-0003-3292-5342] | [grid.5719.a, grid.419534.e, grid.4372.2, grid... | 47.051859 |
12 | 2 | Konstantin B | ur.01163755245.41 | Efetov | [0000-0003-2245-1366] | [grid.21729.3f, grid.423485.c, grid.411233.6, ... | 47.051859 |
13 | 2 | Armin | ur.01172120354.34 | Gölzhäuser | [0000-0002-0838-9028] | [grid.1957.a, grid.419547.a, grid.414703.5, gr... | 47.051859 |
14 | 2 | Thomas | ur.0704114136.03 | Weimann | NaN | [grid.7491.b, grid.10392.39, grid.4764.1, grid... | 47.051859 |
15 | 2 | Mikhail V | ur.0770321167.94 | Fistul | NaN | [grid.169077.e, grid.5330.5, grid.4886.2, grid... | 47.051859 |
16 | 2 | Chinthamani Nagesa Ramachandra | ur.01301431054.98 | Rao | [0000-0003-4088-0615] | [grid.8155.9, grid.8954.0, grid.417965.8, grid... | 45.842103 |
17 | 2 | Kota Surya | ur.0634451611.05 | Subrahmanyam | NaN | [grid.16753.36, grid.472491.d, grid.419636.f, ... | 45.842103 |
18 | 2 | H S S Ramakrishna | ur.0726377173.88 | Matte | [0000-0001-8279-8447] | [grid.7468.d, grid.16753.36, grid.472491.d, gr... | 45.842103 |
19 | 2 | Ashok K | ur.01150320016.45 | Mulchandani | [0000-0002-2831-4154] | [grid.30389.31, grid.215654.1, grid.24805.3b, ... | 33.572757 |
Example 7. Wildcard searches¶
[12]:
concepts = """temperat* "ray diffraction" -magnet* """
q = f"""
identify experts
from concepts "{dsl_escape(concepts)}"
using publications
return experts
"""
print("Query:\n======", q)
dsl.query(q).as_dataframe()
Query:
======
identify experts
from concepts "temperat* \"ray diffraction\" -magnet* "
using publications
return experts
[12]:
docs_found | first_name | id | last_name | research_orgs | score | orcid_id | |
---|---|---|---|---|---|---|---|
0 | 4 | Akinori | ur.07620725665.51 | Katsui | [grid.26999.3d, grid.69566.3a, grid.265061.6, ... | 45.133475 | NaN |
1 | 3 | Andrey V | ur.010274015357.59 | Khoroshilov | [grid.435216.7, grid.431939.5] | 34.575968 | [0000-0002-0678-1421] |
2 | 3 | Konstantin S | ur.014606545157.85 | Gavrichev | [grid.435216.7] | 34.575968 | [0000-0001-5304-3555] |
3 | 3 | Paul | ur.014146743075.39 | Hagenmuller | [grid.4795.f, grid.463879.7, grid.411840.8, gr... | 34.065056 | NaN |
4 | 3 | Yi-Tai | ur.01261545713.97 | Qian | [grid.503014.3, grid.27255.37, grid.12527.33, ... | 33.943935 | NaN |
5 | 3 | Jean Pierre | ur.012446305716.07 | Chaminade | [grid.4444.0, grid.461891.3, grid.5292.c, grid... | 33.930549 | NaN |
6 | 2 | Tatyana V | ur.011457114721.52 | Dyachkova | [grid.426536.0, grid.465372.1, grid.446087.9] | 23.329706 | [0000-0001-6204-797X] |
7 | 2 | Sergey A | ur.015627070115.78 | Gromilov | [grid.4886.2, grid.4605.7, grid.415877.8, grid... | 23.329706 | NaN |
8 | 2 | Elena V | ur.01264404625.74 | Boldyreva | [grid.4605.7, grid.424048.e, grid.4708.b, grid... | 23.184193 | [0000-0002-1401-2438] |
9 | 2 | Alexander P | ur.015443160631.46 | Tyutyunnik | [grid.426536.0, grid.4886.2, grid.10548.38, gr... | 23.063437 | [0000-0003-1360-0913] |
10 | 2 | Steeve | ur.010407434762.78 | Rousselot | [grid.14848.31, grid.418084.1] | 22.955062 | NaN |
11 | 2 | David | ur.014245141470.20 | Aymé-Perrot | [grid.424348.d] | 22.955062 | NaN |
12 | 2 | Mickael | ur.016103556371.52 | Dollé | [grid.419552.e, grid.463728.c, grid.184769.5, ... | 22.955062 | [0000-0002-8887-6730] |
13 | 2 | Marc | ur.016131736155.35 | Bertrand | [grid.14848.31] | 22.955062 | NaN |
14 | 2 | Mitsuru | ur.013216052361.31 | Itoh | [grid.208504.b, grid.62167.34, grid.267827.e, ... | 22.942325 | [0000-0001-6457-9152] |
15 | 2 | Takemasa | ur.010313134742.30 | Fujino | [grid.263518.b] | 22.932349 | NaN |
16 | 2 | Morinobu | ur.01117474603.87 | Endo | [grid.5268.9, grid.417799.5, grid.136304.3, gr... | 22.932349 | NaN |
17 | 2 | Mildred S | ur.01226407050.99 | Dresselhaus | [grid.300943.d, grid.250008.f, grid.5335.0, gr... | 22.932349 | NaN |
18 | 2 | Chan | ur.014164432033.81 | Kim | [grid.263518.b] | 22.932349 | NaN |
19 | 2 | Guang-Xiang | ur.012216352601.27 | Liu | [grid.411412.3, grid.467196.b, grid.136593.b, ... | 22.894837 | [0000-0002-4742-6194] |
Additional resources: shortcut functions included in Dimcli¶
Dimcli includes a number of ‘shortcut’ Python functions that make it easier to work with the expert identification API.
[13]:
from dimcli.functions import extract_concepts, identify_experts, build_reviewers_matrix
extract_concepts¶
A Python wrapper for the DSL function extract_concept (see source).
Extract concepts from any text. Text input is processed and extracted concepts are returned as an array of strings ordered by their relevance
[14]:
%%extract_concepts
We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
centimeters per volt-second can be induced by applying gate voltage.
[14]:
concept | relevance | |
---|---|---|
0 | square centimeter | 0.681 |
1 | films | 0.669 |
2 | ambipolar electric field effect | 0.653 |
3 | two-dimensional semimetal | 0.646 |
4 | electric field effects | 0.628 |
5 | room temperature mobility | 0.621 |
6 | conductance band | 0.601 |
7 | graphitic films | 0.596 |
8 | field effects | 0.596 |
9 | centimeters | 0.587 |
10 | gate voltage | 0.587 |
11 | semimetals | 0.582 |
12 | electrons | 0.576 |
13 | atoms | 0.549 |
14 | holes | 0.542 |
15 | ambient conditions | 0.500 |
16 | band | 0.499 |
17 | valence | 0.471 |
18 | voltage | 0.464 |
19 | mobility | 0.406 |
20 | high quality | 0.390 |
21 | overlap | 0.367 |
22 | effect | 0.323 |
23 | conditions | 0.293 |
24 | concentration | 0.264 |
25 | quality | 0.214 |
identify_experts¶
A Python wrapper for the full expert identification workflow (see source).
This wrapper provide a simpler version of the expert identification API. It is meant to be a convenient alternative for basic queries. For more options, it is advised to use the API directly.
[15]:
%%identify_experts
We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
centimeters per volt-second can be induced by applying gate voltage.
[15]:
docs_found | first_name | first_publication_year | id | last_name | orcid_id | score | total_grants | total_publications | dimensions_url | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 17 | Daichi | 2000 | ur.01203703171.12 | Chiba | [0000-0002-6631-5131] | 720.802273 | 14 | 226 | https://app.dimensions.ai/discover/publication... |
1 | 12 | Ze Don | 1983 | ur.01055006635.53 | Kvon | NaN | 564.498955 | 16 | 367 | https://app.dimensions.ai/discover/publication... |
2 | 14 | Nobuhiro | 1976 | ur.011513332561.53 | Ohta | NaN | 547.722462 | 23 | 229 | https://app.dimensions.ai/discover/publication... |
3 | 12 | Tomohiro | 2008 | ur.01311211105.43 | Koyama | [0000-0003-4796-1776] | 497.447215 | 3 | 111 | https://app.dimensions.ai/discover/publication... |
4 | 10 | Teruo | 1993 | ur.012735754655.38 | Ono | NaN | 407.495159 | 29 | 486 | https://app.dimensions.ai/discover/publication... |
5 | 7 | Pablo | 1999 | ur.01034030721.03 | Jarillo-Herrero | [0000-0001-8217-8213] | 402.322615 | 7 | 276 | https://app.dimensions.ai/discover/publication... |
6 | 9 | Kenji | 1987 | ur.010575643400.34 | Watanabe | [0000-0003-3701-8119] | 360.808606 | 13 | 2694 | https://app.dimensions.ai/discover/publication... |
7 | 9 | Takashi | 1989 | ur.0765715521.02 | Taniguchi | NaN | 360.808606 | 24 | 2874 | https://app.dimensions.ai/discover/publication... |
8 | 8 | Eugene | 1989 | ur.0740560235.48 | Olshanetsky | [0000-0001-7027-9084] | 357.604337 | 0 | 98 | https://app.dimensions.ai/discover/publication... |
9 | 8 | Takahiro | 2002 | ur.014407221755.12 | Moriyama | [0000-0001-7071-0823] | 313.140915 | 10 | 181 | https://app.dimensions.ai/discover/publication... |
10 | 6 | Nikolai N | 2005 | ur.012411463367.07 | Mikhailov | NaN | 289.726385 | 1 | 227 | https://app.dimensions.ai/discover/publication... |
11 | 6 | Sergey A | 1992 | ur.014356370677.99 | Dvoretsky | NaN | 273.726251 | 0 | 272 | https://app.dimensions.ai/discover/publication... |
12 | 7 | Fuyuki | 2016 | ur.013563236015.42 | Ando | NaN | 273.314873 | 0 | 18 | https://app.dimensions.ai/discover/publication... |
13 | 7 | Kamlesh | 2008 | ur.01166776577.55 | Awasthi | [0000-0001-7852-059X] | 269.313042 | 0 | 48 | https://app.dimensions.ai/discover/publication... |
14 | 6 | Masashi | 2012 | ur.016465573056.75 | Kawaguchi | [0000-0001-5907-9137] | 244.958344 | 5 | 37 | https://app.dimensions.ai/discover/publication... |
15 | 6 | Kihiro T | 2013 | ur.016551140015.09 | Yamada | NaN | 239.089080 | 0 | 19 | https://app.dimensions.ai/discover/publication... |
16 | 6 | Haruka | 2013 | ur.010535542231.81 | Kakizakai | NaN | 232.707798 | 0 | 12 | https://app.dimensions.ai/discover/publication... |
17 | 6 | Myeongkyu | 1997 | ur.014732046553.91 | Lee | NaN | 226.825885 | 0 | 125 | https://app.dimensions.ai/discover/publication... |
18 | 3 | Young Sang | 1995 | ur.01342755473.89 | Lee | NaN | 220.422560 | 0 | 156 | https://app.dimensions.ai/discover/publication... |
19 | 4 | Alexander V | 1977 | ur.0632644662.66 | Chaplik | NaN | 211.387046 | 7 | 171 | https://app.dimensions.ai/discover/publication... |
Build a reviewers matrix¶
Generates a matrix of candidate reviewers for abstracts, using the expert identification workflow (see source).
If the input abstracts include identifiers, then those are used in the resulting matrix. Alternatively, a simple list of strings as input will result in a matrix where the identifiers are auto-generated from the abstracts order (first one is 1, etc..).
[16]:
abstracts = [
{
'id' : 'A1',
'text' : """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
centimeters per volt-second can be induced by applying gate voltage."""
},
{
'id' : "A2",
'text' : """The physicochemical properties of a molecule-metal interface, in principle, can play a significant role in tuning the electronic properties
of organic devices. In this report, we demonstrate an electrode engineering approach in a robust, reproducible molecular memristor that
enables a colossal tunability in both switching voltage (from 130 mV to 4 V i.e. >2500% variation) and current (by ~6 orders of magnitude).
This provides a spectrum of device design parameters that can be “dialed-in” to create fast, scalable and ultralow energy organic
memristors optimal for applications spanning digital memory, logic circuits and brain-inspired computing."""
}
]
[17]:
candidates = ["ur.01146544531.57", "ur.011535264111.51", "ur.0767105504.29", "ur.011513332561.53", "ur.01055006635.53"]
[18]:
build_reviewers_matrix(abstracts, candidates, verbose=False)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.95s/it]
[18]:
researcher | A1 | A2 | |
---|---|---|---|
0 | ur.01146544531.57 | 0.000000 | 0.000000 |
1 | ur.011535264111.51 | 500.057833 | 237.479195 |
2 | ur.0767105504.29 | 860.072228 | 924.316053 |
3 | ur.011513332561.53 | 3235.742721 | 1140.205152 |
4 | ur.01055006635.53 | 2518.152591 | 1183.936190 |
Note
The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.