../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Expert Identification with the Dimensions API - An Introduction

This notebook shows to use the expert identification workflow available via Dimensions Analytics API.

[1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
==
CHANGELOG
This notebook was last run on Jan 25, 2022
==

Prerequisites

This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.

[2]:
!pip install dimcli --quiet

import dimcli
from dimcli.utils import *

import json
import sys
import pandas as pd

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')
  dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
  KEY = ""
  dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
Searching config file credentials for 'https://app.dimensions.ai' endpoint..
==
Logging in..
Dimcli - Dimensions API Client (v0.9.6)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0
Method: dsl.ini file

At a glance

At its simplest, an expert search query looks like this:

[3]:
%%dsl

identify experts from concepts "malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
      using publications
      where year >= 2015
return experts[basics]
[3]:
<dimcli.DslDataset object #4415239408. Dict keys: '_copyright', '_stats', '_version', 'experts'>

The query takes a list of concepts defining the expertise you’re looking for, plus other parameters defining the pool of publications to be used, and it returns a list of researchers sorted by relevance.

[4]:
pd.DataFrame(dsl_last_results['experts'])
[4]:
docs_found first_name id last_name research_orgs score orcid_id
0 5 Martha ur.01162445502.98 Sedegah [grid.4437.4, grid.94365.3d, grid.411439.a, gr... 124.891458 NaN
1 5 James G ur.01225135650.70 Beeson [grid.1002.3, grid.10223.32, grid.33058.3d, gr... 112.559316 [0000-0002-1018-7898]
2 4 Danielle I ur.01323510115.98 Stanisic [grid.1022.1, grid.1049.c, grid.1042.7, grid.1... 109.252085 [0000-0003-3908-7468]
3 5 Kazutoyo ur.01253714727.65 Miura [grid.94365.3d, grid.429651.d, grid.265107.7, ... 101.304842 [0000-0003-4455-2432]
4 3 Michael Francis ur.0752141120.95 Good [grid.1008.9, grid.1043.6, grid.415913.b, grid... 90.852515 NaN
5 4 Jack S ur.01354757704.29 Richards [grid.1056.2, grid.1623.6, grid.416153.4, grid... 80.522901 [0000-0001-5786-6989]
6 3 Michael R ur.01165702423.17 Hollingdale [grid.507680.c, grid.418352.9, grid.265436.0, ... 80.158875 NaN
7 3 Eileen D ur.0703623237.41 Villasante [grid.415913.b] 80.158875 NaN
8 4 Carole A ur.01153247161.33 Long [grid.419681.3, grid.94365.3d, grid.4991.5, gr... 79.714262 [0000-0002-3835-5443]
9 3 Harini D ur.01066177176.10 Ganeshan [grid.415913.b, grid.201075.1] 77.814179 NaN
10 3 Maria N ur.01211504563.47 Belmonte [grid.415913.b, grid.201075.1] 77.814179 [0000-0002-4103-6611]
11 3 Bjoern ur.07447331037.97 Peters [grid.5170.3, grid.266100.3, grid.185006.a, gr... 77.814179 [0000-0002-8457-6693]
12 3 Stefan H I ur.01214724604.55 Kappe [grid.34477.33, grid.412623.0, grid.7849.2, gr... 76.991466 NaN
13 3 Simon J ur.0751102271.80 Draper [grid.4991.5, grid.10253.35, grid.425090.a, gr... 76.402222 [0000-0002-9415-1357]
14 3 Robert W ur.01136367303.27 Sauerwein [grid.475691.8, grid.420155.7, grid.461088.3, ... 73.216753 NaN
15 4 Kenneth D ur.07564342517.54 Stuart [grid.280051.e, grid.418352.9, grid.5252.0, gr... 69.617699 NaN
16 3 Arnel D ur.01104675207.03 Belmonte [grid.415913.b, grid.201075.1] 68.449397 NaN
17 2 Adrian Vivian Sinton ur.010475203247.10 Hill [grid.412587.d, grid.8191.1, grid.418159.0, gr... 64.351523 [0000-0003-0900-9629]
18 2 Ashley Michael ur.0667763776.52 Vaughan [grid.418227.a, grid.53964.3d, grid.34477.33, ... 61.861446 [0000-0001-5815-756X]
19 3 Takafumi ur.01246255474.14 Tsuboi [grid.31501.36, grid.174567.6, grid.267625.2, ... 59.119142 [0000-0002-7415-1325]

Often though, we start from some text and want to find experts relevant to that text (as opposed to starting from concepts).

The expert identification workflow, in such a case, consists of two steps:

  1. Concepts extraction from text

  2. Expert identification using concepts

In the first step, the user extracts concepts from an abstract. The user can review and modify the list of extracted concepts and then feed it into the actual expert identification workflow. In the following sections we will go though these steps in details.

Step 1: Concept Extraction

What are concepts?

Concepts are noun-phrases automatically extracted from a document’s abstract as well as the rest of the Dimensions database, which is used to weight their importance and relevance within the document’s field of study (see also the official documentation: searching using concepts).

For instance, the phrases machine learning and neural network will be considered very relevant in a computer science paper, while project and study will have their relevance scores low as they are generic phrases.

Extracting concepts with the DSL

Extracting concepts is implemented using the extract_concepts DSL function. This is the syntax:

extract_concepts("publication abstract")

This query will return a list of extracted concepts, ordered by weight, in descending order. For example:

[5]:
abstract = """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
centimeters per volt-second can be induced by applying gate voltage.
"""

abstract = abstract.replace("\n", " ")

res = dsl.query(f"""extract_concepts("{abstract}")""")

CONCEPTS = res['extracted_concepts']

pd.DataFrame(CONCEPTS)
[5]:
0
0 films
1 ambipolar electric field effect
2 two-dimensional semimetal
3 electric field effects
4 room temperature mobility
5 conductance band
6 field effects
7 graphitic films
8 centimeters
9 gate voltage
10 semimetals
11 electrons
12 atoms
13 holes
14 square centimeter
15 metallic
16 ambient conditions
17 band
18 valence
19 voltage
20 mobility
21 high quality
22 overlap
23 effect
24 conditions
25 concentration
26 quality

Step 2: Expert Identification

Extracted concepts, from step one, can be used in a identify experts queries, for example:

identify experts from concepts "+malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
      using publications
      where research_org_countries is not empty
          and year >= 2013
return experts[basics]
      limit 20 skip 0
      annotate organizational, coauthorship overlap
          with ["ur.016204724721.35", "ur.012127355561.32"]

Returned experts are ordered by their relevance.

A few important things to remember:

  1. Sources. Experts identification can use either publications or grants (when not specified, publications are used)

  2. Default connector is AND. When multiple concepts are provided, these are transformed automatically into an AND query. To match any of the concepts, one should then explicitly add OR connectors.

  3. Where conditions. It is possible to specify where-filters but that’s not required. Fields available for filtering are exactly the same as the ones in standard search expressions.

  4. Pagination. Similarly, the paging-phrase is optional. By default, the top 20 experts get returned - using limit/skip it is possible up to a maximum of 200.

  5. Overlap annotations. Annotating results with organizational and/or coauthorship overlap will produce another JSON object for each identified expert. This object has two parts.

    • The Organizational overlap is defined as a boolean value that is true if the expert and the researchers from the query have the same current research organization.

    • The Coauthorship conflict is defined as the number documents the expert has coauthored with any of the researchers provided in the query, in the last three years.

Example 1. Basic query using concepts

[6]:
# take the top 15 concepts
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:15]])

q = f"""
        identify experts
            from concepts "{dsl_escape(some_concepts)}"
        return experts
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
======
        identify experts
            from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\" \"semimetals\" \"electrons\" \"atoms\" \"holes\" \"square centimeter\""
        return experts

[6]:
docs_found first_name id last_name research_orgs score orcid_id
0 1 Anatoly A ur.011033016243.08 Firsov [grid.4886.2, grid.424048.e, grid.425037.7, gr... 269.41174 NaN
1 1 Da ur.01146544531.57 Jiang [grid.5379.8] 269.41174 NaN
2 1 Sergey V ur.011535264111.51 Dubonos [grid.5254.6, grid.510709.a, grid.5379.8, grid... 269.41174 NaN
3 1 Konstantin Sergeevich ur.01207120103.29 Novoselov [grid.5335.0, grid.423905.9, grid.425037.7, gr... 269.41174 [0000-0003-4972-5371]
4 1 Yuanbo ur.0657076451.24 Zhang [grid.8547.e, grid.30389.31, grid.184769.5, gr... 269.41174 [0000-0003-1290-7980]
5 1 Andre Konstantin ur.0721730631.45 Geim [grid.418975.6, grid.9026.d, grid.12527.33, gr... 269.41174 [0000-0003-2861-8331]
6 1 Sergey V ur.07423561367.62 Morozov [grid.5379.8, grid.9026.d, grid.470117.4, grid... 269.41174 [0000-0003-3075-7787]
7 1 Irina V ur.0767105504.29 Grigorieva [grid.418975.6, grid.500282.d, grid.418751.e, ... 269.41174 [0000-0001-5991-7778]

Example 2. Query with OR connectors

Note: this time we return all experts fields by using the syntax experts[all].

[7]:
some_concepts = " OR ".join(['"%s"' % x for x in CONCEPTS[:15]])

q = f"""
        identify experts
            from concepts "{dsl_escape(some_concepts)}"
        return experts[all]
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
======
        identify experts
            from concepts "\"films\" OR \"ambipolar electric field effect\" OR \"two-dimensional semimetal\" OR \"electric field effects\" OR \"room temperature mobility\" OR \"conductance band\" OR \"field effects\" OR \"graphitic films\" OR \"centimeters\" OR \"gate voltage\" OR \"semimetals\" OR \"electrons\" OR \"atoms\" OR \"holes\" OR \"square centimeter\""
        return experts[all]

1 QueryError found
Semantic errors found:
        Field / Fieldset 'all' is not present in Source 'researchers'. Available fields: current_research_org,dimensions_url,first_grant_year,first_name,first_publication_year,id,last_grant_year,last_name,last_publication_year,nih_ppid,obsolete,orcid_id,redirect,research_orgs,total_grants,total_publications and available fieldsets: basics,extras

Example 3. Query with where filters

[8]:
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:10]])

q = f"""identify experts
            from concepts "{dsl_escape(some_concepts)}"
            using publications
            where research_org_countries is not empty
              and year >= 2000
              and times_cited > 100
        return experts
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()
Query:
====== identify experts
            from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\""
            using publications
            where research_org_countries is not empty
              and year >= 2000
              and times_cited > 100
        return experts

[8]:
docs_found first_name id last_name research_orgs score orcid_id
0 1 Anatoly A ur.011033016243.08 Firsov [grid.4886.2, grid.424048.e, grid.425037.7, gr... 204.01543 NaN
1 1 Da ur.01146544531.57 Jiang [grid.5379.8] 204.01543 NaN
2 1 Sergey V ur.011535264111.51 Dubonos [grid.5254.6, grid.510709.a, grid.5379.8, grid... 204.01543 NaN
3 1 Konstantin Sergeevich ur.01207120103.29 Novoselov [grid.5335.0, grid.423905.9, grid.425037.7, gr... 204.01543 [0000-0003-4972-5371]
4 1 Yuanbo ur.0657076451.24 Zhang [grid.8547.e, grid.30389.31, grid.184769.5, gr... 204.01543 [0000-0003-1290-7980]
5 1 Andre Konstantin ur.0721730631.45 Geim [grid.418975.6, grid.9026.d, grid.12527.33, gr... 204.01543 [0000-0003-2861-8331]
6 1 Sergey V ur.07423561367.62 Morozov [grid.5379.8, grid.9026.d, grid.470117.4, grid... 204.01543 [0000-0003-3075-7787]
7 1 Irina V ur.0767105504.29 Grigorieva [grid.418975.6, grid.500282.d, grid.418751.e, ... 204.01543 [0000-0001-5991-7778]

Example 4. Adding Overlap Annotations (eg for conflict of interests checks)

[9]:
overlap_researchers = ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]

q = f"""
        identify experts
            from concepts "{dsl_escape(some_concepts)}"
            using publications
            where research_org_countries is not empty
              and year >= 2000
        return experts
            annotate coauthorship, organizational overlap
            with {json.dumps(overlap_researchers)}
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
======
        identify experts
            from concepts "\"films\" \"ambipolar electric field effect\" \"two-dimensional semimetal\" \"electric field effects\" \"room temperature mobility\" \"conductance band\" \"field effects\" \"graphitic films\" \"centimeters\" \"gate voltage\""
            using publications
            where research_org_countries is not empty
              and year >= 2000
        return experts
            annotate coauthorship, organizational overlap
            with ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]

[9]:
docs_found first_name id last_name research_orgs score overlap.coauthorship overlap.organizational orcid_id
0 1 Anatoly A ur.011033016243.08 Firsov [grid.4886.2, grid.424048.e, grid.425037.7, gr... 204.01543 0 True NaN
1 1 Da ur.01146544531.57 Jiang [grid.5379.8] 204.01543 0 False NaN
2 1 Sergey V ur.011535264111.51 Dubonos [grid.5254.6, grid.510709.a, grid.5379.8, grid... 204.01543 0 True NaN
3 1 Konstantin Sergeevich ur.01207120103.29 Novoselov [grid.5335.0, grid.423905.9, grid.425037.7, gr... 204.01543 175 True [0000-0003-4972-5371]
4 1 Yuanbo ur.0657076451.24 Zhang [grid.8547.e, grid.30389.31, grid.184769.5, gr... 204.01543 1 False [0000-0003-1290-7980]
5 1 Andre Konstantin ur.0721730631.45 Geim [grid.418975.6, grid.9026.d, grid.12527.33, gr... 204.01543 26 False [0000-0003-2861-8331]
6 1 Sergey V ur.07423561367.62 Morozov [grid.5379.8, grid.9026.d, grid.470117.4, grid... 204.01543 7 False [0000-0003-3075-7787]
7 1 Irina V ur.0767105504.29 Grigorieva [grid.418975.6, grid.500282.d, grid.418751.e, ... 204.01543 8 False [0000-0001-5991-7778]

Example 5. Query with MUST/NOT Operators

By default, the string containing a list of concepts is interpreted as a sequence of AND clauses. That is, the query tries to match the highest number of concepts without any preference.

It is possible to specify MUST/NOT rules with concepts by passing them via a string and using the + and - operators.

Note: please remember that concepts phrases (= concepts that are composed by more than one word) need to be wrapped using quotes, and the quotes need to be escaped with a \.

[10]:
concepts = """
    +"ambipolar electric field effect"
    -"graphitic films"
    +"films"
    "electric field effects"
    """

q = f"""
identify experts
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()


Query:
======
identify experts
    from concepts "
    +\"ambipolar electric field effect\"
    -\"graphitic films\"
    +\"films\"
    \"electric field effects\"
    "
    using publications
return experts

[10]:
docs_found first_name id last_name orcid_id research_orgs score
0 1 Luc ur.01005576245.93 Henrard [0000-0002-2564-1221] [grid.5284.b, grid.121334.6, grid.6520.1] 51.435350
1 1 Sylvain ur.01251242035.86 Latil NaN [grid.14095.39, grid.462531.7, grid.457336.0, ... 51.435350
2 1 Paul ur.01000623240.81 Syers NaN [grid.164295.d] 44.719536
3 1 Nicholas Patrick ur.01046736440.46 Butch [0000-0002-6083-8388] [grid.8547.e, grid.507868.4, grid.94225.38, gr... 44.719536
4 1 John-Pierre ur.01060352233.12 Paglione NaN [grid.8547.e, grid.507868.4, grid.440050.5, gr... 44.719536
5 1 Michael Sears ur.01200656557.13 Fuhrer [0000-0001-6183-2773] [grid.184769.5, grid.1002.3, grid.499241.3, gr... 44.719536
6 1 Dohun ur.01205352017.54 Kim [0000-0001-9687-2089] [grid.14003.36, grid.35541.36, grid.15444.30, ... 44.719536
7 1 Victor V ur.01025667341.62 Sysoev [0000-0002-0372-1802] [grid.446088.6, grid.263856.c, grid.78837.33, ... 38.569305
8 1 Mikhail A ur.01245543252.06 Shekhirev [0000-0002-8381-1276] [grid.14476.30, grid.24434.35, grid.166341.7] 38.569305
9 1 Alexey ur.01276657166.76 Lipatov [0000-0001-5043-1616] [grid.14476.30, grid.426324.5, grid.10420.37, ... 38.569305
10 1 Andrey V ur.013212454037.49 Lashkov [0000-0001-6794-8523] [grid.78837.33] 38.569305
11 1 Angel ur.014430675711.42 Torres NaN [grid.24434.35] 38.569305
12 1 Nataliia S ur.016560200577.43 Vorobeva NaN [grid.24434.35] 38.569305
13 1 Alexander S ur.0646414360.09 Sinitskii [0000-0002-8688-3451] [grid.24434.35, grid.170430.1, grid.1957.a, gr... 38.569305

Example 6. MUST together with AND/OR

[11]:
concepts = """
    (+"ambipolar electric field effect" -"graphitic films") OR
    (+"films" -"electric field effects")
    """

q = f"""
identify experts
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()


Query:
======
identify experts
    from concepts "
    (+\"ambipolar electric field effect\" -\"graphitic films\") OR
    (+\"films\" -\"electric field effects\")
    "
    using publications
return experts

[11]:
docs_found first_name id last_name orcid_id research_orgs score
0 3 Pablo ur.01034030721.03 Jarillo-Herrero [0000-0001-8217-8213] [grid.159791.2, grid.5338.d, grid.116068.8, gr... 78.260747
1 3 Young Sang ur.01342755473.89 Lee NaN [grid.69566.3a, grid.94225.38, grid.507868.4, ... 78.260747
2 3 Lan ur.014670440227.86 Wang [0000-0001-7124-2718] [grid.418788.a, grid.1007.6, grid.17635.36, gr... 75.645182
3 3 Shun-Qing ur.0624630056.98 Shen [0000-0002-1954-5882] [grid.8547.e, grid.450298.2, grid.464262.0, gr... 75.645182
4 3 Alexander S ur.0646414360.09 Sinitskii [0000-0002-8688-3451] [grid.24434.35, grid.170430.1, grid.1957.a, gr... 69.488090
5 2 Peng ur.01150036175.42 Ren NaN [grid.59025.3b] 51.162295
6 2 Azat ur.056250446.77 Sulaev NaN [grid.59025.3b] 51.162295
7 2 Bin ur.0756673070.05 Xia NaN [grid.59025.3b] 51.162295
8 2 James Mitchell ur.01275626274.52 Tour [0000-0002-8479-9328] [grid.264756.4, grid.21940.3e, grid.254567.7, ... 49.416673
9 2 Christian F ur.01010600302.93 Kisielowski NaN [grid.184769.5, grid.8385.6, grid.469490.6, gr... 47.051859
10 2 Andrey A ur.01022425321.95 Turchanin [0000-0003-2388-1042] [grid.7491.b, grid.4764.1, grid.35043.31, grid... 47.051859
11 2 Joachim ur.01161437031.05 Mayer [0000-0003-3292-5342] [grid.5719.a, grid.419534.e, grid.4372.2, grid... 47.051859
12 2 Konstantin B ur.01163755245.41 Efetov [0000-0003-2245-1366] [grid.21729.3f, grid.423485.c, grid.411233.6, ... 47.051859
13 2 Armin ur.01172120354.34 Gölzhäuser [0000-0002-0838-9028] [grid.1957.a, grid.419547.a, grid.414703.5, gr... 47.051859
14 2 Thomas ur.0704114136.03 Weimann NaN [grid.7491.b, grid.10392.39, grid.4764.1, grid... 47.051859
15 2 Mikhail V ur.0770321167.94 Fistul NaN [grid.169077.e, grid.5330.5, grid.4886.2, grid... 47.051859
16 2 Chinthamani Nagesa Ramachandra ur.01301431054.98 Rao [0000-0003-4088-0615] [grid.8155.9, grid.8954.0, grid.417965.8, grid... 45.842103
17 2 Kota Surya ur.0634451611.05 Subrahmanyam NaN [grid.16753.36, grid.472491.d, grid.419636.f, ... 45.842103
18 2 H S S Ramakrishna ur.0726377173.88 Matte [0000-0001-8279-8447] [grid.7468.d, grid.16753.36, grid.472491.d, gr... 45.842103
19 2 Ashok K ur.01150320016.45 Mulchandani [0000-0002-2831-4154] [grid.30389.31, grid.215654.1, grid.24805.3b, ... 33.572757

Example 7. Wildcard searches

[12]:
concepts = """temperat* "ray diffraction" -magnet* """

q = f"""
identify experts
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()
Query:
======
identify experts
    from concepts "temperat* \"ray diffraction\" -magnet* "
    using publications
return experts

[12]:
docs_found first_name id last_name research_orgs score orcid_id
0 4 Akinori ur.07620725665.51 Katsui [grid.26999.3d, grid.69566.3a, grid.265061.6, ... 45.133475 NaN
1 3 Andrey V ur.010274015357.59 Khoroshilov [grid.435216.7, grid.431939.5] 34.575968 [0000-0002-0678-1421]
2 3 Konstantin S ur.014606545157.85 Gavrichev [grid.435216.7] 34.575968 [0000-0001-5304-3555]
3 3 Paul ur.014146743075.39 Hagenmuller [grid.4795.f, grid.463879.7, grid.411840.8, gr... 34.065056 NaN
4 3 Yi-Tai ur.01261545713.97 Qian [grid.503014.3, grid.27255.37, grid.12527.33, ... 33.943935 NaN
5 3 Jean Pierre ur.012446305716.07 Chaminade [grid.4444.0, grid.461891.3, grid.5292.c, grid... 33.930549 NaN
6 2 Tatyana V ur.011457114721.52 Dyachkova [grid.426536.0, grid.465372.1, grid.446087.9] 23.329706 [0000-0001-6204-797X]
7 2 Sergey A ur.015627070115.78 Gromilov [grid.4886.2, grid.4605.7, grid.415877.8, grid... 23.329706 NaN
8 2 Elena V ur.01264404625.74 Boldyreva [grid.4605.7, grid.424048.e, grid.4708.b, grid... 23.184193 [0000-0002-1401-2438]
9 2 Alexander P ur.015443160631.46 Tyutyunnik [grid.426536.0, grid.4886.2, grid.10548.38, gr... 23.063437 [0000-0003-1360-0913]
10 2 Steeve ur.010407434762.78 Rousselot [grid.14848.31, grid.418084.1] 22.955062 NaN
11 2 David ur.014245141470.20 Aymé-Perrot [grid.424348.d] 22.955062 NaN
12 2 Mickael ur.016103556371.52 Dollé [grid.419552.e, grid.463728.c, grid.184769.5, ... 22.955062 [0000-0002-8887-6730]
13 2 Marc ur.016131736155.35 Bertrand [grid.14848.31] 22.955062 NaN
14 2 Mitsuru ur.013216052361.31 Itoh [grid.208504.b, grid.62167.34, grid.267827.e, ... 22.942325 [0000-0001-6457-9152]
15 2 Takemasa ur.010313134742.30 Fujino [grid.263518.b] 22.932349 NaN
16 2 Morinobu ur.01117474603.87 Endo [grid.5268.9, grid.417799.5, grid.136304.3, gr... 22.932349 NaN
17 2 Mildred S ur.01226407050.99 Dresselhaus [grid.300943.d, grid.250008.f, grid.5335.0, gr... 22.932349 NaN
18 2 Chan ur.014164432033.81 Kim [grid.263518.b] 22.932349 NaN
19 2 Guang-Xiang ur.012216352601.27 Liu [grid.411412.3, grid.467196.b, grid.136593.b, ... 22.894837 [0000-0002-4742-6194]

Additional resources: shortcut functions included in Dimcli

Dimcli includes a number of ‘shortcut’ Python functions that make it easier to work with the expert identification API.

[13]:
from dimcli.functions import extract_concepts, identify_experts, build_reviewers_matrix

extract_concepts

A Python wrapper for the DSL function extract_concept (see source).

Extract concepts from any text. Text input is processed and extracted concepts are returned as an array of strings ordered by their relevance

[14]:
%%extract_concepts

We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage.
[14]:
concept relevance
0 square centimeter 0.681
1 films 0.669
2 ambipolar electric field effect 0.653
3 two-dimensional semimetal 0.646
4 electric field effects 0.628
5 room temperature mobility 0.621
6 conductance band 0.601
7 graphitic films 0.596
8 field effects 0.596
9 centimeters 0.587
10 gate voltage 0.587
11 semimetals 0.582
12 electrons 0.576
13 atoms 0.549
14 holes 0.542
15 ambient conditions 0.500
16 band 0.499
17 valence 0.471
18 voltage 0.464
19 mobility 0.406
20 high quality 0.390
21 overlap 0.367
22 effect 0.323
23 conditions 0.293
24 concentration 0.264
25 quality 0.214

identify_experts

A Python wrapper for the full expert identification workflow (see source).

This wrapper provide a simpler version of the expert identification API. It is meant to be a convenient alternative for basic queries. For more options, it is advised to use the API directly.

[15]:
%%identify_experts

We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage.
[15]:
docs_found first_name first_publication_year id last_name orcid_id score total_grants total_publications dimensions_url
0 17 Daichi 2000 ur.01203703171.12 Chiba [0000-0002-6631-5131] 720.802273 14 226 https://app.dimensions.ai/discover/publication...
1 12 Ze Don 1983 ur.01055006635.53 Kvon NaN 564.498955 16 367 https://app.dimensions.ai/discover/publication...
2 14 Nobuhiro 1976 ur.011513332561.53 Ohta NaN 547.722462 23 229 https://app.dimensions.ai/discover/publication...
3 12 Tomohiro 2008 ur.01311211105.43 Koyama [0000-0003-4796-1776] 497.447215 3 111 https://app.dimensions.ai/discover/publication...
4 10 Teruo 1993 ur.012735754655.38 Ono NaN 407.495159 29 486 https://app.dimensions.ai/discover/publication...
5 7 Pablo 1999 ur.01034030721.03 Jarillo-Herrero [0000-0001-8217-8213] 402.322615 7 276 https://app.dimensions.ai/discover/publication...
6 9 Kenji 1987 ur.010575643400.34 Watanabe [0000-0003-3701-8119] 360.808606 13 2694 https://app.dimensions.ai/discover/publication...
7 9 Takashi 1989 ur.0765715521.02 Taniguchi NaN 360.808606 24 2874 https://app.dimensions.ai/discover/publication...
8 8 Eugene 1989 ur.0740560235.48 Olshanetsky [0000-0001-7027-9084] 357.604337 0 98 https://app.dimensions.ai/discover/publication...
9 8 Takahiro 2002 ur.014407221755.12 Moriyama [0000-0001-7071-0823] 313.140915 10 181 https://app.dimensions.ai/discover/publication...
10 6 Nikolai N 2005 ur.012411463367.07 Mikhailov NaN 289.726385 1 227 https://app.dimensions.ai/discover/publication...
11 6 Sergey A 1992 ur.014356370677.99 Dvoretsky NaN 273.726251 0 272 https://app.dimensions.ai/discover/publication...
12 7 Fuyuki 2016 ur.013563236015.42 Ando NaN 273.314873 0 18 https://app.dimensions.ai/discover/publication...
13 7 Kamlesh 2008 ur.01166776577.55 Awasthi [0000-0001-7852-059X] 269.313042 0 48 https://app.dimensions.ai/discover/publication...
14 6 Masashi 2012 ur.016465573056.75 Kawaguchi [0000-0001-5907-9137] 244.958344 5 37 https://app.dimensions.ai/discover/publication...
15 6 Kihiro T 2013 ur.016551140015.09 Yamada NaN 239.089080 0 19 https://app.dimensions.ai/discover/publication...
16 6 Haruka 2013 ur.010535542231.81 Kakizakai NaN 232.707798 0 12 https://app.dimensions.ai/discover/publication...
17 6 Myeongkyu 1997 ur.014732046553.91 Lee NaN 226.825885 0 125 https://app.dimensions.ai/discover/publication...
18 3 Young Sang 1995 ur.01342755473.89 Lee NaN 220.422560 0 156 https://app.dimensions.ai/discover/publication...
19 4 Alexander V 1977 ur.0632644662.66 Chaplik NaN 211.387046 7 171 https://app.dimensions.ai/discover/publication...

Build a reviewers matrix

Generates a matrix of candidate reviewers for abstracts, using the expert identification workflow (see source).

If the input abstracts include identifiers, then those are used in the resulting matrix. Alternatively, a simple list of strings as input will result in a matrix where the identifiers are auto-generated from the abstracts order (first one is 1, etc..).

[16]:
abstracts = [
     {
     'id' : 'A1',
     'text' : """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage."""
     },
     {
     'id' : "A2",
     'text' : """The physicochemical properties of a molecule-metal interface, in principle, can play a significant role in tuning the electronic properties
 of organic devices. In this report, we demonstrate an electrode engineering approach in a robust, reproducible molecular memristor that
 enables a colossal tunability in both switching voltage (from 130 mV to 4 V i.e. >2500% variation) and current (by ~6 orders of magnitude).
 This provides a spectrum of device design parameters that can be “dialed-in” to create fast, scalable and ultralow energy organic
 memristors optimal for applications spanning digital memory, logic circuits and brain-inspired computing."""
     }
 ]
[17]:
candidates = ["ur.01146544531.57", "ur.011535264111.51", "ur.0767105504.29", "ur.011513332561.53", "ur.01055006635.53"]
[18]:
build_reviewers_matrix(abstracts, candidates, verbose=False)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.95s/it]
[18]:
researcher A1 A2
0 ur.01146544531.57 0.000000 0.000000
1 ur.011535264111.51 500.057833 237.479195
2 ur.0767105504.29 860.072228 924.316053
3 ur.011513332561.53 3235.742721 1140.205152
4 ur.01055006635.53 2518.152591 1183.936190


Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg