../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Expert Identification with the Dimensions API - An Introduction

This notebook shows to use the expert identification workflow available via Dimensions Analytics API.

Prerequisites

This notebook assumes you have installed the Dimcli library and are familiar with the Getting Started tutorial.

[1]:
!pip install dimcli --quiet

import dimcli
from dimcli.utils import *

import json
import sys
import pandas as pd

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')
  dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
  KEY = ""
  dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
==
Logging in..
Dimcli - Dimensions API Client (v0.9)
Connected to: https://app.dimensions.ai - DSL v1.30
Method: dsl.ini file

At a glance

At its simplest, an expert search query looks like this:

[6]:
%%dsl

identify experts from concepts "malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
      using publications
      where year >= 2015
return experts[basics]
[6]:
<dimcli.DslDataset object #4405912048. Dict keys: '_stats', '_version', '_copyright', 'experts'>

The query takes a list of concepts defining the expertise you’re looking for, plus other parameters defining the pool of publications to be used, and it returns a list of researchers sorted by relevance.

[8]:
pd.DataFrame(dsl_last_results['experts'])
[8]:
id score orcid_id first_name last_name research_orgs docs_found
0 ur.01332073522.49 4.307605 [0000-0002-3396-1700] Nicholas John White [grid.417815.e, grid.22072.35, grid.5335.0, gr... 7
1 ur.01303637137.59 3.853188 [0000-0001-8300-9593] Miriam K Laufer [grid.8271.c, grid.10595.38, grid.420069.9, gr... 6
2 ur.01314633455.19 3.788729 NaN Ritabrata Kundu [grid.414710.7] 6
3 ur.01355076624.38 3.788729 NaN Jaydeep Choudhury Choudhury [grid.414710.7] 6
4 ur.01333507624.36 3.360286 NaN Christopher Vine Plowe [grid.15653.34, grid.4305.2, grid.94365.3d, gr... 3
5 ur.07764267264.89 3.211528 [0000-0002-7951-0745] Francois Henri Nosten [grid.11586.3b, grid.4367.6, grid.412433.3, gr... 4
6 ur.01323510115.98 3.214206 NaN Danielle I Stanisic [grid.1008.9, grid.1042.7, grid.1049.c, grid.4... 3
7 ur.0752141120.95 3.214206 NaN Michael Francis Good [grid.1048.d, grid.417993.1, grid.1003.2, grid... 3
8 ur.015476113652.05 3.180696 [0000-0001-5725-9118] Brian Mellor Greenwood [grid.8348.7, grid.10025.36, grid.415375.1, gr... 5
9 ur.016122312437.59 3.172880 NaN Ogobara K Doumbo [grid.8191.1, grid.8982.b, grid.10548.38, grid... 3
10 ur.01225135650.70 2.805810 [0000-0002-1018-7898] James G Beeson [grid.1013.3, grid.1056.2, grid.1042.7, grid.1... 3
11 ur.01162445502.98 2.653315 NaN Martha Sedegah [grid.428999.7, grid.201075.1, grid.290496.0, ... 3
12 ur.01165702423.17 2.653315 NaN Michael R Hollingdale [grid.417587.8, grid.265436.0, grid.8991.9, gr... 3
13 ur.0703623237.41 2.653315 NaN Eileen D Villasante [grid.415913.b] 3
14 ur.01240215027.61 2.616639 NaN Rose M Mcgready [grid.1005.4, grid.462844.8, grid.4991.5, grid... 3
15 ur.01204711510.82 2.598776 [0000-0001-9773-2192] Alfonso Javier Rodriguez-Morales [grid.419226.a, grid.441853.f, grid.8171.f, gr... 4
16 ur.0667763776.52 2.515726 NaN Ashley Michael Vaughan [grid.28046.38, grid.413019.e, grid.53964.3d, ... 2
17 ur.01022543462.48 2.427615 [0000-0002-0607-6941] Paul Garner [grid.417153.5, grid.7445.2, grid.48004.38, gr... 4
18 ur.01270143765.64 2.422990 [0000-0003-4566-4030] Joel Tarning [grid.8761.8, grid.501272.3, grid.413674.3, gr... 4
19 ur.0751102271.80 2.368244 [0000-0002-9415-1357] Simon J Draper [grid.425090.a, grid.4991.5, grid.10253.35, gr... 3

Often though, we start from some text and want to find experts relevant to that text (as opposed to starting from concepts).

The expert identification workflow, in such a case, consists of two steps:

  1. Concepts extraction from text

  2. Expert identification using concepts

In the first step, the user extracts concepts from an abstract. The user can review and modify the list of extracted concepts and then feed it into the actual expert identification workflow. In the following sections we will go though these steps in details.

Step 1: Concept Extraction

What are concepts?

Concepts are noun-phrases automatically extracted from a document’s abstract as well as the rest of the Dimensions database, which is used to weight their importance and relevance within the document’s field of study (see also the official documentation: searching using concepts).

For instance, the phrases machine learning and neural network will be considered very relevant in a computer science paper, while project and study will have their relevance scores low as they are generic phrases.

Extracting concepts with the DSL

Extracting concepts is implemented using the extract_concepts DSL function. This is the syntax:

extract_concepts("publication abstract")

This query will return a list of extracted concepts, ordered by weight, in descending order. For example:

[2]:
abstract = """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
centimeters per volt-second can be induced by applying gate voltage.
"""

abstract = abstract.replace("\n", " ")

res = dsl.query(f"""extract_concepts("{abstract}")""")

CONCEPTS = res['extracted_concepts']

pd.DataFrame(CONCEPTS)
[2]:
0
0 ambipolar electric field effect
1 two-dimensional semimetal
2 room-temperature mobility
3 electric field effects
4 field effects
5 graphitic films
6 gate voltage
7 conductance band
8 square centimeter
9 films
10 electrons
11 semimetals
12 ambient conditions
13 atoms
14 holes
15 centimeters
16 metallic
17 voltage
18 band
19 high quality
20 valence
21 mobility
22 overlap
23 effect
24 conditions
25 concentration
26 quality
27 monocrystalline graphitic films
28 tiny overlap
29 strong ambipolar electric field effect

Step 2: Expert Identification

Extracted concepts, from step one, can be used in a identify experts queries, for example:

identify experts from concepts "+malaria OR \"effective malaria vaccine\" OR \"effective prevention\""
      using publications
      where research_org_countries is not empty
          and year >= 2013
return experts[basics]
      limit 20 skip 0
      annotate organizational, coauthorship overlap
          with ["ur.016204724721.35", "ur.012127355561.32"]

Returned experts are ordered by their relevance.

A few important things to remember:

  1. Sources. Experts identification can use either publications or grants (when not specified, publications are used)

  2. Default connector is AND. When multiple concepts are provided, these are transformed automatically into an AND query. To match any of the concepts, one should then explicitly add OR connectors.

  3. Where conditions. It is possible to specify where-filters but that’s not required. Fields available for filtering are exactly the same as the ones in standard search expressions.

  4. Pagination. Similarly, the paging-phrase is optional. By default, the top 20 experts get returned - using limit/skip it is possible up to a maximum of 200.

  5. Overlap annotations. Annotating results with organizational and/or coauthorship overlap will produce another JSON object for each identified expert. This object has two parts.

    • The Organizational overlap is defined as a boolean value that is true if the expert and the researchers from the query have the same current research organization.

    • The Coauthorship conflict is defined as the number documents the expert has coauthored with any of the researchers provided in the query, in the last three years.

Example 1. Basic query using concepts

[3]:
# take the top 15 concepts
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:15]])

q = f"""
        identify experts
            from concepts "{dsl_escape(some_concepts)}"
        return experts
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
======
        identify experts
            from concepts "\"ambipolar electric field effect\" \"two-dimensional semimetal\" \"room-temperature mobility\" \"electric field effects\" \"field effects\" \"graphitic films\" \"gate voltage\" \"conductance band\" \"square centimeter\" \"films\" \"electrons\" \"semimetals\" \"ambient conditions\" \"atoms\" \"holes\""
        return experts

[3]:
id score research_orgs last_name first_name docs_found orcid_id
0 ur.011033016243.08 7.87576 [grid.4886.2, grid.424048.e, grid.425037.7, gr... Firsov Anatoly A 1 NaN
1 ur.01146544531.57 7.87576 [grid.5379.8] Jiang Da 1 NaN
2 ur.011535264111.51 7.87576 [grid.4886.2, grid.5379.8, grid.5254.6, grid.5... Dubonos Sergey V 1 NaN
3 ur.01207120103.29 7.87576 [grid.5379.8, grid.425037.7, grid.116068.8, gr... Novoselov Konstantin Sergeevich 1 [0000-0003-4972-5371]
4 ur.0657076451.24 7.87576 [grid.8547.e, grid.5386.8, grid.184769.5, grid... Zhang Yuanbo 1 NaN
5 ur.0721730631.45 7.87576 [grid.7340.0, grid.5254.6, grid.418975.6, grid... Geim Andre Konstantin 1 [0000-0003-2861-8331]
6 ur.07423561367.62 7.87576 [grid.4886.2, grid.425081.a, grid.28171.3d, gr... Morozov Sergey V 1 [0000-0003-3075-7787]
7 ur.0767105504.29 7.87576 [grid.4886.2, grid.7340.0, grid.5337.2, grid.4... Grigorieva Irina V 1 [0000-0001-5991-7778]

Example 2. Query with OR connectors

Note: this time we return all experts fields by using the syntax experts[all].

[4]:
some_concepts = " OR ".join(['"%s"' % x for x in CONCEPTS[:15]])

q = f"""
        identify experts
            from concepts "{dsl_escape(some_concepts)}"
        return experts[all]
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
======
        identify experts
            from concepts "\"ambipolar electric field effect\" OR \"two-dimensional semimetal\" OR \"room-temperature mobility\" OR \"electric field effects\" OR \"field effects\" OR \"graphitic films\" OR \"gate voltage\" OR \"conductance band\" OR \"square centimeter\" OR \"films\" OR \"electrons\" OR \"semimetals\" OR \"ambient conditions\" OR \"atoms\" OR \"holes\""
        return experts[all]

[4]:
id score research_orgs orcid_id total_grants last_grant_year obsolete last_name total_publications first_publication_year last_publication_year current_research_org first_name first_grant_year docs_found
0 ur.01207120103.29 8.406035 [grid.5379.8, grid.425037.7, grid.116068.8, gr... [0000-0003-4972-5371] 11 2023.0 0 Novoselov 590 1997 2020 grid.5379.8 Konstantin Sergeevich 2006.0 3
1 ur.0721730631.45 8.406035 [grid.7340.0, grid.5254.6, grid.418975.6, grid... [0000-0003-2861-8331] 10 2024.0 0 Geim 582 1991 2020 grid.5379.8 Andre Konstantin 2006.0 3
2 ur.07423561367.62 8.406035 [grid.4886.2, grid.425081.a, grid.28171.3d, gr... [0000-0003-3075-7787] 6 2021.0 0 Morozov 269 1990 2020 grid.425081.a Sergey V 2013.0 3
3 ur.0657076451.24 8.355439 [grid.8547.e, grid.5386.8, grid.184769.5, grid... NaN 0 NaN 0 Zhang 74 2004 2019 grid.8547.e Yuanbo NaN 2
4 ur.011033016243.08 8.081146 [grid.4886.2, grid.424048.e, grid.425037.7, gr... NaN 0 NaN 0 Firsov 25 2003 2018 grid.424048.e Anatoly A NaN 2
5 ur.01146544531.57 8.081146 [grid.5379.8] NaN 0 NaN 0 Jiang 11 2004 2008 grid.5379.8 Da NaN 2
6 ur.011535264111.51 7.875760 [grid.4886.2, grid.5379.8, grid.5254.6, grid.5... NaN 0 NaN 0 Dubonos 81 1990 2009 grid.425037.7 Sergey V NaN 1
7 ur.0767105504.29 7.875760 [grid.4886.2, grid.7340.0, grid.5337.2, grid.4... [0000-0001-5991-7778] 4 2021.0 0 Grigorieva 158 1989 2020 grid.5379.8 Irina V 2007.0 1
8 ur.011513332561.53 5.697777 [grid.39158.36, grid.260539.b, grid.69566.3a] NaN 23 2011.0 0 Ohta 208 1976 2020 grid.260539.b Nobuhiro 1987.0 21
9 ur.01055006635.53 3.264948 [grid.450314.7, grid.4605.7, grid.7727.5, grid... NaN 12 2011.0 0 Kvon 326 1983 2020 grid.4605.7 Ze Don 1993.0 8
10 ur.013312524031.58 3.022535 [grid.443127.7, grid.420030.5, grid.419396.0, ... NaN 28 2003.0 0 Yamazaki 258 1966 2013 grid.39158.36 Iwao 1984.0 11
11 ur.01203703171.12 2.642304 [grid.26999.3d, grid.69566.3a, grid.472717.0, ... [0000-0002-6631-5131] 11 2022.0 0 Chiba 208 2000 2020 grid.136593.b Daichi 2009.0 9
12 ur.0740560235.48 2.564912 [grid.4886.2, grid.15276.37, grid.11899.38, gr... NaN 0 NaN 0 Olshanetsky 91 1989 2020 grid.450314.7 Eugene NaN 6
13 ur.01034030721.03 2.516857 [grid.5338.d, grid.116068.8, grid.21941.3f, gr... [0000-0001-8217-8213] 7 2023.0 0 Jarillo-Herrero 233 2000 2020 grid.116068.8 Pablo 2009.0 5
14 ur.01340766601.27 1.758987 [grid.469490.6, grid.432790.b, grid.431860.8, ... NaN 6 2009.0 0 Venkatesan 777 1975 2020 grid.4280.e Thirumalai Venky 1984.0 6
15 ur.011775522057.45 1.718926 [grid.448924.7, grid.450314.7, grid.4605.7, gr... NaN 0 NaN 0 Mikhailov 222 1995 2020 grid.450314.7 Nikolay N NaN 5
16 ur.01024676171.26 1.649508 [grid.5801.c, grid.481554.9, grid.410387.9, gr... NaN 0 NaN 0 Bednorz 129 1976 2017 grid.410387.9 Johannes Georg NaN 4
17 ur.01267137567.67 1.649508 [grid.410387.9, grid.7307.3, grid.10392.39, gr... [0000-0001-6331-2640] 7 2020.0 0 Mannhart 358 1986 2020 grid.419552.e Jochen D 1990.0 4
18 ur.07376375471.82 1.637159 [grid.4886.2, grid.4605.7, grid.7727.5, grid.4... NaN 2 2019.0 0 Kozlov 52 2007 2020 grid.450314.7 Dmitriy A 2014.0 4
19 ur.07410612715.77 1.631930 [grid.24434.35, grid.39158.36, grid.20515.33, ... [0000-0002-9982-141X] 15 2008.0 0 Nishimura 173 1987 2020 grid.20515.33 Yoshinobu 1989.0 5

Example 3. Query with where filters

[5]:
some_concepts = " ".join(['"%s"' % x for x in CONCEPTS[:10]])

q = f"""identify experts
            from concepts "{dsl_escape(some_concepts)}"
            using publications
            where research_org_countries is not empty
              and year >= 2000
              and times_cited > 100
        return experts
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()
Query:
====== identify experts
            from concepts "\"ambipolar electric field effect\" \"two-dimensional semimetal\" \"room-temperature mobility\" \"electric field effects\" \"field effects\" \"graphitic films\" \"gate voltage\" \"conductance band\" \"square centimeter\" \"films\""
            using publications
            where research_org_countries is not empty
              and year >= 2000
              and times_cited > 100
        return experts

[5]:
id score first_name research_orgs last_name docs_found orcid_id
0 ur.011033016243.08 7.383875 Anatoly A [grid.4886.2, grid.424048.e, grid.425037.7, gr... Firsov 1 NaN
1 ur.01146544531.57 7.383875 Da [grid.5379.8] Jiang 1 NaN
2 ur.011535264111.51 7.383875 Sergey V [grid.4886.2, grid.5379.8, grid.5254.6, grid.5... Dubonos 1 NaN
3 ur.01207120103.29 7.383875 Konstantin Sergeevich [grid.5379.8, grid.425037.7, grid.116068.8, gr... Novoselov 1 [0000-0003-4972-5371]
4 ur.0657076451.24 7.383875 Yuanbo [grid.8547.e, grid.5386.8, grid.184769.5, grid... Zhang 1 NaN
5 ur.0721730631.45 7.383875 Andre Konstantin [grid.7340.0, grid.5254.6, grid.418975.6, grid... Geim 1 [0000-0003-2861-8331]
6 ur.07423561367.62 7.383875 Sergey V [grid.4886.2, grid.425081.a, grid.28171.3d, gr... Morozov 1 [0000-0003-3075-7787]
7 ur.0767105504.29 7.383875 Irina V [grid.4886.2, grid.7340.0, grid.5337.2, grid.4... Grigorieva 1 [0000-0001-5991-7778]

Example 4. Adding Overlap Annotations (eg for conflict of interests checks)

[6]:
overlap_researchers = ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]

q = f"""
        identify experts
            from concepts "{dsl_escape(some_concepts)}"
            using publications
            where research_org_countries is not empty
              and year >= 2000
        return experts
            annotate coauthorship, organizational overlap
            with {json.dumps(overlap_researchers)}
        """

print("Query:\n======", q)

dsl.query(q).as_dataframe()

Query:
======
        identify experts
            from concepts "\"ambipolar electric field effect\" \"two-dimensional semimetal\" \"room-temperature mobility\" \"electric field effects\" \"field effects\" \"graphitic films\" \"gate voltage\" \"conductance band\" \"square centimeter\" \"films\""
            using publications
            where research_org_countries is not empty
              and year >= 2000
        return experts
            annotate coauthorship, organizational overlap
            with ["ur.011535264111.51", "ur.011033016243.08", "ur.01207120103.29"]

[6]:
id score first_name research_orgs last_name docs_found overlap.coauthorship overlap.organizational orcid_id
0 ur.011033016243.08 7.382478 Anatoly A [grid.4886.2, grid.424048.e, grid.425037.7, gr... Firsov 1 3 True NaN
1 ur.01146544531.57 7.382478 Da [grid.5379.8] Jiang 1 0 True NaN
2 ur.011535264111.51 7.382478 Sergey V [grid.4886.2, grid.5379.8, grid.5254.6, grid.5... Dubonos 1 0 True NaN
3 ur.01207120103.29 7.382478 Konstantin Sergeevich [grid.5379.8, grid.425037.7, grid.116068.8, gr... Novoselov 1 153 True [0000-0003-4972-5371]
4 ur.0657076451.24 7.382478 Yuanbo [grid.8547.e, grid.5386.8, grid.184769.5, grid... Zhang 1 0 False NaN
5 ur.0721730631.45 7.382478 Andre Konstantin [grid.7340.0, grid.5254.6, grid.418975.6, grid... Geim 1 38 True [0000-0003-2861-8331]
6 ur.07423561367.62 7.382478 Sergey V [grid.4886.2, grid.425081.a, grid.28171.3d, gr... Morozov 1 6 False [0000-0003-3075-7787]
7 ur.0767105504.29 7.382478 Irina V [grid.4886.2, grid.7340.0, grid.5337.2, grid.4... Grigorieva 1 6 True [0000-0001-5991-7778]

Example 5. Query with MUST/NOT Operators

By default, the string containing a list of concepts is interpreted as a sequence of AND clauses. That is, the query tries to match the highest number of concepts without any preference.

It is possible to specify MUST/NOT rules with concepts by passing them via a string and using the + and - operators.

Note: please remember that concepts phrases (= concepts that are composed by more than one word) need to be wrapped using quotes, and the quotes need to be escaped with a \.

[7]:
concepts = """
    +"ambipolar electric field effect"
    -"graphitic films"
    +"films"
    "electric field effects"
    """

q = f"""
identify experts
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()


Query:
======
identify experts
    from concepts "
    +\"ambipolar electric field effect\"
    -\"graphitic films\"
    +\"films\"
    \"electric field effects\"
    "
    using publications
return experts

[7]:
id score research_orgs last_name first_name docs_found orcid_id
0 ur.01005576245.93 3.480071 [grid.6520.1, grid.121334.6] Henrard Luc 1 NaN
1 ur.01251242035.86 3.480071 [grid.6520.1, grid.121334.6, grid.12082.39, gr... Latil Sylvain 1 NaN
2 ur.01000623240.81 2.589067 [grid.164295.d] Syers Paul 1 NaN
3 ur.01046736440.46 2.589067 [grid.266100.3, grid.410443.6, grid.250008.f, ... Butch Nicholas Patrick 1 NaN
4 ur.01060352233.12 2.589067 [grid.266100.3, grid.440050.5, grid.410443.6, ... Paglione John-Pierre 1 NaN
5 ur.01200656557.13 2.589067 [grid.47840.3f, grid.499241.3, grid.184769.5, ... Fuhrer Michael Sears 1 [0000-0001-6183-2773]
6 ur.01205352017.54 2.589067 [grid.31501.36, grid.164295.d, grid.35541.36, ... Kim Dohun 1 [0000-0001-9687-2089]
7 ur.01025667341.62 2.061342 [grid.263856.c, grid.78837.33, grid.35043.31, ... Sysoev Victor V 1 [0000-0002-0372-1802]
8 ur.01245543252.06 2.061342 [grid.14476.30, grid.24434.35] Shekhirev Mikhail A 1 [0000-0002-8381-1276]
9 ur.01276657166.76 2.061342 [grid.426324.5, grid.10420.37, grid.24434.35, ... Lipatov Alexey 1 [0000-0001-5043-1616]
10 ur.013212454037.49 2.061342 [grid.78837.33] Lashkov Andrey V 1 [0000-0001-6794-8523]
11 ur.016560200577.43 2.061342 [grid.24434.35] Vorobeva Nataliia S 1 NaN
12 ur.0646414360.09 2.061342 [grid.35043.31, grid.24434.35, grid.170430.1, ... Sinitskii Alexander S 1 [0000-0002-8688-3451]

Example 6. MUST together with AND/OR

[8]:
concepts = """
    (+"ambipolar electric field effect" -"graphitic films") OR
    (+"films" -"electric field effects")
    """

q = f"""
identify experts
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()


Query:
======
identify experts
    from concepts "
    (+\"ambipolar electric field effect\" -\"graphitic films\") OR
    (+\"films\" -\"electric field effects\")
    "
    using publications
return experts

[8]:
id score first_name research_orgs last_name docs_found orcid_id
0 ur.014516430466.88 10.359314 Ledford C [grid.411377.7] Carter 18 NaN
1 ur.01034030721.03 3.576317 Pablo [grid.5338.d, grid.116068.8, grid.21941.3f, gr... Jarillo-Herrero 3 [0000-0001-8217-8213]
2 ur.010122277451.23 3.499347 Alberta NaN Meyer 4 NaN
3 ur.012760700525.87 3.499347 Esther NaN Aschemeyer 4 NaN
4 ur.011313310557.79 2.884011 Erwin Randolph [grid.26009.3d] Parson 5 NaN
5 ur.011224625507.86 2.798074 W [grid.461804.f] Feneberg 1 NaN
6 ur.015134442047.63 2.798074 Manfred A [grid.16463.36] Hellberg 1 [0000-0003-0785-8125]
7 ur.01150036175.42 2.434489 Peng [grid.59025.3b] Ren 2 NaN
8 ur.013275477227.26 2.434489 Lan [grid.17635.36, grid.59025.3b, grid.451303.0, ... Wang 2 [0000-0001-7124-2718]
9 ur.056250446.77 2.434489 Azat [grid.59025.3b] Sulaev 2 NaN
10 ur.0624630056.98 2.434489 Shun-Qing [grid.8547.e, grid.450298.2, grid.194645.b, gr... Shen 2 NaN
11 ur.0756673070.05 2.434489 Bin [grid.59025.3b] Xia 2 NaN
12 ur.01003543541.78 2.370124 Shlomo [grid.12136.37, grid.13992.30, grid.133342.4, ... Efrima 3 NaN
13 ur.015146652071.00 2.370124 D [grid.13992.30, grid.7489.2] Yogev 3 NaN
14 ur.012476642650.51 2.251025 B Ruby [grid.205975.c] Rich 3 NaN
15 ur.01022425321.95 2.067556 Andrey A [grid.35043.31, grid.7491.b, grid.4764.1, grid... Turchanin 2 [0000-0003-2388-1042]
16 ur.01161437031.05 2.067556 Joachim [grid.5719.a, grid.419534.e, grid.4372.2, grid... Mayer 2 [0000-0003-3292-5342]
17 ur.01163755245.41 2.067556 Konstantin B [grid.4886.2, grid.457334.2, grid.411233.6, gr... Efetov 2 [0000-0003-2245-1366]
18 ur.01172120354.34 2.067556 Armin [grid.7491.b, grid.7700.0, grid.414703.5, grid... Gölzhäuser 2 [0000-0002-0838-9028]
19 ur.0704114136.03 2.067556 Thomas [grid.10392.39, grid.4764.1, grid.7491.b] Weimann 2 NaN

Example 7. Wildcard searches

[9]:
concepts = """temperat* "ray diffraction" -magnet* """

q = f"""
identify experts
    from concepts "{dsl_escape(concepts)}"
    using publications
return experts
"""

print("Query:\n======", q)

dsl.query(q).as_dataframe()
Query:
======
identify experts
    from concepts "temperat* \"ray diffraction\" -magnet* "
    using publications
return experts

[9]:
id score research_orgs last_name first_name docs_found orcid_id
0 ur.010752560241.92 9.023557 [grid.494717.8, grid.411717.5, grid.5399.6, gr... Buscail Henri 4 NaN
1 ur.016151106345.71 8.850567 [grid.461616.2] Kolarik Vladislav 4 NaN
2 ur.012006337013.67 8.245036 [grid.461616.2] Engel Walter 4 NaN
3 ur.01264404625.74 8.127706 [grid.425759.8, grid.415877.8, grid.465435.5, ... Boldyreva Elena V 4 [0000-0002-1401-2438]
4 ur.01356350415.50 8.127706 [grid.415877.8, grid.4605.7, grid.418421.a, gr... Zakharov Boris A 4 [0000-0002-3520-632X]
5 ur.07650346631.13 7.050485 [grid.4444.0, grid.462844.8, grid.424133.3, gr... Itié Jean-Paul 3 NaN
6 ur.011274203435.25 6.858861 [grid.27736.37, grid.418094.0] Kocharyan Vahan 3 NaN
7 ur.015270341551.59 6.810949 [grid.494717.8, grid.5399.6] Caudron Eric 3 NaN
8 ur.012153454351.77 6.641345 [grid.461616.2, grid.466709.a] Juez-Lorenzo Maria Del Mar 3 NaN
9 ur.011235502761.97 6.217678 NaN Triviño F 3 NaN
10 ur.012267521017.15 6.217678 NaN Vázquez T 3 NaN
11 ur.012630443761.25 6.217678 NaN Ruiz De Gauna A 3 NaN
12 ur.012352473245.95 6.180346 [grid.47894.36, grid.168010.e, grid.299175.1, ... Qadri Syed B 3 NaN
13 ur.013352205311.74 6.088813 [grid.412761.7] Ustinova I S 3 NaN
14 ur.0654202176.07 6.088813 [grid.4886.2, grid.465372.1] Kadyrova Nadezda I 3 NaN
15 ur.012077537127.36 6.035814 [grid.461616.2, grid.4561.6, grid.4886.2] Eisenreich Norbert 3 NaN
16 ur.01015306115.93 5.794733 [grid.418421.a, grid.4605.7, grid.435414.3] Losev Evgeniy A 3 [0000-0003-1743-4166]
17 ur.014146743075.39 5.561709 [grid.4795.f, grid.463879.7, grid.411840.8, gr... Hagenmuller Paul 3 NaN
18 ur.015400602443.87 5.309069 [grid.423902.e, grid.435347.2] Guseinov G G 3 NaN
19 ur.012651704451.05 5.108303 [grid.32197.3e, grid.136593.b] Oguni Masaharu 2 NaN

Additional resources: shortcut functions included in Dimcli

Dimcli includes a number of ‘shortcut’ Python functions that make it easier to work with the expert identification API.

[5]:
from dimcli.functions import extract_concepts, identify_experts, build_reviewers_matrix

extract_concepts

A Python wrapper for the DSL function extract_concept (see source).

Extract concepts from any text. Text input is processed and extracted concepts are returned as an array of strings ordered by their relevance

[49]:
%%extract_concepts

We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage.
[49]:
concept relevance
0 ambipolar electric field effect 0.299
1 two-dimensional semimetal 0.293
2 room-temperature mobility 0.285
3 electric field effects 0.279
4 square centimeter 0.262
5 graphitic films 0.254
6 field effects 0.254
7 gate voltage 0.253
8 conductance band 0.234
9 films 0.213
10 electrons 0.204
11 ambient conditions 0.201
12 semimetals 0.201
13 atoms 0.195
14 holes 0.190
15 centimeters 0.187
16 voltage 0.167
17 band 0.164
18 high quality 0.163
19 valence 0.159
20 mobility 0.150
21 overlap 0.126
22 effect 0.114
23 conditions 0.104
24 concentration 0.088
25 quality 0.074

identify_experts

A Python wrapper for the full expert identification workflow (see source).

This wrapper provide a simpler version of the expert identification API. It is meant to be a convenient alternative for basic queries. For more options, it is advised to use the API directly.

[50]:
%%identify_experts

We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage.
[50]:
id score last_publication_year total_publications last_grant_year current_research_org total_grants research_orgs first_grant_year orcid_id first_publication_year last_name first_name docs_found dimensions_url
0 ur.01203703171.12 4.221575 2021 217 2023.0 grid.136593.b 12 [grid.136593.b, grid.258799.8, grid.472717.0, ... 2009.0 [0000-0002-6631-5131] 2000 Chiba Daichi 17 https://app.dimensions.ai/discover/publication...
1 ur.01311211105.43 2.924170 2021 107 2020.0 grid.136593.b 3 [grid.257022.0, grid.258799.8, grid.136593.b, ... 2013.0 [0000-0003-4796-1776] 2008 Koyama Tomohiro 13 https://app.dimensions.ai/discover/publication...
2 ur.01055006635.53 2.882491 2021 340 2017.0 grid.4605.7 14 [grid.415877.8, grid.4886.2, grid.4605.7, grid... 1993.0 NaN 1983 Kvon Ze Don 11 https://app.dimensions.ai/discover/publication...
3 ur.01034030721.03 2.861368 2021 248 2023.0 grid.116068.8 7 [grid.21729.3f, grid.116068.8, grid.38142.3c, ... 2009.0 [0000-0001-8217-8213] 2000 Jarillo-Herrero Pablo 7 https://app.dimensions.ai/discover/publication...
4 ur.011513332561.53 2.268820 2020 213 2011.0 grid.260539.b 23 [grid.39158.36, grid.69566.3a, grid.417929.0, ... 1987.0 NaN 1976 Ohta Nobuhiro 10 https://app.dimensions.ai/discover/publication...
5 ur.012735754655.38 2.144676 2021 465 2023.0 grid.258799.8 26 [grid.507644.4, grid.136593.b, grid.26091.3c, ... 1995.0 NaN 1993 Ono Teruo 9 https://app.dimensions.ai/discover/publication...
6 ur.0740560235.48 2.100961 2020 90 NaN grid.450314.7 0 [grid.15276.37, grid.4886.2, grid.11899.38, gr... NaN NaN 1989 Olshanetsky Eugene 8 https://app.dimensions.ai/discover/publication...
7 ur.014407221755.12 1.879059 2021 171 2022.0 grid.258799.8 8 [grid.258799.8, grid.260026.0, grid.5386.8, gr... 2011.0 [0000-0001-7071-0823] 2002 Moriyama Takahiro 8 https://app.dimensions.ai/discover/publication...
8 ur.013563236015.42 1.640245 2020 15 NaN grid.258799.8 0 [grid.258799.8] NaN NaN 2016 Ando Fuyuki 7 https://app.dimensions.ai/discover/publication...
9 ur.0765715521.02 1.569727 2021 2307 2023.0 grid.21941.3f 24 [grid.26999.3d, grid.213917.f, grid.89336.37, ... 2000.0 NaN 1989 Taniguchi Takashi 8 https://app.dimensions.ai/discover/publication...
10 ur.014356370677.99 1.506538 2021 255 NaN grid.77602.34 0 [grid.4886.2, grid.450314.7, grid.423485.c, gr... NaN NaN 1992 Dvoretsky Sergey A 6 https://app.dimensions.ai/discover/publication...
11 ur.016551140015.09 1.506452 2020 19 NaN grid.258799.8 0 [grid.32197.3e, grid.258799.8] NaN NaN 2013 Yamada Kihiro T 6 https://app.dimensions.ai/discover/publication...
12 ur.0632644662.66 1.483708 2020 162 2021.0 grid.450314.7 7 [grid.4886.2, grid.265880.1, grid.450314.7, gr... 1996.0 NaN 1977 Chaplik Alexander V 4 https://app.dimensions.ai/discover/publication...
13 ur.010535542231.81 1.444431 2018 12 NaN grid.258799.8 0 [grid.258799.8] NaN NaN 2013 Kakizakai Haruka 6 https://app.dimensions.ai/discover/publication...
14 ur.014040516167.51 1.331730 2020 11 NaN grid.258799.8 0 [grid.258799.8] NaN NaN 2015 Mizuno Hayato 5 https://app.dimensions.ai/discover/publication...
15 ur.01056154103.66 1.327606 2021 190 2022.0 grid.417751.1 14 [grid.450308.a, grid.8591.5, grid.474689.0, gr... 2004.0 [0000-0003-1514-8879] 1999 Ono Shimpei 4 https://app.dimensions.ai/discover/publication...
16 ur.011775522057.45 1.284237 2021 245 2019.0 grid.450314.7 2 [grid.415877.8, grid.423485.c, grid.4886.2, gr... 2018.0 NaN 1995 Mikhailov Nikolay N 4 https://app.dimensions.ai/discover/publication...
17 ur.07376375471.82 1.205354 2021 53 2019.0 grid.450314.7 2 [grid.7727.5, grid.450314.7, grid.4886.2, grid... 2014.0 NaN 2007 Kozlov Dmitriy A 3 https://app.dimensions.ai/discover/publication...
18 ur.012411463367.07 1.178384 2021 217 NaN grid.450314.7 0 [grid.415877.8, grid.450314.7, grid.4886.2, gr... NaN NaN 2005 Mikhailov Nikolai N 5 https://app.dimensions.ai/discover/publication...
19 ur.014001623651.15 1.158604 2020 62 NaN grid.450314.7 0 [grid.4886.2, grid.450314.7, grid.423485.c, gr... NaN NaN 2005 Dvoretsky Sergei A 4 https://app.dimensions.ai/discover/publication...

Build a reviewers matrix

Generates a matrix of candidate reviewers for abstracts, using the expert identification workflow (see source).

If the input abstracts include identifiers, then those are used in the resulting matrix. Alternatively, a simple list of strings as input will result in a matrix where the identifiers are auto-generated from the abstracts order (first one is 1, etc..).

[6]:
abstracts = [
     {
     'id' : 'A1',
     'text' : """We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
 metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
 valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
 holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
 centimeters per volt-second can be induced by applying gate voltage."""
     },
     {
     'id' : "A2",
     'text' : """The physicochemical properties of a molecule-metal interface, in principle, can play a significant role in tuning the electronic properties
 of organic devices. In this report, we demonstrate an electrode engineering approach in a robust, reproducible molecular memristor that
 enables a colossal tunability in both switching voltage (from 130 mV to 4 V i.e. >2500% variation) and current (by ~6 orders of magnitude).
 This provides a spectrum of device design parameters that can be “dialed-in” to create fast, scalable and ultralow energy organic
 memristors optimal for applications spanning digital memory, logic circuits and brain-inspired computing."""
     }
 ]
[7]:
candidates = ["ur.01146544531.57", "ur.011535264111.51", "ur.0767105504.29", "ur.011513332561.53", "ur.01055006635.53"]
[8]:
build_reviewers_matrix(abstracts, candidates, verbose=False)
100%|██████████| 2/2 [00:07<00:00,  3.57s/it]
[8]:
researcher A1 A2
0 ur.01146544531.57 8.166581 0.000000
1 ur.011535264111.51 8.183493 0.000000
2 ur.0767105504.29 8.586408 2.560822
3 ur.011513332561.53 12.946742 1.620928
4 ur.01055006635.53 6.877411 1.838120
[ ]:



Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg