The Organizations API: Features Overview¶
This tutorial provides an overview of the Organizations data source available via the Dimensions Analytics API.
The topics covered in this notebook are:
How to align your affiliation data with Dimensions using the API disambiguation service
How to retrieve organizations metadata using the search fields available
How to use the schema API to obtain some statistics about the Organizations data available
[1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
==
CHANGELOG
This notebook was last run on Sep 10, 2025
==
Prerequisites¶
This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.
[2]:
!pip install dimcli tqdm plotly -U --quiet
import dimcli
from dimcli.utils import *
import json, sys, time
import pandas as pd
from tqdm.notebook import tqdm as pbar
import plotly.express as px # plotly>=4.8.1
if not 'google.colab' in sys.modules:
# make js dependecies local / needed by html exports
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)
#
print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
import getpass
KEY = getpass.getpass(prompt='API Key: ')
dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
KEY = ""
dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
Searching config file credentials for 'https://app.dimensions.ai' endpoint..
==
Logging in..
Dimcli - Dimensions API Client (v1.4)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.12
Method: dsl.ini file
1. Matching affiliation data to Dimensions Organization IDs using extract_affiliations¶
The API function extract_affiliations (docs) can be used to enrich private datasets including non-disambiguated organizations data with Dimensions IDs, so to then take advantage of the wealth of linked data available in Dimensions.
For example, let’s assume our dataset has four columns (affiliation name, city, state and country) - any of which can be empty of course. Like this:
[3]:
affiliations = [
['University of Nebraska–Lincoln', 'Lincoln', 'Nebraska', 'United States'],
['Tarbiat Modares University', 'Tehran', '', 'Iran'],
['Harvard University', 'Cambridge', 'Massachusetts', 'United States'],
['China Academy of Chinese Medical Sciences', 'Beijing', '', 'China'],
['Liaoning University', 'Shenyang', '', 'China'],
['Liaoning Normal University', 'Dalian', '', 'China'],
['P.G. Department of Zoology and Research Centre, Shri Shiv Chhatrapati College of Arts, Commerce and Science, Junnar 410502, Pune, India.', '', '', ''],
['Sungkyunkwan University', 'Seoul', '', 'South Korea'],
['Centre for Materials for Electronics Technology', 'Pune', '', 'India'],
['Institut Necker-Enfants Malades (INEM), INSERM U1151-CNRS UMR8253, Université de Paris, Faculté de Médecine, 156 rue de Vaugirard, 75730 Paris Cedex 15, France', '', '', '']
]
We want to look up Dimensions Organization identifiers for those affiliations using the structured affiliation matching.
[4]:
for d in pbar(affiliations):
res = dsl.query(f"""extract_affiliations(name="{d[0]}", city="{d[1]}", state="{d[2]}", country="{d[3]}")""")
time.sleep(0.5)
print(res.json)
{'results': [{'geo': {'cities': [{'geonames_id': 5072006, 'name': 'Lincoln'}], 'countries': [{'code': 'US', 'geonames_id': 6252001, 'name': 'United States'}], 'states': [{'code': 'US-NE', 'geonames_id': 5073708, 'name': 'Nebraska'}]}, 'input': {'city': 'Lincoln', 'country': 'United States', 'name': 'University of Nebraska–Lincoln', 'state': 'Nebraska'}, 'institutes': [{'institute': {'city': 'Lincoln', 'country': 'United States', 'id': 'grid.24434.35', 'name': 'University of Nebraska–Lincoln', 'state': 'Nebraska'}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [{'geonames_id': 112931, 'name': 'Tehran'}], 'countries': [{'code': 'IR', 'geonames_id': 130758, 'name': 'Iran'}], 'states': [{'code': None, 'geonames_id': 110791, 'name': 'Tehran'}]}, 'input': {'city': 'Tehran', 'country': 'Iran', 'name': 'Tarbiat Modares University', 'state': ''}, 'institutes': [{'institute': {'city': 'Tehran', 'country': 'Iran', 'id': 'grid.412266.5', 'name': 'Tarbiat Modares University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [{'geonames_id': 4931972, 'name': 'Cambridge'}], 'countries': [{'code': 'US', 'geonames_id': 6252001, 'name': 'United States'}], 'states': [{'code': 'US-MA', 'geonames_id': 6254926, 'name': 'Massachusetts'}]}, 'input': {'city': 'Cambridge', 'country': 'United States', 'name': 'Harvard University', 'state': 'Massachusetts'}, 'institutes': [{'institute': {'city': 'Cambridge', 'country': 'United States', 'id': 'grid.38142.3c', 'name': 'Harvard University', 'state': 'Massachusetts'}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [{'geonames_id': 1816670, 'name': 'Beijing'}], 'countries': [{'code': 'CN', 'geonames_id': 1814991, 'name': 'China'}], 'states': [{'code': None, 'geonames_id': 2038349, 'name': 'Beijing'}]}, 'input': {'city': 'Beijing', 'country': 'China', 'name': 'China Academy of Chinese Medical Sciences', 'state': ''}, 'institutes': [{'institute': {'city': 'Beijing', 'country': 'China', 'id': 'grid.410318.f', 'name': 'China Academy of Chinese Medical Sciences', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [{'geonames_id': 2034937, 'name': 'Shenyang'}], 'countries': [{'code': 'CN', 'geonames_id': 1814991, 'name': 'China'}], 'states': [{'code': None, 'geonames_id': 2036115, 'name': 'Liaoning'}]}, 'input': {'city': 'Shenyang', 'country': 'China', 'name': 'Liaoning University', 'state': ''}, 'institutes': [{'institute': {'city': 'Shenyang', 'country': 'China', 'id': 'grid.411356.4', 'name': 'Liaoning University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [{'geonames_id': 1814087, 'name': 'Dalian'}], 'countries': [{'code': 'CN', 'geonames_id': 1814991, 'name': 'China'}], 'states': [{'code': None, 'geonames_id': 2036115, 'name': 'Liaoning'}]}, 'input': {'city': 'Dalian', 'country': 'China', 'name': 'Liaoning Normal University', 'state': ''}, 'institutes': [{'institute': {'city': 'Dalian', 'country': 'China', 'id': 'grid.440818.1', 'name': 'Liaoning Normal University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [], 'countries': [], 'states': []}, 'input': {'city': '', 'country': '', 'name': 'P.G. Department of Zoology and Research Centre, Shri Shiv Chhatrapati College of Arts, Commerce and Science, Junnar 410502, Pune, India.', 'state': ''}, 'institutes': []}]}
{'results': [{'geo': {'cities': [{'geonames_id': 1835848, 'name': 'Seoul'}], 'countries': [{'code': 'KR', 'geonames_id': 1835841, 'name': 'South Korea'}], 'states': [{'code': None, 'geonames_id': 1835847, 'name': 'Seoul'}]}, 'input': {'city': 'Seoul', 'country': 'South Korea', 'name': 'Sungkyunkwan University', 'state': ''}, 'institutes': [{'institute': {'city': 'Seoul', 'country': 'South Korea', 'id': 'grid.264381.a', 'name': 'Sungkyunkwan University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [{'geonames_id': 1259229, 'name': 'Pune'}], 'countries': [{'code': 'IN', 'geonames_id': 1269750, 'name': 'India'}], 'states': [{'code': None, 'geonames_id': 1264418, 'name': 'Maharashtra'}]}, 'input': {'city': 'Pune', 'country': 'India', 'name': 'Centre for Materials for Electronics Technology', 'state': ''}, 'institutes': [{'institute': {'city': 'Pune', 'country': 'India', 'id': 'grid.494569.3', 'name': 'Centre for Materials for Electronics Technology', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}
{'results': [{'geo': {'cities': [{'geonames_id': 2988507, 'name': 'Paris'}], 'countries': [{'code': 'FR', 'geonames_id': 3017382, 'name': 'France'}], 'states': [{'code': None, 'geonames_id': 3012874, 'name': 'Ile-de-France'}]}, 'input': {'city': '', 'country': '', 'name': 'Institut Necker-Enfants Malades (INEM), INSERM U1151-CNRS UMR8253, Université de Paris, Faculté de Médecine, 156 rue de Vaugirard, 75730 Paris Cedex 15, France', 'state': ''}, 'institutes': [{'institute': {'city': 'Paris', 'country': 'France', 'id': 'grid.508487.6', 'name': 'Université Paris Cité', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}
If we combine the affiliation data into a single long string, we can also perform the same king of operation using the unstructured affiliation matching.
[5]:
# implicit results
for d in pbar(affiliations):
merged = f"{d[0]} {d[1]} {d[2]} {d[3]}"
res = dsl.query(f"""extract_affiliations(affiliation="{merged}")""")
time.sleep(0.5)
print(res.json)
{'results': [{'input': {'affiliation': 'University of Nebraska–Lincoln Lincoln Nebraska United States'}, 'matches': [{'affiliation_part': 'University of Nebraska–Lincoln Lincoln Nebraska United States', 'geo': {'cities': [{'geonames_id': 5072006, 'name': 'Lincoln'}], 'countries': [{'code': 'US', 'geonames_id': 6252001, 'name': 'United States'}], 'states': [{'code': 'US-NE', 'geonames_id': 5073708, 'name': 'Nebraska'}]}, 'institutes': [{'institute': {'city': 'Lincoln', 'country': 'United States', 'id': 'grid.24434.35', 'name': 'University of Nebraska–Lincoln', 'state': 'Nebraska'}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'Tarbiat Modares University Tehran Iran'}, 'matches': [{'affiliation_part': 'Tarbiat Modares University Tehran Iran', 'geo': {'cities': [{'geonames_id': 112931, 'name': 'Tehran'}], 'countries': [{'code': 'IR', 'geonames_id': 130758, 'name': 'Iran'}], 'states': [{'code': None, 'geonames_id': 110791, 'name': 'Tehran'}]}, 'institutes': [{'institute': {'city': 'Tehran', 'country': 'Iran', 'id': 'grid.412266.5', 'name': 'Tarbiat Modares University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'Harvard University Cambridge Massachusetts United States'}, 'matches': [{'affiliation_part': 'Harvard University Cambridge Massachusetts United States', 'geo': {'cities': [{'geonames_id': 4931972, 'name': 'Cambridge'}], 'countries': [{'code': 'US', 'geonames_id': 6252001, 'name': 'United States'}], 'states': [{'code': 'US-MA', 'geonames_id': 6254926, 'name': 'Massachusetts'}]}, 'institutes': [{'institute': {'city': 'Cambridge', 'country': 'United States', 'id': 'grid.38142.3c', 'name': 'Harvard University', 'state': 'Massachusetts'}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'China Academy of Chinese Medical Sciences Beijing China'}, 'matches': [{'affiliation_part': 'China Academy of Chinese Medical Sciences Beijing China', 'geo': {'cities': [{'geonames_id': 1816670, 'name': 'Beijing'}], 'countries': [{'code': 'CN', 'geonames_id': 1814991, 'name': 'China'}], 'states': [{'code': None, 'geonames_id': 2038349, 'name': 'Beijing'}]}, 'institutes': [{'institute': {'city': 'Beijing', 'country': 'China', 'id': 'grid.410318.f', 'name': 'China Academy of Chinese Medical Sciences', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'Liaoning University Shenyang China'}, 'matches': [{'affiliation_part': 'Liaoning University Shenyang China', 'geo': {'cities': [{'geonames_id': 2034937, 'name': 'Shenyang'}], 'countries': [{'code': 'CN', 'geonames_id': 1814991, 'name': 'China'}], 'states': [{'code': None, 'geonames_id': 2036115, 'name': 'Liaoning'}]}, 'institutes': [{'institute': {'city': 'Shenyang', 'country': 'China', 'id': 'grid.411356.4', 'name': 'Liaoning University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'Liaoning Normal University Dalian China'}, 'matches': [{'affiliation_part': 'Liaoning Normal University Dalian China', 'geo': {'cities': [{'geonames_id': 1814087, 'name': 'Dalian'}], 'countries': [{'code': 'CN', 'geonames_id': 1814991, 'name': 'China'}], 'states': [{'code': None, 'geonames_id': 2036115, 'name': 'Liaoning'}]}, 'institutes': [{'institute': {'city': 'Dalian', 'country': 'China', 'id': 'grid.440818.1', 'name': 'Liaoning Normal University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'P.G. Department of Zoology and Research Centre, Shri Shiv Chhatrapati College of Arts, Commerce and Science, Junnar 410502, Pune, India. '}, 'matches': [{'affiliation_part': 'P.G. Department of Zoology and Research Centre, Shri Shiv Chhatrapati College of Arts, Commerce and Science, Junnar 410502, Pune, India', 'geo': {'cities': [{'geonames_id': 1259229, 'name': 'Pune'}], 'countries': [{'code': 'IN', 'geonames_id': 1269750, 'name': 'India'}], 'states': [{'code': None, 'geonames_id': 1264418, 'name': 'Maharashtra'}]}, 'institutes': []}]}]}
{'results': [{'input': {'affiliation': 'Sungkyunkwan University Seoul South Korea'}, 'matches': [{'affiliation_part': 'Sungkyunkwan University Seoul South Korea', 'geo': {'cities': [{'geonames_id': 1835848, 'name': 'Seoul'}], 'countries': [{'code': 'KR', 'geonames_id': 1835841, 'name': 'South Korea'}], 'states': [{'code': None, 'geonames_id': 1835847, 'name': 'Seoul'}]}, 'institutes': [{'institute': {'city': 'Seoul', 'country': 'South Korea', 'id': 'grid.264381.a', 'name': 'Sungkyunkwan University', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'Centre for Materials for Electronics Technology Pune India'}, 'matches': [{'affiliation_part': 'Centre for Materials for Electronics Technology Pune India', 'geo': {'cities': [{'geonames_id': 1259229, 'name': 'Pune'}], 'countries': [{'code': 'IN', 'geonames_id': 1269750, 'name': 'India'}], 'states': [{'code': None, 'geonames_id': 1264418, 'name': 'Maharashtra'}]}, 'institutes': [{'institute': {'city': 'Pune', 'country': 'India', 'id': 'grid.494569.3', 'name': 'Centre for Materials for Electronics Technology', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}]}
{'results': [{'input': {'affiliation': 'Institut Necker-Enfants Malades (INEM), INSERM U1151-CNRS UMR8253, Université de Paris, Faculté de Médecine, 156 rue de Vaugirard, 75730 Paris Cedex 15, France '}, 'matches': [{'affiliation_part': 'Institut Necker-Enfants Malades (INEM), INSERM U1151-CNRS UMR8253, Université de Paris, Faculté de Médecine, 156 rue de Vaugirard, 75730 Paris Cedex 15, France', 'geo': {'cities': [{'geonames_id': 2988507, 'name': 'Paris'}], 'countries': [{'code': 'FR', 'geonames_id': 3017382, 'name': 'France'}], 'states': [{'code': None, 'geonames_id': 3012874, 'name': 'Ile-de-France'}]}, 'institutes': [{'institute': {'city': 'Paris', 'country': 'France', 'id': 'grid.508487.6', 'name': 'Université Paris Cité', 'state': None}, 'metadata': {'requires_manual_review': False}}]}]}]}
NOTE: the above commands also support bulk querying e.g. to save up API queries - check out the docs for more info.
2. Searching the API for organizations¶
This can be done using full text search and/or fielded search.
Full-text search¶
[6]:
%%dsldf
search organizations
for "new york"
return organizations limit 10
Returned Organizations: 10 (total = 352)
Time: 5.56s
[6]:
| id | name | country_code | country_name | types | city_name | state_name | |
|---|---|---|---|---|---|---|---|
| 0 | grid.798367.4 | Bank of New York | US | United States | [Company] | NaN | NaN |
| 1 | grid.798343.2 | Research Foundation of University of New York | US | United States | [Education] | NaN | NaN |
| 2 | grid.797561.b | New York Hospital-Cornell Medical Center | US | United States | [Healthcare] | New York | New York |
| 3 | grid.796770.8 | Research Foundation of City University of New ... | US | United States | [Other] | NaN | NaN |
| 4 | grid.796173.d | Bank of New York Mellon Trust Co NA | US | United States | [Company] | NaN | NaN |
| 5 | grid.795276.8 | New York University Medical Center | US | United States | [Education] | New York | New York |
| 6 | grid.794869.d | International General Electric Company of New ... | US | United States | [Other] | NaN | NaN |
| 7 | grid.782261.8 | New York Digital Investment Group LLC | US | United States | [Other] | NaN | NaN |
| 8 | grid.778414.9 | China CITIC Bank International Ltd New York Br... | US | United States | [Government] | NaN | NaN |
| 9 | grid.777726.4 | Morgan Guaranty Trust Company of New York | US | United States | [Company] | NaN | NaN |
[7]:
%%dsldf
search organizations
for "new york AND community"
return organizations limit 10
Returned Organizations: 9 (total = 9)
Time: 0.62s
[7]:
| id | name | country_code | country_name | types | acronym | city_name | latitude | linkout | longitude | state_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | grid.757191.c | New York Community Bank | US | United States | [Company] | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | grid.507861.d | Mohawk Valley Community College | US | United States | [Education] | MVCC | Utica | 43.076850 | [https://www.mvcc.edu/] | -75.220120 | New York |
| 2 | grid.490742.c | Health Foundation for Western & Central New York | US | United States | [Nonprofit] | NaN | Buffalo | 42.874810 | [https://hfwcny.org/] | -78.849690 | New York |
| 3 | grid.480917.3 | New York Community Trust | US | United States | [Nonprofit] | NaN | New York | 40.758870 | [http://www.nycommunitytrust.org/] | -73.968185 | New York |
| 4 | grid.478715.8 | Central New York Community Foundation | US | United States | [Nonprofit] | CNYCF | Syracuse | 43.056038 | [https://www.cnycf.org/] | -76.148210 | New York |
| 5 | grid.475804.a | Community Service Society of New York | US | United States | [Other] | CSS | New York | 40.749622 | [http://www.cssny.org/] | -73.974620 | New York |
| 6 | grid.475783.a | Long Term Care Community Coalition | US | United States | [Other] | LTCCC | New York | 40.751163 | [http://www.ltccc.org/] | -73.992470 | New York |
| 7 | grid.429257.f | Korean Community Services of Metropolitan New ... | US | United States | [Nonprofit] | KCS | New York | 40.770954 | [https://www.kcsny.org/] | -73.786670 | New York |
| 8 | funder.196228 | Community Health Foundation of Western and Cen... | NaN | United States | NaN | Community Health Foundation of Western and Centra | NaN | NaN | NaN | NaN | NaN |
Fielded search¶
We can easily look up an organization using its ID, e.g.
[8]:
%%dsldf
search organizations
where id="grid.468887.d"
return organizations[all]
Returned Errors: 1
Time: 5.84s
Query Error
Semantic errors found:
Field / Fieldset 'all' is not present in Source 'organizations'. Available fields: acronym,city_name,cnrs_ids,country_code,country_name,dimensions_url,established,external_ids_fundref,hesa_ids,id,isni_ids,latitude,linkout,longitude,name,nuts_level1_code,nuts_level1_name,nuts_level2_code,nuts_level2_name,nuts_level3_code,nuts_level3_name,organization_child_ids,organization_parent_ids,organization_related_ids,orgref_ids,redirect,ror_ids,score,state_name,status,types,ucas_ids,ukprn_ids,wikidata_ids,wikipedia_url and available fieldsets: basics,nuts
[9]:
%%dsldf
search organizations
for "new york"
where types in ["Education"]
return organizations limit 10
Returned Organizations: 10 (total = 93)
Time: 0.64s
[9]:
| id | name | country_code | country_name | types | city_name | state_name | latitude | linkout | longitude | acronym | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | grid.798343.2 | Research Foundation of University of New York | US | United States | [Education] | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | grid.795276.8 | New York University Medical Center | US | United States | [Education] | New York | New York | NaN | NaN | NaN | NaN |
| 2 | grid.512545.2 | State University of New York, Korea | KR | South Korea | [Education] | Incheon | NaN | 37.376694 | [http://www.sunykorea.ac.kr/] | 126.667170 | NaN |
| 3 | grid.511090.c | Craig Newmark Graduate School of Journalism at... | US | United States | [Education] | New York | New York | 40.755230 | [https://www.journalism.cuny.edu/] | -73.988830 | NaN |
| 4 | grid.510787.c | Center for Migration Studies of New York | US | United States | [Education] | New York | New York | 40.761470 | [https://cmsny.org/] | -73.965450 | CMS |
| 5 | grid.507867.b | New York State College of Ceramics | US | United States | [Education] | Alfred | New York | 42.253372 | [https://www.alfred.edu/academics/colleges-sch... | -77.787575 | NaN |
| 6 | grid.507863.f | New York State School of Industrial and Labor ... | US | United States | [Education] | Ithaca | New York | 42.439213 | [https://www.ilr.cornell.edu/] | -76.493380 | ILR |
| 7 | grid.507861.d | Mohawk Valley Community College | US | United States | [Education] | Utica | New York | 43.076850 | [https://www.mvcc.edu/] | -75.220120 | MVCC |
| 8 | grid.507860.c | New York State College of Agriculture and Life... | US | United States | [Education] | Ithaca | New York | 42.448290 | [https://cals.cornell.edu/#] | -76.479390 | CALS |
| 9 | grid.507859.6 | New York State College of Veterinary Medicine ... | US | United States | [Education] | Ithaca | New York | 42.447483 | [https://www.vet.cornell.edu/] | -76.464905 | NaN |
[10]:
%%dsldf
search organizations
for "new york"
where types in ["Education"]
and country_name != "United States"
return organizations limit 10
Returned Organizations: 9 (total = 9)
Time: 5.97s
[10]:
| id | name | city_name | country_code | country_name | latitude | linkout | longitude | types | acronym | state_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | grid.512545.2 | State University of New York, Korea | Incheon | KR | South Korea | 37.376694 | [http://www.sunykorea.ac.kr/] | 126.667170 | [Education] | NaN | NaN |
| 1 | grid.479986.d | New York University Paris | Paris | FR | France | 48.869614 | [http://www.nyu.edu/paris.html] | 2.346863 | [Education] | NaN | NaN |
| 2 | grid.473731.5 | New York University Florence | Florence | IT | Italy | 43.795910 | [http://www.nyu.edu/florence.html] | 11.265850 | [Education] | NYU | NaN |
| 3 | grid.473728.d | New York Institute of Technology | Vancouver | CA | Canada | 49.284374 | [http://nyit.edu/vancouver] | -123.116480 | [Education] | NYIT | British Columbia |
| 4 | grid.449989.1 | University of New York in Prague | Prague | CZ | Czechia | 50.074043 | [https://www.unyp.cz/] | 14.433994 | [Education] | UNYP | NaN |
| 5 | grid.449457.f | New York University Shanghai | Shanghai | CN | China | 31.225506 | [https://shanghai.nyu.edu/] | 121.533510 | [Education] | NaN | NaN |
| 6 | grid.444973.9 | University of New York Tirana | Tirana | AL | Albania | 41.311060 | [http://unyt.edu.al/] | 19.801466 | [Education] | UNYT | NaN |
| 7 | grid.440573.1 | New York University Abu Dhabi | Abu Dhabi | AE | United Arab Emirates | 24.485000 | [https://nyuad.nyu.edu/] | 54.353000 | [Education] | NaN | NaN |
| 8 | grid.410685.e | SUNY Korea | Seoul | KR | South Korea | 37.377018 | [http://www.sunykorea.ac.kr/] | 126.666770 | [Education] | NaN | NaN |
Returning facets¶
[11]:
%%dsldf
search organizations
for "new york"
return country_name
Returned Country_name: 11
Time: 0.50s
[11]:
| id | count | |
|---|---|---|
| 0 | United States | 341 |
| 1 | South Korea | 2 |
| 2 | Albania | 1 |
| 3 | Canada | 1 |
| 4 | China | 1 |
| 5 | Czechia | 1 |
| 6 | France | 1 |
| 7 | Italy | 1 |
| 8 | Panama | 1 |
| 9 | United Arab Emirates | 1 |
| 10 | United Kingdom | 1 |
[12]:
%%dsldf
search organizations
for "new york"
where country_name = "United States"
return types
Returned Types: 8
Time: 5.47s
[12]:
| id | count | |
|---|---|---|
| 0 | Education | 84 |
| 1 | Nonprofit | 75 |
| 2 | Company | 57 |
| 3 | Government | 46 |
| 4 | Other | 34 |
| 5 | Healthcare | 28 |
| 6 | Archive | 9 |
| 7 | Facility | 7 |
Returning organizations facets from publications¶
Organization data is used thoughout Dimensions.
So, for example, one can do a publications search and return organizations as a facet. This allows to take advantage of organization metadata - e.g. latiture and longitude - in order to quickly build a geograpical visualization.
[13]:
q = """
search publications for "coronavirus OR covid-19"
where year > 2019
return research_orgs[basics] limit 50
"""
df = dslquery(q).as_dataframe()
df.head(5)
Returned Research_orgs: 50
Time: 1.16s
[13]:
| id | name | city_name | count | country_code | country_name | latitude | linkout | longitude | state_name | types | acronym | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | grid.38142.3c | Harvard University | Cambridge | 33545 | US | United States | 42.377052 | [http://www.harvard.edu/] | -71.116650 | Massachusetts | [Education] | NaN |
| 1 | grid.17063.33 | University of Toronto | Toronto | 21731 | CA | Canada | 43.661667 | [http://www.utoronto.ca/] | -79.395000 | Ontario | [Education] | NaN |
| 2 | grid.21107.35 | Johns Hopkins University | Baltimore | 19419 | US | United States | 39.328888 | [https://www.jhu.edu/] | -76.620280 | Maryland | [Education] | JHU |
| 3 | grid.4991.5 | University of Oxford | Oxford | 19345 | GB | United Kingdom | 51.753437 | [http://www.ox.ac.uk/] | -1.254010 | Oxfordshire | [Education] | NaN |
| 4 | grid.83440.3b | University College London | London | 19047 | GB | United Kingdom | 51.524470 | [http://www.ucl.ac.uk/] | -0.133982 | NaN | [Education] | UCL |
[14]:
fig = px.scatter_geo(df,
lat="latitude", lon="longitude",
color="country_name",
size="count",
projection="natural earth",
hover_name="name",
hover_data=['city_name', 'id', 'types']
)
fig.show()
3. A closer look at the organizations data statistics¶
The Dimensions Search Language exposes programmatically metadata, such as supported sources and entities, along with their fields, facets, fieldsets, metrics and search fields.
[15]:
%dsldocs organizations
[15]:
| sources | field | type | description | is_filter | is_entity | is_facet | |
|---|---|---|---|---|---|---|---|
| 0 | organizations | acronym | string | GRID acronym of the organization. E.g., "UT" f... | True | False | False |
| 1 | organizations | city_name | string | GRID name of the organization country. E.g., "... | True | False | True |
| 2 | organizations | cnrs_ids | string | CNRS IDs for this organization | True | False | False |
| 3 | organizations | country_code | string | Country of the organisation, identified using ... | True | False | True |
| 4 | organizations | country_name | string | GRID name of the organization country. E.g., "... | True | False | True |
| 5 | organizations | dimensions_url | string | Link pointing to the Dimensions web application | False | False | False |
| 6 | organizations | established | integer | Year when the organization was estabilished | True | False | False |
| 7 | organizations | external_ids_fundref | string | Fundref IDs for this organization | True | False | False |
| 8 | organizations | hesa_ids | string | HESA IDs for this organization | True | False | False |
| 9 | organizations | id | string | GRID ID of the organization. E.g., "grid.26999... | True | False | False |
| 10 | organizations | isni_ids | string | ISNI IDs for this organization | True | False | False |
| 11 | organizations | latitude | float | None | False | False | False |
| 12 | organizations | linkout | string | None | False | False | False |
| 13 | organizations | longitude | float | None | False | False | False |
| 14 | organizations | name | string | GRID name of the organization. E.g., "Universi... | True | False | False |
| 15 | organizations | nuts_level1_code | string | Level 1 code for this organization, based on `... | True | False | True |
| 16 | organizations | nuts_level1_name | string | Level 1 name for this organization, based on `... | True | False | True |
| 17 | organizations | nuts_level2_code | string | Level 2 code for this organization, based on `... | True | False | True |
| 18 | organizations | nuts_level2_name | string | Level 2 name for this organization, based on `... | True | False | True |
| 19 | organizations | nuts_level3_code | string | Level 3 code for this organization, based on `... | True | False | True |
| 20 | organizations | nuts_level3_name | string | Level 3 name for this organization, based on `... | True | False | True |
| 21 | organizations | organization_child_ids | string | Child organization IDs | True | False | False |
| 22 | organizations | organization_parent_ids | string | Parent organization IDs | True | False | False |
| 23 | organizations | organization_related_ids | string | Related organization IDs | True | False | False |
| 24 | organizations | orgref_ids | string | OrgRef IDs for this organization | True | False | False |
| 25 | organizations | redirect | string | GRID ID of an organization this one was redire... | True | False | False |
| 26 | organizations | ror_ids | string | ROR IDs for this organization | True | False | False |
| 27 | organizations | score | float | For full-text queries, the relevance score is ... | True | False | False |
| 28 | organizations | state_name | string | GRID name of the organization country. E.g., "... | True | False | True |
| 29 | organizations | status | string | Status of an organization. May be be one of:\n... | True | False | True |
| 30 | organizations | types | string | Type of an organization. Available types inclu... | True | False | True |
| 31 | organizations | ucas_ids | string | UCAS IDs for this organization | True | False | False |
| 32 | organizations | ukprn_ids | string | UKPRN IDs for this organization | True | False | False |
| 33 | organizations | wikidata_ids | string | WikiData IDs for this organization | True | False | False |
| 34 | organizations | wikipedia_url | string | Wikipedia URL | False | False | False |
We can use the fields information above to draw up some quick statistics re. the organizations source.
In order to do this, we use the operator is not empty to generate automatically queries like this search organizations where field_name is not empty return organizations limit 1 and then use the total_count field in the JSON we get back for our statistics.
[16]:
FIELDS_DATA = dsl_last_results
# one query with `is not empty` for field-filters
q_template = """search organizations where {} is not empty return organizations[id] limit 1"""
# seed results with total number of orgs
totorgs = dsl.query("""search organizations return organizations[id] limit 1""", verbose=False).count_total
stats = [
{'filter_by': 'All Organizations (no filter)', 'results' : totorgs}
]
for index, row in pbar(list(FIELDS_DATA.iterrows())):
# print("\n===", row['field'])
q = q_template.format(row['field'], row['field'])
res = dsl.query(q, verbose=False)
time.sleep(0.5)
stats.append({'filter_by': row['field'], 'results' : res.count_total})
# save to a dataframe
df = pd.DataFrame().from_dict(stats)
df.sort_values("results", inplace=True, ascending=False)
df
Query Error
Semantic errors found:
Field id does not support emptiness filters.
Query Error
Semantic errors found:
Field score does not support emptiness filters.
[16]:
| filter_by | results | |
|---|---|---|
| 0 | All Organizations (no filter) | 406172.0 |
| 6 | dimensions_url | 406172.0 |
| 30 | status | 406172.0 |
| 15 | name | 406172.0 |
| 31 | types | 406087.0 |
| 5 | country_name | 406058.0 |
| 4 | country_code | 406012.0 |
| 2 | city_name | 166300.0 |
| 14 | longitude | 131622.0 |
| 12 | latitude | 131622.0 |
| 23 | organization_parent_ids | 119442.0 |
| 13 | linkout | 119290.0 |
| 27 | ror_ids | 94038.0 |
| 7 | established | 91959.0 |
| 29 | state_name | 66188.0 |
| 16 | nuts_level1_code | 51585.0 |
| 18 | nuts_level2_code | 51585.0 |
| 17 | nuts_level1_name | 51585.0 |
| 21 | nuts_level3_name | 51585.0 |
| 19 | nuts_level2_name | 51585.0 |
| 20 | nuts_level3_code | 51585.0 |
| 34 | wikidata_ids | 51499.0 |
| 11 | isni_ids | 49885.0 |
| 1 | acronym | 45701.0 |
| 35 | wikipedia_url | 33440.0 |
| 22 | organization_child_ids | 21945.0 |
| 25 | orgref_ids | 14577.0 |
| 8 | external_ids_fundref | 9406.0 |
| 26 | redirect | 5669.0 |
| 24 | organization_related_ids | 4751.0 |
| 3 | cnrs_ids | 920.0 |
| 33 | ukprn_ids | 172.0 |
| 9 | hesa_ids | 171.0 |
| 32 | ucas_ids | 152.0 |
| 10 | id | NaN |
| 28 | score | NaN |
Let’s visualize the data with plotly¶
[17]:
px.bar(df, x="filter_by", y="results",
title="Fields distribution for GRID data")
Where to find out more¶
Please have a look at the official documentation for more information on the organizations data source.
Note
The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.