DSL Logo

API COOKBOOKS

Getting Started

  • Verifying Your API Connection
    • Technical prerequisites
    • Connecting to the API using a key
    • Legacy authentication method: username & password
    • Troubleshooting
  • The Dimcli Python library: Installation and Querying
    • Installation
    • Authentication
      • More secure method: storing a private credentials file
    • Querying
      • Simple Querying
      • Iterative querying (loops)
    • Command line interface
      • Running the CLI
  • The Dimcli Python library: Working with Pandas Dataframes
    • Prerequisites
    • 1. General method to transform JSON query results into a dataframe
    • 2. Dataframe Methods for ‘Publications’ queries
      • Extracting authors: as_dataframe_authors
      • Extracting Affiliations: as_dataframe_authors_affiliations
    • 3. Dataframe Methods for ‘Grants’ queries
      • Extracting Funders: as_dataframe_funders
      • Extracting investigators: as_dataframe_investigators
    • 4. Dataframe Methods for ‘Concepts’ queries
      • Extracting Concepts: as_dataframe_concepts
    • Conclusions
  • The Dimcli Python library: Magic Commands
    • Prerequisites
    • Dimcli ‘magic’ commands
      • Tip: Accessing data returned by magic queries
    • 1. Simple queries with %dsl or %%dsl
    • 2. Loop queries with %dslloop or %%dslloop
    • 3. Returning dataframes: %dsldf and %%dsldf
    • 4. Looped dataframe queries: %dslloopdf and %%dslloopdf
    • 5. Getting API schema documentation with %dsldocs
  • Exploring The Dimensions Search Language (DSL) - Quick Intro
    • Prerequisites
      • What the query statistics refer to
      • Working with fields
    • Control the fields you return
    • Make a mistake, and the DSL will tell you what fields that you could have used
      • Full text search
    • A simple author search
    • ..or search for a researcher by a specific id
      • Sources VS Facets
    • You can paginate through source results up to 50000 rows
    • You can return max 1000 facet rows
    • Just make a mistake, and you will ge the complete list of available facets
  • Exploring The Dimensions Search Language (DSL) - Deep Dive
    • What is the Dimensions Search Language?
    • Prerequisites
    • Sections Index
    • 1. Basic query structure
      • search source
      • return result (source or facet)
    • 2. Full-text Searching
      • 2.1 in [search index]
      • 2.2 for "search term"
      • 2.3 Boolean Operators
      • 2.4 Wildcard Searches
      • 2.5 Proximity Searches
    • 3. Field Searching
      • 3.1 where
      • 3.2 in
      • 3.3 count - filter function
      • 3.4 Filter Operators
      • 3.5 Partial string matching with ~
      • 3.6 Emptiness filters is empty
    • 4. Searching for Researchers
      • 4.1 Exact name searches
      • 4.2 Fuzzy Searches
      • 4.3 Using the disambiguated Researchers database
    • 5. Returning results
      • 5.1 Returning Multiple Sources
      • 5.2 Returning Specific Fields
      • 5.3 Returning Facets
      • 5.4 What the query statistics refer to - sources VS facets
      • 5.5 Paginating Results
      • 5.6 Sorting Results
      • 5.7 Unnesting results
    • 6. Aggregations
      • 6.1 Complex aggregations
        • Publications citations_per_year
        • Grants funding_per_year
  • Working with lists in the Dimensions API
    • Prerequisites
    • 1. How do we use lists in the Dimensions API?
    • 2. What are all the things that we can make lists of in the Dimensions API?
      • What are the internal Entities that we might put in a list?
      • What about lists of ids?
      • What are the external entities that we can put in a list?
    • 3. Making a list from the results of a query
      • Putting both parts of this example together
      • Doing something useful: Get all the publications that cite my publications
    • 5. How Long can lists get?
      • This won’t work
      • This will
      • What if I need a very long list?
    • 6. What if I want to get the researchers associated with the publications the cite my institution?
    • 7. What if I want to get all the researchers associated with the publications that cite my institution?
    • 8. ..and if we want details about our researchers, we can put our list of researchers into the researcher API
    • 9. Patents example (patents -> publications)
    • 10. Clinical Trials (clinical trials -> publications)
    • 11. Grants (publications -> grants)
    • Conclusions
  • Working with concepts in the Dimensions API
    • Prerequisites
    • 1. Background: What are concepts?
      • 1.1 From concepts to dataframes: Dimcli’s as_dataframe_concepts method
      • 1.2 Extracting concepts from any text
    • 2. Data acquisition: retrieving publications and all their associated concepts
      • 2.1 Processing concept data
    • 3. Exploring our dataset: basic statistics about Publications / Concepts
      • 3.1 Documents With concepts VS Without
      • 3.2 Yearly breakdown of Documents With concepts VS Without
      • 3.3 Concepts frequency
      • 3.4 Distribution of Concepts Frequency
      • 3.5 Yearly breakdown: unique VS repeated concepts
    • 4. Isolating ‘interesting’ concepts using frequency and score_avg
      • 4.1 The problem: frequent concepts are not that interesting!
      • 4.2 Solution 1: prefiltering by score_avg and sorting by frequency
      • 4.3 Solution 2: prefiltering by frequency and sorting by score_avg
    • 5. Analyses By Year
      • 5.1 Adding year-based metrics to the concepts dataframe
      • 5.2 Charting the variation: multi-year visualization
    • 6. Conclusion
      • The main takeaways
      • What next

Publications

  • General Publication Statistics about a Research Organization
    • Prerequisites
    • Choose a Research Organization
    • Publications output by year
    • Publications most cited in last 2 years
    • Publications most cited - all time
    • Publications most cited : which research areas?
    • Publications most cited : which journals?
    • Top Funders (by aggregated funding amount)
    • Top funders split by country of the funder
    • Correlation between No of Publications VS Funding
  • Citation Analysis: an Introduction
    • 1. Prerequisites
    • Method A: getting citations for one publication at a time
      • Comments about this method
    • Method B: Getting citations for multiple publications via a single query
    • Creating a second-level citations network
    • Building a Simple Dataviz
    • Final considerations
      • Querying for more than 1000 results
      • Querying for more than 50K results
      • Dealing with highly cited publications
      • Pre-checking citations counts
  • Citation Analysis: Journals Citing a Research Organization
    • 1. Prerequisites
    • 2. Choose a Research Organization
    • 3. Building a Publications Baseset
    • 4. Extracting Publications Citing the Baseset
    • 5. Journal Analysis
      • Number of Unique journals
      • Most frequent journals
      • Top 100 journals chart
      • Top 20 journals by year chart
  • Citation Analysis: Journals Cited by a Research Organization
    • Prerequisites
    • 1. Choosing a Research Organization
      • 1.1 Selecting a Field of Research ID
    • 2. Getting the IDs of the outgoing citations
      • 2.1 Removing duplicates and counting most frequent citations
    • 3. Enriching the citations IDs with other publication metadata
      • 3.1 Adding the citations counts
    • 4. Journal Analysis
      • 4.1 Number of Unique journals
      • 4.2 Most frequent journals
      • 4.3 Top 50 journals chart, by publisher
      • 4.4 Top 20 journals by year of the cited publication
    • Conclusions
  • Extracting Authors order from Publications data
    • Prerequisites
    • 1. Extracting a dataset from Dimensions
    • 2. Combining the results
    • Where to go from here
  • Journal Profiling Part 1: Getting the Data
    • Prerequisites
    • Selecting a Journal and Extracting All Publications Metadata
    • Basic stats about authors
    • A quick look at authors without a Dimensions Researcher ID
      • Any common patterns?
      • Creating an export for manual curation
  • Journal Profiling Part 2: Impact Metrics
    • Prerequisites
    • Measuring the Impact of Researchers within a Journal
      • Load the publications and authors data previously saved
      • Isolate the Researchers data (= authors with an ID)
      • Enrich the data with Impact Statistics
    • Couple of Dataviz
  • Journal Profiling Part 3: Funding
    • Prerequisites
      • Load previously saved researchers data
    • Adding another impact measure: funding
      • We’ll have to do it in two steps
    • Next: full data for step 1
    • Next: full data for step 2
    • Finally: let’s merge the new data into the ‘researchers-impact’ table
    • Couple of Dataviz
  • Journal Profiling Part 4: Institutions
    • Prerequisites
    • Institutions Contributing to a Journal
      • Load previously saved affiliations data
    • Basic stats about affiliations
    • Enriching the unique affiliations (GRIDs list) with pubs count and authors count
    • Couple of Dataviz
  • Journal Profiling Part 5: Competing Journals Analysis
    • Prerequisites
    • Competing Journals
      • First let’s reload the data obtained in previous steps
      • What the query looks like
    • Extracting all publications/journals information
    • Visualizations
  • Topic Modeling Analysis for a Set of Publications: Basic Workflow
    • Prerequisites
    • 1. Creating a Dataset: e.g. Publications from a Research Organization
    • 2. Concepts: exploratory analysis
    • 3. Building a word-cloud visualization
    • 4. Trend analysis - ‘emerging’ new topics each year
    • 5. Trend Analysis: topics growth
    • 6. Trend analysis: growth distribution based on selected years
    • 7. Conclusion
  • Building a concepts co-occurence network
    • Prerequisites
    • Step 1: Creating a dataset
    • Step 2: Building a concepts co-occurrence network
    • Step 3: Visualizing the network
    • Conclusions
  • Rejected Article tracker
    • Prerequisites
    • 1. Get an example data set
    • 2. Define the search template
    • 3. Iteratively Query the Dimensions API for the retracted articles
      • 4. Join together the input and output data
    • 5. Add Matching Score
    • 6. Conclusion

Grants

  • Enriching Grants part 1: Matching your grants records to Dimensions
    • A sample grants list
    • Prerequisites
    • Loading the sample grants data
    • Matching grants data
      • A) Matching grants when we have a grant number
      • B) What if we don’t have a grant number?
    • Back to our grants list
      • Enriching the original list
  • Enriching Grants part 2: Adding Publications Information from Dimensions
    • Prerequisites
    • Reusing the sample grants data from part-1
    • Extracting linked Publications data from Dimensions
      • Building a looped extraction
      • Final step: grouping publications by grant
    • Data Exploration
      • Publications per grant by year and funding amount
      • Publications per grant by country
      • Correlation of num of publications to grant length
      • Publications by grant funder
      • Publications by grant funder vs funding amount
    • Conclusion
  • Enriching Grants part 3: adding related Patents and Clinical Trials data
    • Prerequisites
    • Reusing the enriched grants data from part-2
    • Extracting linked Patents data
    • Extracting linked Clinical Trials data
    • Data Exploration
      • How many linked objects overall?
      • Patents and Clinical Trials by Year
      • Patents and Clinical Trials by Grant Funder
      • Exploring Correlations between dimensions
    • Conclusion
  • Identifying emerging topics in grants using ‘concepts’
    • Prerequisites
    • 1. Creating a Dataset: e.g. Grants from a Funder
    • 2. A first look at the concepts we extracted
    • 3. Building a word-cloud visualization
    • 4. Trend analysis - ‘emerging’ new topics each year
    • 5. Segmenting results using the fields of research categories
    • 6. Conclusion
  • Getting all grants received by a list of researchers
    • Prerequisites
    • 1. Starting point: a list of researchers
    • 2. What Grants have been received by these researchers?
    • 3. Data Exploration
      • Grants by year
      • Grants by funders
      • Funders by funding amount

Datasets

  • The Datasets API: Features Overview
    • Prerequisites
    • 1. Sample Dataset Queries
      • Searching datasets by keyword
      • Returning associated grants and publication data
      • Searching using fielded search
      • Extracting related funding via grants
      • Aggregating results using facets
        • Top funders
        • Top research organizations
        • Top contributors
    • 2. A closer look at Datasets statistics
      • Counting records per each field
      • Creating a bar chart
      • Counting the yearly distribution of field/records data
      • Creating a line chart
    • Where to find out more

Clinical Trials

  • Clinical Trials by Volume of Publications
    • Prerequisites
    • Query for Clinical Trials
    • Counting publications per clinical trial

Patents

  • The Patents API: Features Overview
    • Prerequisites
    • 1. Sample Patents Queries
      • Searching patents by keyword
      • Searching using fielded search
        • Searching using dates
      • Extracting cited publications via publication_ids
      • Extracting cited patents via reference_ids
      • Aggregating results using facets
        • Top assignees
        • Top researchers
        • Top FOR categories
    • 2. A closer look at Patents statistics
      • Counting records per each field
      • Creating a bar chart
      • Counting the yearly distribution of field/records data
      • Creating a line chart
    • Where to find out more
  • Measuring the Innovation Impact of an Organization using Patents Citations
    • Prerequisites
    • 1. Choosing a GRID Research Organization
    • 2. Extracting Publications Data
      • Quick look at publications statistics
      • What are the main subject areas?
    • 3. Extracting Patents linked to Publications
      • Enriching publications with patents citations metrics
    • 4. Patents Data Analysis
      • How many patents per year?
      • Who is filing the patents?
      • What are the publications most frequenlty referenced in patents?
      • What are the main subject areas of referenced publications?
      • Is there a correlation between publication citations and patents citations?
    • Where to go from here
  • Patent publication references, for an entire patent family
    • Prerequisites
    • 1. Search for the patent ID and return the family ID.
    • 2. Use the family ID to search for all related patents and return the publications IDs they reference
    • 3. Enriching the publication IDs with additional metadata
    • 4. Combine the publication metadata with the patent citations information
      • 4.1 Optional: exporting the data to google sheets
    • Conclusions

Policy Documents

  • Measuring the policy impact of an organization using policy document citations
    • Prerequisites
    • 1. Choosing a research organization’s GRID identifier
    • 2. Extracting publications data
    • 3. Extracting policy documents linked to publications
      • Enriching publications with policy document citation metrics
    • 4. Policy documents data analysis
      • How many policy documents per year?
      • Who is publishing the policy documents?
      • What are the main subject areas of the publications most frequently referenced in policy documents?
      • Is there a correlation between publication citations and policy document citations?
    • Where to go from here

Researchers

  • Calculating the H-index of a researcher
    • Background
      • Prerequisites
      • Selecting a researcher
      • The H-Index function
        • Getting citations data from Dimensions
        • Wrapping things up
      • Where to find out more
  • Extracting researchers based on affiliations and publications history
    • Prerequisites
      • Select an organization ID
    • Background: understanding the data model
    • Methodology: two options available
    • Approach 1. From publications to researchers
    • Approach 2. From researchers to publications
    • Conclusions
  • Expert Identification with the Dimensions API - An Introduction
    • Prerequisites
    • At a glance
    • Step 1: Concept Extraction
      • What are concepts?
      • Extracting concepts with the DSL
    • Step 2: Expert Identification
      • Example 1. Basic query using concepts
      • Example 2. Query with OR connectors
      • Example 3. Query with where filters
      • Example 4. Adding Overlap Annotations (eg for conflict of interests checks)
      • Example 5. Query with MUST/NOT Operators
      • Example 6. MUST together with AND/OR
      • Example 7. Wildcard searches
    • Additional resources: shortcut functions included in Dimcli
      • extract_concepts
      • identify_experts
      • Build a reviewers matrix
  • Identification of reviewers for funders: Globally and among Panels
    • Introduction: Use Cases For Reviewer Identification
      • Kinds of identification
    • Prerequisites
    • 1. Loading and preprocessing text data
      • 1.1 Loading from File (placeholder code)
      • 1.2 Loading Sample Data
      • 1.3 Preprocess data and extract concepts
    • 2. The Two Use Cases - Recap
      • 2.1 Global identification use case
      • 2.2 Panel identification use case
    • 3. Global identification
      • 3.1 Advanced Cases: Working around the limitations of global search
        • 3.1.1 Searching based on organizational affiliation
        • 3.1.2 Searching based on previous funding history
    • 4. Panel Identification
    • Conclusions

Organizations

  • The Organizations API: Features Overview
    • Prerequisites
    • 1. Matching affiliation data to GRID IDs using extract_affiliations
    • 2. Searching GRID organizations
      • Full-text search
      • Fielded search
      • Returning facets
      • Returning organizations facets from publications
    • 3. A closer look at the organizations data statistics
      • Let’s visualize the data with plotly
    • Where to find out more
  • Identifying the Industry Collaborators of an Academic Institution
    • Prerequisites
    • 1. Selecting an academic institution
    • 2. Extracting publications from industry collaborations
    • 3. Analyses
      • 3.1 Count of Publications per year from Industry Collaborations
      • 3.2 Citations from Industry Collaboration
      • 3.3 Top Industry Collaborators
      • 3.4 Countries of Industry Collaborators
      • 3.5 Putting Countries and Collaborators together
  • Building an Organizations Collaboration Network Diagram
    • Prerequisites
    • 1. Choose an Organization and a keyword (topic)
    • 2. Building a one-degree network of collaborating institutions
    • 3. Building a network of any size
    • 4. Visualizing the network
    • 5. Addendum: showing only ‘Government’ collaborators
    • Conclusions
  • Collaboration Patterns By Year (International, Domestic, Internal)
    • Prerequisites
    • 1. Lookup the University that you are interested in
    • 2. Publications output by year
    • 3. International publications
    • 4. Domestic
    • 5. Internal
    • 6. Joining up All metrics together
    • 7. How does this compare to Australia?
    • 8. How does this compare to a different Institution (University of Toronto)?
    • Want to learn more?
  • Mapping GRID IDs to Organization Data
    • Prerequisites
    • 1. Importing Organization Data
      • 1.1 Manually Generate Organization Data
      • 1.2 Load Organization Data from Local Machine
    • 2. Utilizing Dimensions API to Extract GRID IDs
    • 3. Save the GRID ID Dataset we created
    • Conclusions
  • Using Dimensions organization groups with the API
    • Prerequisites
    • 1. Downloading groups data from Dimensions
    • 2. Using group data with the API
      • How many grants from the NSF?
  • Benchmarking organizations with the Dimensions API
    • Prerequisites
    • 1. Quick benchmarking using the API
    • 2. Calculating more complex ‘Quality’ Benchmarking indicators: Number of articles in the top X percent of research their category
      • Step 1. retrieve the total volume of publications by volume. (focusing on Fields of Research)
        • Step 1.2. … Need to filter for level 2 codes
      • Step 2. calculate 1% of the total number of records by category. This will be used to retrieve the 1% boundary record..
      • Step 3. Use the cutoff value to get the indicator value for the 1% boundary
        • We can only filter on integers in the DSL, so we will round up the values
      • Step 4. Now get the number of publications by organisation, filtered by category that have a field_citation_ratio > the boundary score
      • Step 5. Rank the results
        • We should probably control for Volume though…
      • Step 6. Get the total paper counts for each organisation
      • Step 7. calculate the percentage of local papers in the top 1% of global publications (in 2018)
        • Now the results are going to look a little strange…
        • Need to control for size…
      • Final step. Show me the institutions that I should be most interested in (Five above)

Miscellaneous

  • Generating a report that monitors research topics across Dimensions sources
    • Prerequisites
    • 1. Creating a query builder
      • Query logic
    • 2. Plotting the results
    • 3. Conclusions
  • Enrich text with Field of Research (FoR) codes
    • A sample set of publications
    • Prerequisites
    • 1. Loading the sample text
    • 2. FoR Classification
    • 3. Number of FoR categories per document
    • 4. Top FoR categories by document count
    • Conclusions

Github Source

DSL


© Copyright 2020 Digital Science & Research Solutions, Inc. All Rights Reserved | About us · Privacy policy · Legal terms