Works analysis and plot with concepts 1

Import the library

[1]:
# Import the full library
from openalex_analysis.plot import InstitutionsPlot, WorksPlot

# If you only need the analysis methods, you can import them without the plot ones with:
from openalex_analysis.analysis import InstitutionsAnalysis, WorksAnalysis

Basic case

Works of a concept

In this example, we will analyse the works of sustainability and their references

Get the works

[2]:
concept_sustainability_id = 'C66204764'

wplt = WorksPlot(concept_sustainability_id)

The works array

[3]:
wplt.entities_df.head(3)
[3]:
id doi title display_name publication_year publication_date ids language primary_location type ... referenced_works_count referenced_works related_works cited_by_api_url counts_by_year updated_date created_date abstract institution_assertions is_authors_truncated
0 https://openalex.org/W2198847224 https://doi.org/10.1016/s1352-0237(01)00307-0 Human Development Report Human Development Report 2001 2001-05-01 {'doi': 'https://doi.org/10.1016/s1352-0237(01... en {'is_accepted': False, 'is_oa': False, 'is_pub... article ... 0 [] [https://openalex.org/W4392167019, https://ope... https://api.openalex.org/works?filter=cites:W2... [{'cited_by_count': 14, 'year': 2024}, {'cited... 2024-09-06T14:44:04.425232 2016-06-24 In 2013, UN-Habitat released the State of The ... None None
1 https://openalex.org/W1999167944 https://doi.org/10.1126/science.1259855 Planetary boundaries: Guiding human developmen... Planetary boundaries: Guiding human developmen... 2015 2015-02-13 {'doi': 'https://doi.org/10.1126/science.12598... en {'is_accepted': True, 'is_oa': True, 'is_publi... article ... 163 [https://openalex.org/W1007704209, https://ope... [https://openalex.org/W4235755527, https://ope... https://api.openalex.org/works?filter=cites:W1... [{'cited_by_count': 891, 'year': 2024}, {'cite... 2024-09-06T06:00:48.170112 2016-06-24 Crossing the boundaries in global sustainabili... None None
2 https://openalex.org/W2126975094 None Climate change 2007 : impacts, adaptation and ... Climate change 2007 : impacts, adaptation and ... 2007 2007-01-01 {'doi': None, 'mag': '2126975094', 'openalex':... en {'is_accepted': False, 'is_oa': False, 'is_pub... book ... 1 [https://openalex.org/W1905429483] [https://openalex.org/W617039848, https://open... https://api.openalex.org/works?filter=cites:W2... [{'cited_by_count': 44, 'year': 2024}, {'cited... 2024-09-14T10:43:11.583624 2016-06-24 Foreword Preface Introduction Summary for poli... [] None

3 rows × 51 columns

Compute the most used references

[4]:
wplt.create_element_used_count_array('reference')

The reference count array

[5]:
wplt.element_count_df.head(3)
[5]:
C66204764 Sustainability
element
https://openalex.org/W4285719527 1495
https://openalex.org/W49479346 421
https://openalex.org/W2026816730 296

Advanced cases

Compare the works of a concept and of 2 institutions per year

Analysis

In this example, we will compare the works of concept (Sustainability) and the works of 2 institutions (SRC - Stockholm Resilience Centre and UTT - University of Technology of Troyes) year by year.
The analysis will focus on the concept used by the works but it also work with the references used as in the previous example.
[6]:
concept_sustainability_id = 'C66204764'
institution_src_id = 'I138595864'
institution_utt_id = 'I140494188'

# count per year from 2015 to 2023
count_years = list(range(2015, 2024))

# The filter needs to have the format from the OpenAlex API
sustainability_concept_filter = {"concepts": {"id": concept_sustainability_id}}

# Create a list of dictionary with each dictionary representing an institution
# The dictionary keys can be any parameter of the WorksConceptsAnalysis constructor
# In our example, we add an extra filter to get only the works about sustainability of each institution
entities_to_compare = [
    {'entity_from_id': institution_src_id, 'extra_filters': sustainability_concept_filter,},
    {'entity_from_id': institution_utt_id, 'extra_filters': sustainability_concept_filter,}
]

# We create instance with the concept of sustainability. In the analysis, the main entity is
# the entity given to the constructor if given, or the first entity in the list given to the
# create_element_used_count_array() function
wplt = WorksPlot(concept_sustainability_id)

wplt.create_element_used_count_array('concept', entities_to_compare, count_years = count_years)

# We sort the entities in the statistics array by the most used. We can also sort them again later
# with sort_count_array()
wplt.add_statistics_to_element_count_array(sort_by = 'sum_all_entities')

#wplt.element_count_df.to_csv("array.csv")
wplt.element_count_df.head(20)
[6]:
C66204764 Sustainability I138595864 Stockholm Resilience Centre I140494188 Université de Technologie de Troyes sum_all_entities average_all_entities proportion_used_by_main_entity sum_all_entities_rank proportion_used_by_main_entity_rank h_used_all_l_use_main
element year
https://openalex.org/C18903297 2015 441 21 3 465 155.0 0.948387 0.999085 0.848525 0.847748
2016 495 31 0 526 175.333333 0.941065 0.999263 0.853373 0.852744
2017 531 29 1 561 187.0 0.946524 0.999339 0.850048 0.849487
2018 579 45 1 625 208.333333 0.9264 0.999479 0.861684 0.861235
2019 661 60 2 723 241.0 0.914246 0.999670 0.869095 0.868808
2020 791 57 2 850 283.333333 0.930588 0.999746 0.858706 0.858488
2021 874 68 9 951 317.0 0.919033 0.999822 0.865494 0.865340
2022 895 59 2 956 318.666667 0.936192 0.999898 0.856144 0.856056
2023 954 58 7 1019 339.666667 0.936212 0.999975 0.855936 0.855914
https://openalex.org/C66204764 2015 441 21 3 465 155.0 0.948387 0.999085 0.848525 0.847748
2016 495 31 0 526 175.333333 0.941065 0.999263 0.853373 0.852744
2017 531 29 1 561 187.0 0.946524 0.999339 0.850048 0.849487
2018 579 45 1 625 208.333333 0.9264 0.999479 0.861684 0.861235
2019 661 60 2 723 241.0 0.914246 0.999670 0.869095 0.868808
2020 791 57 2 850 283.333333 0.930588 0.999746 0.858706 0.858488
2021 874 68 9 951 317.0 0.919033 0.999822 0.865494 0.865340
2022 895 59 2 956 318.666667 0.936192 0.999898 0.856144 0.856056
2023 954 58 7 1019 339.666667 0.936212 0.999975 0.855936 0.855914
https://openalex.org/C86803240 2015 441 21 3 465 155.0 0.948387 0.999085 0.848525 0.847748
2016 495 31 0 526 175.333333 0.941065 0.999263 0.853373 0.852744

Plot

As we keep only the 10k most cited articles in each dataset, the selected articles for sustainability contains only 2% of them (~500k in total). As the recent articles are usually less cited than the older ones, we have less articles in the recent years.

The default plot plot the usage of the first concept in the dataframe

[7]:
wplt.get_figure_time_series_element_used_by_entities()#.write_image("default_yearly_plot.pdf", width = 1000)

We can plot sum of usage by all and by SRC of the concept “Social sustainability” (’https://openalex.org/C52407799’)

[8]:
wplt.get_figure_time_series_element_used_by_entities(element = 'https://openalex.org/C52407799', y_datas = ['sum_all_entities', 'I138595864 Stockholm Resilience Centre'])#.write_image("sum_yearly_plot.pdf", width = 1000)