Works analysis and plot with concepts 1
Import the library
[1]:
# Import the full library
from openalex_analysis.plot import InstitutionsPlot, WorksPlot
# If you only need the analysis methods, you can import them without the plot ones with:
from openalex_analysis.analysis import InstitutionsAnalysis, WorksAnalysis
Basic case
Works of a concept
In this example, we will analyse the works of sustainability and their references
Get the works
[2]:
concept_sustainability_id = 'C66204764'
wplt = WorksPlot(concept_sustainability_id)
The works array
[3]:
wplt.entities_df.head(3)
[3]:
id | doi | title | display_name | publication_year | publication_date | ids | language | primary_location | type | ... | referenced_works_count | referenced_works | related_works | cited_by_api_url | counts_by_year | updated_date | created_date | abstract | institution_assertions | is_authors_truncated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | https://openalex.org/W2198847224 | https://doi.org/10.1016/s1352-0237(01)00307-0 | Human Development Report | Human Development Report | 2001 | 2001-05-01 | {'doi': 'https://doi.org/10.1016/s1352-0237(01... | en | {'is_accepted': False, 'is_oa': False, 'is_pub... | article | ... | 0 | [] | [https://openalex.org/W4392167019, https://ope... | https://api.openalex.org/works?filter=cites:W2... | [{'cited_by_count': 14, 'year': 2024}, {'cited... | 2024-09-06T14:44:04.425232 | 2016-06-24 | In 2013, UN-Habitat released the State of The ... | None | None |
1 | https://openalex.org/W1999167944 | https://doi.org/10.1126/science.1259855 | Planetary boundaries: Guiding human developmen... | Planetary boundaries: Guiding human developmen... | 2015 | 2015-02-13 | {'doi': 'https://doi.org/10.1126/science.12598... | en | {'is_accepted': True, 'is_oa': True, 'is_publi... | article | ... | 163 | [https://openalex.org/W1007704209, https://ope... | [https://openalex.org/W4235755527, https://ope... | https://api.openalex.org/works?filter=cites:W1... | [{'cited_by_count': 891, 'year': 2024}, {'cite... | 2024-09-06T06:00:48.170112 | 2016-06-24 | Crossing the boundaries in global sustainabili... | None | None |
2 | https://openalex.org/W2126975094 | None | Climate change 2007 : impacts, adaptation and ... | Climate change 2007 : impacts, adaptation and ... | 2007 | 2007-01-01 | {'doi': None, 'mag': '2126975094', 'openalex':... | en | {'is_accepted': False, 'is_oa': False, 'is_pub... | book | ... | 1 | [https://openalex.org/W1905429483] | [https://openalex.org/W617039848, https://open... | https://api.openalex.org/works?filter=cites:W2... | [{'cited_by_count': 44, 'year': 2024}, {'cited... | 2024-09-14T10:43:11.583624 | 2016-06-24 | Foreword Preface Introduction Summary for poli... | [] | None |
3 rows × 51 columns
Compute the most used references
[4]:
wplt.create_element_used_count_array('reference')
The reference count array
[5]:
wplt.element_count_df.head(3)
[5]:
C66204764 Sustainability | |
---|---|
element | |
https://openalex.org/W4285719527 | 1495 |
https://openalex.org/W49479346 | 421 |
https://openalex.org/W2026816730 | 296 |
Advanced cases
Compare the works of a concept and of 2 institutions per year
Analysis
In this example, we will compare the works of concept (Sustainability) and the works of 2 institutions (SRC - Stockholm Resilience Centre and UTT - University of Technology of Troyes) year by year.
The analysis will focus on the concept used by the works but it also work with the references used as in the previous example.
[6]:
concept_sustainability_id = 'C66204764'
institution_src_id = 'I138595864'
institution_utt_id = 'I140494188'
# count per year from 2015 to 2023
count_years = list(range(2015, 2024))
# The filter needs to have the format from the OpenAlex API
sustainability_concept_filter = {"concepts": {"id": concept_sustainability_id}}
# Create a list of dictionary with each dictionary representing an institution
# The dictionary keys can be any parameter of the WorksConceptsAnalysis constructor
# In our example, we add an extra filter to get only the works about sustainability of each institution
entities_to_compare = [
{'entity_from_id': institution_src_id, 'extra_filters': sustainability_concept_filter,},
{'entity_from_id': institution_utt_id, 'extra_filters': sustainability_concept_filter,}
]
# We create instance with the concept of sustainability. In the analysis, the main entity is
# the entity given to the constructor if given, or the first entity in the list given to the
# create_element_used_count_array() function
wplt = WorksPlot(concept_sustainability_id)
wplt.create_element_used_count_array('concept', entities_to_compare, count_years = count_years)
# We sort the entities in the statistics array by the most used. We can also sort them again later
# with sort_count_array()
wplt.add_statistics_to_element_count_array(sort_by = 'sum_all_entities')
#wplt.element_count_df.to_csv("array.csv")
wplt.element_count_df.head(20)
[6]:
C66204764 Sustainability | I138595864 Stockholm Resilience Centre | I140494188 Université de Technologie de Troyes | sum_all_entities | average_all_entities | proportion_used_by_main_entity | sum_all_entities_rank | proportion_used_by_main_entity_rank | h_used_all_l_use_main | ||
---|---|---|---|---|---|---|---|---|---|---|
element | year | |||||||||
https://openalex.org/C18903297 | 2015 | 441 | 21 | 3 | 465 | 155.0 | 0.948387 | 0.999085 | 0.848525 | 0.847748 |
2016 | 495 | 31 | 0 | 526 | 175.333333 | 0.941065 | 0.999263 | 0.853373 | 0.852744 | |
2017 | 531 | 29 | 1 | 561 | 187.0 | 0.946524 | 0.999339 | 0.850048 | 0.849487 | |
2018 | 579 | 45 | 1 | 625 | 208.333333 | 0.9264 | 0.999479 | 0.861684 | 0.861235 | |
2019 | 661 | 60 | 2 | 723 | 241.0 | 0.914246 | 0.999670 | 0.869095 | 0.868808 | |
2020 | 791 | 57 | 2 | 850 | 283.333333 | 0.930588 | 0.999746 | 0.858706 | 0.858488 | |
2021 | 874 | 68 | 9 | 951 | 317.0 | 0.919033 | 0.999822 | 0.865494 | 0.865340 | |
2022 | 895 | 59 | 2 | 956 | 318.666667 | 0.936192 | 0.999898 | 0.856144 | 0.856056 | |
2023 | 954 | 58 | 7 | 1019 | 339.666667 | 0.936212 | 0.999975 | 0.855936 | 0.855914 | |
https://openalex.org/C66204764 | 2015 | 441 | 21 | 3 | 465 | 155.0 | 0.948387 | 0.999085 | 0.848525 | 0.847748 |
2016 | 495 | 31 | 0 | 526 | 175.333333 | 0.941065 | 0.999263 | 0.853373 | 0.852744 | |
2017 | 531 | 29 | 1 | 561 | 187.0 | 0.946524 | 0.999339 | 0.850048 | 0.849487 | |
2018 | 579 | 45 | 1 | 625 | 208.333333 | 0.9264 | 0.999479 | 0.861684 | 0.861235 | |
2019 | 661 | 60 | 2 | 723 | 241.0 | 0.914246 | 0.999670 | 0.869095 | 0.868808 | |
2020 | 791 | 57 | 2 | 850 | 283.333333 | 0.930588 | 0.999746 | 0.858706 | 0.858488 | |
2021 | 874 | 68 | 9 | 951 | 317.0 | 0.919033 | 0.999822 | 0.865494 | 0.865340 | |
2022 | 895 | 59 | 2 | 956 | 318.666667 | 0.936192 | 0.999898 | 0.856144 | 0.856056 | |
2023 | 954 | 58 | 7 | 1019 | 339.666667 | 0.936212 | 0.999975 | 0.855936 | 0.855914 | |
https://openalex.org/C86803240 | 2015 | 441 | 21 | 3 | 465 | 155.0 | 0.948387 | 0.999085 | 0.848525 | 0.847748 |
2016 | 495 | 31 | 0 | 526 | 175.333333 | 0.941065 | 0.999263 | 0.853373 | 0.852744 |
Plot
As we keep only the 10k most cited articles in each dataset, the selected articles for sustainability contains only 2% of them (~500k in total). As the recent articles are usually less cited than the older ones, we have less articles in the recent years.
The default plot plot the usage of the first concept in the dataframe
[7]:
wplt.get_figure_time_series_element_used_by_entities()#.write_image("default_yearly_plot.pdf", width = 1000)
We can plot sum of usage by all and by SRC of the concept “Social sustainability” (’https://openalex.org/C52407799’)
[8]:
wplt.get_figure_time_series_element_used_by_entities(element = 'https://openalex.org/C52407799', y_datas = ['sum_all_entities', 'I138595864 Stockholm Resilience Centre'])#.write_image("sum_yearly_plot.pdf", width = 1000)