Works analysis and plot with concepts 2
Basic case: count concepts presence in the dataset
The dataset consists of the works of the concept “Sustainability” and it contains only the 10k most cited ones (default library setting).
For these 10k works, we will count the number of time each concept appears. We could do the same with the references cited by these works (replace ‘concept’ by ‘reference’)
[1]:
from openalex_analysis.plot import WorksPlot, config
# limit to 1000 the number of entities (works in this example) per dataset
config.n_max_entities = 10000
# config.log_level = "INFO"
concept_sustainability = 'C66204764'
wplt = WorksPlot(concept_sustainability)
wplt.create_element_used_count_array('concept') # you can also count the 'reference'
wplt.add_statistics_to_element_count_array()
wplt.element_count_df.head(10)
[1]:
C66204764 Sustainability | sum_all_entities | average_all_entities | proportion_used_by_main_entity | sum_all_entities_rank | proportion_used_by_main_entity_rank | h_used_all_l_use_main | |
---|---|---|---|---|---|---|---|
element | |||||||
https://openalex.org/C18903297 | 10000 | 10000 | 10000.0 | 1.0 | 0.999817 | 0.500091 | 0.500000 |
https://openalex.org/C66204764 | 10000 | 10000 | 10000.0 | 1.0 | 0.999817 | 0.500091 | 0.500000 |
https://openalex.org/C86803240 | 10000 | 10000 | 10000.0 | 1.0 | 0.999817 | 0.500091 | 0.500000 |
https://openalex.org/C144133560 | 6523 | 6523 | 6523.0 | 1.0 | 0.999452 | 0.500091 | 0.499817 |
https://openalex.org/C162324750 | 6241 | 6241 | 6241.0 | 1.0 | 0.999269 | 0.500091 | 0.499726 |
https://openalex.org/C127413603 | 4418 | 4418 | 4418.0 | 1.0 | 0.999086 | 0.500091 | 0.499634 |
https://openalex.org/C41008148 | 4341 | 4341 | 4341.0 | 1.0 | 0.998903 | 0.500091 | 0.499543 |
https://openalex.org/C17744445 | 4211 | 4211 | 4211.0 | 1.0 | 0.998720 | 0.500091 | 0.499451 |
https://openalex.org/C199539241 | 3691 | 3691 | 3691.0 | 1.0 | 0.998537 | 0.500091 | 0.499360 |
https://openalex.org/C39432304 | 2836 | 2836 | 2836.0 | 1.0 | 0.998355 | 0.500091 | 0.499269 |
Advanced case: compare entities
Compute the statistics for institutions
In this example, we will count the number of times each concept is used by the institutions works per year.
[2]:
from openalex_analysis.plot import WorksPlot
sustainability_concept_filter = {"concepts": {"id": concept_sustainability}}
count_years = list(range(2004, 2024))
institutions = {
'I138595864': "Stockholm Resilience Centre",
'I140494188': "University of Technology of Troyes",
'I163151358': "Cyprus University of Technology",
'I107257983': "Darmstadt University of Applied Sciences",
'I201787326': "Riga Technical University",
'I4210144925': "Technological University Dublin",
'I31151848': "Technical University of Sofia",
'I3123212020': "Universidad Politécnica de Cartagena",
'I158333966': "Universitatea Tehnică din Cluj-Napoca",
'I158333966': "Università degli studi di Cassino e del Lazio Meridionale",
}
entities_ref_to_count = [None] * len(institutions)
for i, institution_id in enumerate(institutions.keys()):
entities_ref_to_count[i] = {'entity_from_id': institution_id,
'extra_filters': sustainability_concept_filter}
wplt = WorksPlot()
wplt.create_element_used_count_array('concept', entities_ref_to_count, count_years = count_years)
wplt.add_statistics_to_element_count_array(sort_by = 'sum_all_entities')
wplt.element_count_df
[2]:
I138595864 Stockholm Resilience Centre | I140494188 Université de Technologie de Troyes | I163151358 Cyprus University of Technology | I107257983 Darmstadt University of Applied Sciences | I201787326 Riga Technical University | I4210144925 Technological University Dublin | I31151848 Technical University of Sofia | I3123212020 Universidad Politécnica de Cartagena | I158333966 Technical University of Cluj-Napoca | sum_all_entities | average_all_entities | proportion_used_by_main_entity | sum_all_entities_rank | proportion_used_by_main_entity_rank | h_used_all_l_use_main | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element | year | |||||||||||||||
https://openalex.org/C86803240 | 2004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <NA> | 0.0 | <NA> | 0.422286 | NaN | NaN |
2005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <NA> | 0.0 | <NA> | 0.422286 | NaN | NaN | |
2006 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0.222222 | 0.0 | 0.949926 | 0.748684 | 0.711195 | |
2007 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0.222222 | 0.5 | 0.949926 | 0.390608 | 0.371048 | |
2008 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.111111 | 1.0 | 0.891609 | 0.148302 | 0.132227 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
https://openalex.org/C9682599 | 2019 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <NA> | 0.0 | <NA> | 0.422286 | NaN | NaN |
2020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <NA> | 0.0 | <NA> | 0.422286 | NaN | NaN | |
2021 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <NA> | 0.0 | <NA> | 0.422286 | NaN | NaN | |
2022 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <NA> | 0.0 | <NA> | 0.422286 | NaN | NaN | |
2023 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0.111111 | 0.0 | 0.891609 | 0.748684 | 0.667534 |
40340 rows × 15 columns
Plot the default figure
Here, the most used concept will be plotted. We can see how many times each institutions works used the concept per year.
[3]:
wplt.get_figure_time_series_element_used_by_entities()
Plot the yearly sum of the usage of Planetary boundaries
We can choose to plot a specific concept and/or to only display the sum for all the entities (here the institutions).
[4]:
wplt.get_figure_time_series_element_used_by_entities(element = 'https://openalex.org/C32334204', y_datas = ['sum_all_entities'])
Get the size on RAM of the dataframe
Check how much space is used on the RAM by this analysis.
[5]:
import humanize
humanize.naturalsize(wplt.element_count_df.memory_usage(deep=True).sum())
[5]:
'5.7 MB'
Save the dataframe to a CSV file
We can save the dataframe with the statistics by uncommenting the following line:
[6]:
# wplt.element_count_df.to_csv("dataframe.csv")