Works analysis and plot with concepts 2

Basic case: count concepts presence in the dataset

The dataset consists of the works of the concept “Sustainability” and it contains only the 10k most cited ones (default library setting).

For these 10k works, we will count the number of time each concept appears. We could do the same with the references cited by these works (replace ‘concept’ by ‘reference’)

[1]:
from openalex_analysis.plot import WorksPlot, config

# limit to 1000 the number of entities (works in this example) per dataset
config.n_max_entities = 10000
# config.log_level = "INFO"

concept_sustainability = 'C66204764'

wplt = WorksPlot(concept_sustainability)
wplt.create_element_used_count_array('concept') # you can also count the 'reference'
wplt.add_statistics_to_element_count_array()


wplt.element_count_df.head(10)
[1]:
C66204764 Sustainability sum_all_entities average_all_entities proportion_used_by_main_entity sum_all_entities_rank proportion_used_by_main_entity_rank h_used_all_l_use_main
element
https://openalex.org/C18903297 10000 10000 10000.0 1.0 0.999817 0.500091 0.500000
https://openalex.org/C66204764 10000 10000 10000.0 1.0 0.999817 0.500091 0.500000
https://openalex.org/C86803240 10000 10000 10000.0 1.0 0.999817 0.500091 0.500000
https://openalex.org/C144133560 6523 6523 6523.0 1.0 0.999452 0.500091 0.499817
https://openalex.org/C162324750 6241 6241 6241.0 1.0 0.999269 0.500091 0.499726
https://openalex.org/C127413603 4418 4418 4418.0 1.0 0.999086 0.500091 0.499634
https://openalex.org/C41008148 4341 4341 4341.0 1.0 0.998903 0.500091 0.499543
https://openalex.org/C17744445 4211 4211 4211.0 1.0 0.998720 0.500091 0.499451
https://openalex.org/C199539241 3691 3691 3691.0 1.0 0.998537 0.500091 0.499360
https://openalex.org/C39432304 2836 2836 2836.0 1.0 0.998355 0.500091 0.499269

Advanced case: compare entities

Compute the statistics for institutions

In this example, we will count the number of times each concept is used by the institutions works per year.

[2]:
from openalex_analysis.plot import WorksPlot

sustainability_concept_filter = {"concepts": {"id": concept_sustainability}}

count_years = list(range(2004, 2024))

institutions = {
    'I138595864': "Stockholm Resilience Centre",
    'I140494188': "University of Technology of Troyes",
    'I163151358': "Cyprus University of Technology",
    'I107257983': "Darmstadt University of Applied Sciences",
    'I201787326': "Riga Technical University",
    'I4210144925': "Technological University Dublin",
    'I31151848': "Technical University of Sofia",
    'I3123212020': "Universidad Politécnica de Cartagena",
    'I158333966': "Universitatea Tehnică din Cluj-Napoca",
    'I158333966': "Università degli studi di Cassino e del Lazio Meridionale",
}

entities_ref_to_count = [None] * len(institutions)
for i, institution_id in enumerate(institutions.keys()):
    entities_ref_to_count[i] = {'entity_from_id': institution_id,
                                'extra_filters': sustainability_concept_filter}


wplt = WorksPlot()
wplt.create_element_used_count_array('concept', entities_ref_to_count, count_years = count_years)

wplt.add_statistics_to_element_count_array(sort_by = 'sum_all_entities')

wplt.element_count_df
[2]:
I138595864 Stockholm Resilience Centre I140494188 Université de Technologie de Troyes I163151358 Cyprus University of Technology I107257983 Darmstadt University of Applied Sciences I201787326 Riga Technical University I4210144925 Technological University Dublin I31151848 Technical University of Sofia I3123212020 Universidad Politécnica de Cartagena I158333966 Technical University of Cluj-Napoca sum_all_entities average_all_entities proportion_used_by_main_entity sum_all_entities_rank proportion_used_by_main_entity_rank h_used_all_l_use_main
element year
https://openalex.org/C86803240 2004 0 0 0 0 0 0 0 0 0 <NA> 0.0 <NA> 0.422286 NaN NaN
2005 0 0 0 0 0 0 0 0 0 <NA> 0.0 <NA> 0.422286 NaN NaN
2006 0 1 0 0 0 0 0 1 0 2 0.222222 0.0 0.949926 0.748684 0.711195
2007 1 0 0 0 0 0 1 0 0 2 0.222222 0.5 0.949926 0.390608 0.371048
2008 1 0 0 0 0 0 0 0 0 1 0.111111 1.0 0.891609 0.148302 0.132227
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
https://openalex.org/C9682599 2019 0 0 0 0 0 0 0 0 0 <NA> 0.0 <NA> 0.422286 NaN NaN
2020 0 0 0 0 0 0 0 0 0 <NA> 0.0 <NA> 0.422286 NaN NaN
2021 0 0 0 0 0 0 0 0 0 <NA> 0.0 <NA> 0.422286 NaN NaN
2022 0 0 0 0 0 0 0 0 0 <NA> 0.0 <NA> 0.422286 NaN NaN
2023 0 0 0 0 1 0 0 0 0 1 0.111111 0.0 0.891609 0.748684 0.667534

40340 rows × 15 columns

Plot the default figure

Here, the most used concept will be plotted. We can see how many times each institutions works used the concept per year.

[3]:
wplt.get_figure_time_series_element_used_by_entities()

Plot the yearly sum of the usage of Planetary boundaries

We can choose to plot a specific concept and/or to only display the sum for all the entities (here the institutions).

[4]:
wplt.get_figure_time_series_element_used_by_entities(element = 'https://openalex.org/C32334204', y_datas = ['sum_all_entities'])

Get the size on RAM of the dataframe

Check how much space is used on the RAM by this analysis.

[5]:
import humanize

humanize.naturalsize(wplt.element_count_df.memory_usage(deep=True).sum())
[5]:
'5.7 MB'

Save the dataframe to a CSV file

We can save the dataframe with the statistics by uncommenting the following line:

[6]:
# wplt.element_count_df.to_csv("dataframe.csv")