Works analysis and plot with concepts 2

Basic case: count concepts presence in the dataset

The dataset consists of the works of the concept “Sustainability” and it contains only the 10k most cited ones (default library setting).

For these 10k works, we will count the number of time each concept appears. We could do the same with the references cited by these works (replace ‘concept’ by ‘reference’)

[1]:

from openalex_analysis.plot import WorksPlot, config

# limit to 1000 the number of entities (works in this example) per dataset
config.n_max_entities = 10000
# config.log_level = "INFO"

concept_sustainability = 'C66204764'

wplt = WorksPlot(concept_sustainability)
wplt.create_element_used_count_array('concept') # you can also count the 'reference'
wplt.add_statistics_to_element_count_array()


wplt.element_count_df.head(10)

[1]:

	C66204764 Sustainability	sum_all_entities	average_all_entities	proportion_used_by_main_entity	sum_all_entities_rank	proportion_used_by_main_entity_rank	h_used_all_l_use_main
element
https://openalex.org/C18903297	10000	10000	10000.0	1.0	0.999817	0.500091	0.500000
https://openalex.org/C66204764	10000	10000	10000.0	1.0	0.999817	0.500091	0.500000
https://openalex.org/C86803240	10000	10000	10000.0	1.0	0.999817	0.500091	0.500000
https://openalex.org/C144133560	6523	6523	6523.0	1.0	0.999452	0.500091	0.499817
https://openalex.org/C162324750	6241	6241	6241.0	1.0	0.999269	0.500091	0.499726
https://openalex.org/C127413603	4418	4418	4418.0	1.0	0.999086	0.500091	0.499634
https://openalex.org/C41008148	4341	4341	4341.0	1.0	0.998903	0.500091	0.499543
https://openalex.org/C17744445	4211	4211	4211.0	1.0	0.998720	0.500091	0.499451
https://openalex.org/C199539241	3691	3691	3691.0	1.0	0.998537	0.500091	0.499360
https://openalex.org/C39432304	2836	2836	2836.0	1.0	0.998355	0.500091	0.499269

Advanced case: compare entities

Compute the statistics for institutions

In this example, we will count the number of times each concept is used by the institutions works per year.

[2]:

from openalex_analysis.plot import WorksPlot

sustainability_concept_filter = {"concepts": {"id": concept_sustainability}}

count_years = list(range(2004, 2024))

institutions = {
    'I138595864': "Stockholm Resilience Centre",
    'I140494188': "University of Technology of Troyes",
    'I163151358': "Cyprus University of Technology",
    'I107257983': "Darmstadt University of Applied Sciences",
    'I201787326': "Riga Technical University",
    'I4210144925': "Technological University Dublin",
    'I31151848': "Technical University of Sofia",
    'I3123212020': "Universidad Politécnica de Cartagena",
    'I158333966': "Universitatea Tehnică din Cluj-Napoca",
    'I158333966': "Università degli studi di Cassino e del Lazio Meridionale",
}

entities_ref_to_count = [None] * len(institutions)
for i, institution_id in enumerate(institutions.keys()):
    entities_ref_to_count[i] = {'entity_from_id': institution_id,
                                'extra_filters': sustainability_concept_filter}


wplt = WorksPlot()
wplt.create_element_used_count_array('concept', entities_ref_to_count, count_years = count_years)

wplt.add_statistics_to_element_count_array(sort_by = 'sum_all_entities')

wplt.element_count_df

[2]:

		I138595864 Stockholm Resilience Centre	I140494188 Université de Technologie de Troyes	I163151358 Cyprus University of Technology	I107257983 Darmstadt University of Applied Sciences	I201787326 Riga Technical University	I4210144925 Technological University Dublin	I31151848 Technical University of Sofia	I3123212020 Universidad Politécnica de Cartagena	I158333966 Technical University of Cluj-Napoca	sum_all_entities	average_all_entities	proportion_used_by_main_entity	sum_all_entities_rank	proportion_used_by_main_entity_rank	h_used_all_l_use_main
element	year
https://openalex.org/C86803240	2004	0	0	0	0	0	0	0	0	0	<NA>	0.0	<NA>	0.422286	NaN	NaN
	2005	0	0	0	0	0	0	0	0	0	<NA>	0.0	<NA>	0.422286	NaN	NaN
	2006	0	1	0	0	0	0	0	1	0	2	0.222222	0.0	0.949926	0.748684	0.711195
	2007	1	0	0	0	0	0	1	0	0	2	0.222222	0.5	0.949926	0.390608	0.371048
	2008	1	0	0	0	0	0	0	0	0	1	0.111111	1.0	0.891609	0.148302	0.132227
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
https://openalex.org/C9682599	2019	0	0	0	0	0	0	0	0	0	<NA>	0.0	<NA>	0.422286	NaN	NaN
	2020	0	0	0	0	0	0	0	0	0	<NA>	0.0	<NA>	0.422286	NaN	NaN
	2021	0	0	0	0	0	0	0	0	0	<NA>	0.0	<NA>	0.422286	NaN	NaN
	2022	0	0	0	0	0	0	0	0	0	<NA>	0.0	<NA>	0.422286	NaN	NaN
	2023	0	0	0	0	1	0	0	0	0	1	0.111111	0.0	0.891609	0.748684	0.667534

40340 rows × 15 columns

Plot the default figure

Here, the most used concept will be plotted. We can see how many times each institutions works used the concept per year.

[3]:

wplt.get_figure_time_series_element_used_by_entities()

Plot the yearly sum of the usage of Planetary boundaries

We can choose to plot a specific concept and/or to only display the sum for all the entities (here the institutions).

[4]:

wplt.get_figure_time_series_element_used_by_entities(element = 'https://openalex.org/C32334204', y_datas = ['sum_all_entities'])

Get the size on RAM of the dataframe

Check how much space is used on the RAM by this analysis.

[5]:

import humanize

humanize.naturalsize(wplt.element_count_df.memory_usage(deep=True).sum())

[5]:

'5.7 MB'

Save the dataframe to a CSV file

We can save the dataframe with the statistics by uncommenting the following line:

[6]:

# wplt.element_count_df.to_csv("dataframe.csv")