Setup and settings

This section shows how to import and configure the library.

Import the library

The library is composed of three submodules: - openalex_analysis.data: Manage the cache and the downloads from the OpenAlex API. - openalex_analysis.analysis: Include openalex_analysis.data and provides methods to run analysis on the data - openalex_analysis.plot: Include openalex_analysis.analysis and provides methods to create plots.

For example, if you only need the library to manage the downloads from OpenAlex, you can import only openalex_analysis.data as you won’t need the other methods located in openalex_analysis.analysis or openalex_analysis.plot.

If you don’t know, import openalex_analysis.plot to have all the methods available.

[1]:
# If you want to work with works, you can import the library as follow:
from openalex_analysis.plot import WorksPlot

WorksPlot()
[1]:
<openalex_analysis.plot.entities_plot.WorksPlot at 0x7080b958be60>
[2]:
# If you want to work with institutions and only need the methods to manage the downloads from OpenAlex, you can import the library as follow:
from openalex_analysis.data import WorksData

WorksData()
[2]:
<openalex_analysis.data.entities_data.WorksData at 0x7080e4310a10>

Configure the library

Example to configure the email (to use the polite pool from OpenAlex).

[3]:
from openalex_analysis.plot import config, WorksPlot

config.email = "email@example.com"

WorksPlot()
[3]:
<openalex_analysis.plot.entities_plot.WorksPlot at 0x7080b8001490>

Default configuration

Those are the default parameters. You can change them when importing the library, like in the example above with the email.

[4]:
# we need this module to set the path of 'project_data_folder_path':
from os.path import join, expanduser

config.email = None
config.api_key = None
config.openalex_url = "https://api.openalex.org"
config.http_retry_times = 3
config.disable_tqdm_loading_bar = False
config.n_max_entities = 10000
config.project_data_folder_path = join(expanduser("~"), "openalex-analysis", "data")
config.parquet_compression = "brotli"
config.max_storage_percent = 95
config.max_storage_files = 10000
config.max_storage_size = 5e9
config.min_storage_files = 1000
config.min_storage_size = 5e8
config.cache_max_age = 365
config.log_level = 'WARNING'

Use a configuration file

To avoid having to set the configuration of the library each time you import it, you can use a configuration file.

When the library is imported, if a configuration file exists at ~/openalex-analysis/openalex-analysis-conf.toml, it is automatically loaded.

Alternatively, you can load this configuration as follows if it is located at another location:

[5]:
from openalex_analysis.analysis import load_config_from_file

load_config_from_file("my-openalex-analysis-conf.toml")

Example of configuration file:

[6]:
n_max_entities = 10000
log_level = 'WARNING'