Skip to main content

loaders.notebook

load_notebook_prebronze

def load_notebook_prebronze(table_config: TableConfig,
config_manager: ConfigManager)

Executes a pre-processing notebook defined in the configuration before loading the bronze layer. The function checks for the existence of a notebook, handles parameters, and executes it with a specified timeout. If an error is encountered during execution, an exception is raised.

Arguments:

  • table_config TableConfig - Configuration object containing information about the source table, notebook path, filter criteria, parameters, and other properties related to the pre-processing step.
  • config_manager ConfigManager - Configuration manager object used to provide global configurations such as the KeyVault reference.

Returns:

  • dict - A dictionary containing the initial file type, notebook path, and the output of the executed notebook (if available). If no notebook is defined, it returns a default dictionary indicating that no notebook execution occurred.

Raises:

  • Exception - Raised when the notebook execution encounters an error, and its exit value indicates failure.

load_notebook_postbronze

def load_notebook_postbronze(table_config: TableConfig,
config_manager: ConfigManager)

Runs a post-bronze processing notebook if configured in the provided table configuration.

This function is used to execute a notebook post-loading the bronze data layer. It checks for the existence of a post-bronze notebook in the configuration, prepares the necessary parameters, and executes the notebook. If no notebook is defined, it simply logs and returns default metadata. If errors occur during notebook execution, they are logged and raised as exceptions.

Arguments:

  • table_config TableConfig - The table configuration containing settings for the notebook execution including the notebook name, parameters, and timeout values.
  • config_manager ConfigManager - A configuration manager that provides access to additional configuration details, such as the key vault information.

Returns:

  • dict - A dictionary containing metadata about the notebook execution, including the output or a message indicating no notebook was defined.

Raises:

  • Exception - If the notebook execution finishes with an error status.

load_notebook_presilver

def load_notebook_presilver(table_config: TableConfig,
config_manager: ConfigManager)

Runs a pre-silver layer notebook based on the provided configuration and parameters. The function initializes and runs a specified notebook (if defined) before loading data into the silver layer, using properties from the given table_config and additional parameters from the config_manager. If the notebook execution encounters errors or is not defined, appropriate warnings or exceptions are raised.

Arguments:

  • table_config TableConfig - The configuration for the table, containing notebook-specific settings, silver table information, and parameters for execution.
  • config_manager ConfigManager - The configuration manager responsible for accessing additional configuration settings like key vault resources.

Returns:

  • dict - A dictionary containing initializations for filetype, notebook status information, and potential outputs from the executed notebook.

Raises:

  • Exception - If the notebook execution returns an error status.

load_notebook_postsilver

def load_notebook_postsilver(table_config: TableConfig,
config_manager: ConfigManager)

Executes a specified notebook after the silver layer data processing step. It retrieves notebook configuration and executes it if provided. Additionally, parameters for notebook execution are prepared based on the table configuration and configuration manager.

Arguments:

  • table_config TableConfig - Configuration containing details about the silver layer and the notebook to execute post-processing.
  • config_manager ConfigManager - Manager containing additional system-wide configurations such as key vault information.

Returns:

  • dict - A dictionary containing metadata about the notebook execution, including its filetype, name, and the output (if any).

Raises:

  • Exception - If the notebook execution fails and returns an error message.

load_notebook_bronze

def load_notebook_bronze(object_info: ObjectInfo, table_config: TableConfig,
config_manager: ConfigManager) -> None

Executes a specific notebook at the Bronze data processing layer within a data pipeline. The function handles logging, validates the presence of the notebook configuration, and ensures the notebook is executed with the defined timeout period. If the notebook path is not configured, it logs a warning and terminates execution without raising an exception.

Arguments:

  • object_info ObjectInfo - Contains metadata related to the object being processed, including its file path.
  • table_config TableConfig - Holds the configuration details and parameters for table processing, including the notebook specific to the processing layer and timeout values.
  • config_manager ConfigManager - Provides configuration management, such as retrieving and validating settings for the pipeline.

Returns:

  • None - The function does not return a value but performs operations such as notebook execution and logging.

load_notebook_silver

def load_notebook_silver(table_config: TableConfig,
config_manager: ConfigManager)

Loads and executes a notebook for the silver processing layer based on the given table configuration and configuration manager. This function retrieves the notebook name and other relevant parameters from the table configuration and runs the notebook using a specified timeout. If the notebook is not defined, it logs a warning and returns an initial data structure without execution.

Arguments:

  • table_config TableConfig - An object containing the configuration for the table, including the notebook name and timeout settings for the silver processing layer.
  • config_manager ConfigManager - Configuration manager object for managing and retrieving additional global or contextual configurations.

Returns:

  • dict - A dictionary containing metadata about the executed or non-executed notebook, such as file type, file path, notebook details, and execution output.