Skip to main content

easyfabric.load_data_silver

logging

DataFrame

config

ConfigManager

LoadConfig

TableConfig

initialize_config

get_log_file_path

init_logging

dataframe_to_silver

full_load_silver

load_notebook_postsilver

load_notebook_presilver

load_notebook_silver

load_silver_table_from_bronze

merge_source_into_target

layer

run

def run(tablefile: str, config_manager: ConfigManager = None)

Executes the silver extraction and transformation process for a table, using the configuration information provided in the ConfigManager. This involves loading, processing, and merging table data within the silver lakehouse layer.

The function ensures the proper configuration of logging, verifies layer activity and table activation status, and invokes pre-silver and post-silver workflows if defined. It uses corresponding notebooks or direct data operations for the silver loading process.

Arguments:

  • tablefile str - Path to the YAML file containing table configuration details.
  • config_manager ConfigManager - An instance of ConfigManager, pre-initialized with application configuration, connection, and lakehouse details.

Returns:

  • Optional[str] - An error message containing the name of the failed table and the exception details, or None if the operation finishes successfully.

Raises:

  • Exception - If ConfigManager is not initialized before invoking this function.
  • Exception - If the bronze lakehouse configuration is not found.
  • Exception - If the stop_at_error setting is enabled and an exception occurs in processing.

dataframeloader

def dataframeloader(data_frame: DataFrame,
table_config: TableConfig,
load_config: LoadConfig,
config_manager: ConfigManager = None)

Loads data into a silver layer table in a lakehouse environment.

This function facilitates loading data from a given DataFrame into a table specified by a table configuration within a silver layer of the lakehouse architecture. It uses the provided configuration details to establish connections, manage runtime settings, and log relevant information during the operation. It validates critical configurations and raises appropriate exceptions in case of missing or invalid details.

Arguments:

  • data_frame DataFrame - The input PySpark DataFrame containing data to load.
  • table_config TableConfig - Configuration object specifying table details and related connection configurations.
  • load_config LoadConfig - Configuration object holding load-specific settings, including the layer and runtime options.
  • config_manager ConfigManager - Centralized configuration management object used to retrieve connection settings and maintain runtime parameters.

Returns:

  • str - A string indicating the result of the load operation, either as success or an error message.

Raises:

  • Exception - If load_config is None, if ConfigManager is not properly initialized, or if no silver lakehouse configuration is found.
  • Exception - If there is an issue while performing the load operation and config_manager.stop_at_error is set to True.