Skip to main content

easyfabric.loaders.parquet

logging

time

reduce

DeltaTable

DataFrame

coalesce

col

datediff

expr

lag

lead

lit

regexp_replace

when

year

DoubleType

FloatType

IntegerType

StringType

StructType

Window

ConfigManager

ObjectInfo

TableConfig

check_for_duplicate_keys_dataframe

get_spark

spark_operation_with_retries

load_parquet_bronze

def load_parquet_bronze(object_info: ObjectInfo, table_config: TableConfig,
config_manager: ConfigManager) -> DataFrame

Load Parquet file into Bronze layer

Arguments:

  • file_path - Path to the Parquet file
  • table_config - Table configuration object
  • conn_config - Connection configuration object
  • config_manager - Configuration manager object
  • destinationpath - Destination path for the bronze layer

load_parquet_history

def load_parquet_history(abfs_file: str, table_config: TableConfig,
config_manager: ConfigManager) -> None

Load Parquet file into History layer (used for transfering old data to the history layer directly)

Arguments:

  • file_path - Path to the Parquet file
  • table_config - Table configuration object
  • conn_config - Connection configuration object
  • config_manager - Configuration manager object
  • destinationpath - Destination path for the bronze layer

validate_parquet_schema

def validate_parquet_schema(parquet_file_path: str,
expected_schema: StructType) -> bool

Validate that the Parquet file matches the expected schema

Arguments:

  • parquet_file_path - Path to the Parquet file
  • expected_schema - Expected schema as StructType

Returns:

  • bool - True if schema matches, False otherwise