Skip to main content

easyfabric.maintenance

logging

ThreadPoolExecutor

as_completed

Optional

get_spark

ConfigManager

TableConfig

initialize_config

spark

optimize_by_folder

def optimize_by_folder(table_folder: str,
config_manager: ConfigManager = None,
except_folders: list[str] = None,
except_files: list[str] = None,
layers: list[str] = None,
max_workers: int = 10,
skip_missing_tables: bool = True) -> list[str]

Optimizes tables across specified lakehouse layers (Bronze, Silver, Gold) with proper existence checks and concurrent execution.

Arguments:

  • table_folder str - Path to folder with YAML table metadata.
  • config_manager ConfigManager - Config manager instance.
  • except_folders List[str] - Folders to exclude.
  • except_files List[str] - Files to exclude.
  • layers list[str] - List like ["Bronze", "Silver", "Gold"].
  • max_workers int - Max concurrent optimizations.
  • skip_missing_tables bool - If True, skip tables that don't exist (log warning). If False, raise exception on missing table.

Returns:

  • list[str] - List of successfully optimized table full names.

Raises:

  • ValueError - Invalid layers or parameters.
  • Exception - If a table is missing and skip_missing_tables=False.