easyfabric.maintenance
logging
ThreadPoolExecutor
as_completed
Optional
get_spark
ConfigManager
TableConfig
initialize_config
spark
optimize_by_folder
def optimize_by_folder(table_folder: str,
config_manager: ConfigManager = None,
except_folders: list[str] = None,
except_files: list[str] = None,
layers: list[str] = None,
max_workers: int = 10,
skip_missing_tables: bool = True) -> list[str]
Optimizes tables across specified lakehouse layers (Bronze, Silver, Gold) with proper existence checks and concurrent execution.
Arguments:
table_folderstr - Path to folder with YAML table metadata.config_managerConfigManager - Config manager instance.except_foldersList[str] - Folders to exclude.except_filesList[str] - Files to exclude.layerslist[str] - List like ["Bronze", "Silver", "Gold"].max_workersint - Max concurrent optimizations.skip_missing_tablesbool - If True, skip tables that don't exist (log warning). If False, raise exception on missing table.
Returns:
list[str]- List of successfully optimized table full names.
Raises:
ValueError- Invalid layers or parameters.Exception- If a table is missing and skip_missing_tables=False.