Skip to main content

easyfabric.spark_extensions

Union

DataFrame

get_first_row_per_group

def get_first_row_per_group(df: DataFrame, partition_col: Union[str,
list[str]],
order_col: Union[str, list[str]]) -> DataFrame

Returns a DataFrame with only the first row per group based on partition and order columns.

Arguments:

  • df DataFrame - The input DataFrame.
  • partition_col Union[str, List[str]] - Column(s) to partition by.
  • order_col Union[str, List[str]] - Column(s) to order by.

Returns:

  • DataFrame - A DataFrame containing the first row for each partition group.

get_silver_snapshot

def get_silver_snapshot(table_name_or_df: Union[str, DataFrame],
include_deleted_rows: bool = True) -> DataFrame

Returns the latest rows from a silver history table based on SYSTEMPRIMARYKEY and ABS(SYSTEMSTATETIMESTAMP).

Arguments:

  • table_name_or_df Union[str, DataFrame] - The full name of the table to load (e.g., "Silver.his.MyTable") or an already loaded history DataFrame.
  • include_deleted_rows bool, optional - If False, filters out rows where SYSTEMSTATETIMESTAMP < 0. Defaults to True.

Returns:

  • DataFrame - Deduplicated snapshot DataFrame.