easyfabric.spark_extensions
Union
DataFrame
get_first_row_per_group
def get_first_row_per_group(df: DataFrame, partition_col: Union[str,
list[str]],
order_col: Union[str, list[str]]) -> DataFrame
Returns a DataFrame with only the first row per group based on partition and order columns.
Arguments:
dfDataFrame - The input DataFrame.partition_colUnion[str, List[str]] - Column(s) to partition by.order_colUnion[str, List[str]] - Column(s) to order by.
Returns:
DataFrame- A DataFrame containing the first row for each partition group.
get_silver_snapshot
def get_silver_snapshot(table_name_or_df: Union[str, DataFrame],
include_deleted_rows: bool = True) -> DataFrame
Returns the latest rows from a silver history table based on SYSTEMPRIMARYKEY and ABS(SYSTEMSTATETIMESTAMP).
Arguments:
table_name_or_dfUnion[str, DataFrame] - The full name of the table to load (e.g., "Silver.his.MyTable") or an already loaded history DataFrame.include_deleted_rowsbool, optional - If False, filters out rows where SYSTEMSTATETIMESTAMP < 0. Defaults to True.
Returns:
DataFrame- Deduplicated snapshot DataFrame.