Read latest Silver snapshot
The get_silver_snapshot function is designed to retrieve the current state (snapshot) of a dataset from its Silver history table. It effectively deduplicates history records into a single representative row for each primary key.
Overview
In the EasyFabric architecture, Silver history tables (Silver.his.*) contain a full audit trail of changes. Each time a record is modified, a new row is appended with a SYSTEMSTATETIMESTAMP.
get_silver_snapshot automates the process of finding the "latest" version of every record, giving you a clean table of current data without having to write complex window functions manually.
How it Works
The function applies the following logic:
- Partitioning: Groups data by
SYSTEMPRIMARYKEY. - Ordering: Sorts records within each group by
ABS(SYSTEMSTATETIMESTAMP)in descending order (highest absolute value first). - Selection: Selects the first row for each group (the most recent change).
- Deletion Handling: If the latest record has a negative
SYSTEMSTATETIMESTAMP, it indicates the record was deleted. Depending on parameters, these are either included or filtered out.
Usage
As a Package Function
You can call the function directly by passing the full table name.
import easyfabric as ef
# Get the latest snapshot including deletions
df = ef.get_silver_snapshot("Silver.his.afs_medewerkerverzuimverloop")
# Get latest snapshot, filtering out deleted rows
df_active = ef.get_silver_snapshot("Silver.his.afs_medewerkerverzuimverloop", include_deleted_rows=False)
As a DataFrame Extension
The function is also monkey-patched onto the PySpark DataFrame class, allowing you to use it as a method if you already have the history loaded.
df_history = spark.table("Silver.his.my_table")
df_snapshot = df_history.get_silver_snapshot(include_deleted_rows=False)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
table_name | str | Required | The full name of the Silver history table (e.g., "Silver.his.MyTable"). |
include_deleted_rows | bool | True | If False, records whose latest version in history has a negative SYSTEMSTATETIMESTAMP will be excluded from the resulting DataFrame. |
Important Note
This function depends on the existence of SYSTEMPRIMARYKEY and SYSTEMSTATETIMESTAMP columns in the source table. It is specifically optimized for Silver history tables in the EasyFabric framework.