History Tables in Fabric Lakehouse
Overview
The History Tables functionality provides an insert-only approach to tracking historical changes in data within the Fabric Lakehouse bronze layer. This documentation explains the implementation, structure, and behavior of history tables.
Structure
Location
- History tables are stored in the
his
schema within the bronze lakehouse - Each history table corresponds to a table in the bronze layer that has history tracking enabled
Table Structure
History tables maintain the same structure as their corresponding bronze tables, with one additional column:
SYSTEMSTATETIMESTAMP
(bigint)- Format: YYYYMMDDHHMMSS (e.g., 20250104123025)
- Negative values indicate deleted records
- Used to track the exact moment of each change
How It Works
Timestamp Generation
- New/Updated Records: Positive timestamp (e.g., 20250104123025)
- Deleted Records: Negative timestamp (e.g., -20250104123025)
Process Flow
- Bronze table is loaded with new data
- History step runs if
keephistory = true
for the table - System compares bronze data with history table using primary keys
- Current state is determined by finding the latest timestamp for each primary key
- Latest timestamp is determined by ordering absolute values
- Negative timestamps indicate deleted records
Update Scenarios
New Records
- Record exists in Bronze but not in History
- Added to History with current positive timestamp
Updated Records
- Record exists in both Bronze and History
- System compares all columns for differences
- If differences found, new record added to History with current positive timestamp
Deleted Records
- Record exists in History but not in Bronze
- If latest timestamp is positive, new record added with negative timestamp
- Indicates record was deleted at that timestamp
Reactivated Records
- Record previously existed, was deleted, and is now being added again
- Treated as a completely new record
- Previous history (including deletion) remains intact
- New positive timestamp marks the start of the new lifecycle
Example Timelines
Basic Update and Delete
Consider a record with ID=1:
SYSTEMSTATETIMESTAMP | ID | Name | Status
20250101090000 | 1 | John | Active
20250102103000 | 1 | John D. | Active
-20250103140000 | 1 | John D. | Active
This timeline shows:
- Record created on Jan 1, 2025
- Name updated on Jan 2, 2025
- Record deleted on Jan 3, 2025
Delete and Reactivation
Consider a record with ID=1 that is deleted and later reactivated:
SYSTEMSTATETIMESTAMP | ID | Name | Status | Department
20250101090000 | 1 | John | Active | Sales
20250102103000 | 1 | John D. | Active | Sales
-20250103140000 | 1 | John D. | Active | Sales
20250401090000 | 1 | John D. | Active | Marketing
This timeline shows:
- Record created on Jan 1, 2025
- Name updated on Jan 2, 2025
- Record deleted on Jan 3, 2025
- Record reactivated on Apr 1, 2025 with a new department
- Treated as a new record
- Previous history maintained
- New positive timestamp indicates start of new lifecycle
Benefits
-
Complete Audit Trail
- Every change is preserved
- No data is ever overwritten
- Deletion tracking maintained
- Reactivation history clearly visible
-
Simple Implementation
- Single timestamp column approach
- Insert-only operations
- No complex temporal tables required
-
Easy Querying
- Current state can be determined by latest timestamp
- Historical states accessible through timestamp filtering
- Deletion status clear through timestamp sign
Best Practices
-
Primary Keys
- Ensure stable primary keys
- Required for accurate history tracking
-
Performance
- Regular maintenance of history tables
- Consider partitioning for large datasets
-
Querying
- Always order by absolute timestamp value for accurate current state
- Use timestamp ranges for point-in-time analysis
Common Queries
Get Current State
SELECT t.*
FROM his.table t
INNER JOIN (
SELECT primary_key, MAX(ABS(SYSTEMSTATETIMESTAMP)) as max_ts
FROM his.table
GROUP BY primary_key
) latest ON t.primary_key = latest.primary_key
AND ABS(t.SYSTEMSTATETIMESTAMP) = latest.max_ts
Get State at Specific Time
SELECT t.*
FROM his.table t
INNER JOIN (
SELECT primary_key, MAX(ABS(SYSTEMSTATETIMESTAMP)) as max_ts
FROM his.table
WHERE SYSTEMSTATETIMESTAMP <= 20250101235959
GROUP BY primary_key
) point_in_time ON t.primary_key = point_in_time.primary_key
AND ABS(t.SYSTEMSTATETIMESTAMP) = point_in_time.max_ts
This page is generated by AI