Skip to main content

History Tables in Fabric Lakehouse

Overview

The History Tables functionality provides an insert-only approach to tracking historical changes in data within the Fabric Lakehouse bronze layer. This documentation explains the implementation, structure, and behavior of history tables.

Structure

Location

  • History tables are stored in the his schema within the bronze lakehouse
  • Each history table corresponds to a table in the bronze layer that has history tracking enabled

Table Structure

History tables maintain the same structure as their corresponding bronze tables, with one additional column:

  • SYSTEMSTATETIMESTAMP (bigint)
    • Format: YYYYMMDDHHMMSS (e.g., 20250104123025)
    • Negative values indicate deleted records
    • Used to track the exact moment of each change

How It Works

Timestamp Generation

  • New/Updated Records: Positive timestamp (e.g., 20250104123025)
  • Deleted Records: Negative timestamp (e.g., -20250104123025)

Process Flow

  1. Bronze table is loaded with new data
  2. History step runs if keephistory = true for the table
  3. System compares bronze data with history table using primary keys
  4. Current state is determined by finding the latest timestamp for each primary key
    • Latest timestamp is determined by ordering absolute values
    • Negative timestamps indicate deleted records

Update Scenarios

New Records

  • Record exists in Bronze but not in History
  • Added to History with current positive timestamp

Updated Records

  • Record exists in both Bronze and History
  • System compares all columns for differences
  • If differences found, new record added to History with current positive timestamp

Deleted Records

  • Record exists in History but not in Bronze
  • If latest timestamp is positive, new record added with negative timestamp
  • Indicates record was deleted at that timestamp

Reactivated Records

  • Record previously existed, was deleted, and is now being added again
  • Treated as a completely new record
  • Previous history (including deletion) remains intact
  • New positive timestamp marks the start of the new lifecycle

Example Timelines

Basic Update and Delete

Consider a record with ID=1:

SYSTEMSTATETIMESTAMP | ID | Name    | Status
20250101090000 | 1 | John | Active
20250102103000 | 1 | John D. | Active
-20250103140000 | 1 | John D. | Active

This timeline shows:

  1. Record created on Jan 1, 2025
  2. Name updated on Jan 2, 2025
  3. Record deleted on Jan 3, 2025

Delete and Reactivation

Consider a record with ID=1 that is deleted and later reactivated:

SYSTEMSTATETIMESTAMP | ID | Name    | Status  | Department
20250101090000 | 1 | John | Active | Sales
20250102103000 | 1 | John D. | Active | Sales
-20250103140000 | 1 | John D. | Active | Sales
20250401090000 | 1 | John D. | Active | Marketing

This timeline shows:

  1. Record created on Jan 1, 2025
  2. Name updated on Jan 2, 2025
  3. Record deleted on Jan 3, 2025
  4. Record reactivated on Apr 1, 2025 with a new department
    • Treated as a new record
    • Previous history maintained
    • New positive timestamp indicates start of new lifecycle

Benefits

  1. Complete Audit Trail

    • Every change is preserved
    • No data is ever overwritten
    • Deletion tracking maintained
    • Reactivation history clearly visible
  2. Simple Implementation

    • Single timestamp column approach
    • Insert-only operations
    • No complex temporal tables required
  3. Easy Querying

    • Current state can be determined by latest timestamp
    • Historical states accessible through timestamp filtering
    • Deletion status clear through timestamp sign

Best Practices

  1. Primary Keys

    • Ensure stable primary keys
    • Required for accurate history tracking
  2. Performance

    • Regular maintenance of history tables
    • Consider partitioning for large datasets
  3. Querying

    • Always order by absolute timestamp value for accurate current state
    • Use timestamp ranges for point-in-time analysis

Common Queries

Get Current State

SELECT t.*
FROM his.table t
INNER JOIN (
SELECT primary_key, MAX(ABS(SYSTEMSTATETIMESTAMP)) as max_ts
FROM his.table
GROUP BY primary_key
) latest ON t.primary_key = latest.primary_key
AND ABS(t.SYSTEMSTATETIMESTAMP) = latest.max_ts

Get State at Specific Time

SELECT t.*
FROM his.table t
INNER JOIN (
SELECT primary_key, MAX(ABS(SYSTEMSTATETIMESTAMP)) as max_ts
FROM his.table
WHERE SYSTEMSTATETIMESTAMP <= 20250101235959
GROUP BY primary_key
) point_in_time ON t.primary_key = point_in_time.primary_key
AND ABS(t.SYSTEMSTATETIMESTAMP) = point_in_time.max_ts

This page is generated by AI