Skip to main content

Loading data - hands-on

This manual explains how to load data in Fabric using various notebooks for different loading methods. The following default loading notebooks are available:

  • DAG \ DAG_Gold
  • DAG \ DAG_Multiloader
  • Loaders \ Load_bronze
  • Loaders \ Load_silver
  • Loaders \ Copy_Data_Lakehouse

And there are some temporary notebooks:

  • Loaders \ Load_history
  • Loaders \ Object_maintenance

Each notebook includes a description in its Markdown cells. This manual focuses on the workflow and how these notebooks can be used in daily operations.

Known limitations

Keep in mind that the default options for Fabric are relatively new and sometimes limited. For example, not all options normally available for a DAG (Directed Acyclic Graph) are supported yet. As a result, in some cases, we use a simpler approach, even though a full DAG can handle more advanced tasks. We use RunMultiple to run the DAG configurations.

When opening a notebook and change the contents, it will be saved to the workspace automatically. Those changes will affect a scheduled load when there is no new deployment in between. So it's better to call the notebook from a personal notebook via a Run statement. At the end some examples are provided.

DAG Multiloader

This is the most important notebook and can load the complete Fabric environment for a workspace (e.g., DEV, TST, PRD). In PRD, this notebook should be scheduled to load the workspace with fresh data.

The Multiloader notebook has parameters to configure which layers (Bronze, Silver, Gold) to load. You can also use the folder_path parameter to filter objects. This only works for the Bronze and Silver layers, because Gold has nothing to do with the Objects files.

Bronze

For each load order, a DAG is created and run. It starts with the lowest load order and continues until all load orders for Bronze are complete. The tasks for Bronze call the Load_bronze notebook. You can optionally supply the skip_notebookbronze parameter to skip pre-processing steps.

Silver

The Silver part works similarly to Bronze. Each task calls the Load_silver notebook. This step only runs if the Bronze step succeeds, or if Bronze is skipped.

Gold

Gold starts the DAG_Gold notebook, providing the model.yaml configuration. If there are multiple models, you can clone the Gold box and configure it with the desired model file.

How to run a notebook without touching the original

Run Notebook as a user

Open your own personal notebook in the User folder and use the script below to call the DAG_Multiloader.

import json

# Configuration
folder_path = "Files/Objects"
skip_notebookprebronze = False
model_file = "Files/Model/DM/model.yaml"
layers = ["Bronze", "Silver", "Gold"]

params = {
"folder_path": json.dumps(folder_path),
"skip_notebookprebronze": json.dumps(skip_notebookprebronze),
"model_file": json.dumps(model_file),
"layers": json.dumps(layers)
}

notebookutils.notebook.run("DAG_Multiloader", 83200, params)

Common Scenarios

Use the table below to adjust the parameters for your specific needs:

Scenariofolder_pathlayersskip_notebookprebronze
Full Load (Default)"Files/Objects"["Bronze", "Silver", "Gold"]False
Bronze Only"Files/Objects"["Bronze"]False
Specific Object"Files/Objects/MyTable.yaml"["Bronze", "Silver"]False
One Source Folder"Files/Objects/MySource"["Bronze"]True