Skip to main content

Loading data

Loading data is happening at multiple places with Easy Fabric. In general data will move from source to bronze, to silver and will end at gold.

Source

Loading data from the source can be very diverse. So there is not just one option that will handle all situations. Fabric has support for all kind of datasources and can also create shortcuts to sources. This can help by reducing the movement of data. If we look from a datawarehouse perspective, we commonly want to track data as much as possible, that's why a datawarehouse holds historized data, which means it can recall how the data looked like, when it was loaded into the datawarehouse. This will be explained further on at Bronze.

Easy Fabric supports the following sources:

  • Azure Blob storage
  • Custom notebook connector
  • Fabric Files section
  • Fabric Shortcuts

Easy Fabric supports the following sourcetypes

  • CSV
  • XML
  • JSON
  • Parquet
  • Notebook*
  • With a notebook it is possible to handle custom cases. Read more about this in the advanced topic about custom extensions.

Bronze

Once data is retrieved from the source it is processed into the bronze layer. Based on the object configuration the data from the source is loaded into the bronze layer. At the deployment stage, the tables are created in Bronze that will be loaded. If keep history is set on an object there will be 2 tables, one for the rows deliverd by the source and one for the history. In this step Easy Fabric will have to do a lot of things to make sure, the data is copied correctly into the tables, at this point the following things will be done:

  • Renaming the source columns into destination names
  • Ordering the columns in the exact position as the bronze table requires
  • Fill columns that are not delivered with blanks (optional)
  • Keep data in it's purest form possible. No checks at this point yet.
  • Process the delivered data also in the history

When processing the history, the data will be compared to the existing rows in the history layer, and if the row is new, updated or deleted, it will write a record to the table according to this method.

Now the data is available in Bronze. Let's move on to Silver.

Silver

The silver tables are filled with the most recent data from the history tables in Bronze. For convenience the SYSTEMTIMESTAMP is also copied as well as the last known deleted records. The latter can be easily skipped by adding a filter SYSTEMTIMESTAMP > 0. When data is loaded to silver, it is possible to validate the data on some common checks and custom checks as well. In general it will come to checks like:

  • Mandatory field/NOT NULL
  • Uniqueness (for fields that are not the primary key)
  • Range (value should be between this and that)
  • RegEx (value should have the following format)
  • Not all is implemented yet!

Gold

The gold layer is based on the model that is supplied by the model.yaml file. It is possible to have multiple models in one Gold layer*. The gold layer tables have to be filled by data. This data is modelled in a certain way that it will suit the business needs. That's why there will be some transformation necessary between the data from silver, before loading it into the gold layer.

  • At this point, only one model is supported.