Skip to main content

Object config (table)

An object is the definition of a table that will be created inside fabric, based on the settings in the yaml file. Some settings can be supplied on the connection as well. In general, the lowest will be used. So if a setting is set on connection and object, the setting of the object will be used for that particular object.

Object

NameDescription
ConnectionName of the connection to use for this object
SourceTableName to be used to get the source. Depends on the connection what will be done.
DataplatformObjectnameName to be used in fabric for this object
BronzeFolderFolder in the Bronze lakehouse to save the incoming data
KeepHistoryKeep history for this object. Requires at least one primary key column (default=true)

Special object configuration options

NameDescription
HistorySettingsHistory settings to use when loading history table in Bronze and Silver.
SourceFilterOptional filter that can be used by the connection. Depends on the connection
SourceOrderOptional setting to order the files before loading into landing. If files contain a certain format like file2024-01.csv, file2024-02.csv you could order on that for example with a regex like: ^(\d4-\d2).csv$ It will start with 01 and than 02 and so on.
BronzeFolderFolder in the Bronze lakehouse to save the incoming data
BronzekeyViolationActionWhen a primary key check is violated, it will halt by default (raise). It's also possible to allow duplicates and keep one record by using keepone.
BronzeLoadViolationActionStop loading this object when a source file load fails or use 'continue' to keep going. (Default = stop)
BronzeLoadSkipSkip the bronze load completely
LayersFor which layers this object should be generated (Bronze, Silver)
DateFormatDateformat for this table, to convert incoming dates to datetime in the delta table.
BronzeMaxExpectedDurationIf loading bronze from source takes longer than expected, set a value in seconds of the max expected duration for the load. Default = 1800
SilverMaxExpectedDurationIf loading of silver takes longer than expected, set a value in seconds of the max expected duration for the load. Default = 1800
PreBronzeNotebookNotebook config setting to run a notebook before loading via the regular flow
PreSilverNotebookNotebook config setting to run a notebook before loading via the regular flow
MidBronzeNotebookNotebook config setting to run a notebook betweeen loading the landing table and the final historized table via the regular flow
MidSilverNotebookNotebook config setting to run a notebook between loading the historized silver table and the actual table via the regular flow
PostBronzeNotebookNotebook config setting to run a notebook after loading via the regular flow
PostSilverNotebookNotebook config setting to run a notebook after loading via the regular flow
BronzeNotebookNotebook config to override the default loading flow with a notebook for bronze
SilverNotebookNotebook config to override the default loading flow with a notebook for silver
BronzeLoadOrderThe order to use when loading this object for bronze
SilverLoadOrderThe order to use when loading this object for silver

Some settings can be set on the connection or on the object. See here those options

Column

NameDescription
SourceColumnName
DataplatformnameTo override the sourcecolumn name in the dataplatform. SourceColumn is still necessary. Is used to acquire the data from the source. Optional, when not supplied, sourcecolumn is used
SourceDataTypeSourcedatatype
SourceExpressionDefine an spark sql expression to fill this column, starting at the bronze layer
SilverExpressionDefine an spark sql expression to fill this column in the silver layer
SilverDataTypeType for the silver column.
DateFormatTo override the default date format or the dateformat set on the table. This should be the dateformat that is delivered by the source.
IsPrimaryKeyDefine the primary key column (default=false)
IsNullableMay this column contain nulls (not implemented as check on fabric yet)
IsSilverOnlyColumn only available in Silver (default=false)
IsZOrderColumn used in Z-Order for the table (default=SYSTEMSTATETIMESTAMP)
IsActiveColumn is used

Notebook configuration

NameDescription
NotebookName of the notebook to run
TimeoutThe time the notebook may run before cancel it
Param001Optional parameter value send to the notebook
Param002Optional parameter value send to the notebook
Param003Optional parameter value send to the notebook

Next to the optional parameters, these default parameters are send to the notebook:

ParameterValues
notebook_typePrebronze, Presilver, Postsilver, Presilver, Midbronze, Midsilver
sourcetableThe sourcetable value from the yaml file
sourcefilterThe sourcefilter value from the yaml file
bronzefolderThe bronzefolder value from the yaml file
keyvaultThe keyvault from the config file
object_yaml_filemeta_yamlfile path on the lakehouse

HistorySettings

NameDescription
SourceFilterQueryFilter the source before adding the rows to the landing table in bronze
BronzeDeleteFilterQueryFilter the data that should be checked for deletes.
BronzeSkipDeleteSkip the delete part for the Bronze layer completely
SilverFilterQueryFilter data that should go into Silver

Basic: Example of table with one primary key field and one column

  Table:
Connection: exa-example
SourceTable: example.csv
DataPlatformObjectname: example
KeepHistory: true
Columns:
- SourceColumn: ExampleID
SourceDataType: int
IsPrimaryKey: true
- SourceColumn: ExampleText
SourceDataType: varchar(250)

Basic: Example of table with one primary key field and one column with the source in the Fabric Files section of the Bronze layer

Imagine a third party that has dropped files into the Fabric Bronze layer in the files section in a certain folder. Those files need to be processed into the bronze and silver layer, based on the settings in the yaml file(s). In this case an parquet file is dropped into the bronze files section in the bronze folder exa/exampledata, the parquet file will always be overwritten and the name of the file is exampledata.parquet.

  Table:
Connection: exa-fabricfiles
SourceTable: exampledata.parquet
BronzeFolder: exa/exampledata
DataPlatformObjectname: example
KeepHistory: true
Columns:
- SourceColumn: ExampleID
SourceDataType: int
IsPrimaryKey: true
- SourceColumn: ExampleText
SourceDataType: varchar(250)

The connection file:

    ConnectionName: exa-fabricfiles
ConnectionPrefix: fbf
ConnectionType: fabricfiles
BronzeFolder: Example
FileType: parquet
FileExtension: parquet

When EasyFabric is run for the bronze layer for this object, it will first look for the settings in the object config. In this case it will find that the connection that should be used is exa-fabricfiles, so also those settings will be loaded. Since the BronzeFolder is defined in both, the BronzeFolder from the object is used, in this example "exa/exampledata". If all files for different objects are in the same folder, it is possible to define the folder just once, on the connection.

The object yaml configuration above, will result in 3 items. 2 tables in the bronze layer and 1 table in the silver layer. In bronze a table will be created for loading the source into a delta table. The second table in the bronze layer, is the history table. This table has an extra column called: SYSTEMSTATETIMESTAMP. Read more about SYSTEMSTATETIMESTAMP

  • SourceColumn: This is the name of the column in the source data. This defines how the data is stored and processed within the model. For a comprehensive list of data types, visit all data types.

Silver additional columns

In Bronze, all values are stored as STRING. When loading data from Bronze to Silver, the data will be converted to the data type defined in SourceDataType. This is the data type the source has specified for this attribute. The load mechanism will automatically add a conversion from STRING to the SourceDataType.

Table:
Connection: exa-fabricfiles
SourceTable: exampledata.parquet
BronzeFolder: exa/exampledata
DataPlatformObjectname: example
KeepHistory: true
Columns:
- SourceColumn: ExampleID
SourceDataType: int
IsPrimaryKey: true
- SourceColumn: ExampleText
SourceDataType: varchar(250)
- SourceColumn: ExampleNumber
SourceDataType: int

Basic: Example of an explicit conversion between Bronze and Silver on a column

In Bronze, all values are stored as STRING. When loading data from Bronze to Silver, the data will be converted to the data type defined in SourceDataType. In some cases it is not possible to convert a value with a simple cast. In that case, a SilverExpression can be supplied. In this example, a CASE expression is used, but any expression that is valid Spark SQL will work here. In this case, both the original value and the converted value will be present in the Silver table. An extra column is added with a postfix (default = _convert).

Table:
Connection: exa-fabricfiles
SourceTable: exampledata.parquet
BronzeFolder: exa/exampledata
DataPlatformObjectname: example
KeepHistory: true
Columns:
- SourceColumn: ExampleID
SourceDataType: int
IsPrimaryKey: true
- SourceColumn: ExampleText
SourceDataType: varchar(250)
- SourceColumn: ExampleNumber
SourceDataType: int
SilverExpression: CASE WHEN ExampleNumber = 'W20' THEN 20 ELSE CAST(ExampleNumber AS INT) END
TIP: Test expression first

First test the SilverExpression by running it in a Fabric notebook on the exact same table. This saves a lot of hassle when working with very specific expressions.

Enjoy...

SilverExpression should be used sparingly. Enjoy it… but in moderation.

Basic: Example of a column that is created only in Silver with IsSilverOnly

Sometimes it is difficult to connect a source column directly to an expression. Therefore, it is possible to add a column specifically for Silver. In this example, the source delivers a month and year value in separate columns. It is useful for further processing to have them combined as a date value. A date value on the first day of the month will be created. The name of the column, that's only for Silver and its data type can be set via SourceColumn, IsSilverOnly and SilverDataType. SourceDataType should not be set, because it will not exists in Bronze.

Table:
Connection: exa-fabricfiles
SourceTable: exampledata.parquet
BronzeFolder: exa/exampledata
DataPlatformObjectname: example
KeepHistory: true
Columns:
- SourceColumn: ExampleID
SourceDataType: int
IsPrimaryKey: true
- SourceColumn: Year
SourceDataType: int
- SourceColumn: Month
SourceDataType: int
- SourceColumn: Date
IsSilverOnly: true
SilverDataType: datetime
SilverExpression: date_trunc('month', to_date(CONCAT(`Year`, RIGHT(CONCAT('0', `Month`), 2)), 'yyyyMM'))

Show object configuration in Fabric Notebook

To print the object configuration to the screen in Fabric, you can use a Notebook with the following code.


from easyfabric import load_meta_data
from pprint import pprint

table_file = "Files/Objects/MyConnection/MyTable.yaml"
table_config =load_meta_data.get_object_by_file(table_file)
pprint(table_config)