Skip to main content

easyfabric.fabric.spark_utils

PySpark utilities and helper functions for EasyFabric.

DeltaTable

DataFrame

SparkSession

col

to_date

to_timestamp

DateType

TimestampType

get_spark

def get_spark() -> SparkSession

__new__

def __new__(cls)

get_df_size

def get_df_size(df) -> float

Used to get the size of a dataframe in bytes or very large number if stats unavailable.

Arguments:

df:

  • Returns - sizeInBytes or very large number

current_timestamp

def current_timestamp()

merge_with_conversions

def merge_with_conversions(dt_source: DataFrame,
dest_delta: DeltaTable,
date_format="M/d/yyyy h:mm:ss a")

union_with_schema_alignment

def union_with_schema_alignment(df1: DataFrame, df2: DataFrame) -> DataFrame

Union two dataframes, aligning schema types and warning about incompatible columns. Only columns with matching types (after possible casting) will be included in the union.

Arguments:

  • df1 - First dataframe
  • df2 - Second dataframe

Returns:

Unified dataframe with compatible columns