easyfabric.fabric.spark_utils
PySpark utilities and helper functions for EasyFabric.
DeltaTable
DataFrame
SparkSession
col
to_date
to_timestamp
DateType
TimestampType
get_spark
def get_spark() -> SparkSession
__new__
def __new__(cls)
get_df_size
def get_df_size(df) -> float
Used to get the size of a dataframe in bytes or very large number if stats unavailable.
Arguments:
df:
Returns- sizeInBytes or very large number
current_timestamp
def current_timestamp()
merge_with_conversions
def merge_with_conversions(dt_source: DataFrame,
dest_delta: DeltaTable,
date_format="M/d/yyyy h:mm:ss a")
union_with_schema_alignment
def union_with_schema_alignment(df1: DataFrame, df2: DataFrame) -> DataFrame
Union two dataframes, aligning schema types and warning about incompatible columns. Only columns with matching types (after possible casting) will be included in the union.
Arguments:
df1- First dataframedf2- Second dataframe
Returns:
Unified dataframe with compatible columns