Skip to main content

easyfabric.fabric.fabric_utils

logging

datetime

timezone

notebookutils

BinaryType

BooleanType

ByteType

DataType

DateType

DecimalType

DoubleType

FloatType

IntegerType

LongType

ShortType

StringType

TimestampType

get_current_datetime

def get_current_datetime()

Gets the current date and time in the format 'YYYYMMDDHHMM'.

This function retrieves the current date and time from the system and formats it as a string in the 'YYYYMMDDHHMM' format.

Returns:

  • str - The current date and time formatted as 'YYYYMMDDHHMM'.

convert_to_abfss

def convert_to_abfss(relative_path: str, abfs_path: str) -> str

Convert a relative file path to its full Azure ABFSS path.

This function takes a relative path string containing directory and file information, and converts it into a complete ABFSS (Azure Blob Filesystem Service) path. The ABFSS path includes a standard prefix and appends the portion of the relative path starting after the fifth forward slash. This ensures that the proper structure and formatting of ABFSS paths is maintained, as required for handling Azure-based file systems. The user must ensure the input path includes at least five forward slashes followed by a "Files" directory.

Arguments:

  • relative_path - The relative path containing directory and file names.
  • abfs_path - The Azure Blob Filesystem Service prefix to prepend to the converted path.

Returns:

  • str - The full ABFSS path constructed from the given inputs.

get_lakehouse_path

def get_lakehouse_path(lakehouse: str)

Gets the full path for a specified lakehouse in the workspace.

This function retrieves the lakehouse ID using the specified lakehouse name within the current workspace context, constructs the full path to the lakehouse storage, and returns it. If the lakehouse is not found, it raises a ValueError.

Arguments:

  • lakehouse str - The name of the lakehouse to retrieve the path for.

Returns:

  • str - The full storage path of the lakehouse.

Raises:

  • ValueError - If the specified lakehouse is not found.

get_lakehouse_onelake_path

def get_lakehouse_onelake_path(lakehouse: str)

Constructs the OneLake path for a given lakehouse within a specified workspace.

This function retrieves the workspace ID and the lakehouse ID, constructs the URL path to access the defined lakehouse, and returns it. It ensures the given lakehouse exists, otherwise raises an appropriate error.

Arguments:

  • lakehouse str - The name of the lakehouse for which the OneLake path needs to be generated.

Returns:

  • str - The OneLake path for the specified lakehouse.

Raises:

  • ValueError - If the lakehouse is not found or its ID is empty.

select_keys_from_list

def select_keys_from_list(data, keys)

Project a list of dicts onto only the given keys. :param data: list of dicts :param keys: list of keys to keep :return: list of dicts with only selected keys

spark_type_to_sql

def spark_type_to_sql(t) -> str

parse_datatype

def parse_datatype(datatype: str) -> DataType

Parses a given datatype string and maps it to the corresponding Spark SQL data type.

This function takes a datatype string as input and returns the appropriate Spark SQL data type instance based on a predefined mapping. If the datatype specifies a decimal type, the function handles precision and scale according to the specified values, ensuring they meet the constraints for Spark decimal types.

Arguments:

  • datatype str - The datatype string to be parsed. Common values include "string", "int", "integer", "boolean", "float", "decimal", "double", "long", "date", "timestamp", etc. When specifying "decimal", the format should be "decimal(precision,scale)".

Returns:

  • DataType - An instance of the appropriate Spark SQL DataType, such as StringType, IntegerType, BooleanType, DecimalType, TimestampType, or others.

Raises:

  • ValueError - If the decimal datatype specifies invalid precision or scale values, such as:
    • Precision greater than 38
    • Scale greater than 38
    • Precision less than 0
    • Scale less than 0
    • Precision less than scale

parse_datatype_skipdate

def parse_datatype_skipdate(datatype: str)

Parses a given datatype string and maps it to the corresponding Spark SQL data type. Skip datetype handling. Keep as string

This function takes a datatype string as input and returns the appropriate Spark SQL data type instance based on a predefined mapping. If the datatype specifies a decimal type, the function handles precision and scale according to the specified values, ensuring they meet the constraints for Spark decimal types.

Arguments:

  • datatype str - The datatype string to be parsed. Common values include "string", "int", "integer", "boolean", "float", "decimal", "double", "long", "date", "timestamp", etc. When specifying "decimal", the format should be "decimal(precision,scale)".

Returns:

  • DataType - An instance of the appropriate Spark SQL DataType, such as StringType, IntegerType, BooleanType, DecimalType, TimestampType, or others.

Raises:

  • ValueError - If the decimal datatype specifies invalid precision or scale values, such as:
    • Precision greater than 38
    • Scale greater than 38
    • Precision less than 0
    • Scale less than 0
    • Precision less than scale