https://datahubproject.io logo
#ingestion
Title
# ingestion
r

rich-rocket-77152

06/13/2022, 10:25 AM
@millions-waiter-49836 hello. Currently, i am using glue profiling (Thankyou for your contribution) Today, i add statistics to my glue table through update-column-statistics-for-table ( https://docs.aws.amazon.com/cli/latest/reference/glue/update-column-statistics-for-table.html) I checked statistics through get-column-statistics-for-table (https://docs.aws.amazon.com/cli/latest/reference/glue/get-column-statistics-for-table.html), and i ingest glue table but nothing showing up in Stats tab I saw your code In your code, you read statistics through get_table(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_table) and get_partitions(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_partition) When I called the get_table api via boto3, there were no Parameters field in the Table field, and get_partitions too Why didn't you use get_column_statistics_for_partition(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_column_statistics_for_partition) and get_column_statistics_for_table(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_column_statistics_for_table)? Is what I checked wrong? let me know if there is a right way I'll attach a screenshot of the Stats tab from my datahub and a screenshot of the get-column-statistics-for-table result.
m

millions-waiter-49836

08/18/2022, 2:28 PM
Hey! Thanks for raising this question. Storing statstics in table parameter is better than in column statistics in a few ways: • table parameter supports unlimited types of statistics • table parameter supports table-level statistics, like row count • AWS has an official boiler plate spark application that calculates the statistics and upload them as table parameters. Kindly check this AWS documentation for a thorough example: https://aws.amazon.com/blogs/big-data/build-an-automatic-data-profiling-and-reporting-solution-with-amazon-emr-aws-glue-and-amazon-quicksight/
2 Views