mammoth-bear-12532
01/04/2022, 7:25 PMmodern-monitor-81461
01/05/2022, 7:07 PMnative_data_type
from Avro. At least that's what I see (I have attached a screenshot with a Time
type and a tooltip of Timestampz
, which is the Iceberg native_data_type
). But there is a field that is mapped to Time
and the tooltip shows Date
. I was expecting to see the type as Date
and not Time
.
Here is the Iceberg metadata for that field:
}, {
"id" : 227,
"name" : "date",
"required" : false,
"type" : "date"
}, {
As you can see, its type is date
and it will be mapped to DateType
in Python. In my IcebergSource, I create the following Avro schema:
elif isinstance(type, IcebergTypes.DateType):
dateType : IcebergTypes.DateType = type
return {
"type": "int",
"logicalType": "date",
"native_data_type": repr(dateType),
"_nullable": True,
}
where repr(dateType)
is
def __repr__(self):
return "date"
Is it because a logical Avro type of date
is mapped to a Time
type in the UI, or there is something broken on my side?
I don't know if all of this makes sense without demo-ing it! Sorry if it's confusing.modern-monitor-81461
01/05/2022, 7:43 PMiceberg
platform. I looked at data_platforms.json as well as your demo instance and saw a hive
and a AWS S3
platform. I'm confused by the S3 one... Does it exist for organization simply storing files? What about orgs like mine who store Iceberg tables in Azure Storage Account? In my mind, S3 is equivalent as Azure Storage accounts, so which one should I use then? Iceberg seems like the logical choice, but I'm curious to know more about platforms.modern-monitor-81461
01/18/2022, 10:33 PMDistinct Count
and Distinct %
since it is not possible to compute those by only using the manifest metrics (there is a set of metrics for each data file, so a distinct value in file A and a distinct value in file B do not mean that we have 2 distinct values in the table... it could be the same value, so the distinct count for the table would be 1). Min
, Max
, Null Count
and Null %
are reliable though. Is it a problem if the Iceberg source profiling does not provide a full picture?
My code would greatly benefit from a review since I don't think I leveraged all the tooling from DataHub ingestion. What would you recommend? That I try to polish it as much as I think I can and then ask for a review, so do this sooner in case I need to do a big refactoring? I don't want to waste your time too much, but I don't want to waste mine either! 😉modern-monitor-81461
01/28/2022, 2:19 AMmodern-monitor-81461
01/29/2022, 3:15 PMFolder_1.Iceberg_Table_2
But they would have two different Azure URLs (abfss://{container_name}@{account_name}.<http://dfs.core.windows.net/{folder}|dfs.core.windows.net/{folder}>
):
• <abfss://Container_X@Datalake_A.dfs.core.windows.net/Folder_1/Iceberg_Table_1>
• <abfss://Container_X@Datalake_B.dfs.core.windows.net/Folder_1/Iceberg_Table_1>
My question is how should the Iceberg source deal with this? How does it compare to AWS S3? How would it look for someone using a local filesystem?modern-monitor-81461
03/01/2022, 4:45 AMlittle-megabyte-1074
03/09/2022, 7:25 PMhelpful-optician-78938
03/15/2022, 11:40 PMtype=self._converter._get_column_type(
actual_schema.type,
(
getattr(actual_schema, "logical_type", None)
or actual_schema.props.get("logicalType")
),
),
modern-monitor-81461
03/15/2022, 11:46 PMmodern-monitor-81461
03/16/2022, 5:02 PMred-lizard-30438
04/26/2022, 5:38 AMbig-carpet-38439
05/02/2022, 3:51 PMmodern-monitor-81461
12/19/2022, 11:43 AMERROR: Cannot install acryl-datahub[dev]==0.0.0.dev0 and pyiceberg==0.2.0 because these package versions have conflicting dependencies.
The conflict is caused by:
acryl-datahub[dev] 0.0.0.dev0 depends on pydantic>=1.5.1
acryl-datahub[dev] 0.0.0.dev0 depends on pydantic<1.10 and >=1.9.0; extra == "dev"
acryl-datahub[dev] 0.0.0.dev0 depends on pydantic>=1.5.1; extra == "dev"
pyiceberg 0.2.0 depends on pydantic==1.10.2
pyiceberg requires pydantic 1.10.2, but DataHub seems to have a type issue with 1.10+ according to this comment. What is this about? Is it something we can fix?
@gray-shoe-75895wide-optician-47025
03/21/2023, 4:04 PMwide-optician-47025
03/21/2023, 4:05 PMwide-optician-47025
03/21/2023, 4:05 PMnumerous-byte-87938
04/13/2023, 9:34 PM