modern-monitor-81461
01/28/2022, 2:19 AMmodern-monitor-81461
01/28/2022, 2:25 AMmodern-monitor-81461
01/29/2022, 3:15 PMFolder_1.Iceberg_Table_2
But they would have two different Azure URLs (abfss://{container_name}@{account_name}.<http://dfs.core.windows.net/{folder}|dfs.core.windows.net/{folder}>
):
• <abfss://Container_X@Datalake_A.dfs.core.windows.net/Folder_1/Iceberg_Table_1>
• <abfss://Container_X@Datalake_B.dfs.core.windows.net/Folder_1/Iceberg_Table_1>
My question is how should the Iceberg source deal with this? How does it compare to AWS S3? How would it look for someone using a local filesystem?little-megabyte-1074
modern-monitor-81461
03/01/2022, 4:45 AMlittle-megabyte-1074
modern-monitor-81461
03/15/2022, 1:20 PMDecimalType
that I am trying to map to a NumberTypeClass
. I think this mapping makes sense and it's what I can see in schema_util.py. This map is relying on the Avro logical_type
property. The logical map is being used here.
-- The Problem --
What I see when my test runs is actual_schema
is of type avro.schema.BytesDecimalSchema
(class definition here). It is not setting a logicalType
property with set_prop()
, so when schema_util
tries to use it, the returned value is None
and the decimal
key mapping is never used. If I change the code in schema_util to use the logical_type
Python property, everything works.
I don't know if I explained it well with all my code references 😅 , but here is a simple test case to reproduce the problem:
def test_avro():
avro_schema = {'type': 'record', 'name': '__struct_', 'fields': [{'name': 'name', 'type': {'type': 'bytes', 'logicalType': 'decimal', 'precision': 3, 'scale': 2, 'native_data_type': 'decimal(3, 2)', '_nullable': True}}]}
newfields = schema_util.avro_schema_to_mce_fields(
json.dumps(avro_schema), default_nullable=True
)
assert len(newfields) == 1
schema_field: SchemaField = newfields[0]
assert isinstance(schema_field.type.type, NumberTypeClass)
In this code, avro_schema
is what my Iceberg source is generating for an Iceberg DecimalType. I expect to get a NumberTypeClass
as a Datahub type, but I get a BytesTypeClass
.helpful-optician-78938
03/15/2022, 9:34 PMhelpful-optician-78938
03/15/2022, 11:40 PMtype=self._converter._get_column_type(
actual_schema.type,
(
getattr(actual_schema, "logical_type", None)
or actual_schema.props.get("logicalType")
),
),
modern-monitor-81461
03/15/2022, 11:46 PMmodern-monitor-81461
03/16/2022, 5:02 PMred-lizard-30438
04/26/2022, 5:38 AMbig-carpet-38439
05/02/2022, 3:51 PMmodern-monitor-81461
12/19/2022, 11:43 AMERROR: Cannot install acryl-datahub[dev]==0.0.0.dev0 and pyiceberg==0.2.0 because these package versions have conflicting dependencies.
The conflict is caused by:
acryl-datahub[dev] 0.0.0.dev0 depends on pydantic>=1.5.1
acryl-datahub[dev] 0.0.0.dev0 depends on pydantic<1.10 and >=1.9.0; extra == "dev"
acryl-datahub[dev] 0.0.0.dev0 depends on pydantic>=1.5.1; extra == "dev"
pyiceberg 0.2.0 depends on pydantic==1.10.2
pyiceberg requires pydantic 1.10.2, but DataHub seems to have a type issue with 1.10+ according to this comment. What is this about? Is it something we can fix?
@gray-shoe-75895wide-optician-47025
03/21/2023, 4:04 PMwide-optician-47025
03/21/2023, 4:05 PMwide-optician-47025
03/21/2023, 4:05 PMnumerous-byte-87938
04/13/2023, 9:34 PMdazzling-london-20492
06/10/2023, 2:00 AMdazzling-london-20492
06/10/2023, 2:00 AMmodern-monitor-81461
07/04/2023, 7:04 PMlively-appointment-50242
09/29/2023, 8:20 AMsource:
type: "iceberg"
config:
env: PROD
catalog:
name: my_iceberg_catalog
type: rest
# Catalog configuration follows pyiceberg's documentation (<https://py.iceberg.apache.org/configuration>)
config:
uri: <http://localhost:8181>
s3.access-key-id: admin
s3.secret-access-key: password
s3.region: us-east-1
warehouse: <s3a://warehouse/wh/>
s3.endpoint: <http://localhost:9000>
platform_instance: my_iceberg_catalog
table_pattern:
allow:
- marketing.*
profiling:
enabled: true
Can smb provide an Iceberg config example for the ADLS? Thanks in advancebulky-shoe-65107
10/16/2023, 12:39 AMvictorious-car-1170
11/22/2023, 12:37 PMvictorious-car-1170
11/27/2023, 8:54 AMacoustic-hospital-48865
11/29/2023, 10:46 AMable-pilot-25899
11/30/2023, 8:11 AMvictorious-car-1170
12/01/2023, 9:42 AMfull-alligator-99452
12/18/2023, 1:57 PMfull-alligator-99452
12/21/2023, 12:30 PM