Pierre Kerschgens
11/03/2022, 1:40 PMCREATE
*TABLE* public.task_events (
id serial4
NOT
*NULL*,
task_id int4
*NULL*,
event_name *varchar*(20) *NULL*,
ts timestamp
NOT
*NULL*,
*CONSTRAINT* task_events_pkey *PRIMARY*
*KEY* (id)
);
CREATE
*INDEX* ix_task_events_task_id *ON* public.task_events *USING* btree (task_id);
CREATE
*INDEX* ix_task_events_ts *ON* public.task_events *USING* btree (ts);
When I obtain the jsonschema from Airbyte API I receive these types (looking good to me):
print(json_schema['properties']['id'])
{'type': 'number', 'airbyte_type': 'integer'}
print(json_schema['properties']['ts'])
{'type': 'string', 'format': 'date-time', 'airbyte_type': 'timestamp_without_timezone'}
print(json_schema['properties']['task_id'])
{'type': 'number', 'airbyte_type': 'integer'}
print(json_schema['properties']['event_name'])
{'type': 'string'}
Now I check the .parquet files written to S3 by Airbyte in AWS Glue and it has these types:
id double
ts struct
task_id double
event_name string
So I guess the S3/Parquet destination converted id and task_id to double instead of int
This leads to IDs like this β4.2108168E7β instead of this β42108168"
I hope someone can help with this.
Thanks in advance! πFloyd Berndsen
11/03/2022, 2:55 PMuser
11/03/2022, 3:21 PMFloyd Berndsen
11/03/2022, 3:23 PMPierre Kerschgens
11/03/2022, 3:54 PMuser
11/03/2022, 4:44 PMPierre Kerschgens
11/03/2022, 5:08 PMuser
11/04/2022, 2:17 PMJulien Ruey
10/03/2023, 3:04 PM