Kyle Rosenstein
09/27/2022, 8:22 PM{
"_id": "<airbyte-id>",
"active": true,
"identifier": [
{
"system": "<some-link>",
"value": "<some-value>"
}
],
"managingOrganization": {
"identifier": {
"system": "<some-link>",
"value": "<some-value>"
}
},
"meta": {
"lastUpdated": "<some-date>",
"source": "<some-link>"
}
}
The same document in Snowflake:
{
"_id": "<airbyte-id>",
"active": true,
"identifier": "",
"managingOrganization": {
"identifier": ""
},
"meta": {
"lastUpdated": "<some-date>",
"source": "<some-link>"
}
}
When checking the export logs, I see:
WARN i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$5):339 - Schema validation errors found for stream <stream-name>. Error messages: [$.identifier is of an incorrect type. Expected it to be object, $.managingOrganization.identifier is of an incorrect type. Expected it to be object]
For some reason regardless of the data type, if the field is called “identifier” it is being wiped and replaced as an empty string during export. I was able to double check by duplicating these Documents in the DocDB collection but changing the “identifier” fields to something like “org_identifier” and when I did that, the data was being preserved just find. This makes me think it is not a data type issue but rather that the word “identifier” is either throwing Airbyte off or is some DocumentDB preserved key word. Please let me know if you have seen anything like this, have any ideas, or know of ways to cast these fields during the connection such that I can either fix this for all existing records or get a better sense if this issue is coming from Airbyte or DocDBuser
09/27/2022, 8:22 PMKyle Rosenstein
09/28/2022, 4:23 PMuser
09/28/2022, 4:44 PMKyle Rosenstein
09/28/2022, 7:43 PMuser
09/28/2022, 10:29 PMKyle Rosenstein
09/29/2022, 1:23 PMKyle Rosenstein
09/29/2022, 1:23 PMuser
09/29/2022, 5:43 PMKyle Rosenstein
09/29/2022, 5:44 PM