Hi everyone, I was wondering if anyone had ideas a...
# ask-community-for-troubleshooting
k
Hi everyone, I was wondering if anyone had ideas around this very obscure error I’m facing. I’m currently trying to export data from DocumentDB (using MongoDB source) to Snowflake. I have the connection working properly and data is being exported accordingly. However, it seems that the JSON/BSON field “identifier” within the documents is having its values replaced with empty strings. One example: A document I see in DocumentDB:
Copy code
{
  "_id": "<airbyte-id>",
  "active": true,
  "identifier": [
    {
      "system": "<some-link>",
      "value": "<some-value>"
    }
  ],
  "managingOrganization": {
    "identifier": {
      "system": "<some-link>",
      "value": "<some-value>"
    }
  },
  "meta": {
    "lastUpdated": "<some-date>",
    "source": "<some-link>"
  }
}
The same document in Snowflake:
Copy code
{
  "_id": "<airbyte-id>",
  "active": true,
  "identifier": "",
  "managingOrganization": {
    "identifier": ""
  },
  "meta": {
    "lastUpdated": "<some-date>",
    "source": "<some-link>"
  }
}
When checking the export logs, I see:
WARN i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$5):339 - Schema validation errors found for stream <stream-name>. Error messages: [$.identifier is of an incorrect type. Expected it to be object, $.managingOrganization.identifier is of an incorrect type. Expected it to be object]
For some reason regardless of the data type, if the field is called “identifier” it is being wiped and replaced as an empty string during export. I was able to double check by duplicating these Documents in the DocDB collection but changing the “identifier” fields to something like “org_identifier” and when I did that, the data was being preserved just find. This makes me think it is not a data type issue but rather that the word “identifier” is either throwing Airbyte off or is some DocumentDB preserved key word. Please let me know if you have seen anything like this, have any ideas, or know of ways to cast these fields during the connection such that I can either fix this for all existing records or get a better sense if this issue is coming from Airbyte or DocDB
✍️ 1
u
@[DEPRECATED] Marcos Marx turned this message into Zendesk ticket 2455 to ensure timely resolution!
k
Note that when trying to replicate this issue using a standalone MongoDB source, I was unable to do so - the data was being retained just fine. There could be a mistranslation or issue with DocumentDB’s emulation of MongoDB and perhaps not an issue with Airbyte but I’ve been having difficulty ruling out either
u
What version of Airbyte and Mongodb are you using?
k
@Marcos Marx (Airbyte) I’m using a local version of Airbyte, testing this stuff in dev. We forked the repo in early August (I believe off this tag: v0.39.41-alpha). Our DocumentDB database uses MongoDB 4.2 so when I tried to replicate the issue using my own standalone MongoDB database, I tried once with 6.0 (latest) and now with 4.2 so they match
u
But what version of MongoDB connector are you using?
k
A dev version of the MongoDB connector. DocumentDB requires AWS to use their own Certificate Authorities so a coworker of mine created a fork in which she added the ability to input a custom CA and add that to the trust store. From there we uncovered this issue
Let me know if that helps or if you were looking for different information
u
I saw you opened the issue https://github.com/airbytehq/airbyte/issues/17397 closing the discussion here to continue in Github.
k
Thank you!