hey folks I m using Airbyte for the first time trying to ing Airbyte #ask-community-for-troubleshooting

hey folks, I'm using Airbyte for the first time, t...

Daniel Mateus Pires (Earnest Research)

05/27/2021, 10:24 AM

hey folks, I'm using Airbyte for the first time, trying to ingest from a REST API using the HTTP source. However the API returns as JSON array, and I'm getting

Copy code

2021-05-27 10:22:09 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - data
2021-05-27 10:22:09 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - value is not a valid dict (type=type_error.dict)
2021-05-27 10:22:10 INFO (/tmp/workspace/2/0) DefaultReplicationWorker(run):134 - Source thread complete.
2021-05-27 10:22:10 INFO (/tmp/workspace/2/0) DefaultReplicationWorker(run):135 - Waiting for destination thread to join.
2021-05-27 10:22:11 INFO (/tmp/workspace/2/0) DefaultReplicationWorker(run):137 - Destination thread complete.
2021-05-27 10:22:11 ERROR (/tmp/workspace/2/0) DefaultReplicationWorker(run):141 - Sync worker failed.

in the logs on version

0.24.1-alpha

Daniel Mateus Pires (Earnest Research)

05/27/2021, 11:31 AM

I looked into the Airbyte code and added a unit test that was actually passing for a REST API returning a

list(json)

, then I realized the API I'm trying to pull from, returns just a list of list of strings!

Copy code

[["a", "b", "c"], ["val_a", "val_b", "val_c"], ["val_a2", "val_b2", "val_c2"]]

hence the issue

👍 1

Daniel Mateus Pires (Earnest Research)

05/27/2021, 12:05 PM

As a follow up, I added an issue: https://github.com/airbytehq/airbyte/issues/3662 and a PR: https://github.com/airbytehq/airbyte/pull/3663

[DEPRECATED] Marcos Marx

05/27/2021, 12:07 PM

Daniel for CSV files you can use the File source if you dont need to send headers/body in the request. I read the HTTP connector doc and we have a lot to improve there btw, thanks for opening the issues 😃

Daniel Mateus Pires (Earnest Research)

05/27/2021, 12:14 PM

thanks for reaching out 🙂 so maybe I'm not understanding correctly or I didn't explain my issue well: the data still comes from a REST API, it's just the response that is atypical. (in the issue I pasted the link I'm looking at, and the response structure) The response isn't an actual csv, it's still a valid JSON object but I'm guessing they represent it internally as a CSV because of the structure of the response which is:

Copy code

[
  ["column_a", "column_b", "column_c"],
  ["value_a1", "value_b1", "value_c1"],
  ["value_a2", "value_b2", "value_c2"]
]

``````

[DEPRECATED] Marcos Marx

05/27/2021, 12:26 PM

whyyyy

[DEPRECATED] Marcos Marx

05/27/2021, 12:28 PM

Daniel i think is possible to extend the HTTP connector, but imho this format is nonstd... this API has a lot of endpoint that other peoples could use, wdyt building a connector specific to get information from the census.gov? they have a python lib https://pypi.org/project/census/0.5/

[DEPRECATED] Marcos Marx

05/27/2021, 12:29 PM

using the CDK should be an easy job 😎

Daniel Mateus Pires (Earnest Research)

05/27/2021, 1:07 PM

That sounds good Marcos 🙂 I'll add some information into the issue/PR I opened and turn it into a "Census specific connector request" rather than an extension to the HTTP Request connector

👍 1

Eugene Kulak

05/27/2021, 8:55 PM

another option could be just override parse response and produce regular dicts

Copy code

[{"column_a": "value_a1", "column_b": "value_b1", "column_c": "value_c1"},
{"column_a": "value_a2", "column_b": "value_b2", "column_c": "value_c2"}]

👍 1

5 Views

Open in Slack

Previous Next