hey folks, I'm using Airbyte for the first time, t...
# ask-community-for-troubleshooting
d
hey folks, I'm using Airbyte for the first time, trying to ingest from a REST API using the HTTP source. However the API returns as JSON array, and I'm getting
Copy code
2021-05-27 10:22:09 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - data
2021-05-27 10:22:09 ERROR (/tmp/workspace/2/0) LineGobbler(voidCall):69 - value is not a valid dict (type=type_error.dict)
2021-05-27 10:22:10 INFO (/tmp/workspace/2/0) DefaultReplicationWorker(run):134 - Source thread complete.
2021-05-27 10:22:10 INFO (/tmp/workspace/2/0) DefaultReplicationWorker(run):135 - Waiting for destination thread to join.
2021-05-27 10:22:11 INFO (/tmp/workspace/2/0) DefaultReplicationWorker(run):137 - Destination thread complete.
2021-05-27 10:22:11 ERROR (/tmp/workspace/2/0) DefaultReplicationWorker(run):141 - Sync worker failed.
in the logs on version
0.24.1-alpha
I looked into the Airbyte code and added a unit test that was actually passing for a REST API returning a
list(json)
, then I realized the API I'm trying to pull from, returns just a list of list of strings!
Copy code
[["a", "b", "c"], ["val_a", "val_b", "val_c"], ["val_a2", "val_b2", "val_c2"]]
hence the issue
👍 1
u
Daniel for CSV files you can use the File source if you dont need to send headers/body in the request. I read the HTTP connector doc and we have a lot to improve there btw, thanks for opening the issues 😃
d
thanks for reaching out 🙂 so maybe I'm not understanding correctly or I didn't explain my issue well: the data still comes from a REST API, it's just the response that is atypical. (in the issue I pasted the link I'm looking at, and the response structure) The response isn't an actual csv, it's still a valid JSON object but I'm guessing they represent it internally as a CSV because of the structure of the response which is:
Copy code
[
  ["column_a", "column_b", "column_c"],
  ["value_a1", "value_b1", "value_c1"],
  ["value_a2", "value_b2", "value_c2"]
]
``````
u
whyyyy
Daniel i think is possible to extend the HTTP connector, but imho this format is nonstd... this API has a lot of endpoint that other peoples could use, wdyt building a connector specific to get information from the census.gov? they have a python lib https://pypi.org/project/census/0.5/
using the CDK should be an easy job 😎
d
That sounds good Marcos 🙂 I'll add some information into the issue/PR I opened and turn it into a "Census specific connector request" rather than an extension to the HTTP Request connector
👍 1
e
another option could be just override parse response and produce regular dicts
Copy code
[{"column_a": "value_a1", "column_b": "value_b1", "column_c": "value_c1"},
{"column_a": "value_a2", "column_b": "value_b2", "column_c": "value_c2"}]
👍 1