Airbyte #advice-data-ingestion

Ashish Kumar Mohanty

08/02/2022, 6:07 PM

Is there any way we can track when Redshift as Source incremental would be released? https://docs.airbyte.com/integrations/sources/redshift/

Harvey Marshall

08/03/2022, 12:08 PM

Hey all, I was wondering if anyone had successfully pulled data from an Azure Event Hub using the kafka connector. I have set it up and says it’s connected but wont read a message.

Marcos Marx (Airbyte)

08/03/2022, 12:38 PM

Hello 👋 I’m sending this message to help you identify if this channel is the best place to post your question. Airbyte has a few channels to open discussion about data topics (architecture, ingestion, quality, etc). In these channels you may ask general questions related to the particular topic. If you’re having problem deploying or running a connection in Airbyte this is not the topic. We recommend to you open a Discourse Topic where our support team will help you troubleshooting your issue.

Edgar Valdez

08/08/2022, 1:33 AM

Hi there! I’m trying out Airbyte for the first time, I’m testing out the freskdesk connector, and even tho the extraction is successful i can’t see all the records coming up in the

tickets

table…in the settings i’m not specifying a

start date

so is “ingesting” all the data… can’t see any error in the logs, so i’m assuming everything is ok…but when comparing with a freshdesk report, i’m about 600 records out (from ~1300), don’t think it’s an api limit rate issue any ideas? TIA

Dipti Bijpuria

08/08/2022, 6:41 PM

Hello everyone. We have a requirement to fetch multiple reports from workday using SOAP connector. I am able to create a protype to fetch one report from Workday. I am using below mentioned static files to request and process the response" 1. Use static xml file to make the request. 2. The xml response received in step 1 is then processed using a static xsl to cherry-pick desired fields and flatten the response. 3. File produced in step 2 is validated against a static JSON schema. Although my prototype works fine, as next step I need to modify it to be able to fetch other reports as well using same connector. To do so, as first pass I am planning to pass xml(to make request payload), xsl and JSON schema as parameter in UI. I m aware of the fact that above solution is not sustainable as business requirement changes frequently and we might need to fetch more columns and ideal solution would be to persist entire response from workday. 1. While I am working on parameterization solution (passing xml, xsl and JSON schema via UI), it would be great if anyone could share their experience regarding limitation of this approach or any problem they would have encountered. 2. Also, my ultimate goal is to be able to generate XSL(to flatten the response) and JSON schema dynamically for each report. Can anyone help me with how to generate dynamic JSON schema for SOAP connector.

Arber X

08/08/2022, 11:12 PM

Hi all, having a simple issue connecting to a mongodb atlas instance hosting my own airbyte instance. I've followed all the guides I could find, and believe I have a proper setup but still get:

Copy code

Error: Command failed with error 13 (Unauthorized): 'not authorized on db to execute command....

Arber X

08/08/2022, 11:12 PM

Anyone have any insights into proper permissioning/the issue?

gunu

08/08/2022, 11:15 PM

Attempting to sync a source: MySQL Aurora database using CDC. It’s a writer & reader cluster. Whilst I’ve been able to modify the

log-bin

parameter to ON for the writer instance, it does not seem to transfer over to the reader instance - and it is read-only parameter in the reader instance. If the reader instance has

log-bin

set to OFF, then I cannot use this as a source using CDC. Any suggestions?

Mike Passey

08/09/2022, 11:49 AM

Hi! I've been struggling to use stripe as a source in incremental mode. I get the last/first record duplicated each time in my output. Each time it runs, it seems to store the unix timestamp (

created

) field for the most recent record in the

Connection State

e.g.:

Copy code

Connection State
{
  "refunds": {
    "created": 1660045125
  }
}

Then next time it runs, it picks up the most recent record from the previous run again. For this example I get the the same record with

created=1660045125

in my output at end of first run, and start of next incremental run. So I end up having duplicates of the last/first record each time the incremental run works. E.g. it's doing something like

where created >= X

rather than

where created > X

. Is this behaviour anyone else has seen? I have to use full refresh as a workaround for now, which is of course much slower. Any help would be great! Is it a bug?

Lior Shkiller

08/09/2022, 2:28 PM

Hi! We are trying to use the Slack source to fetch specific slack channels to big query but we are having trouble it. It seems like some of the messages that exist in threads are not being synced. We looked at the airbyte code and the Slack's API and it seems correct. You indeed call

conversations.replies

with the timestamp of the slack thread. It seems to be done here. The thing is, that when we call the API directly, it seems to give the correct results, so we are not sure what's wrong with Airbyte's code exactly.. Did anyone else see this weird behavior of missing messages while trying to integrate to Slack?

Rocky Appiah

08/09/2022, 3:36 PM

Is it possible to airbyte to connect to a secondary mongo instance? or does it need to connect to master?

Rocky Appiah

08/09/2022, 5:05 PM

Copy code

Could not connect with provided configuration. Error: Command failed with error 59: 'no such cmd: listCollections' on server 10.0.4.230:27017. The full response is {"ok": 0.0, "errmsg": "no such cmd: listCollections", "code": 59, "bad cmd": {"listCollections": 1, "authorizedCollections": true, "nameOnly": true}}

Running a really old version of mongo, 2.6.x

Tobias Troelsen

08/10/2022, 8:31 AM

File (alpha connector) issues | Seem to get rows in log, but outputs nothing to table Any idea what is going wrong with my file connector (.csv file through URL) - See log output in thread. THANKS.

Gautam

08/10/2022, 11:40 PM

Hi ! Airbyte noob here. • Deployment: Kubernetes on Azure [AKS] • Airbyte Version: 0.39.42-alpha Trying to implement a Postgres to BigQuery connection/ingestion. Started with a fairly simple table schema with < 10 fields and 24 rows Not able to achieve the desired result, logs indicate an error [screenshot below]

Gautam

08/10/2022, 11:41 PM

Running Airbyte sync without the basic normalization, the data gets written to destination as one data column with a JSON blob that contains all of the data

Gautam

08/10/2022, 11:41 PM

Copy code

2022-08-10 23:11:45 normalization >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/transform.py", line 46, in parse
2022-08-10 23:11:45 normalization >     profiles_yml = read_profiles_yml(parsed_args.profile_config_dir)
2022-08-10 23:11:45 normalization >   File "/usr/local/lib/python3.9/site-packages/normalization/transform_catalog/transform.py", line 75, in read_profiles_yml
2022-08-10 23:11:45 normalization >     with open(os.path.join(profile_dir, "profiles.yml"), "r") as file:
2022-08-10 23:11:45 normalization > FileNotFoundError: [Errno 2] No such file or directory: '/config/profiles.yml'

There is also this error text in the logs when normalization happens. Is that a missing config when setting up on an AKS cluster ? Any advice on the approach that needs to be adopted

Eli Sigal

08/11/2022, 9:22 AM

Hi. I`m new to airbyte and I would like to create new logic for google source for custom query using GAQL to fetch data with out using the date segment for example:

Copy code

SELECT geo_target_constant.canonical_name, geo_target_constant.country_code, geo_target_constant.id, geo_target_constant.name, geo_target_constant.parent_geo_target, geo_target_constant.resource_name, geo_target_constant.status, geo_target_constant.target_type FROM geo_target_constant

What is the best approach to do so? Is there a place I can look for an example on how to add data to an existing source component since creating a new ones we already know how to Thank you.

Madiha Khalid

08/11/2022, 12:15 PM

Hi Team, I need an advice regarding Source-hubspot data extraction nearly ~7M. I posted my question here can someone please guide me? https://discuss.airbyte.io/t/source-hubspot-contact-list-membership-contacts-extraction-performance-optimization/2219 Many Thanks

Shamil Siddique

08/11/2022, 1:56 PM

Hi team, a quick doubt. I’ve been trying to find if pagination is available for connections. All I managed to find is this issue and its parent issue. Is pagination available for connections? If yes, please point me to any related resource.

Daniel Rothamel

08/11/2022, 6:43 PM

Quick question-- this is our first use of Airbyte, and we're using the Close.com connector, which is in Alpha. Ingestion into Snowflake seems to be at about 500 rows per minute for raw data. Is this because it is an Alpha connector, or is this expected performance?

Vitalie CALMÎC

08/12/2022, 7:40 AM

Hello Team, I am developing a custom source connector for Amazon Ads API, which will be a long running job. I am almost at the finish line, but I have an issue. After the job runs for a quite long time, I receive 401 errors from Amazon, however the job runs for more than an hour and the token is refreshed accordingly. However, sometimes I get this 401 error. I tried to hard refresh the token in

should_retry

function:

tokens: Tuple[str, int] = _self_.authenticator.refresh_access_token()

, but it fails with the following error:

Copy code

File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 294, in _send
    if self.should_retry(response):
  File "/airbyte/integration_code/source_amazon_ads/streams/common.py", line 292, in should_retry
    tokens: Tuple[str, int] = self.authenticator.refresh_access_token()
AttributeError: 'NoAuth' object has no attribute 'refresh_access_token'

I do not uderstand why the authenticator property is of type NoAuth, because when I initialize the stream in source, I pass the Oauth authenticator. Could someone help or point into the right direction?

Alex Banks

08/12/2022, 8:34 AM

Hello! Looking for a pattern or advice on ingesting data from the same Source but with different inputs into the same Destination. IE, I have two different Quickbooks connections that I'd like to write into the same Postgres table, but obviously the data would need to be distinct (ie, including some information defined on the Source and not in the data itself). Is there an Airbyte-approved/recommended way to do this? I don't really want to have two different Destination tables, because over time the number of Quickbooks sources will grow to be greater than 2.

Vadym Samsonovych

08/12/2022, 10:34 AM

Hi colegs, I'm having a problem syncing my data in Airbyte. I have an S3 bucket in AWS and a Postgres database, connected to it without any problems, but the data sync fails with the error "Error origin: Replication, Message: Something went wrong during replication". I specified my folder on S3 to get 1 csv file for test, this file is not empty. I used these templates to specify the path to it /test/**, /test/*.csv. Is anyone could advise somethig for me ?

Rocky Appiah

08/12/2022, 12:38 PM

What’s the ETA on allowing new source tables in a PG instance to be added without having to refresh the entire destination? Combined with the throughput on large tables, this starts to become a problem 😞

Juan Cruz Grave

08/12/2022, 2:33 PM

Hi . Im trying to make a replication from Postgres to BigQuery. The Postgres instance has PgBouncer pooler, and when i try to make replication, the sync process returns with "Prepared Statement already exists" and replication returns zero rows for every table in the schema. Is there any way to disable prepared statements on Airbyte config? I dont see a way to do that

look R

08/12/2022, 3:10 PM

hello all. i’m trying out airbyte and i wanted to try to fetch ziped data from https. Example link here: https://data.bikedataproject.org/counts/network-counts.geojson.zip. Now, the file behind the zip is geojson file and also the fact that it is ziped does not help. Is there a way of simply sending this kind of a file in it’s original form to s3 or another destination without the transformations in a 1-1 fashion!? thank you!

Saul Burgos

08/12/2022, 7:18 PM

According to the official documentation, SFTP Connection doesn't support "incremental append" . So...Anyone has a workaround for this? something custom? I am trying to use airbyte with Dagster. If someone can guide me it will great.

Vadym Samsonovych

08/13/2022, 3:24 PM

Anyone can help me with that ?

Rocky Appiah

08/13/2022, 6:14 PM

Looks like when reading from postgres source, by default it ingests 1,000 rows at a time. How do you increase this? I have a job which takes hours to complete 😞

Lucas Wiley

08/14/2022, 9:20 PM

Hmm not sure if something changes for Postgres connectors but over the weekend my entire schema replication list was changed to only bring in a few tables and stop replicating and reset every other table...