Airbyte #ask-community-for-troubleshooting

dandpz

11/15/2022, 10:34 AM

Hi Everyone, I am using the Amazon Ads connector and I was wondering how to get the query field from the sponsored_product_keywords report. From amazon docs it should be a field added by default when using the param

"segment": "query"

, but I cannot find it in the downloaded data, maybe should it be added a new report stream with this kind of parameter? Thanks in advance 🙂

komal azram

11/15/2022, 10:38 AM

I am trying to make a connection between gcs and snowflake. Both connections are working but when I transfer data it fails and gives error. I have attached the error log file.

logs-6580.txt

navod perera

11/15/2022, 12:21 PM

Hello Team Airbyte, We integrated woo-commerce with Airbyte. So the problem is when we add a correct woocommerce shop URL and wrong secret keys (Consumer key and Consumer secret) it succeeds. I need airbyte and woocommerce integration to succeed only if the user's input shop URL and secret keys (Consumer key and Consumer secret) are valid. Thanks in advance.

Berzan Yildiz

11/15/2022, 12:28 PM

I keep getting

AssertionError: Mismatched number of tables 190 vs 6 being resolved

for my custom source connector to postgres. This occurs during normalizaiton. I am sure my schema is fine. What does this error mean?

thomas trividic

11/15/2022, 1:15 PM

hello, we have problem wit airbyte deployment on Kubernetes

thomas trividic

11/15/2022, 1:15 PM

we have difficulties to expose the webapp

thomas trividic

11/15/2022, 1:15 PM

in our internal AWS network

Dave Tomkinson

11/15/2022, 1:20 PM

Hi all. I'm trying to understand the airbyte UI (0.40.18) and troubleshoot a sync. When my job completes I get a '`Sync Succeeded 10,000,000 emitted records | 10,000,000 committed records`' message. But when I do a count in the raw table I only have 9,986,326 rows. I found a line in the log which says

A total of 13674 record(s) of data from stream AirbyteStreamNameNamespacePair{name='events_196', namespace='analytics_raw'} were invalid and were ignored.

My sync is a raw sync with no normalisation postgres (RDS) to Redshift Serverless (using destination-redshift 0.3.51) going direct (not via S3) How do I figure out why those rows are invalid as all rows are required? (This was a test copy of 10M rows from a 1.7B row db) Why does the UI say its committed 10,000,000 records when it hasn't?

Savio Lucena

11/15/2022, 2:24 PM

Does airbyte in K8s provides an entrypoint to allow us to define additional environment variables to the

JOB_MAIN_

container in a worker?

Rytis Zolubas

11/15/2022, 2:41 PM

Hello! Is it possible to use API without webapp running? How it could be done? Maybe I could expose a certain port from server?

Paulo Singaretti

11/15/2022, 4:02 PM

Hello! I'm starting my journey with Airbyte devopment a pipeline to collect data from Postgres(CDC) that sends to Kafka and then send these data to S3 or Kinesis, but I'm facing errors in any of these destinations I choose: S3:

Copy code

ERROR i.a.i.b.AirbyteExceptionHandler(uncaughtException):26 - Something went wrong in the connector. See the logs for more details.
2022-11-15 15:59:03 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):78 - java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1

Kinesis:

Copy code

ERROR i.a.i.b.AirbyteExceptionHandler(uncaughtException):26 - Something went wrong in the connector. See the logs for more details.
2022-11-15 15:59:41 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):78 - java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1

Do you guys have any idea what I'm doing wrong? I guess it's something in Kafka due to same error.

Gergely Lendvai

11/15/2022, 4:04 PM

Hi all, We have a

Hubspot -> S3

connector with the following configs and we’d like to understand why it takes a bunch of time to run a sync and whether it can be sped up in any way. For the deployment we are using the helm chart with the following resource settings for jobs (this is not reflected in the destination definition which is weird, however the

source-*

and

destination-*

pods are using these limits):

Copy code

global:
  jobs:
    resources:
      requests:
        cpu: "200m"
        memory: "4Gi"
      limits:
        cpu: "200m"
        memory: "4Gi"

Airbyte version: 0.40.17
Source definition:

Copy code

{
  "sourceDefinitionId": "36c891d9-4bd9-43ac-bad2-10e12756272c",
  "name": "HubSpot",
  "dockerRepository": "airbyte/source-hubspot",
  "dockerImageTag": "0.2.3",
  "documentationUrl": "<https://docs.airbyte.io/integrations/sources/hubspot>",
  "protocolVersion": "0.2.0",
  "releaseStage": "generally_available"
}

Destination definition:

Copy code

{
  "destinationDefinitionId": "4816b78f-1489-44c1-9060-4b19d5fa9362",
  "name": "S3",
  "dockerRepository": "airbyte/destination-s3",
  "dockerImageTag": "0.3.17",
  "documentationUrl": "<https://docs.airbyte.com/integrations/destinations/s3>",
  "protocolVersion": "0.2.0",
  "releaseStage": "generally_available",
  "resourceRequirements": {
    "jobSpecific": [
      {
        "jobType": "sync",
        "resourceRequirements": {
          "memory_request": "1Gi",
          "memory_limit": "1Gi"
        }
      }
    ]
  }
}

Source:

Copy code

{
  "sourceDefinitionId": "36c891d9-4bd9-43ac-bad2-10e12756272c",
  "sourceId": "457f3db8-6ce1-41be-9ecb-7ef9a724c88b",
  "workspaceId": "2b94a777-1e5e-4381-af9f-21582ecce5c7",
  "connectionConfiguration": {
    "start_date": "2022-11-15T12:00:00Z",
    "credentials": {
      "access_token": "**********",
      "credentials_title": "Private App Credentials"
    }
  },
  "name": "hubspot_test",
  "sourceName": "HubSpot"
}

Destination:

Copy code

{
  "destinationDefinitionId": "4816b78f-1489-44c1-9060-4b19d5fa9362",
  "destinationId": "033a010b-ff8e-4eb3-9ee6-6505a6c42d00",
  "workspaceId": "2b94a777-1e5e-4381-af9f-21582ecce5c7",
  "connectionConfiguration": {
    "format": {
      "compression": {
        "compression_type": "No Compression"
      },
      "format_type": "JSONL"
    },
    "s3_endpoint": "",
    "access_key_id": "**********",
    "s3_bucket_name": "****",
    "s3_bucket_path": "****",
    "s3_bucket_region": "****",
    "secret_access_key": "**********"
  },
  "name": "hubspot_s3",
  "destinationName": "S3"
}

Do you know what can cause the pulling of only

3 MBs

of data to take

~3 hours

? Also do you have any recommendations on how to handle this? Many thanks 🙏

Kaan Murzoğlu

11/15/2022, 4:43 PM

Hello everyone! Here trying to insert data to clickhouse from a nested mongo json document. So my question is that, is it possible to add a column to base table as “documents.drivingLicence” instead of creating separate table "accounts_documents" table.

Copy code

accounts

{
    "_id" : ObjectId("xxxx"),
    "clientId" : "xxxxx",
    "areaCode" : "xx",
    "gsm" : "xx",
    "status" : "approved",
    "createdAt" : ISODate("2022-05-26T15:35:44.113+0000"),
    "updatedAt" : ISODate("2022-06-27T10:22:07.959+0000"),
    "document" : {
        "drivingLicence" : "approved",
        "video" : "approved"
    }
}

👀 1

Felipe Cosse

11/15/2022, 6:13 PM

Hello everyone! I’m having problem passing data from

MYSQL

(AWS Aurora) to

S3

(AWS). When a table has a field with the

TIME

type, an error occurs when reading the

PARQUET

file. Here’s the error:

Copy code

Unable to create Parquet converter for data type "timestamp" whose Parquet type is optional int64 member0 (TIME(MICROS,true))

The field is mapped as Struct and a dictionary is created with the timestamp and the string that would be the timezone.

Copy code

{
  "expire_timeofday": {
    "member0": "timestamp",
    "member1": "string"
  }
}

I try to convert the field to String but there is an error in the conversion. Wouldn’t it be possible to select the type of field to be saved in the Destination?

Alexander Govgel

11/15/2022, 6:32 PM

Hi guys! Integer data type in Postgres source defines as Number. Is there way to get Integer data type as Integer?

Jeff De Los Reyes

11/15/2022, 6:36 PM

Hi all, just a question on db replication i,e one postgres source to another postgres destination. Suppose I have a connection set-up to run nightly, if I change the source schema by adding new columns to a table or adding a new table to a database, will I have to refresh this source or can the connection detect this change?

Manish Tomar

11/15/2022, 7:36 PM

How can we create Airbyte source/ Destination in Bulk using Airbyte API to Automate things?

Jonathan Cachat PhD (JC)

11/15/2022, 8:55 PM

is anyone here aware of how to properly setup a Facebook Marketing API insights call for a Manager Account?? My group is a marketing manager, and we have a few 100 business we over see the social ads. SO, business_id=#####-me-##### and that manager account has access to 250 account_ids. I want to pull the insights reports for all accounts, so rather than use account_id for a single customer, i'd like to use my groups ID. however, whenever I try to use the business_id or act_#### ids - it comes back with FacebookAPIException('Error: 100, (#100) Missing permissions')

Jonathan Cachat PhD (JC)

11/15/2022, 9:18 PM

IF you can only call one ACCOUNT_ID via Facebook API - is there a way to call a custom report?? I made the custom report I am hoping to download in the reportbuilder. I have a report_id for it. Is there a graph URL that I can drop the report_id into and it will kick me out a flat data file!?

Rahul Borse

11/15/2022, 10:56 PM

Hi all, Is there any way I can fork the base-java-s3 repository only, if I am trying fork option for the base-java-s3, it is doing fork for entire airbyte repository. Can someone please help.

Abdi Darmawan

11/16/2022, 2:06 AM

hi all, how to make pod

orchestrator-norm-job-xxx

run to spesific nodepool already set

JOB_KUBE_NODE_SELECTORS: pool-env=production-airbyte

in configmap kubernetes ,but only for pods

orchestrator-norm-job-xxx

still running in random nodepool

Benen Cahill

11/16/2022, 3:16 AM

Hi folks, I’m running a Mixpanel connector on an open source instance but unable to sync sucessfully even a days worth of data. It seems the mixpanel connector only retrieves 1000 rows per request, which combined with Mixpanel’s API request limits means we can never experience the throughput needed to catch up with the number of events flowing through our Mixpanel project. Is there any way to increase that row size up from 1000 rows at a time?

➕ 2

Mukul Gopinath

11/16/2022, 7:26 AM

Hey Team, I'm running Airbyte on EKS and 3 pods keep running into Pending state with the same Event,

Warning  FailedScheduling  35s   default-scheduler  0/1 nodes are available: 1 node(s) had volume node affinity conflict.

It gets fixed when I resize the

airbyte-volume-configs

persistent volume. Initially from 500Mi to 2Gi and then later pulled it to 20Gi too. Still facing this issue. Is there a suggestive volume that needs to be configured? Or is there a way to reclaim the volume if this is temporary? https://discuss.airbyte.io/t/eks-pods-running-into-pending-state-due-to-pv/3211

Berzan Yildiz

11/16/2022, 7:36 AM

Is there a way for the raw and tmp directories to be cleaned up after sync?

Gergely Imreh

11/16/2022, 8:55 AM

Hi! I was doing some connector development work, and added the custom connector both as “dev” (so I can iterate on it), and a “regular” (so my live pipelines would use that connector). Now I’d like to remove the “dev” version from the available connectors. Is that possible in any way? I didn’t see anything in the UI. Thanks!

Rahul Borse

11/16/2022, 9:09 AM

Hi all, I am using java 17.0.5 and gradle 6.9.3 and when I am trying to gradle build airbyte I am getting "Unsupported class file major version 61" error. Can someone help me with this? Do I need to change java version with any specific version?

Vikas Goswami

11/16/2022, 10:17 AM

Hi all, I am trying to setup airbyte on AWS EKS Cluster and I want to use S3 as log location I have configured AWS IAM User Credentials and Log bucket name and region as well and I removed minio because I am gonna use S3 for storing logs but due to this airyte-worker is not coming up and when I go with default minio deployment then everything works fine. I followed airbyte document but it doesn't work for me. I don't know what I am missing. If anyone can help me that would be great

Monika Bednarz

11/16/2022, 10:19 AM

Hi Team ! octavia wave I was trying to set up the Adjust connector (alpha) and the below error pops up no matter the setup of the source. The equivalent API calls succeed 🤔 The issue occurrs with fetching the schema of the source sadparrot Screen below 🔽 It’d be so grateful on any insight! We’ve got the newest version of Airbyte.

Karan

11/16/2022, 1:20 PM

Hello Team, I have a gcp Managed Instance where I have installed Airbyte. But when I try connecting it to postgres Managed gcp instance it doesn't connect. It simply gives non-json response. However same connection works when I try connecting via local docker instance. Why is this happening and what is the turnaround?

Leonardo de Almeida

11/16/2022, 2:09 PM

Hi guys, I'm having a issue with airbyte v0.40.14 in kubernete when I try to test a Postgres source sometimes the rce-postgres-check pod was not created and sometimes it was. Anyone with this issue too?