https://linen.dev logo
Join Slack
Powered by
# ask-community-for-troubleshooting
  • r

    Rodrigo Parra

    10/15/2021, 4:42 PM
    Hi there! Not sure if this is the right channel, but I have a “getting started with Airbyte Cloud” type of question so here it goes: I am considering moving our self hosted Airbyte connections to Airbyte Cloud. Should it be as simple as exporting/importing configuration or are there any caveats? Also, would the connections require a full refresh or should incremental appends just continue working out of the box?
    ✅ 1
    u
    • 2
    • 1
  • b

    Blake Enyart

    10/15/2021, 5:24 PM
    Along the lines of Airbyte Cloud, is there a recommended channel for fielding some of those questions? I’d like to consider it particularly for private custom connectors, but wanted to check some details around managing them, HIPAA/PCI compliance, etc.
    ✅ 1
    u
    • 2
    • 1
  • t

    Tedmund Ho

    10/17/2021, 1:11 PM
    Hey all! I have a particular use case for my application which I'm not too sure whether Airbyte would be suitable for. Basically, I'm planning to allow users to import raw data from Google Sheets into an arbitrary table created in a Postgres database. The table created would then allow them to perform arbitrary queries through my application. Does Airbyte make sense for this use case? I have some doubts because I'm not too sure how to manage the creation of so many "sources" (one for each spreadsheet, I think) within the backend and whether it would be quickly overloaded as well.
    h
    • 2
    • 3
  • p

    Prateek Gupta

    10/18/2021, 12:43 PM
    hey all, is there an option in the airbyte UI to hit APIs repeatedly to get and transfer data ? My company has a number of services deployed and we are checking out if we can use airbyte to get data by hitting APIs.
    ✅ 1
    h
    • 2
    • 3
  • b

    Blake Enyart

    10/18/2021, 7:16 PM
    Is there an easy way to remove a custom connector from the list in Airbyte? I want to export the configuration to an EC2 instance, but don't want to include the custom connector in the process which would require us to upload and setup either a Docker Hub account or figure out the connection process to an ECR image which we are hoping to avoid
    ✅ 1
    u
    l
    • 3
    • 4
  • j

    Justin Sharp

    10/19/2021, 3:12 AM
    Is Logical Replication for Postgres real-time? Or is it near-real-time due to some set interval? If the latter, how small can that interval be? For context: we need CDC on our monolith Postgres DB, so that we can stream table changes to Kafka topics to power downstream microservices. Debezium is being investigated. Would be wicked if Airbyte Cloud could be used instead.
    ✅ 1
    h
    u
    • 3
    • 9
  • m

    Mert Karabulut

    10/19/2021, 8:33 AM
    Hi everyone I've got a question, does anyone tried to set up a source for a local json file? I am trying but airbyte seems to not see my local file path.
    h
    p
    • 3
    • 2
  • a

    Andrew Groh

    10/19/2021, 2:27 PM
    What is the best way to trigger notifications once a sync is complete? What I really want to do is to write a message to a kafka topic with information about what was synced (I will have multiple sources and connections, all syncing to S3). I see the notification webhook in the docs, but that seems to be mainly for slack? I am happy to write an endpoint that takes in the notification webhook call and writes to kafka, assuming enough info is passed in the webhook call (like what was synced). I prefer to use the airbyte scheduler to trigger syncs and not airflow.
    👀 1
    ✅ 1
    h
    c
    u
    • 4
    • 4
  • a

    Atif Imam

    10/19/2021, 3:24 PM
    Hey, Everyone . I am trying to set up airbyte and airflow on my local system [ linux] . I have very little experience with airflow and zero deployment experiece. If anyone has made airflow and airbyte work with each other in local machine , then please help me out .
    ✅ 1
    k
    u
    • 3
    • 2
  • j

    Jeferson Machado Santos

    10/19/2021, 3:25 PM
    Is this your first time deploying Airbyte: Yes OS Version / Instance: e2-standard-2 Memory / Disk: 32Gb Deployment: Kubernetes Airbyte Version: 0.30.19-alpha Source name/version: NA Destination name/version: NA Description: What is the effect of increasing the replicas of the workers on workers.yaml? Does it have impact on performance when moving data? Although there are more worker pods with more replicas, every time a job of moving data starts, new pods for source and destination are created, so it is note clear if more worker nodes impact on the performance of the operations.
    j
    u
    • 3
    • 7
  • s

    SJ

    10/20/2021, 4:08 PM
    Hello, I would like to check what information are stored in Airbyte?
    ✅ 1
    u
    • 2
    • 1
  • a

    Alasdair Brown

    10/20/2021, 4:43 PM
    Couple of architectural questions: 1. There is no persistence of message passed between a Source to Destination? A Worker reads from stdout and puts to stdin and this is all ephemeral 2. A Worker is also a container that lives independently of a source/destination container? a. If so, is a Worker tightly coupled with a Source/Dest pair? I.e. create a connection, it gets its own worker b. How many workers are there? Is there a set # in the cluster, 1 per source/dest...? c. If a worker fails, what is the impact to the related Source/Dest? 3. If a Destination fails, what is the impact to its Source? a. What happens to in-flight data? Do we assume the data is persistent and we can re-consume? b. If a destination commits part of its messages to the downstream application, but fails - what is the ability for it to resume? I assume there isn't any? 4. Can a source/destination scale independently? I.e. where the down/upstream application would support parallelism, e.g. Kafka, could a Source scale itself to be many containers working as a consumer group? a. If so, is the Worker able to scale with it to be able to consume from many Sources in parallel?
    👀 1
    ✅ 1
    j
    • 2
    • 10
  • d

    Dave Lindley

    10/20/2021, 5:18 PM
    Is SSL/TLS supported on the OSS version? sec team asking and couldn’t find any info
    ✅ 1
    u
    • 2
    • 4
  • a

    Alibek Tokayev

    10/20/2021, 7:17 PM
    Hi everyone, Thanks to the Airbyte community for developing such a great product. We like it, but we have one concern: We would prefer for all the configurations to be stored as code in a version control system and perform proper reviews and CICD process. I couldn’t find an answer to this question. The only options that I see would be to use the API, but it would be somewhat clumsy implementation. Do you guys have any suggestions regarding this?
    u
    n
    • 3
    • 3
  • y

    Yuhui Shi

    10/21/2021, 10:20 PM
    Hi all, having a general question about using Airbyte + DBT, wondering what's the best practice for it. 1. I have a stream that syncs product information in FULL_REFRESH mode from source and APPEND the latest records to a destination table. So the destination table contains snapshots of the product info at different timestamps. 2. A downstream DBT transformation is set up to always fetch the latest snapshot of the product info, clean it and load it to another table that only keeps the latest snapshot. Right now, I am using a MACRO function in dbt to query the product destination table to get the max value of
    _airbyte_emitted_at
    , so all downstream cleaning operations can use it as a filter to only get the latest snapshot records. I am wondering if it is possible that I can pass the
    _airbyte_emitted_at
    as a variable from Airbyte to DBT from CLI to tell the transformation which snapshot it is going to operate on? If this is possible, it will be more friendly to rerun "backfill" jobs from DBT (since the snapshot is passed in as an argument)
    👀 2
    h
    • 2
    • 2
  • p

    Prateek Gupta

    10/24/2021, 12:26 AM
    Hey, might be a basic question, if I compose down my docker deployment, update airbyte and compose up again, do i have to reset all the tables again? Thanks
    ✅ 1
    l
    • 2
    • 1
  • s

    Sarabjot Singh Sudan

    10/24/2021, 5:56 PM
    Hi Everyone getting below error message when trying to authenticate Salesforce connecctor ERROR i.a.s.RequestLogger(filter):93 - {workspace_app_root=/tmp/workspace/server/logs} - REQ 172.18.0.4 POST 404 /api/v1/source_oauths/get_consent_url - {"workspaceId":"eb801f67-0f72-4f6d-8dde-e1a4420cab1e","sourceDefinitionId":"b117307c-14b6-41aa-9422-947e34922962","redirectUrl":"http://localhost:8000/auth_flow"} airbyte-webapp | 172.18.0.1 - - [24/Oct/20211753:41 +0000] "POST /api/v1/source_oauths/get_consent_url HTTP/1.1" 404 11309 "http://localhost:8000/connections/new-connection" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36" "-" I am using docker installed on windows
    a
    s
    g
    • 4
    • 9
  • s

    Sarabjot Singh Sudan

    10/24/2021, 6:00 PM
    Also is it possible to use username, pwd and security token instead of oauth
    h
    • 2
    • 3
  • a

    Amit Gelber

    10/25/2021, 10:02 AM
    Hi everyone! A few questions :) Is it possible to manage a 3TB full sync with Airbyte? MSSQL ->BigQuery? will it handle the load? How can I determine how many workers do I need for my setup? How does it split the load? by table? or each worker grab apart from a table? what is the retry policy? Can we retry failed parts of a table? Was the k8s tested in production?
    👀 4
    c
    d
    • 3
    • 8
  • l

    Lewis Cunningham

    10/25/2021, 1:32 PM
    Simple question on editing an existing connection. How? I am clicking on things and going in a circle. Click on Connections and see my connection. Click on one of those connections and see the history and reset/sync screen. Click on sources and see my sources. Click on a source and see the source description (with a launch button). Click on the dedestination and go to the destination. I want to edit this source, not run it and not see the destination. I click on the gear and I can see what objects I want to ingest but I want to change the actual connection info (like user/pwd). This is a salesforce connection if that makes any difference. How do I do that?
    a
    • 2
    • 5
  • s

    SJ

    10/25/2021, 3:35 PM
    Hi, I am exploring data ingestion through multiple datasources using Airbyte. I would like to know is it possible to have runtime processing. I would like to deploy Airbyte in AWS and then use it for reading and updating data from multiple data sources. For example, Let say system would read the data from Snowflake and keep it in memory, do some transformations in lambda and then update it back in snowflake. Can we do it using some temp processing through Airbyte, without storing the data? If yes then what is the process to do? Any help and input is appreciated!!
    k
    • 2
    • 6
  • n

    Noel Gomez

    10/25/2021, 10:47 PM
    when loading a file from S3 to Snowflake, is it possible to prevent airbyte from inferring the table schema? e.g. I want everything to load as raw strings and I will cast with dbt, but I see some cols are coming in as integer. I set the connection with Raw data - no normalization
    👀 1
    h
    • 2
    • 6
  • f

    Fausto

    10/26/2021, 10:49 PM
    Hi! I'm trying a simple custom dbt transformation where I insert only one field instead of n in a table. In the log I found: "Could not find adapter type sqlserver!". I tried to run dbt alone and it works. Also, the basic normalization without any custom one, works. What it colud be? Thanks!
    u
    h
    • 3
    • 7
  • z

    Zane Selvans

    10/27/2021, 4:02 AM
    I work with a bunch of public energy system data, much of which is published by government agencies in Excel spreadsheets with multiple tabs, whose filenames, headers and organizational structure changes from year to year. It's kind of a bespoke mess to get them all integrated into unified, well normalized database tables. I'm wondering if anyone knows of any public projects using Airbyte to manage this kind of data source effectively, that we might be able to learn from? Where does one put all of the logic required to get this kind of semi-structured data to the point where it makes sense in a database? Right now we use Python and pandas to do it all, and then load it into SQLite locally, but we want to do a better job of separating out the "Transform" stuff (calculating derived values, creating denormalized views, entity matching between datasets) and only doing that after there's a well-defined structure to start with. Right now it's kind of mixed in with the work of getting all the different years of data into a coherent whole.
    z
    • 2
    • 4
  • s

    Sunil Nair

    10/27/2021, 5:32 AM
    Hi Experts...i am planning to use Airbyte to replicate (postgres to postgres) our ERP Database (150 GB), but selective tables only. Is Airbyte the right choice ? Also, I have installed Airbyte and did some extraction and having configured CDC and Full refresh Overwrite, basic normalization, etc. But, what I see is that any updated row at source is duplicated at the destination. Help me to understand what I misconfigured. Thank you.
    d
    • 2
    • 2
  • d

    dasol kim

    10/28/2021, 1:17 AM
    Hi all, I just did workspaces/create, the source and destination created in the extisting workspace disappreared. Is there any way to recover or undo this? please help.
    ✅ 1
    👀 1
    h
    • 2
    • 4
  • m

    Mohammed Al-Hamdan

    10/28/2021, 5:57 PM
    Hello there, I have Airbyte instance running locally and I want to migrate to our K8s cluster with out losing track of the data I have synced in the local machine. How can I do that?
    ✅ 1
    u
    • 2
    • 2
  • b

    Brian Olsen

    10/28/2021, 9:17 PM
    Hey all, running through the EC2 version with a slight variation to use ARM (I'm cheap 😉)
    Attaching to airbyte-db, airbyte-scheduler, airbyte-server, airbyte-temporal, airbyte-webapp, airbyte-worker, init
    init | standard_init_linux.go228 exec user process caused: exec format error
    init exited with code 1
    airbyte-temporal | standard_init_linux.go228 exec user process caused: exec format error
    airbyte-db | standard_init_linux.go228 exec user process caused: exec format error
    airbyte-server | standard_init_linux.go228 exec user process caused: exec format error
    airbyte-scheduler | standard_init_linux.go228 exec user process caused: exec format error
    airbyte-worker | standard_init_linux.go228 exec user process caused: exec format error
    airbyte-webapp | standard_init_linux.go228 exec user process caused: exec format error
    Is ARM not supported by airbyte?
    ✅ 1
    u
    • 2
    • 2
  • d

    David Beck

    10/29/2021, 10:48 AM
    Are airbyte a good choice when implementing streaming ingestion? I'm thinking about receiving events from webhooks or using change data capture.
    k
    • 2
    • 4
  • b

    Blake Enyart

    10/29/2021, 9:53 PM
    So I just setup an Airbyte EC2 instance with 100GiB of storage setup on the root volume. With that, I'm concerned about log storage over time. Does Airbyte currently manage how long logs are stored in the
    /tmp
    currently or do these accrue with time and need to be cleaned out occasionally for long running EC2 instances?
    ✅ 1
    👍 1
    h
    a
    • 3
    • 5
1...121314...245Latest