https://linen.dev logo
Join Slack
Powered by
# advice-data-ingestion
  • w

    wp

    09/12/2022, 8:32 PM
    did you ever figured this out? Im seeing the same thing from bigquery denormalized
  • c

    Cristian Ivanoff

    09/13/2022, 1:46 PM
    Hi, im testing a connector to IBM DB2 and would like to set a parameter on the connector: -Ddb2.jcc.charsetDecoderEncoder=3 Is this possible?
    y
    • 2
    • 6
  • s

    Samikshan Bairagya

    09/13/2022, 2:50 PM
    Hi everyone. We are using the Slack connector to sync data from one Slack Channel. However, it took ~ 2 hours
    (2h 12m 37s)
    to complete the data sync. The Slack source was configured with
    Join all channels
    set to
    true
    and one channel name in the
    Channel name filter
    list. From the logs we could see that it took around 1 hour for the
    Syncing stream: channel_messages
    step to be completed:
    Copy code
    ...
    2022-09-13 10:03:40 [44msource[0m > Syncing stream: channel_messages 
    2022-09-13 11:04:52 [43mdestination[0m > 2022-09-13 11:04:52 [32mINFO[m i.a.i.d.r.SerializedBufferingStrategy(lambda$addRecord$0):55 - Starting a new buffer for stream channel_members (current state: 0 bytes in 0 buffers)
    2022-09-13 11:04:52 [43mdestination[0m > 2022-09-13 11:04:52 [32mINFO[m i.a.i.d.r.SerializedBufferingStrategy(lambda$addRecord$0):55 - Starting a new buffer for stream channel_messages (current state: 0 bytes in 1 buffers)
    2022-09-13 11:05:06 [44msource[0m > Read 66 records from channel_messages stream
    2022-09-13 11:05:06 [44msource[0m > Finished syncing channel_messages
    2022-09-13 11:05:06 [44msource[0m > SourceSlack runtimes:
    ...
    After this it took further 1 hour for the
    Syncing stream: threads
    step to error out (
    503 Service Unavailable
    ), before retrying and completing the sync in a total time of ~
    1h 10m
    . You can see portion of the logs here:
    Copy code
    ...
    2022-09-13 11:05:06 [44msource[0m > Finished syncing channels
    2022-09-13 11:05:06 [44msource[0m > SourceSlack runtimes:
    Syncing stream channel_members 0:00:00.564468
    Syncing stream channel_messages 1:01:25.944632
    Syncing stream channels 0:00:00.049431
    2022-09-13 11:05:06 [44msource[0m > Syncing stream: threads 
    2022-09-13 11:05:06 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 341884800.0, 'latest': 341971200.0}
    2022-09-13 11:05:06 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 341971200.0, 'latest': 342057600.0}
    2022-09-13 11:05:07 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 342057600.0, 'latest': 342144000.0}
    2
    .
    <logs snipped>
    .
    .
    2022-09-13 12:05:41 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 1480291200.0, 'latest': 1480377600.0}
    2022-09-13 12:05:41 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 1480377600.0, 'latest': 1480464000.0}
    2022-09-13 12:05:42 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 1480464000.0, 'latest': 1480550400.0}
    2022-09-13 12:05:44 [44msource[0m > Retry-after header not found. Using default backoff value
    2022-09-13 12:05:44 [44msource[0m > Backing off _send(...) for 0.0s (airbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Request URL: <https://slack.com/api/conversations.history?limit=100&channel=><channel_id>&oldest=1480464000.0&latest=1480550400.0, Response Code: 503, Response Text: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>503 Service Unavailable</title>
    </head><body>
    <h1>Service Unavailable</h1>
    <p>The server is temporarily unable to service your
    request due to maintenance downtime or capacity
    problems. Please try again later.</p>
    </body></html>)
    2022-09-13 12:05:50 [44msource[0m > Retrying. Sleeping for 5 seconds
    2022-09-13 12:05:50 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 1480550400.0, 'latest': 1480636800.0}
    2022-09-13 12:05:50 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 1480636800.0, 'latest': 1480723200.0}
    ...
    <snipped>
    ...
    2022-09-13 12:15:56 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 1662940800.0, 'latest': 1663027200.0}
    2022-09-13 12:15:56 [44msource[0m > Syncing replies {'channel': <channel_id>, 'oldest': 1663027200.0, 'latest': 1663113600.0}
    2022-09-13 12:15:56 [44msource[0m > Read 71 records from threads stream
    2022-09-13 12:15:56 [44msource[0m > Finished syncing threads
    2022-09-13 12:15:56 [44msource[0m > SourceSlack runtimes:
    Syncing stream channel_members 0:00:00.564468
    Syncing stream channel_messages 1:01:25.944632
    Syncing stream channels 0:00:00.049431
    Syncing stream threads 1:10:50.295063
    It would be great if anyone could help us figure out why sync on a single slack channel is taking so much time. Thanks!
  • j

    Jon Watts

    09/13/2022, 3:50 PM
    Hello, I have a connection set up that is working using deduped + history, however it appears that final item in our csv file has all the data nulled out except for a single column which is correct. All the information for this record is correct in the _ab_additional_properties interestingly enough. The record exists in all the csv files we have loaded. The issue persists after resetting data and syncing. Hopefully you have some thoughts on this issue. Thanks
  • o

    Olga Braginskaya

    09/13/2022, 5:11 PM
    Hi. I have an issue with a cockroach connector. I know that it’s in alpha and not supported but it might be that someone have thought about it. So we have self-hosted airbyte 0.40.4 (same was on 0.39) with “airbyte/source-cockroachdb” connection, version 0.1.18. We have a cockroach table and it worked fine till it got 8 mln rows. Now this specific connector fails with
    Copy code
    2022-09-13 15:17:53 [44msource[0m > Terminating due to java.lang.OutOfMemoryError: Java heap space
    I tried the same job on resourceRequirements and it fails with the same error. Other our cockroachdb connectors with other tables work fine. cpu_request: “50m” cpu_limit: “2” memory_request: “*50Mi*” memory_limit: “8Gi” and resourceRequirements: cpu_request: “50m” cpu_limit: “2” memory_request: “*400Mi*” memory_limit: “8Gi”
  • d

    Disi Koa

    09/13/2022, 6:56 PM
    Hi I am using the Monday connector to try to ETL all our Monday data. I am running into 401 errors which I believe is because my user is not subscribed to all the boards etc. If anyone has experience with the Monday connector, how do I create a user that has access to everything, or generate an API key that has access to all the data? Thanks.
  • p

    Pranit

    09/14/2022, 6:02 AM
    I have airbyte on my azure VM and it was working well with less connections, but with multiple connections we are seeing slowness in UI. What steps need to be taken to make it optimised an faster. Please suggest
  • s

    Slackbot

    09/14/2022, 12:24 PM
    This message was deleted.
    o
    • 2
    • 1
  • d

    Denis

    09/14/2022, 4:12 PM
    Hello! Just a stupid question... can Airbyte currupt data in the source by any chance? I am not familiar with the tool and so far I am not comfortable in letting it run while am I not in front of my screen
    m
    • 2
    • 3
  • h

    Henri Blancke

    09/14/2022, 9:26 PM
    👋 I'm learning more about airbyte destination development and I'm struggling to find how the destination would know a user initiated a "data reset". What is the best way for the destination to know that it should initiate a data reset? Thank you for your help!
  • n

    Nishant Soni

    09/14/2022, 10:50 PM
    Hey, I'm using the CDC replication mode for source. and
    sync mode : Incremental -> append
    . But the can't find a way to define a cursor field. `Airbyte Version`: v0.40.0-alpha
    source
    : pgSQL v12 `destination`: pgSQL v12 Logs:
    Copy code
    2022-09-14 21:57:19 [44msource[0m > 2022-09-14 21:57:19 [32mINFO[m i.a.i.s.r.s.CursorManager(createCursorInfoForStream):151 - Found matching cursor in state. Stream: AirbyteStreamNameNamespacePair{name='test_1_default_pk', namespace='test_cdc'}. Cursor Field: null Value: null
    On web UI: I can't find option either to update it / or it's locked. Can anyone help with this ?
    a
    • 2
    • 3
  • l

    Lorenz Eckhard

    09/15/2022, 6:15 AM
    Hello, I have a question regarding the hubspot connector. I connected hubspot a few days ago and everything seems to work fine. The only problem is that I don’t get the data from custom made ticket properties in our data warehouse (using Postgres). The custom made ticket properties appear as columns in the ticket table in our data warehouse, but they don’t have any values inside. I already tried different sync modes and syncing the data raw and normalized. Could it be that it´s not possible to sync custom made ticket objects, or could anybody help me with this problem? Kind regards
    s
    • 2
    • 2
  • e

    Eli Sigal

    09/15/2022, 8:35 AM
    Hi. When it comes to dbt it really depends on the data , the logic of the sql query or more precisely the data it self. you can build your own model to run for normalization in dbt BUT even so we are talking about 1 min less max for high amount of data. You can build a model that will only extract the fields you need. adding more threads also wont work since they are good mostly for ref() functions in dbt or in some cases for high number of sources for 1 model. simply put it is what it is, since we are talking about taking json data and extract it and the extraction is linear. or if you have faster CPU
    t
    • 2
    • 1
  • o

    Olivier AGUDO PEREZ

    09/15/2022, 9:04 AM
    Hey, I try to replicate a MySQL db into BigQuery, however I end up with an error :
    io.debezium.DebeziumException: Unexpected error while connecting to MySQL and looking at BINLOG_FORMAT mode:
    It seems the error appears when Debezium perform the request
    SHOW GLOBAL VARIABLES LIKE 'binlog_format'
    against the MySQL database, I tried to run the request with the same user and there was no problem, value
    ROW
    is returned. I have Airbyte 0.40.6 and mysql connector version 0.6.12 (also tested with 0.6.11)
  • o

    Opeyemi Daniel

    09/15/2022, 11:43 AM
    👋 Hello, team!
  • o

    Opeyemi Daniel

    09/15/2022, 11:43 AM
    I tried to create a connection between mongodb and postgres database and I got this error
  • o

    Opeyemi Daniel

    09/15/2022, 11:44 AM
    2022-09-15 112814 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - com.mongodb.MongoCommandException: Command failed with error 40353 (Location40353): 'Invalid $project :: caused by :: FieldPath must not end with a '.'.' on server. The full response is {"ok": 0.0, "errmsg": "Invalid $project :: caused by :: FieldPath must not end with a '.'.", "code": 40353, "codeName": "Location40353"}
  • o

    Opeyemi Daniel

    09/15/2022, 11:44 AM
    what could be the issue here
  • r

    Rocky Appiah

    09/15/2022, 6:05 PM
    Running airbyte 0.40.3 Source: postgres 1.0.5 connector Destination: snowflake 0.4.34 connector Running:
    Copy code
    >select sent, _airbyte_emitted_at from TABLE_SCD where uuid = 'myid' order by 2;
    +-------------------------+-------------------------------+
    | SENT                    | _AIRBYTE_EMITTED_AT           |
    |-------------------------+-------------------------------|
    | 2022-09-05 22:30:19.000 | 2022-09-15 15:40:14.684 +0000 |
    | NULL                    | 2022-09-15 16:56:56.121 +0000 |
    +-------------------------+-------------------------------+
    Why don’t I see null in the
    sent
    column?
    Copy code
    >select sent, _airbyte_emitted_at from TABLE where uuid = 'myid' order by 2;
    +-------------------------+-------------------------------+
    | SENT                    | _AIRBYTE_EMITTED_AT           |
    |-------------------------+-------------------------------|
    | 2022-09-05 22:30:19.000 | 2022-09-15 15:40:14.684 +0000 |
    +-------------------------+-------------------------------+
    m
    • 2
    • 9
  • r

    Rocky Appiah

    09/15/2022, 6:23 PM
    Happy to send logs, seems like a fairly large bug
  • a

    Albinas Plėšnys

    09/15/2022, 6:40 PM
    Hi, could anyone advice me on what happens if you pause a sync mid-way? Does it flush the buffer to a table and update state to where it stopped; does it drop everything; does it halt, saving the sync data until you resume? Asking this because I'm running a pretty heavy historical sync. After several days it basically stopped progressing (logs just keep silent). I'd be happy to flush the current temp, update state, and continue further; don't know how to achieve that though.
    m
    • 2
    • 2
  • s

    Saurabh Gunjal

    09/16/2022, 5:41 AM
    Hello, can someone help me?
    m
    • 2
    • 1
  • s

    Saurabh Gunjal

    09/16/2022, 5:42 AM
    I want to upload DigitalOcean Crt of Postgresql but in SSL option, I can't seem to find option for it
  • s

    Saurabh Gunjal

    09/16/2022, 5:42 AM
    so how should I do it?
  • s

    Saurabh Gunjal

    09/16/2022, 5:42 AM
    @Karen (Airbyte)
  • s

    Saurabh Gunjal

    09/16/2022, 5:42 AM
    @airbyte-cloud-source-slack
  • m

    Manish Gaurav

    09/16/2022, 8:47 AM
    Hi, I want to implement a CDC pipeline using Airbyte as ingestion connector for MySQL, Mongo and MSSQL (all on aws). I checked in the Airbyte documentations that the CDC has some limitations. Is there any specific channel where I can connect and understand how we can overcome these limitations.
    a
    • 2
    • 1
  • p

    Pranit

    09/16/2022, 11:39 AM
    that might be the minimum frequency available on that connector, configure it externally using other service to increase the frequency
  • p

    Pranit

    09/16/2022, 11:39 AM
    Hello, Did the docker installation and airbyte download like earlier but giving below error like while running docker-compose up
    Copy code
    ERROR: Invalid interpolation format for "environment" option in service "worker": "CONFIG_DATABASE_PASSWORD=${CONFIG_DATABASE_PASSWORD:-}"
    m
    • 2
    • 1
  • y

    Yash Ghelani

    09/16/2022, 1:08 PM
    Hi guys, I am trying to transfer data from oracle (source) to postgres (destination) in incremental append mode but it does not show option for same.
    a
    • 2
    • 4
1...89101112Latest