Airbyte #help-connector-development

Yokesh RS

06/15/2023, 10:06 AM

hi all , how can i get the specification of the particular definition id of the source or a destination , in order to create thhem

Patrick Elsen

06/15/2023, 12:23 PM

Hey team, We are planning to use Airbyte for our data ingestion pipeline. We are currently evaluating if it is a good fit. To make it useful for our customers, we need to add a "Field33" destination. The plan is to use it to ingest data into our Ontology-powered graph database. Two quick questions: • I'm trying to figure out how best to build our Destination Connector. Since all of our internal tooling is written in Rust, and we like small Docker containers, we would like to write a custom destination in Rust+Docker and create a PR to add it to Airbyte. Is it allowed to use Rust for this? Is there some policy around it? I'm on the hook if I build it and it gets rejected, so this would be useful to know 😛 • I saw that you guys have a connector builder. Am I right in my understanding that this builder is only for custom sources, and not custom destinations? If building a connector is not an option, I could add some routes to our backend to support an existing protocol. Cheers!

Patrick Elsen

06/15/2023, 12:49 PM

Am I correct to assume that it is not currently possible to add a private connector, but the only way to hook one up is by a PR to the airbyte repo?

Andy Smith

06/15/2023, 1:00 PM

Hi, we are building a custom connector that runs once a day and pulls in data for yesterday from an API. We are sinking to postgres into a single table, so

Full Refresh - Append

would seem appropriate. Sometimes, however, data for a given date (e.g. 7 days ago) is updated by the remote server, and we need to re-ingest for that date. which would yield records with duplicate primary keys in the destination table, for that date. The only other sync method that seems close is

Incremental - Dedupe with History

, However, this requires a cursor, and because the re-ingested date could be well before the value of the current cursor, it is not clear to me whether the dedupe would work, as it looks like the dedupe just works for records with the last cursor value and above. How can we achieve this behaviour? What we kinda need is the dedupe to work only for the records with the given date (i.e. the newly ingested date).

Octavia Squidington III

06/15/2023, 1:45 PM

🔥 Office Hours starts in 15 minutes 🔥 Topic and schedule posted in #C045VK5AF54 octavia loves At 16:00 CEST / 10am EDT click here to join us on Zoom octavia loves

Tom Anderson

06/15/2023, 2:51 PM

👋 I'm trying to figure out the record selector to get all of the "values" records (in the response below, 2 records). I've tried the following but they are either return nothing, or empty lists: • values • values,* • *,values • ,values, Here is my response example:

Copy code

[
  {
    "headers": [
      "trackingId",
      "timeStamp",
      "userId",
      "email",
      "userRole",
      "emailDomain",
      "userGroups",
      "dashboardTitle",
      "dashboardId",
      "dashboardPath",
      "action",
      "loadTime",
      "category",
      "str1",
      "str2",
      "int1",
      "int2"
    ],
    "values": [
      [
        "8abeffd5-af24-4d6f-b929-ac56d7319220",
        "2022-06-28T20:40:17",
        "obfuscated",
        "obfuscated",
        "Sys. Admin",
        "obfuscated",
        "Admins;",
        "N\\A",
        "N\\A",
        "N\\A",
        "page.navigate.analytics",
        "N\\A",
        "General",
        "N\\A",
        "N\\A",
        "N\\A",
        "N\\A"
      ],
      [
        "8abeffd5-af24-4d6f-b929-ac56d7319220",
        "2022-06-28T20:40:17",
        "obfuscated",
        "obfuscated",
        "Sys. Admin",
        "obfuscated",
        "Admins;",
        "N\\A",
        "N\\A",
        "N\\A",
        "page.navigate.analytics",
        "N\\A",
        "General",
        "N\\A",
        "N\\A",
        "N\\A",
        "N\\A"
      ]
    ]
  }
]

CDK Version
0.44.5

David Anderson

06/15/2023, 2:55 PM

if im building a stream in the CDK UI with two sub-streams to build a resource path, how do i denote which substream is which using the

{{ stream_partition.id }}

notation logic? i want to end up with something like:

/path/stream_partition1.id/path/stream_partition2.id

Aazam Thakur

06/15/2023, 10:21 PM

For my python connector, I have this class. I want to create a new class

ListMembers

which gets the

list_id

from the Lists class to use it in the url

lists/{list_id}/members

Copy code

class Lists(IncrementalMailChimpStream):
    cursor_field = "date_created"
    data_field = "lists"

    def path(self, **kwargs) -> str:
        return "lists"

Hoang Ho

06/16/2023, 8:42 AM

Hello, Our team is currently working on an application, that utilizes Airbyte for seamless data transfer from MongoDB to PostgreSQL. However, we've run into an issue concerning the transfer of DateTime data. During the process, Airbyte automatically converts it into text format, which poses a problem for us. Is there a possible solution to configure the mapping data or resolve this issue? We eagerly await your response and appreciate your assistance. Thank you kindly.

Janis Karimovs

06/16/2023, 12:14 PM

Hello everyone, I'm working on a custom source connector for Podio and using BigQuery as destination. I am successfully able to get the data from most of the podio apps that I am testing to Bigquery, but at least one stream is failing with some kind of "pickling" (serialization?) error As far as I can tell the relevant error info from sync logs is here:

Copy code

2023-06-16 11:35:10 [42mnormalization[0m > [31mUnhandled error while executing model.airbyte_utils.App_Name_2_0[0m
Pickling client objects is explicitly not supported.
Clients have non-trivial state that is local and unpickleable.

From some of the research that I've done it seems the issue lies within BigQuery connector, but I'm confused about why the other streams (podio apps) which seem more or less the same as the one that's failing are working just fine. Anyone else encounter something similar? Any info on this would be highly appreciated... thanks 🙏

Anthony Smart

06/16/2023, 12:51 PM

I also have a question in relation to the Snowflake destination connector. Are there plans to add Azure blob storage as an external staging area? The only alternative is to use internal staging but this will require files to be stored locally before they can be uploaded to the internal stage and copied into a Snowflake table. This will be of course be less performant than using an external stage which can be read directly into Snowflake via COPY INTO. The Web UI is also stating that the internal stage option is recommended for performance and scalability. Clearly this is the less optimal approach compared to an external stage. Any feedback on this would be appreciated. https://docs.airbyte.com/integrations/destinations/snowflake/ Slack Conversation

Octavia Squidington III

06/16/2023, 7:45 PM

🔥 Community Office Hours starts in 15 minutes 🔥 At 1pm PDT click here to join us on Zoom!

Aazam Thakur

06/16/2023, 11:32 PM

Explain me this error

GetMemberInfo(authenticator=authenticator)\nTypeError: Can't instantiate abstract class GetMemberInfo with abstract method data_field\n", "failure_type": "system_error"}}}

Cody Scott

06/17/2023, 12:19 AM

Hi! 👋 Question surrounding incremental sync and substreams. I have an API that I want to incrementally request data from. The parent stream outputs the ID to request, and I would like to incrementally request the data for each child. There is around 1000 known parent ids. Basically can I apply a state to each substream and have it store its state (so 1000 internal states). Other side is do I need to accept that there is going to be duplication and request smaller slices (hourly for example) the dedup early in the transform layers. Output data is ID + date time as the state. If it fails I need it to get the data on the next run, so worst case I duplicate. Ideal case is each stream can run on its own little world happily keeping its own state, but I also don’t want 1000+ tables created if possible… Thanks!

Biondi Septian S

06/17/2023, 1:46 PM

Hello Airbyte team, please take a look at this serious issue: https://github.com/airbytehq/airbyte/issues/24097

Biondi Septian S

06/17/2023, 1:46 PM

this is a serious timeout issue on typesense destination connector

Biondi Septian S

06/17/2023, 1:46 PM

I think this is the solution, by merging this pull request below: https://github.com/airbytehq/airbyte/pull/18806/commits

Chính Bùi Quang

06/19/2023, 6:41 AM

Hi AirByte team, I am setting up Builder, there is 1 record returned with the following structure: [{ "item": "deal", "id": 123450, "data": { "id": 123450, "update_time": "2023-06-01 012008", "user_id": 18488038, "person_id": 514665, "org_id": 200989} }, { "item": "deal", "id": 54981, "data": { "id": 54981, "update_time": "2023-06-01 052008", "user_id": 18488038, "person_id": 514665, "org_id": 200989} }] Now I want to get the maximum value of update_time in the returned record and assign it to the Cursor Field in Incremental Sync, what should I do?

Chidambara Ganapathy

06/19/2023, 9:05 AM

Hi Airbyte Team, Is AWS cost explorer source connector available out of the box? Thanks

Luke Whittaker

06/19/2023, 10:37 AM

I'm building out a connector using the no-code builder. Is it possible to handle UNIX timestamps instead of the stander

%Y-%m-%d....

laila ribke

06/19/2023, 11:32 AM

Hi, does someone has built a source connector for Optimizely and can share the image?

Anthony Smart

06/19/2023, 12:06 PM

Slackbot

06/19/2023, 5:47 PM

This message was deleted.

Octavia Squidington III

06/19/2023, 7:45 PM

🔥 Community Office Hours starts in 15 minutes 🔥 Topic and schedule posted in #C045VK5AF54 octavia loves At 1pm PDT click here to join us on Zoom!

Thomas van Latum

06/19/2023, 7:53 PM

Is there a Guide to setup a development environment for building a Java connector?

Mahesh Thirunavukarasu

06/19/2023, 8:08 PM

Hi, can we declare multiple requester in airbyte low code cdk? If so, will it take a list like structure or we can declare it in different name and refer it ?

Mahesh Thirunavukarasu

06/19/2023, 9:00 PM

How to integrate a CDC REST api ? I am trying to implement it in the current Quickbooks connector which is in alpha. Since all the other objects in quickbooks follows Query pattern, I have to declare a separate requester to capture cdc and load the raw tables. Unfortunately, I am always getting 0 records in the loaded cdc table. Also please suggest a way to monitor api requests and responses in Self deployed Airbyte using docker.

Micky

06/19/2023, 9:22 PM

Hi, if I drop the replication slot and recreate it and reconfigure Airbyte to use new slot for CDC, does it mean it will have full fresh (initial sync)?

Luis Peña

06/20/2023, 12:28 AM

Hello, I'm currently trying my hands on the Low-code CDK. But I'm having some issues understanding two topics: 1. How to implement payloads/body on the request of each stream. Is there any additional information or examples besides the one on: https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/request-options#request-options-1 2.- How to implement nested streams. So far I get that is by using "SubstreamPartitionRouter" but I'm having some issues on how to implement it. Is there any source code I could take a look as an example? I really appreaciate any help on the topic.

Quang Dang Vu Nhat

06/20/2023, 3:00 AM

Hello, currently I am developing a custom connector that includes some streams which have fields with multiple datatype I have seen this pattern in some other connectors like in Gitlab

merge_request_commits

schema

Copy code

"approvals_before_merge": {
      "type": ["null", "boolean", "string", "object"]
},

or in Pipedrive

product_fields

schema

Copy code

"options": {
      "type": ["null", "array"],
      "items": {
        "type": "object",
        "properties": {
          "id": {
            "type": ["null", "boolean", "integer"]
          },
          "label": {
            "type": ["null", "string"]
          }
        }
     }
}

I don’t see how they handle multiple datatype in their code, apart from these declaration, Can anyone support me with this 🙇