Hello folks, I have a connector from MongoDB to AW...
# ask-community-for-troubleshooting
d
Hello folks, I have a connector from MongoDB to AWS S3. Now, what I want to is when docs are being moved to s3, I have to hit an external endpoint to send metadata to it. I am exploring solutions that are achievable from airbyte itself.
k
Airbyte does not natively support sending metadata to an external endpoint as part of the sync process from MongoDB to AWS S3. However, Airbyte does offer a webhook notification feature that can send notifications (including metadata about syncs) to any generic webhook service when a sync completes—either successfully or with failure. The webhook payload includes detailed metadata about the sync, such as workspace, connection, source, destination, job details, and error information if applicable. Here’s an example of the payload for a successful sync:
Copy code
json {     "data": {         "workspace": {             "id": "b510e39b-e9e2-4833-9a3a-963e51d35fb4",             "name": "Workspace1",             "url": "<https://link/to/ws>"         },         "connection": {             "id": "64d901a1-2520-4d91-93c8-9df438668ff0",             "name": "Connection",             "url": "<https://link/to/connection>"         },         "source": {             "id": "c0655b08-1511-4e72-b7da-24c5d54de532",             "name": "Source",             "url": "<https://link/to/source>"         },         "destination": {             "id": "5621c38f-8048-4abb-85ca-b34ff8d9a298",             "name": "Destination",             "url": "<https://link/to/destination>"         },         "jobId": 9988,         "startedAt": "2024-01-01T00:00:00Z",         "finishedAt": "2024-01-01T01:00:00Z",         "bytesEmitted": 1000,         "bytesCommitted": 1000,         "recordsEmitted": 89,         "recordsCommitted": 89,         "bytesEmittedFormatted": "1000 B",         "bytesCommittedFormatted": "90 B",         "success": true,         "durationInSeconds": 3600,         "durationFormatted": "1 hours 0 min"     } }
You can configure Airbyte to send this webhook to your external endpoint, and then process the metadata as needed on your side. This is the closest built-in solution Airbyte provides for your use case. For more details on setting up webhook notifications, see the documentation: [Set up webhook notifications](https://docs.airbyte.com/platform/next/cloud/managing-airbyte-cloud/manage-airbyte-cloud-notifications#set-up-webhook-notifications). If you need to send more granular or custom metadata (e.g., per document rather than per sync), you would need to implement a custom solution outside of Airbyte, such as a post-processing step triggered by the webhook or by monitoring the S3 bucket for new files. In summary: - Airbyte can send sync-level metadata to an external endpoint via webhooks. - There is no built-in feature to send per-document metadata to an external endpoint during the sync process itself. If you need more fine-grained integration, you will need to build a custom workflow around Airbyte’s existing capabilities.
d
I am going ahead with approach where in I fork s3 connect and update it with the code that I need. For this I am making use Builder > Fork an existing connector > When I search for s3 its not present. @kapa.ai
k
A message from kapa.ai
The reason you cannot find the S3 connector when using the **Connector Builder**'s "Fork an existing connector" feature is because the Connector Builder currently only supports creating and editing source connectors, not destination connectors like S3. This limitation is explicitly stated in the documentation: > The Connector Builder is only for creating source connectors. You can't currently use the Connector Builder to create destination connectors. [Connector Builder Overview](https://docs.airbyte.com/platform/connector-development/connector-builder-ui/overview) Since S3 is a destination connector, it will not appear in the list of connectors you can fork or edit using the Builder UI. What can you do instead? To customize the S3 destination connector, you will need to fork the Airbyte repository and modify the connector code directly in your local development environment. The typical workflow is: 1. Fork the Airbyte repository on GitHub. 2. Clone your fork locally. 3. Make your changes to the S3 destination connector code. 4. Build and test your custom connector locally. 5. Update your Airbyte instance to use your custom connector image. You can find detailed steps for developing and testing connectors locally in the documentation: [Developing Locally: Connector Contributions](https://docs.airbyte.com/platform/contributing-to-airbyte/developing-locally#connector-contributions) If you need further guidance on forking and customizing destination connectors, let me know!
d
@kapa.ai share the repo of s3 connector.
k
A message from kapa.ai
The S3 destination connector is part of the main Airbyte repository on GitHub. You can find it in the Airbyte monorepo, which contains all official connectors, including the S3 destination. The latest release notes confirm the S3 destination is available and actively maintained as part of this repository ([airbyte v0.64.0 release notes](https://docs.airbyte.com/release_notes/aug_2024)). You can access the Airbyte repository here: [airbytehq/airbyte](https://github.com/airbytehq/airbyte). Within the repository, the S3 destination connector is typically located at:
Copy code
airbyte/airbyte-integrations/connectors/destination-s3
For more details on contributing or modifying connectors, refer to the official documentation: [Contributing to Airbyte](https://docs.airbyte.com/platform/contributing-to-airbyte) and [Developing Locally: Connector Contributions](https://docs.airbyte.com/platform/contributing-to-airbyte/developing-locally#connector-contributions).
d
@kapa.ai As per your suggested steps below:
Copy code
1. Fork the Airbyte repository on GitHub.
2. Clone your fork locally.
3. Make your changes to the S3 destination connector code.
4. Build and test your custom connector locally.
5. Update your Airbyte instance to use your custom connector image.
When I checked code in destination-s3, I find that the actual upload happens from airbyte-cdk and that is where we need to make change. The
destination-s3
handles the configuration part.
k
A message from kapa.ai
You're correct in your observation. The S3 destination connector in Airbyte is built on top of the Airbyte CDK (Connector Development Kit), and much of the core functionality, including the actual upload logic, is handled by the CDK rather than the connector-specific code. According to the documentation, the Airbyte CDK is actually a combination of several CDKs and tools: 1. Python CDK - A developer kit that includes the foundation for low-code and no-code connectors, as well as several other Python-based implementations. 2. File CDK - A CDK for building file-based source connectors, built on the Python CDK. 3. Bulk Load CDK - A set of libraries and resources for building destinations using the Kotlin language (used for high-throughput connectors). [Airbyte Connector Development Kits (CDKs)](https://docs.airbyte.com/platform/connector-development/local-connector-development#airbyte-connector-development-kits-cdks) The S3 destination connector has recently been migrated to use the Bulk Load CDK (as noted in the changelog: "Migrate to Bulk Load CDK: cost reduction, perf increase, bug fix for filename clashes" - [S3 Changelog](https://docs.airbyte.com/integrations/destinations/s3#changelog)), which is a Kotlin-based CDK for high-throughput destinations. What this means for your customization: If you need to modify the actual upload logic, you'll need to: 1. Fork both the Airbyte repository (for the connector) and potentially the CDK repository where the upload logic resides. 2. For the S3 connector specifically, since it uses the Bulk Load CDK (Kotlin-based), you'll need to work with Kotlin code rather than Python. 3. Make your changes to the CDK code where the upload happens. 4. Build and test both the modified CDK and the connector that depends on it. The Python CDK repository is maintained separately at [
airbytehq/airbyte-python-cdk
](https://github.com/airbytehq/airbyte-python-cdk) ([Python CDK documentation](https://docs.airbyte.com/platform/connector-development/cdk-python)), but for the S3 destination using the Bulk Load CDK, you'll need to work with the Kotlin-based CDK code within the main Airbyte repository. This is a more complex customization than just modifying the connector configuration, as you'll be working with the underlying CDK framework itself.