Airbyte is an open-source data integration engine that helps you consolidate your data in your data warehouses, lakes and databases.

Airbyte

Hi,
I’m new to Airbyte and am going through the documentation to find how to provision a custom code pipeline (e.g. something org specific web scraping) in a container that is executed by Airbyte. I’m looking at the section of building custom connectors but I have doubts that I’m reading the right sections.
Any suggestions on Airbyte best practices how to do that?

the plan is to extract data from web pages through web scraping and ingest the extracted data onto our Snowflake data warehouse.
So sources are websites, extract process would be the built and containerised web scraping scripts and destination would be Snowflake

Hey, welcome to the community <@U032WJ4ASMB>! :octavia-wave:

Yeah, the docs would be pretty misleading for this use case. Airbyte is built more for handling API calls rather than the actual data collection and storage. For this use case typically the web scraping would happen outside of Airbyte, the scraped data would be stored somewhere (a database?) and then the Airbyte source connector can request data from its API and sync it into the destination API.

So, once your web scraper has an API to pull from, you can build a source connector and it can sync with Snowflake :slightly_smiling_face:

Thank you so much for explaining. You saved me many hours of experiment.

No prob! Let us know if you have any other questions, either here or if you have a more in-depth inquiry we have a <https://discuss.airbyte.io/|community Discourse forum>. Good luck with your development!