I wrote a blog about an ongoing project in which I m using A Airbyte #releases

Channels

advice-data-architecture

advice-data-ingestion

advice-data-orchestration

advice-data-privacy

advice-data-quality

advice-data-transformation

advice-data-visualization

advice-data-warehouses

advice-reverse-etl

airbyte-cloud-pricing

airbyte-dbt-packages

airbyte-enterprise-pricing

airbyte-for-power-users

airbyte-plus-airflow

airbyte-plus-dagster

airbyte-udemy-course

ask-community-for-troubleshooting

cloud-master-build-failure

code-contributions-reviews

community-strategy

connector-builds

connector-build-statuses

connector-development

contributing-to-airbyte

databricks-airbyte316

events-and-conferences

example-of-channel

explo-dentalxchange-trial

external-teradata--11

external-teradata-source

ext-tabular-iceberg

feedback-and-requests

hacktoberfest-2022

help-api-cli-orchestration

help-connector-development

infra-dev-alerts

infra-dev-alerts-webhook

license-questions

low-code-migrations

movedata-a-different-way-to-work

movedata-airbyte-connection-management-scale

movedata-airbyte-orchestration-in-gcp

movedata-better-data-testing-with-the-data-error-generating-process

movedata-bring-your-own-infra

movedata-building-a-real-time-user-facing-dashboard-with-airbyte-redpanda-and

movedata-building-connectors-in-minutes

movedata-cicd-for-data-building-devtest-data-environments-with-lakefs

movedata-conference-2023

movedata-data-analysts-are-setup-to-fail-and-its-our-fault

movedata-data-engineering-is-software-engineering-and-software-engineering-is-da

movedata-dataops-on-the-open-modern-data-stack

movedata-data-orchestration-is-not-just-running-jobs-on-a-schedule

movedata-dev-first-open-source-data-observability-solution

movedata-ducky-data-crunching-on-the-laptop---the-pendulum-swings

movedata-five-causes-of-data-quality-issues

movedata-future-evolution-of-the-data-stack

movedata-guardrails-not-stop-signs

movedata-if-you-build-it-will-they-come-how-to-activate-your-modern-data-stack

movedata-in-2023-data-trust-will-be-more-important-than-ever-before

movedata-ingesting-data-with-airbyte-into-a-high-speed-data-lakehouse-using-trin

movedata-keynote-building-the-foundations-of-data-movement

movedata-let-your-data-team-choose-their-own-tools

movedata-mobilize-the-worlds-data

movedata-modern-data-management---how-to-achieve-data-discovery

movedata-moving-data-reliably-ingestion-observability-are-better-together

movedata-navigating-the-modern-data-stack-with-open-source-headless-bi

movedata-open-source-communities-shape-modern-data-stacks

movedata-prefect-ing-self-hosted-airbyte

movedata-prep-your-pipelines---reverse-etl-and-the-coming-great-flood

movedata-questions

movedata-speakers

movedata-speakers-2025

movedata-streaming-made-modern

movedata-the-best-data-warehouse-is-a-lakehouse

movedata-the-data-ecosystem-is-ready-for-etl-to-be-dead

movedata-the-end-of-the-pipeline

movedata-traditional-data-catalogs-will-be-replace-by-active-metadata-platforms

movedata-using-airbyte-on-day-1of-our-startup

movedata-using-airbyte-to-build-your-dremio-open-data-lakehouse

movedata-who-needs-a-traditional-data-warehouse-when-you-can-get-real-time-analy

movedata-why-bi-is-not-enough

new-channel-test

office-hour-12-october

oss-master-build-failure

oss-master-build-failure

p0-amazon-ads-03-30

p0-cloud-syncs-failing-dec-15

p0-harvest-bad-requests

p0-source-airtable-stream-failures

p1-auto-detect-schema-issues-2-1-23

partner-productpair-hightouch

prod1-alerts-test

proj-concurrent-cdk

proj-integration-tests

proj-python-cdk-upgrades

troubleshooting

understanding-airbyte

well-seeded-accounts

write-for-the-community

I wrote a blog about an ongoing project in which I...

r

Robert Stolz

09/07/2021, 7:22 PM

I wrote a blog about an ongoing project in which I'm using Airbyte to ingest open-source community data. Planning to follow up in a few days with a guided tutorial. https://preset.io/blog/building-an-open-source-ingestion-layer-with-airbyte

👏 5

👏🏽 1

m

Michel

09/07/2021, 8:09 PM

This is great! Thanks @Robert Stolz for sharing

g

gunu

09/08/2021, 2:06 PM

thanks for sharing @Robert Stolz 1. do you do any additional transformations for your insights or mostly just graphs on the reviews issues commits tables etc 2. i find the basic normalisation a little too bloated and with our dbt project framework this ends up consisting of 3 duplicate tables (AIRBYTE_RAW, table (as source) and dbt table (staged or final). As such I’ve opted for no normalisation and parsing json in the dbt project - I guess this is just the ETL x ELT argument but with duplicate data storage implications. was curious your thoughts either way

s

s

09/09/2021, 5:44 AM

Thanks for the mention and feedback Rob! Your feedback on normalization is spot on and one of my focuses in the coming weeks will be on formalizing the schema change process so you don’t run into the issues you’ve mentioned going forward

r

Robert Stolz

09/10/2021, 2:03 AM

@gunu The idea is to make an open source community data platform that you can implement your own analytic frameworks of choice on top of, so I'm doing some basic arrangement of the data (and implementing a framework or two of my own interest). I made the same choice re: #2. My preference has been to build my own transformations on the raw tables just to make schema evolution something that is more under my own control. @s Excited to see what is to come in the schema evolution department.

👍 1

2 Views