Airbyte #good-reads

Zapier

03/18/2025, 6:18 AM

📚 Just published a new blogpost Creating Data Pipeline with dbt & DuckDB Using Airbyte | Airbyte

Learn to build efficient data pipelines using Airbyte, dbt, and DuckDB. A comprehensive guide for data engineers with practical implementation steps.

Read the complete article here

Vivek Dubey

03/18/2025, 10:26 AM

📘 The Data Product Testing Strategy: Handbook - Goals & Components of the Data Product Testing Strategy, Testing Integration in the Data Product Lifecycle, Testing Facilitation Strategies & Technologies, and more! 📃 In this article, community authors will take you through importance of a metrics-first approach in developing data products. It outlines a strategy that begins with identifying business opportunities and use cases, followed by defining a metric model to establish clear relationships between metrics. This approach ensures that data products are aligned with business goals and can be effectively validated and iterated upon. 📧 Read the complete article here: https://moderndata101.substack.com/p/the-data-product-testing-strategy

Zapier

03/18/2025, 2:10 PM

📚 Just published a new blogpost Orchestrate data ingestion and transformation pipelines with Dagster | Airbyte

Learn how to ingest and transform Github and Slack data into Postgres. Dagster can orchestrate data ingestion pipelines with Airbyte, SQL-based transformations with dbt, and any kind of Python transformation.

Read the complete article here

Zapier

03/25/2025, 4:53 AM

📚 Just published a new blogpost Building ETL Pipeline with Python, Docker, & Airbyte | Airbyte

Learn how to build robust ETL pipelines using Python, Docker, and Airbyte. A guide for data engineers covering setup, implementation, & best practices.

Read the complete article here

Vivek Dubey

03/26/2025, 6:26 AM

🛠️ How the Ontology Pipeline Powers Semantic Knowledge Systems: The Need for a Structured Approach, Elements of the Ontology Pipeline, the Pipeline as a Framework for Developing Knowledge Management Systems, and More! 📃 Ontologies and knowledge systems don’t have to be a black box. They’re the secret sauce behind *AI, search, and intelligent automation*—if structured right. Jessica Talisman’s latest guide breaks it all down: How the Ontology Pipeline transforms scattered data into machine-readable, scalable knowledge. What’s inside? • Why taxonomies, ontologies & vocabularies are key to AI-ready data • How the Ontology Pipeline eliminates ambiguity & boosts accuracy • The library science principles that power modern AI systems And more! If you’re building RAG pipelines, training LLMs, or structuring enterprise knowledge, this is a must-read. 💌 Read it here: https://moderndata101.substack.com/p/the-ontology-pipeline

Vivek Dubey

04/07/2025, 7:26 AM

⚔️ From Data Tyranny to Data Democracy: How Risk-Based Governance Frameworks and Data Product Owners can transform Data Tyranny into agile, scalable Data Democratization. 📃 In this article, community author Francesco dig into a shift every data leader should be thinking about: • From rigid, centralized control ("data tyranny") • To agile, risk-aware data democratization. But this shift only works if we change how we think about data: • Treat it like a long-term product, not just an asset. • Push ownership left—closer to where data is created. • Adopt risk-based governance that scales with context (not all data needs the same level of scrutiny). If your governance model slows down delivery or bottlenecks innovation, it’s time to rethink it. 📧 Read the complete article here: https://moderndata101.substack.com/p/from-data-tyranny-to-data-democratization

MohOdejimi

04/12/2025, 5:51 PM

Hi @[DEPRECATED] Marcos Marx, I am a software engineer with experience in writing for tech blogs such as Baeldung and OpenReplay. Could you please guide me through the process of contributing as an author for the Airbyte blog?

Zapier

04/17/2025, 12:58 PM

📚 Just published a new blogpost A step-by-step guide to setting up and configuring Airbyte and Airflow to work together | Airbyte

Learn how to create an Airflow DAG (directed acyclic graph) that triggers Airbyte synchronizations.

Read the complete article here

Zapier

04/17/2025, 12:58 PM

📚 Just published a new blogpost Build a connector to extract data from the Webflow API | Airbyte

Learn how to create a custom Airbyte source connector – this tutorial shows you how to use Airbyte’s Python connector development kit (CDK) to create a source connector that extracts data from the Webflow API. You will learn about authentication, requesting data, and paginating through responses, as well as how to dynamically create streams and how to automatically extract schemas.

Read the complete article here

Zapier

04/17/2025, 1:02 PM

📚 Just published a new blogpost MySQL CDC: Build an ELT pipeline from MySQL Database | Airbyte

Easily set up MySQL CDC using Airbyte, harnessing the power of a robust tool like Debezium to construct a near real-time ELT pipeline.

Read the complete article here

Zapier

04/17/2025, 1:04 PM

📚 Just published a new blogpost Version control Airbyte configurations with Octavia CLI | Airbyte

Use Octavia CLI to import, edit, and apply Airbyte application configurations.

Read the complete article here

Vivek Dubey

05/29/2025, 5:44 AM

🏗️ 𝐓𝐡𝐞 𝐑𝐨𝐥𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐃𝐚𝐭𝐚 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭 𝐢𝐧 𝐀𝐈 𝐄𝐧𝐚𝐛𝐥𝐞𝐦𝐞𝐧𝐭: Pivoting on right-sized foundations, delivered through focused vertical slices, and driven by impact, not just technology. 📃 In his latest article by Modern Data 101, our community author Colin Hardie cuts through the noise to reveal the pragmatic path to AI success. He explores how Data Architects, operating at the foundational layers of the AI Hierarchy of Needs, ensure that AI initiatives are built on solid ground, delivering impact, not just technology. 𝐖𝐡𝐚𝐭 𝐲𝐨𝐮’𝐥𝐥 𝐝𝐢𝐬𝐜𝐨𝐯𝐞𝐫 𝐢𝐧𝐬𝐢𝐝𝐞: • Why the "Data Architect" title is so misunderstood – and what it truly entails in the AI era. • How the DIKW Pyramid and AI Hierarchy of Needs framework guide effective AI enablement. • The "Opera Cake Approach": Delivering value through focused vertical slices, not endless foundational builds. • 10 practical ways Data Architects enable successful AI implementations, from readiness evaluation to outcome measurement. ➡️ 𝐑𝐞𝐚𝐝 𝐭𝐡𝐞 𝐟𝐮𝐥𝐥 𝐚𝐫𝐭𝐢𝐜𝐥𝐞 𝐡𝐞𝐫𝐞: https://moderndata101.substack.com/p/the-role-of-the-data-architect

Vivek Dubey

06/17/2025, 5:24 AM

💡*The Reflexive Supply Chain Stack: Sensing, Thinking, Acting* Remember the toilet paper crisis? The garage door shortage? COVID didn't just expose cracks in our supply chains; it revealed how deeply interconnected and fragile they truly are. We didn't run out of things; we ran out of the ability to respond when the world shifted. Our community author, Sagar Paul unpacks the anatomy of this collapse and lays out a groundbreaking vision for true supply chain resilience. He cuts through the noise, explaining the infamous "bullwhip effect" and revealing where decisions die in modern operations. What You'll Discover in This Essential Read: • The critical lessons from the "Great Disruption": • Why "data-driven decisions" must move beyond dashboards • The "Thousand Brains" theory of AI • The "Inflexion Point" where insights fail to become action • How to build the "Action Layer" • The future of supply chain autonomy – with accountability Why This Matters? Supply chain resilience isn't just about recovery; it's about real-time adaptability. Sagar's insights provide the blueprint for building the "new muscle" your organization needs to navigate volatility, prevent costly delays, and ensure your operations don't just survive, but thrive amidst constant change. ➡️ 𝐑𝐞𝐚𝐝 𝐭𝐡𝐞 𝐟𝐮𝐥𝐥 𝐚𝐫𝐭𝐢𝐜𝐥𝐞 𝐡𝐞𝐫𝐞: https://open.substack.com/pub/moderndata101/p/the-reflexive-supply-chain-stack?utm_campaign=post&utm_medium=web

Hugo Lu

07/10/2025, 8:04 AM

https://tinyurl.com/4mwve2yk

Christopher Bergh

07/24/2025, 1:24 PM

https://datakitchen.io/fitt-data-architecture/

👀 1

Kumari Surya Remanan

08/04/2025, 9:21 AM

Hi everyone — I just submitted a GitHub Issue proposing a new blog topic: "Leveraging EV Infrastructure for Real-Time Data Synchronization Between Charging Stations and AI Models Using Airbyte" Here’s the link: https://github.com/airbytehq/airbyte/issues/64485 Would love to get feedback and see if it aligns with the community content plan!

👍 1

Bala

09/09/2025, 11:28 AM

Hello team, I'd submitted an article for review through the form about a month ago. Could you please let me know how long the review process generally takes?

👍 1

سیدحماد احمد

09/18/2025, 6:27 AM

I have also submitted an article (What's New in dbt 1.10) for review through the form but haven't heard back. Could you please confirm if you are reviewing the submission?

Chiara

10/23/2025, 1:12 PM

Hi everyone! I am organizing OSA Con 2025 and it's just around the corner. Lots of great speakers this year: Amazon, Snowflake, Preset, Percona, TiDB, Apple, Nutanix, Altinity, and more! 🗓️ Nov 4-5 📍ONLINE Register here: https://osacon.io/

Vivek Dubey

10/30/2025, 4:31 AM

💠 dbt Coalesce 2025: What 14,000 Practitioners Learned 📃 Beyond tools and trends, Coalesce 2025 revealed the blueprint for data systems that can think, learn, and earn trust. ✉️ Read the complete article here: https://metadataweekly.substack.com/p/dbt-coalesce-2025-what-14000-practitioners-learned

Young

10/30/2025, 2:29 PM

🎊🎊🎊 Join us on Nov 10 in San Francisco for the next Data for AI Meetup! 🎟️ RSVP: https://luma.com/p7m6mxki?locale=en The Agentic AI era is here, and data stacks need to catch up. Analytics used to be all customer-facing, now it's agent-facing, too. Join the event and see how multi-modal catalogs and real-time lakehouses can help you build a data infra that's ready for AI and agents. Hear from top engineers at Uber, Pinterest, Datastrato, and VeloDB(https://www.velodb.io/): - Lessons from Uber: Re-architecting metadata systems to manage 200B+ entries. - From hours to seconds: How Pinterest solved its 130 petabytes data partition listing challenge. - Design AI-native, agent-driven data operations with Gravitino. - Build a real-time lakehouse on AWS with Glue, S3 Tables, and Apache Doris to get an AI-ready data foundation. 🍺 Drinks, food, and great conversations guaranteed 💬 Nov. 10, 17:30 - 20:30 PT @ AWS Builder Loft, San Francisco 🎟️ RSVP link at the top Grateful to our partners at AWS and Datastrato for supporting the event 🙏

Vivek Dubey

11/04/2025, 11:46 AM

👀 The Semantic Gap: Why Your AI Still Can’t Read The Room Brilliant piece by Vince Dacanay, Head of Data at Prodege, LLC, on Metadata Weekly: > “Your AI can process a decade of data in seconds, but it still misses the point. It doesn’t catch the hesitation before an answer, or the unspoken politics behind a ‘yes.’” Vince calls this semantic density, the human layer of meaning that no AI can yet read. It’s the culture, context, and intuition that sit between words and understanding. He makes a sharp point: the best AI systems aren’t built to “understand everything.” They’re built with constraints — because the semantic gap isn’t a flaw, it’s where humans still matter most. It shows up when: → Questions carry unspoken stakes. → Experience compresses into two words. → Context fills in what data can’t see. How to close this gap??? Read the article for complete details here: https://metadataweekly.substack.com/p/the-semantic-gap-why-your-ai-still-cant-read-the-room

Young

11/04/2025, 2:32 PM

🎟️*Apache Doris(https://doris.apache.org/)* Summit 2025 kicks off (only 7 hours left!) Join us for a full day of sessions featuring the latest innovations, user stories, and ecosystem insights around real-time analytics and search in the AI era. Check out the full agenda and register here👇 https://lnkd.in/gECm7n4E

Vivek Dubey

11/06/2025, 1:10 PM

👑 The AI Era Runs on Context: "In the internet era, content was king. In the AI era, context is sovereign." At Re:govern 2025 - 20+ of the world’s most AI-forward data teams — from Workday, Mastercard, CME Group, Dropbox, GitLab, and more — dropped real talk on what’s working (and what’s not) in the age of AI + governance.

One takeaway stood above all: Context is king.

AI readiness starts with context readiness, and the best teams are building it before the crisis hits. Missed the live action? Catch every session + recap here → https://atlan.com/regovern/?utm_medium=outreach&utm_source=slack&utm_campaign=regovern_2025

Young

11/06/2025, 1:44 PM

Webinar: Query Billions of JSON Rows and 10K+ Subcolumns in Seconds 👉 Register: https://lnkd.in/dPmtxRMf JSON is everywhere: logs, metrics, e-commerce, IoT, but most systems still struggle to query it efficiently at scale. Join our webinar to see how VARIANT in Apache Doris delivers fast, schema-flexible JSON analytics at scale: 1️⃣ Understand the landscape: How Elasticsearch, Snowflake, ClickHouse, and Iceberg handle JSON. And where Apache Doris stands out. 2️⃣ See how Apache Doris does it: Sparse columns, subcolumn compaction, and schema templates for performance and flexibility. 3️⃣ Watch the demo: Query 1 billion rows of JSON data in seconds on AWS deployment. No tricks. Just high-performance JSON analytics that scale. 📅 Nov. 20, 4:00 p.m. PT | 7:00 p.m. ET

Vivek Dubey

11/19/2025, 6:59 AM

💭 What if your AI Analyst understood your business as well as your best human analyst? Yes, its the DREAM... but it needs a ton of Context Engineering. That’s the question Shubham Bhargav explores in the latest Metadata Weekly edition and IMO if there's only one article you can read this week, read this one! We’ve seen time and again that it’s not the models holding AI back, but the context gap — all the meaning that lives in people’s heads, not in systems. The definitions, judgment calls, and patterns of reasoning that make decisions make sense. We have been rolling up our sleeves getting these AI agents into production. Shubham breaks down what it actually takes to close the context engineering gap. He dives into how data teams can build a “context supply chain,” layer semantics, and continuously refine meaning through human–AI feedback loops. Read the complete article here: https://metadataweekly.substack.com/p/context-engineering-for-ai-analysts

Young

11/20/2025, 11:01 PM

Hey all, we're Apache Doris (https://doris.apache.org/)and our JSON analytics and VARIANT webinar starts in an hour. 👉 Join us live in an hour: https://us06web.zoom.us/j/89475839940?pwd=2edKgJFO8QDEOnE55hMc4ByDIDIC1F.1 If you work with logs, events, IoT data, or any large-scale JSON workloads, this session will give you a practical breakdown of how different systems handle JSON. We’ll walk through: 1. How semi-structured analytics evolved: From TEXT and JSON to VARIANT 2. How major systems approach JSON: Apache Doris, Elasticsearch, Snowflake, ClickHouse, and Iceberg 3. How Apache Doris VARIANT type works: sparse columns, subcolumn vertical compaction, and schema templates 4. Live demo: Querying 1B rows of JSON in seconds on AWS using Apache Doris

Vivek Dubey

11/26/2025, 7:20 AM

💎 Data can look “healthy”… and your model can still drift, hallucinate, or amplify bias. That’s the observability gap that Mahdi Karabiben unpacks in his latest article Metadata Weekly. We’ve spent years solving data observability for dashboards. But in the AI era, a clean pipeline doesn’t guarantee a safe decision — because the real risk now lives at the intersection of data + model + agent behavior. Mahdi breaks down why unified Data + AI Observability is quickly becoming essential for trustworthy AI systems, covering: ➡️ Why good data can still create bad AI ➡️ Why alerts need to be built for agents, not humans ➡️ How lineage becomes the control plane ➡️ Why the future is about decision trust, not just data trust If you’re aiming to build trustworthy AI systems, this one is worth the read. Read the full article on Metadata Weekly: https://metadataweekly.substack.com/p/data-trust-to-decision-trust-the

Vivek Dubey

12/01/2025, 6:57 AM

Customer data is the highest-risk, highest-impact fuel for AI, and the easiest to get wrong. One incorrect field or outdated attribute can ripple through personalization, scoring, and support workflows within seconds. In the latest Metadata Weekly edition, Michele Nieberding digs into what it really takes to build AI agents you can trust with customer data. She breaks down the two foundations teams can’t ignore: ➡️ Governance — ensuring data is accurate, compliant, and purpose-aligned ➡️ Context — giving AI a deep semantic understanding of what the data means Her article covers everything from lineage as a non-negotiable to purpose-aware access, data minimization, and how to build the intelligence layer AI actually needs. If you’re building customer-data AI or experimenting with agents, you’ll want to read this one. ✉️ Read the full article on Metadata Weekly: https://metadataweekly.substack.com/p/building-ai-agents-you-can-trust

Young

12/01/2025, 1:57 PM

When evaluating Apache Doris(https://doris.apache.org/), Elasticsearch, or ClickHouse for observability, you're really deciding how to handle massive volumes of fast-moving, constantly evolving data. Four questions teams should ask: 1️⃣ How much will it cost to store all this data? → Elasticsearch: Storage-heavy indexes, most expensive. → ClickHouse: Good compression, may require more tuning. → Apache Doris: Very high compression, 50–80% cheaper than Elasticsearch. Also offers storage-compute separation, hot data on cloud disks and cold data in object storage. 2️⃣ Can it ingest data in real time? → Elasticsearch: slows under high throughput ingest → ClickHouse: strong ingest → Apache Doris: 10 GB/s real-time ingest, handling PB-scale observability data daily 3️⃣ Can it search text fast and run complex analytics? → Elasticsearch: built its name in full-text search, slower analytics → ClickHouse: good in analytics, text search still experimental → Apache Doris: great in analytics and full-text search. Offer inverted indexes + columnar engine → 3–10x faster full-text search than ClickHouse and 6–21x better aggregation performance than Elasticsearch. 4️⃣ Will the schema break as logs evolve? → Elasticsearch: uses dynamic mapping, but type conflicts are painful → ClickHouse: schema changes require planning → Apache Doris: provides flexible schema with VARIANT data type, supports changing field type as data changes and large-scale JSON analytics. 🔗 See demo on OpenTelemetry + Apache Doris + Grafana: https://lnkd.in/geS-WNty