https://linen.dev logo
Join Slack
Powered by
# good-reads
  • v

    Vijay Anand S

    01/07/2025, 2:21 PM
    Hello everyone, We're conducting a POC using the Airbyte Cloud trial version. I am trying to set up a connection between an MS SQL Server (source) and Azure Data Lake Storage (ADLS) Blob (target). The goal is to split files before writing them to the Blob by configuring the Azure Blob Storage file spill size with the desired MBs. Additionally, we configured the Azure Blob Storage output buffer size to match the spill size to ensure that files are written only once to the Blob. However, despite these efforts, Airbyte creates the file in the Blob as 0 bytes initially, and the file size keeps increasing as the process progresses until completion. This behavior creates an issue because we have an Event Grid that triggers a downstream task as soon as the file is created. We are looking for a solution to ensure that Airbyte writes the file simultaneously as it is created in the Blob and does not continue appending rows to the same file. Note: We cannot use an intermediate Blob storage before pushing the data to the target Blob (which is linked to the Event Grid). Is there any way to solve this issue?
  • z

    Zapier

    01/08/2025, 2:48 AM
    ๐Ÿ“š Just published a new blogpost Deploy a Self-service Business Intelligence Project With Whaly & Airbyte | Airbyte
    Learn how to move your data to a data warehouse with Airbyte, model it, and build a self-service layer with Whalyโ€™s BI platform.
    Read the complete article here
  • z

    Zapier

    01/08/2025, 2:48 AM
    ๐Ÿ“š Just published a new blogpost Build a connector to extract data from the Webflow API | Airbyte
    Learn how to create a custom Airbyte source connector โ€“ this tutorial shows you how to use Airbyteโ€™s Python connector development kit (CDK) to create a source connector that extracts data from the Webflow API. You will learn about authentication, requesting data, and paginating through responses, as well as how to dynamically create streams and how to automatically extract schemas.
    Read the complete article here
  • z

    Zapier

    01/14/2025, 6:52 AM
    ๐Ÿ“š Just published a new blogpost Building a Social Media Sentiment Analyzer Using Airbyte and Twitter API | Airbyte
    Learn to build a social media sentiment analyzer using Airbyte and Twitter API. Simplify data integration and analyze trends effectively.
    Read the complete article here
  • z

    Zapier

    01/14/2025, 8:10 AM
    ๐Ÿ“š Just published a new blogpost Financial Market Monitoring with Airbyte and Polygon.io Integration | Airbyte
    Discover financial market monitoring using Airbyte and Polygon.io integration. Streamline data for actionable insights.
    Read the complete article here
  • z

    Zapier

    01/14/2025, 8:52 AM
    ๐Ÿ“š Just published a new blogpost Healthcare Data Integration: FHIR API Connector with Airbyte's AI Assistant | Airbyte
    Streamline healthcare data integration with Airbyte's AI Assistant and FHIR API connector. Simplify workflows and improve insights.
    Read the complete article here
  • z

    Zapier

    01/14/2025, 10:44 AM
    ๐Ÿ“š Just published a new blogpost Creating a GitHub Documentation Chatbot Using PyAirbyte and pgvector | Airbyte
    Learn how to build a GitHub documentation chatbot with PyAirbyte and PG Vector for seamless data retrieval and enhanced user experience.
    Read the complete article here
  • v

    Vivek Dubey

    01/15/2025, 6:55 AM
    ๐Ÿ–ฅ๏ธ Evolving Data Models: Backbone of Rich User Experiences (UX) for Data Citizens - Hegel's Framework to Distill Value of Data Models underneath ANY and ALL User Interfaces, Good Traits of UX/UI in Data, and Addressing Fundamental User Emotions ๐Ÿ“ƒ In this article, you'll explore how the relationship between users and products shapes user experience (UX), particularly for data users. Weโ€™ll dive into the role of evolving data models as key enablers, examine UX evolution through Hegelโ€™s stages of consciousness, and uncover the traits of exceptional UX in data platforms. By the end, youโ€™ll gain fresh insights into designing better experiences for data-driven tools. ๐Ÿ’Œ Read the complete article here: https://moderndata101.substack.com/p/evolving-data-models-backbone-of
  • s

    Sergio Ramos

    01/17/2025, 4:04 PM
    Hi guys ! Just finished reading fundamentals of data engineering and wrote up a review in case anyone is interested! https://medium.com/@sergioramos3.sr/self-taught-reviews-fundamentals-of-data-engineering-by-joe-reis-and-matt-housley-36b66ec9cb23
    u
    • 2
    • 1
  • z

    Zapier

    01/20/2025, 5:07 AM
    ๐Ÿ“š Just published a new blogpost Explore Airbyte's full refresh data synchronization | Airbyte
    Step-by-step instructions that help you to understand how Airbyteโ€™s full refresh overwrite and full refresh append synchronization modes function behind the scenes.
    Read the complete article here
  • z

    Zapier

    01/20/2025, 5:25 AM
    ๐Ÿ“š Just published a new blogpost Incremental data synchronization between Postgres databases | Airbyte
    Learn how Airbyteโ€™s incremental synchronization replication modes work.
    Read the complete article here
  • z

    Zapier

    01/20/2025, 6:40 AM
    ๐Ÿ“š Just published a new blogpost Automating Customer Support Analytics: Zendesk + Airbyte + OpenAI Integration | Airbyte
    Automate customer support analytics with Zendesk, Airbyte, and OpenAI integration. Unlock insights and enhance support efficiency.
    Read the complete article here
  • z

    Zapier

    01/20/2025, 7:37 AM
    ๐Ÿ“š Just published a new blogpost Building a Knowledge Management System with PyAirbyte and Vector Databases | Airbyte
    Discover how to build efficient knowledge management systems using PyAirbyte and vector databases for streamlined data access.
    Read the complete article here
  • v

    Vivek Dubey

    01/20/2025, 9:10 AM
    ๐Ÿค– How AI Agents & Data Products Work Together to Support Cross-Domain Queries & Decisions for Businesses: The Two Primary Gaps in AI's Business Enablement Capabilities and the Solution Framework Addressing Both Data and AI Stack Essentials ๐Ÿ“ƒ The article explores how AI agents and data products collaborate to address cross-domain queries and decisions, highlighting gaps in LLMs like context isolation and task execution. Readers will learn about multi-agent workflows, reinforcement learning with RAG, semantic layers, knowledge graphs, and data governance to enable accurate, scalable AI-driven business insights. ๐Ÿ’Œ Read the complete article here: https://moderndata101.substack.com/p/how-ai-agents-and-data-products-work
  • v

    Vivek Dubey

    01/27/2025, 7:24 AM
    ๐Ÿš… Speed-to-Value Funnel: Data Products + Platform and Where to Close the Gaps - Fundamental approach to building products, platforms, and user-relevant data interfaces ๐Ÿ“ƒ In this article, community author Travis Thompson discussed the Speed-to-Value Funnel for turning data into actionable insights fast. Learn how to define success metrics, prioritize impactful data strategies, build deployable data product MVPs, and boost adoption. Explore frameworks, metric modeling, and evolving data products to align speed, precision, and user-focused development for maximum ROI. โœ‰๏ธ Read the complete article here: https://moderndata101.substack.com/p/speed-to-value-funnel-data-products
  • z

    Zapier

    02/11/2025, 5:22 AM
    ๐Ÿ“š Just published a new blogpost How to Create an LLM Application with ChromaDB & Airbyte | Airbyte
    Learn how to build a robust Large Language Model application using ChromaDB for vector storage and Airbyte for data integration, simplifying your AI development workflow.
    Read the complete article here
  • v

    Vivek Dubey

    03/03/2025, 7:01 AM
    ๐Ÿ—๏ธ Building Supply Chains From Within: Strategic Data Products (Part 1/2): Building Strong & Dynamic Supply Chain Models with a Strategic Approach to Data Products, Data Platforms, and AI Agents. > ๐Ÿ“ƒ In this article by community authors Arielle Rolland and Alexandre Gontcharov talks about - > โ€ข The Evolution of the Supply Chain > โ€ข Why Your Supply Chain is NOT linear > โ€ข Building a Strong Data Foundation Means Moving Beyond Process Mining > โ€ข How to Embrace the Future of GenAI ๐Ÿ“ง Read the complete article here: https://moderndata101.substack.com/p/strategic-data-products-building-supply-chains-from-within
  • v

    Vivek Dubey

    03/12/2025, 9:41 AM
    ๐Ÿ—๏ธ ๐“๐ก๐ž ๐‚๐ฎ๐ซ๐ซ๐ž๐ง๐ญ ๐ƒ๐š๐ญ๐š ๐’๐ญ๐š๐œ๐ค ๐ข๐ฌ ๐“๐จ๐จ ๐‚๐จ๐ฆ๐ฉ๐ฅ๐ž๐ฑ: 70% ๐ƒ๐š๐ญ๐š ๐‹๐ž๐š๐๐ž๐ซ๐ฌ & ๐๐ซ๐š๐œ๐ญ๐ข๐ญ๐ข๐จ๐ง๐ž๐ซ๐ฌ ๐€๐ ๐ซ๐ž๐ž The modern data stack promised speed, scalability, and flexibilityโ€”but in reality, it has led to fragmentation, operational overhead, and rising costs. Every new tool claims to simplify, yet the stack only grows more complex, harder to manage, and increasingly expensive. A quick search on data stack complexity will surface common issues: tool sprawl, governance gaps, and interoperability struggles. But these are just symptoms. Whatโ€™s the root cause? And what does it mean for the future of data infrastructure? ๐Ÿ’  Why has the data stack become so complicated? ๐Ÿ’  What are the real pain points practitioners face daily? ๐Ÿ’  Is simplification even possible at this stage? In this latest Modern Data 101 article, our community authorsโ€”Saurabh Gupta, Animesh Kumar, and Matt Lampeโ€”cut through the noise to break down the core challenges of todayโ€™s data stack and explore whether the future lies in evolution or revolution. ๐–๐ก๐š๐ญ ๐ฒ๐จ๐ฎโ€™๐ฅ๐ฅ ๐ฅ๐ž๐š๐ซ๐ง ๐ข๐ง ๐ญ๐ก๐ข๐ฌ ๐๐ž๐ญ๐š๐ข๐ฅ๐ž๐ ๐ฌ๐ญ๐ฎ๐๐ฒ? ๐‡๐ž๐ซ๐žโ€™๐ฌ ๐š ๐ญ๐š๐ฌ๐ญ๐ž - ๐Ÿ’  Why thereโ€™s a disconnect between promise and reality in data tooling ๐Ÿ’  Why 70% of data leaders see complexity as a bottleneck ๐Ÿ’  The stack-wide impact & outcome of modular architectures ๐Ÿ’  Is consolidation or abstraction the right way forward? And much more! ๐Ÿ“ฉ ๐‘๐ž๐š๐ ๐ญ๐ก๐ž ๐œ๐จ๐ฆ๐ฉ๐ฅ๐ž๐ญ๐ž ๐š๐ซ๐ญ๐ข๐œ๐ฅ๐ž ๐ก๐ž๐ซ๐ž: https://moderndata101.substack.com/p/the-current-data-stack-is-too-complex
  • z

    Zapier

    03/18/2025, 6:18 AM
    ๐Ÿ“š Just published a new blogpost Creating Data Pipeline with dbt & DuckDB Using Airbyte | Airbyte
    Learn to build efficient data pipelines using Airbyte, dbt, and DuckDB. A comprehensive guide for data engineers with practical implementation steps.
    Read the complete article here
  • v

    Vivek Dubey

    03/18/2025, 10:26 AM
    ๐Ÿ“˜ The Data Product Testing Strategy: Handbook - Goals & Components of the Data Product Testing Strategy, Testing Integration in the Data Product Lifecycle, Testing Facilitation Strategies & Technologies, and more! ๐Ÿ“ƒ In this article, community authors will take you through importance of a metrics-first approach in developing data products. It outlines a strategy that begins with identifying business opportunities and use cases, followed by defining a metric model to establish clear relationships between metrics. This approach ensures that data products are aligned with business goals and can be effectively validated and iterated upon. ๐Ÿ“ง Read the complete article here: https://moderndata101.substack.com/p/the-data-product-testing-strategy
  • z

    Zapier

    03/18/2025, 2:10 PM
    ๐Ÿ“š Just published a new blogpost Orchestrate data ingestion and transformation pipelines with Dagster | Airbyte
    Learn how to ingest and transform Github and Slack data into Postgres. Dagster can orchestrate data ingestion pipelines with Airbyte, SQL-based transformations with dbt, and any kind of Python transformation.
    Read the complete article here
  • z

    Zapier

    03/25/2025, 4:53 AM
    ๐Ÿ“š Just published a new blogpost Building ETL Pipeline with Python, Docker, & Airbyte | Airbyte
    Learn how to build robust ETL pipelines using Python, Docker, and Airbyte. A guide for data engineers covering setup, implementation, & best practices.
    Read the complete article here
  • v

    Vivek Dubey

    03/26/2025, 6:26 AM
    ๐Ÿ› ๏ธ How the Ontology Pipeline Powers Semantic Knowledge Systems: The Need for a Structured Approach, Elements of the Ontology Pipeline, the Pipeline as a Framework for Developing Knowledge Management Systems, and More! ๐Ÿ“ƒ Ontologies and knowledge systems donโ€™t have to be a black box. Theyโ€™re the secret sauce behind *AI, search, and intelligent automation*โ€”if structured right. Jessica Talismanโ€™s latest guide breaks it all down: How the Ontology Pipeline transforms scattered data into machine-readable, scalable knowledge. Whatโ€™s inside? โ€ข Why taxonomies, ontologies & vocabularies are key to AI-ready data โ€ข How the Ontology Pipeline eliminates ambiguity & boosts accuracy โ€ข The library science principles that power modern AI systems And more! If youโ€™re building RAG pipelines, training LLMs, or structuring enterprise knowledge, this is a must-read. ๐Ÿ’Œ Read it here: https://moderndata101.substack.com/p/the-ontology-pipeline
    j
    • 2
    • 1
  • v

    Vivek Dubey

    04/07/2025, 7:26 AM
    โš”๏ธ From Data Tyranny to Data Democracy: How Risk-Based Governance Frameworks and Data Product Owners can transform Data Tyranny into agile, scalable Data Democratization. ๐Ÿ“ƒ In this article, community author Francesco dig into a shift every data leader should be thinking about: โ€ข From rigid, centralized control ("data tyranny") โ€ข To agile, risk-aware data democratization. But this shift only works if we change how we think about data: โ€ข Treat it like a long-term product, not just an asset. โ€ข Push ownership leftโ€”closer to where data is created. โ€ข Adopt risk-based governance that scales with context (not all data needs the same level of scrutiny). If your governance model slows down delivery or bottlenecks innovation, itโ€™s time to rethink it. ๐Ÿ“ง Read the complete article here: https://moderndata101.substack.com/p/from-data-tyranny-to-data-democratization
  • m

    MohOdejimi

    04/12/2025, 5:51 PM
    Hi @[DEPRECATED] Marcos Marx, I am a software engineer with experience in writing for tech blogs such as Baeldung and OpenReplay. Could you please guide me through the process of contributing as an author for the Airbyte blog?
  • z

    Zapier

    04/17/2025, 12:58 PM
    ๐Ÿ“š Just published a new blogpost A step-by-step guide to setting up and configuring Airbyte and Airflow to work together | Airbyte
    Learn how to create an Airflow DAG (directed acyclic graph) that triggers Airbyte synchronizations.
    Read the complete article here
  • z

    Zapier

    04/17/2025, 12:58 PM
    ๐Ÿ“š Just published a new blogpost Build a connector to extract data from the Webflow API | Airbyte
    Learn how to create a custom Airbyte source connector โ€“ this tutorial shows you how to use Airbyteโ€™s Python connector development kit (CDK) to create a source connector that extracts data from the Webflow API. You will learn about authentication, requesting data, and paginating through responses, as well as how to dynamically create streams and how to automatically extract schemas.
    Read the complete article here
  • z

    Zapier

    04/17/2025, 1:02 PM
    ๐Ÿ“š Just published a new blogpost MySQL CDC: Build an ELT pipeline from MySQL Database | Airbyte
    Easily set up MySQL CDC using Airbyte, harnessing the power of a robust tool like Debezium to construct a near real-time ELT pipeline.
    Read the complete article here
  • z

    Zapier

    04/17/2025, 1:04 PM
    ๐Ÿ“š Just published a new blogpost Version control Airbyte configurations with Octavia CLI | Airbyte
    Use Octavia CLI to import, edit, and apply Airbyte application configurations.
    Read the complete article here
  • v

    Vivek Dubey

    05/06/2025, 6:06 AM
    โ›“๏ธ Data Lineage is Strategy: Beyond Observability and Debugging - Gaps in passive lineage, how and why Data Products change and uplift lineage, and notes on stepping up to the AI-native era. ๐Ÿ“ƒ Letโ€™s be honest, most data teams only turn to lineage when things break. Itโ€™s reactive. Fragmented. Siloed. But in modern, modular data stacks (especially AI-native ones), thatโ€™s not enough. Thatโ€™s where Strategic Lineage comes in. Itโ€™s not about watching what happened. Itโ€™s about designing for what should happen starting day one. In this article, community author makes the case for treating lineage as a product capability, not an ops tool. He argues that true lineage must be intentionally designed into how data products are built, governed, and consumed. โœ‰๏ธ Read the complete article here: https://moderndata101.substack.com/p/data-lineage-is-strategy-beyond-observability