Hello, team ! Very quick question, does developmen...
# ask-ai
a
Hello, team ! Very quick question, does development with low code connector mean that data will be loaded only on airbyte-internal table not to the destination schema , and that I should add dbt model somehow so that it would spread the raw data into separate columns ?
k
Hello! Based on the information available, it seems that the concept of "airbyte_internal" tables is related to how Airbyte handles raw data before it is processed or transformed. Here are some key points: 1. Raw Data Handling: - For various destinations like Oracle, MS SQL Server, and Clickhouse, Airbyte writes raw data into tables within the
airbyte_internal
schema or database. This is a common pattern where raw data is initially stored in a designated internal schema before any transformations are applied. - For example, in Oracle, the raw tables live in the
airbyte_internal
schema, and you can change this by supplying a value for
Raw Table Schema Name
([source](https://docs.airbyte.com/integrations/destinations/oracle-migrations#databaseschema-and-the-internal-schema)). 2. DBT and Normalization: - Airbyte supports normalization through dbt (data build tool). This means that after the raw data is loaded into the
airbyte_internal
tables, you can use dbt models to transform and spread the raw data into separate columns in the final destination schema. - According to a support agent on the Airbyte forum, Airbyte normalization works only for destinations that support dbt ([source](https://discuss.airbyte.io/t/4005)). 3. Example of Raw Data Schema: - For Redshift, each stream is output into its own raw table containing columns like
_airbyte_raw_id
,
_airbyte_extracted_at
,
_airbyte_loaded_at
, and
_airbyte_data
([source](https://docs.airbyte.com/integrations/destinations/redshift#output-schema)). - Similarly, for Firebolt, the raw table contains columns like
_airbyte_ab_id
,
_airbyte_emitted_at
, and
_airbyte_data
([source](https://docs.airbyte.com/integrations/destinations/firebolt#output-schema)). In summary, when using a low-code connector, the data will initially be loaded into the
airbyte_internal
table. To spread the raw data into separate columns in the destination schema, you should indeed use dbt models for normalization. For more detailed guidance on adding normalization to a connector, you can refer to the [Airbyte documentation on dbt transformations](https://docs.airbyte.com/understanding-airbyte/operations#dbt-transformations).
a
@kapa.ai Is Normalization by default set in latter versions of airbyte?
k
Normalization in Airbyte is not set by default in the latter versions. In fact, Basic Normalization is being deprecated in favor of the new Typing and Deduping feature as part of Destinations V2. This means that normalization is only available for relational database and warehouse destinations that have not yet migrated to Destinations V2, and it will eventually be fully deprecated. For more details, you can refer to the [Basic Normalization documentation](https://docs.airbyte.com/using-airbyte/core-concepts/basic-normalization). Here are some key points from the documentation: - Basic Normalization is packaged in a Docker image
airbyte/normalization
and is tied to specific Airbyte versions. It is not configurable independently like connectors (source & destinations) ([source](https://docs.airbyte.com/using-airbyte/core-concepts/basic-normalization#airbyte-integrationbasesbase-normalization)). - Basic Normalization is being removed in favor of Typing and Deduping as part of Destinations V2 ([source](https://docs.airbyte.com/using-airbyte/core-concepts/basic-normalization)). - Normalization is only available for relational database & warehouse destinations that have not yet migrated to Destinations V2, and will eventually be fully deprecated ([source](https://docs.airbyte.com/using-airbyte/core-concepts#basic-normalization)). If you have any specific needs or issues with normalization, it might be worth looking into the new Typing and Deduping feature or using a more featureful tool for transformations like dbt.
a
@kapa.ai Does Destinations V2 include postgres destination?
k
A message from kapa.ai
Yes, Destinations V2 includes the Postgres destination. The migration to Destinations V2 for Postgres brings several improvements, such as one-to-one table mapping, better error handling, and incremental data delivery. Here are some key points about the Postgres destination in Destinations V2: - One-to-one mapping: Data from one stream (endpoint or table) will now create one table in the destination. - Improved error handling: Typing errors will no longer fail your sync and are now easily visible in a new
_airbyte_meta
column. - Incremental data delivery: Data will be incrementally delivered in final tables for all initial snapshots. For more details, you can refer to the [Postgres Migration Guide](https://docs.airbyte.com/integrations/destinations/postgres-migrations#upgrading-to-200) and the [February 2024 Release Notes](https://docs.airbyte.com/release_notes/february_2024#-highlights). Additionally, the Postgres destination in Destinations V2 is now generally available (GA) as of version 2.0.0, with a breaking change deadline of May 31, 2024 ([source](https://github.com/airbytehq/airbyte/discussions/34423)).
a
@kapa.ai still have the problem that the destination table have null values, data only writes to airbyte_internal. Can you also explain how iairbyte understands incremental sync if it doesnt write data to final destination tables so the sursor value is empty
k
A message from kapa.ai
If you are experiencing issues where the destination table has null values and data only writes to
airbyte_internal
, it could be related to how the incremental sync is configured and how the cursor values are managed. ### Understanding Incremental Sync and Cursor Values Incremental syncs in Airbyte are designed to pull only the data that has changed since the previous sync. This is achieved using a cursor value, which acts as a pointer to the most recent record that has been replicated. Here’s a brief overview of how Airbyte handles incremental syncs: 1. Cursor Field Determination: - If
source_defined_cursor
in
AirbyteStream
is true, the source determines the cursor field internally and it cannot be overridden. - If
cursor_field
in
ConfiguredAirbyteStream
is set, the source uses that field as the cursor. - If
default_cursor_field
in
AirbyteStream
is set, the source uses that field as the cursor. - If none of these are set, it is considered an invalid configuration ([source](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#logic-for-resolving-the-cursor-field)). 2. Incremental Sync Process: - During an incremental sync, Airbyte includes the cursor value as part of the query on the source system to fetch only new or updated records. - The cursor value is stored and used in subsequent syncs to ensure only records newer than the last cursor value are fetched ([source](https://airbyte.com/tutorials/incremental-data-synchronization#step-2)). ### Troubleshooting Null Values in Destination Tables If data is only writing to
airbyte_internal
and not to the final destination tables, here are some steps to troubleshoot: 1. Check Cursor Configuration: - Ensure that the cursor field is correctly configured in your source and destination settings. Verify that
source_defined_cursor
,
cursor_field
, or
default_cursor_field
are appropriately set. 2. Verify Data Transformation: - If the cursor field is nested, use an "Add Field" transformation to copy it to the top-level and a "Remove Field" transformation to remove it from the object. This ensures the cursor field is correctly recognized ([source](https://docs.airbyte.com/connector-development/connector-builder-ui/incremental-sync)). 3. Review Sync Logs: - Check the sync logs for any errors or warnings that might indicate issues with data fetching or writing. Look for messages related to cursor values or data transformations. 4. Inspect
airbyte_internal
Tables
: - Examine the data in the
airbyte_internal
tables to understand why it is not being written to the final destination tables. Look for any discrepancies or issues with the data format.
If you are experiencing issues where the destination table has null values and data only writes to
airbyte_internal
, it could be related to how the incremental sync is configured and how the cursor values are managed. ### Understanding Incremental Sync and Cursor Values Incremental syncs in Airbyte are designed to pull only the data that has changed since the previous sync. This is achieved using a cursor value, which acts as a pointer to the most recent record that has been replicated. Here’s a brief overview of how Airbyte handles incremental syncs: 1. Cursor Field Determination: - If
source_defined_cursor
in
AirbyteStream
is true, the source determines the cursor field internally and it cannot be overridden. - If
cursor_field
in
ConfiguredAirbyteStream
is set, the source uses that field as the cursor. - If
default_cursor_field
in
AirbyteStream
is set, the source uses that field as the cursor. - If none of these are set, it is considered an invalid configuration ([source](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#logic-for-resolving-the-cursor-field)). 2. Incremental Sync Process: - During an incremental sync, Airbyte includes the cursor value as part of the query on the source system to fetch only new or updated records. - The cursor value is stored and used in subsequent syncs to ensure only records newer than the last cursor value are fetched ([source](https://airbyte.com/tutorials/incremental-data-synchronization#step-2)). ### Troubleshooting Null Values in Destination Tables If data is only writing to
airbyte_internal
and not to the final destination tables, here are some steps to troubleshoot: 1. Check Cursor Configuration: - Ensure that the cursor field is correctly configured in your source and destination settings. Verify that
source_defined_cursor
,
cursor_field
, or
default_cursor_field
are appropriately set. 2. Verify Data Transformation: - If the cursor field is nested, use an "Add Field" transformation to copy it to the top-level and a "Remove Field" transformation to remove it from the object. This ensures the cursor field is correctly recognized ([source](https://docs.airbyte.com/connector-development/connector-builder-ui/incremental-sync)). 3. Review Sync Logs: - Check the sync logs for any errors or warnings that might indicate issues with data fetching or writing. Look for messages related to cursor values or data transformations. 4. Inspect
airbyte_internal
Tables
: - Examine the data in the
airbyte_internal
tables to understand why it is not being written to the final destination tables. Look for any discrepancies or issues with the data format. 5. Ensure Proper State Management: - Make sure that the state messages are being correctly output by the connector after reading all records. These state messages are crucial for checkpointing and determining the next set of records to read in subsequent syncs ([source](https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/incremental-syncs)). ### Additional Resources For more detailed information on incremental syncs and cursor management, you can refer to the following documentation: - [Incremental Sync - Append](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append) - [Incremental Sync - Append + Deduped](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append-deduped) - [Airbyte Protocol - Logic for Resolving the Cursor Field](https://docs