Aziza Alkuatova
06/04/2024, 11:00 AMkapa.ai
06/04/2024, 11:00 AMairbyte_internal
schema or database. This is a common pattern where raw data is initially stored in a designated internal schema before any transformations are applied. - For example, in Oracle, the raw tables live in the airbyte_internal
schema, and you can change this by supplying a value for Raw Table Schema Name
([source](https://docs.airbyte.com/integrations/destinations/oracle-migrations#databaseschema-and-the-internal-schema)). 2. DBT and Normalization: - Airbyte supports normalization through dbt (data build tool). This means that after the raw data is loaded into the airbyte_internal
tables, you can use dbt models to transform and spread the raw data into separate columns in the final destination schema. - According to a support agent on the Airbyte forum, Airbyte normalization works only for destinations that support dbt ([source](https://discuss.airbyte.io/t/4005)). 3. Example of Raw Data Schema: - For Redshift, each stream is output into its own raw table containing columns like _airbyte_raw_id
, _airbyte_extracted_at
, _airbyte_loaded_at
, and _airbyte_data
([source](https://docs.airbyte.com/integrations/destinations/redshift#output-schema)). - Similarly, for Firebolt, the raw table contains columns like _airbyte_ab_id
, _airbyte_emitted_at
, and _airbyte_data
([source](https://docs.airbyte.com/integrations/destinations/firebolt#output-schema)). In summary, when using a low-code connector, the data will initially be loaded into the airbyte_internal
table. To spread the raw data into separate columns in the destination schema, you should indeed use dbt models for normalization. For more detailed guidance on adding normalization to a connector, you can refer to the [Airbyte documentation on dbt transformations](https://docs.airbyte.com/understanding-airbyte/operations#dbt-transformations).Aziza Alkuatova
06/04/2024, 11:42 AMkapa.ai
06/04/2024, 11:43 AMairbyte/normalization
and is tied to specific Airbyte versions. It is not configurable independently like connectors (source & destinations) ([source](https://docs.airbyte.com/using-airbyte/core-concepts/basic-normalization#airbyte-integrationbasesbase-normalization)). - Basic Normalization is being removed in favor of Typing and Deduping as part of Destinations V2 ([source](https://docs.airbyte.com/using-airbyte/core-concepts/basic-normalization)). - Normalization is only available for relational database & warehouse destinations that have not yet migrated to Destinations V2, and will eventually be fully deprecated ([source](https://docs.airbyte.com/using-airbyte/core-concepts#basic-normalization)). If you have any specific needs or issues with normalization, it might be worth looking into the new Typing and Deduping feature or using a more featureful tool for transformations like dbt.Aziza Alkuatova
06/04/2024, 11:48 AMkapa.ai
06/04/2024, 11:48 AMkapa.ai
06/04/2024, 11:48 AM_airbyte_meta
column. - Incremental data delivery: Data will be incrementally delivered in final tables for all initial snapshots. For more details, you can refer to the [Postgres Migration Guide](https://docs.airbyte.com/integrations/destinations/postgres-migrations#upgrading-to-200) and the [February 2024 Release Notes](https://docs.airbyte.com/release_notes/february_2024#-highlights). Additionally, the Postgres destination in Destinations V2 is now generally available (GA) as of version 2.0.0, with a breaking change deadline of May 31, 2024 ([source](https://github.com/airbytehq/airbyte/discussions/34423)).Aziza Alkuatova
06/05/2024, 5:41 AMkapa.ai
06/05/2024, 5:41 AMkapa.ai
06/05/2024, 5:41 AMairbyte_internal
, it could be related to how the incremental sync is configured and how the cursor values are managed.
### Understanding Incremental Sync and Cursor Values
Incremental syncs in Airbyte are designed to pull only the data that has changed since the previous sync. This is achieved using a cursor value, which acts as a pointer to the most recent record that has been replicated. Here’s a brief overview of how Airbyte handles incremental syncs:
1. Cursor Field Determination:
- If source_defined_cursor
in AirbyteStream
is true, the source determines the cursor field internally and it cannot be overridden.
- If cursor_field
in ConfiguredAirbyteStream
is set, the source uses that field as the cursor.
- If default_cursor_field
in AirbyteStream
is set, the source uses that field as the cursor.
- If none of these are set, it is considered an invalid configuration ([source](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#logic-for-resolving-the-cursor-field)).
2. Incremental Sync Process:
- During an incremental sync, Airbyte includes the cursor value as part of the query on the source system to fetch only new or updated records.
- The cursor value is stored and used in subsequent syncs to ensure only records newer than the last cursor value are fetched ([source](https://airbyte.com/tutorials/incremental-data-synchronization#step-2)).
### Troubleshooting Null Values in Destination Tables
If data is only writing to airbyte_internal
and not to the final destination tables, here are some steps to troubleshoot:
1. Check Cursor Configuration:
- Ensure that the cursor field is correctly configured in your source and destination settings. Verify that source_defined_cursor
, cursor_field
, or default_cursor_field
are appropriately set.
2. Verify Data Transformation:
- If the cursor field is nested, use an "Add Field" transformation to copy it to the top-level and a "Remove Field" transformation to remove it from the object. This ensures the cursor field is correctly recognized ([source](https://docs.airbyte.com/connector-development/connector-builder-ui/incremental-sync)).
3. Review Sync Logs:
- Check the sync logs for any errors or warnings that might indicate issues with data fetching or writing. Look for messages related to cursor values or data transformations.
4. Inspect airbyte_internal
Tables:
- Examine the data in the airbyte_internal
tables to understand why it is not being written to the final destination tables. Look for any discrepancies or issues with the data format.kapa.ai
06/05/2024, 5:41 AMairbyte_internal
, it could be related to how the incremental sync is configured and how the cursor values are managed. ### Understanding Incremental Sync and Cursor Values Incremental syncs in Airbyte are designed to pull only the data that has changed since the previous sync. This is achieved using a cursor value, which acts as a pointer to the most recent record that has been replicated. Here’s a brief overview of how Airbyte handles incremental syncs: 1. Cursor Field Determination: - If source_defined_cursor
in AirbyteStream
is true, the source determines the cursor field internally and it cannot be overridden. - If cursor_field
in ConfiguredAirbyteStream
is set, the source uses that field as the cursor. - If default_cursor_field
in AirbyteStream
is set, the source uses that field as the cursor. - If none of these are set, it is considered an invalid configuration ([source](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#logic-for-resolving-the-cursor-field)). 2. Incremental Sync Process: - During an incremental sync, Airbyte includes the cursor value as part of the query on the source system to fetch only new or updated records. - The cursor value is stored and used in subsequent syncs to ensure only records newer than the last cursor value are fetched ([source](https://airbyte.com/tutorials/incremental-data-synchronization#step-2)). ### Troubleshooting Null Values in Destination Tables If data is only writing to airbyte_internal
and not to the final destination tables, here are some steps to troubleshoot: 1. Check Cursor Configuration: - Ensure that the cursor field is correctly configured in your source and destination settings. Verify that source_defined_cursor
, cursor_field
, or default_cursor_field
are appropriately set. 2. Verify Data Transformation: - If the cursor field is nested, use an "Add Field" transformation to copy it to the top-level and a "Remove Field" transformation to remove it from the object. This ensures the cursor field is correctly recognized ([source](https://docs.airbyte.com/connector-development/connector-builder-ui/incremental-sync)). 3. Review Sync Logs: - Check the sync logs for any errors or warnings that might indicate issues with data fetching or writing. Look for messages related to cursor values or data transformations. 4. Inspect airbyte_internal
Tables: - Examine the data in the airbyte_internal
tables to understand why it is not being written to the final destination tables. Look for any discrepancies or issues with the data format. 5. Ensure Proper State Management: - Make sure that the state messages are being correctly output by the connector after reading all records. These state messages are crucial for checkpointing and determining the next set of records to read in subsequent syncs ([source](https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/incremental-syncs)). ### Additional Resources For more detailed information on incremental syncs and cursor management, you can refer to the following documentation: - [Incremental Sync - Append](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append) - [Incremental Sync - Append + Deduped](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append-deduped) - [Airbyte Protocol - Logic for Resolving the Cursor Field](https://docs