<@U04S8PLJWMT> I need help creating a StateDelegat...
# ask-ai
e
@kapa.ai I need help creating a StateDelegatingStream declarative stream
k
To create a
StateDelegatingStream
declarative stream in Airbyte, you need to define a stream that can switch between a full refresh and an incremental sync mode based on the presence or absence of state. This is useful when you want to optimize data extraction by using incremental syncs when possible, but fall back to full refresh if no state is available. Here’s how you can define a
StateDelegatingStream
in your connector’s YAML configuration:
Copy code
yaml streams:   - type: StateDelegatingStream     name: my_state_delegating_stream     full_refresh_stream:       # Define your full refresh DeclarativeStream here       type: DeclarativeStream       name: my_full_refresh_stream       retriever:         # retriever configuration for full refresh     incremental_stream:       # Define your incremental DeclarativeStream here       type: DeclarativeStream       name: my_incremental_stream       retriever:         # retriever configuration for incremental sync       incremental_sync:         # incremental sync configuration (e.g., DatetimeBasedCursor)
Key properties: - `name`: The name of your state delegating stream. - `full_refresh_stream`: A
DeclarativeStream
definition used when no state is present. - `incremental_stream`: A
DeclarativeStream
definition used when state is present (i.e., for incremental syncs). Note: - This component is marked as experimental, so use it with caution. - You can also use the
$parameters
property to pass parameters to child components if needed. For more details, see the official documentation for the [StateDelegatingStream component](https://docs.airbyte.com/platform/connector-development/config-based/understanding-the-yaml-file/reference#/definitions/StateDelegatingStream) and its usage in declarative connectors. If you need a more concrete example or have a specific use case, please provide more details!
e
@kapa.ai I keep getting this build error:
Copy code
resolve: process "python /airbyte/integration_code/main.py spec" did not complete successfully: exit code: 1
Stdout:
{"type":"TRACE","trace":{"type":"ERROR","emitted_at":1753129267672,"error":{"message":"Error starting the sync. This could be due to an invalid configuration or catalog. Please contact Support for assistance. Error: Validation against json schema defined in declarative_component_schema.yaml schema failed","stack_trace":"Traceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/manifest_declarative_source.py\", line 394, in _validate_source\n    validate(self._source_config, self._declarative_component_schema)\n  File \"/usr/local/lib/python3.11/site-packages/jsonschema/validators.py\", line 1121, in validate\n    raise error\njsonschema.exceptions.ValidationError: 'StateDelegatingStream' is not one of ['DeclarativeStream']\n\nFailed validating 'enum' in schema[0]['properties']['type']:\n    {'enum': ['DeclarativeStream'], 'type': 'string'}\n\nOn instance['type']:\n    'StateDelegatingStream'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/airbyte/integration_code/source_declarative_manifest/run.py\", line 95, in _get_local_yaml_source\n    return SourceLocalYaml(config=config, catalog=catalog, state=state)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/airbyte/integration_code/source_declarative_manifest/run.py\", line 72, in __init__\n    super().__init__(\n  File \"/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/yaml_declarative_source.py\", line 34, in __init__\n    super().__init__(\n  File \"/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/concurrent_declarative_source.py\", line 93, in __init__\n    super().__init__(\n  File \"/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/manifest_declarative_source.py\", line 138, in __init__\n    self._validate_source()\n  File \"/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/manifest_declarative_source.py\", line 396, in _validate_source\n    raise ValidationError(\njsonschema.exceptions.ValidationError: Validation against json schema defined in declarative_component_schema.yaml schema failed\n"}}}
Stderr:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/manifest_declarative_source.py", line 394, in _validate_source
    validate(self._source_config, self._declarative_component_schema)
  File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 1121, in validate
    raise error
jsonschema.exceptions.ValidationError: 'StateDelegatingStream' is not one of ['DeclarativeStream']

Failed validating 'enum' in schema[0]['properties']['type']:
    {'enum': ['DeclarativeStream'], 'type': 'string'}

On instance['type']:
    'StateDelegatingStream'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/airbyte/integration_code/main.py", line 4, in <module>
    run()
  File "/airbyte/integration_code/source_declarative_manifest/run.py", line 236, in run
    handle_command(args)
  File "/airbyte/integration_code/source_declarative_manifest/run.py", line 87, in handle_command
    handle_local_manifest_command(args)
  File "/airbyte/integration_code/source_declarative_manifest/run.py", line 118, in handle_local_manifest_command
    source = _get_local_yaml_source(args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/airbyte/integration_code/source_declarative_manifest/run.py", line 114, in _get_local_yaml_source
    raise error
  File "/airbyte/integration_code/source_declarative_manifest/run.py", line 95, in _get_local_yaml_source
    return SourceLocalYaml(config=config, catalog=catalog, state=state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/airbyte/integration_code/source_declarative_manifest/run.py", line 72, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/yaml_declarative_source.py", line 34, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/concurrent_declarative_source.py", line 93, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/manifest_declarative_source.py", line 138, in __init__
    self._validate_source()
  File "/usr/local/lib/python3.11/site-packages/airbyte_cdk/sources/declarative/manifest_declarative_source.py", line 396, in _validate_source
    raise ValidationError(
jsonschema.exceptions.ValidationError: Validation against json schema defined in declarative_component_schema.yaml schema failed
k
A message from kapa.ai
Your error message indicates that the value
'StateDelegatingStream'
is not accepted for the
type
property in your stream definition. The schema expects only
'DeclarativeStream'
as a valid value for `type`:
Copy code
'StateDelegatingStream' is not one of ['DeclarativeStream']  Failed validating 'enum' in schema[0]['properties']['type']:     {'enum': ['DeclarativeStream'], 'type': 'string'}  On instance['type']:     'StateDelegatingStream'
According to the Airbyte documentation, the
StateDelegatingStream
is a valid component, but it is not meant to be used as the top-level stream type in your YAML. Instead, your top-level stream should always have
type: DeclarativeStream
, and you can use
StateDelegatingStream
as a component within the stream definition, such as in the
retriever
or other nested properties. The schema for streams only allows
DeclarativeStream
as the type at the top level, not
StateDelegatingStream
 [Understanding the YAML file](https://docs.airbyte.com/platform/connector-development/config-based/understanding-the-yaml-file/yaml-overview). How to fix: - Change your stream definition so that the top-level
type
is
DeclarativeStream
. - Use
StateDelegatingStream
as a component within the stream, for example as the
retriever
. Example:
Copy code
yaml streams:   - type: DeclarativeStream     name: my_stream     retriever:       type: StateDelegatingStream       name: my_state_delegating_stream       full_refresh_stream:         # your full refresh DeclarativeStream config       incremental_stream:         # your incremental DeclarativeStream config
This structure matches the schema and should resolve your validation error. If you need more details on the schema, you can review the [declarative_component_schema.yaml](https://github.com/airbytehq/airbyte-python-cdk/blob/main/airbyte_cdk/sources/declarative/declarative_component_schema.yaml) as referenced in the documentation [Component Schema Reference](https://docs.airbyte.com/platform/connector-development/config-based/advanced-topics/component-schema-reference). If you continue to have issues, please share your YAML snippet for more targeted help.
e
@kapa.ai how can i do a non datetime cursor for incremental sync?
k
A message from kapa.ai
To use a non-datetime cursor (such as an incrementing integer or string) for incremental sync in Airbyte, you need to select a field that is monotonically increasing (e.g., an auto-incrementing ID or a string that increases lexicographically) and configure it as the cursor field for your stream. For example, both the BigQuery and Snowflake source connectors support using numeric or string fields as cursor fields for incremental syncs. The requirements are: - The cursor field should be monotonically increasing (e.g., an integer ID, a timestamp, or a string that increases over time). - The field should not contain null values. - For best performance, the field should be indexed or aligned with partitioning/clustering if possible. Supported non-datetime cursor types include: - Numeric types (e.g., INT64, INTEGER, BIGINT, etc.) - String types (e.g., STRING, VARCHAR, etc.) Example for BigQuery: - INT64 (auto-incrementing IDs) and STRING are supported as cursor fields, though numeric/date types are preferred for performance reasons [BigQuery Supported Sync Modes](https://docs.airbyte.com/integrations/sources/bigquery#supported-sync-modes). Example for Snowflake: - Numeric types (e.g., INTEGER, BIGINT, DECIMAL) and string types (e.g., VARCHAR) are supported as cursor fields [Snowflake Incremental Sync](https://docs.airbyte.com/integrations/sources/snowflake#incremental-sync). How to configure: - In the Airbyte UI, when setting up your connection, select "Incremental" sync mode and choose your desired non-datetime field as the cursor field from the dropdown. - In a declarative YAML connector, set the
cursor_field
property to the path of your chosen field. Note: The field must be present in every record, and for best results, it should be unique or at least not have many duplicate values. If you need a YAML example or more details for a specific connector, let me know!
e
Can you show me yaml reference please @kapa.ai
k
A message from kapa.ai
Certainly! For non-datetime cursors in incremental syncs using Airbyte’s declarative YAML, you do not use the
DatetimeBasedCursor
component. Instead, you configure the
stream_cursor_field
property in your stream definition to point to your chosen field (such as an integer or string), and Airbyte will use that field for incremental syncs. Here’s a reference snippet from the YAML schema for streams:
Copy code
yaml DeclarativeStream:   type: DeclarativeStream   name: my_stream   retriever:     # retriever configuration here   stream_cursor_field: my_id_field  # This can be a string or an array of strings for nested fields
- `stream_cursor_field`: The field of the records being read that will be used during checkpointing. This can be a string (for a top-level field) or an array of strings (for nested fields). Note: - For non-datetime cursors, you do not specify an
incremental_sync
block with
DatetimeBasedCursor
. - The incremental sync logic will use the value of
stream_cursor_field
to track progress. You can find this in the YAML reference documentation for the declarative low-code framework [Understanding the YAML file](https://docs.airbyte.com/platform/connector-development/config-based/understanding-the-yaml-file/yaml-overview): > The stream object is represented in the YAML file as: > >
Copy code
> DeclarativeStream:   >   ... >   stream_cursor_field:   >     definition: The field of the records being read that will be used during checkpointing   >     anyOf:   >       - type: string   >       - type: array   >         items:   >           - type: string   >
If you need to see a full example or have a specific use case, let me know!
e
@kapa.ai does this support incrmemental syncs though
k
A message from kapa.ai
Yes, Airbyte supports incremental syncs using non-datetime cursor fields, such as integers or strings, in some connectors. For example, the BigQuery source connector allows you to use any monotonically increasing field—including
INT64
(integer) or `STRING`—as the cursor field for incremental syncs. The connector tracks the maximum cursor value after each sync and resumes from there in subsequent syncs. The only requirements are that the cursor field must be monotonically increasing and non-null for all records [BigQuery Supported Sync Modes](https://docs.airbyte.com/integrations/sources/bigquery#supported-sync-modes). However, in the context of Airbyte's declarative YAML low-code framework, the official documentation and YAML reference primarily describe incremental syncs using the
DatetimeBasedCursor
component, which is specifically for datetime fields [Understanding the YAML file](https://docs.airbyte.com/platform/connector-development/config-based/understanding-the-yaml-file/yaml-overview). There is no explicit documentation in the provided sources about configuring incremental syncs with non-datetime cursor fields (like integers or strings) using the declarative YAML approach. Summary: - Yes, incremental syncs with non-datetime cursors are supported in some connectors (like BigQuery) via the UI and connector code. - The declarative YAML framework's documented incremental sync support is focused on datetime-based cursors. There is no explicit YAML reference for non-datetime incremental cursors in the provided documentation. If you need to implement incremental syncs with a non-datetime cursor in a declarative YAML connector, the knowledge sources do not provide enough information to confirm this is currently supported.
e
@kapa.ai what is a DeclarativeCursor and when to use it?
k
A message from kapa.ai
I'm sorry, but the knowledge sources provided do not contain any information about a component or concept called DeclarativeCursor. The documentation covers concepts such as
DatetimeBasedCursor
for datetime-based incremental syncs, but does not mention a
DeclarativeCursor
or provide details about its usage or when to use it. If you meant another cursor type (such as
DatetimeBasedCursor
or using a cursor field for incremental syncs), please clarify, and I can provide information based on the available documentation.