<@U04SE163WC8> how to define streams for unstructu...
# ask-ai
k
@kapa.ai how to define streams for unstructured data. for example in couchbase, we buckets which contains scopes which contains collections which contain documents. All documents inside a collection can have different schema
u
In Airbyte, a stream is a group of related records, which can correspond to a table in a relational database, a resource or API endpoint for a REST API, or the records from a directory containing many files in a filesystem [source](https://docs.airbyte.com/using-airbyte/core-concepts#stream). For unstructured data like the documents in a Couchbase collection, you can define the schema of each stream using JSONSchema. The simplest way to do this is to describe the schema of your streams using one
.json
file per stream. You can also dynamically generate the schema of your stream in code, or you can combine both approaches: start with a
.json
file and dynamically add properties to it. The schema of a stream is the return value of
Stream.get_json_schema
[source](https://docs.airbyte.com/connector-development/cdk-python/schemas). If you'd rather define your schema in code, override
Stream.get_json_schema
in your stream class to return a
dict
describing the schema using JSONSchema [source](https://docs.airbyte.com/connector-development/cdk-python/schemas#dynamic-schemas). For example, if you are using a data source that wraps an API with multiple different resources, each route would correspond to a stream. The JSON object returned by each route would be described in the
json_schema
field [source](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#overview). In the case of Couchbase, you could potentially treat each collection as a separate stream. The schema for each stream can be dynamically generated based on the documents within the collection. This approach is similar to how the Webflow connector works, where it dynamically figures out what collections are available and what their schema is, and dynamically creates a unique stream for each one [source](https://docs.airbyte.com/integrations/sources/webflow#webflow). Please note that the stream and field names can be any UTF8 string. Destinations are responsible for cleaning these names to make them valid table and column names in their respective data stores [source](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#overview).