Henrik Nilsson
09/24/2024, 6:02 AMuser
09/24/2024, 6:02 AMSet up destination
[source](https://docs.airbyte.com/integrations/destinations/redshift#step-2-set-up-the-destination-connector-in-airbyte). 2. Configure a Postgres source. Create a Postgres source called "Postgres-source" [source](https://airbyte.com/tutorials/full-data-synchronization#step-5). 3. Understand the output schema. Each stream will be output into its own raw table in Redshift. Each table will contain 4 columns: _airbyte_raw_id
, _airbyte_extracted_at
, _airbyte_loaded_at
, and _airbyte_data
. The _airbyte_data
column is a json blob representing the event data and its column type in Redshift is SUPER
[source](https://docs.airbyte.com/integrations/destinations/redshift#output-schema). 4. Understand the data type map. Airbyte maps its data types to Redshift data types. For example, the Airbyte type OBJECT
and ARRAY
are mapped to the Redshift type SUPER
[source](https://docs.airbyte.com/integrations/destinations/redshift#data-type-map). 5. Run the sync. After setting up the source and destination, you can run the sync. Airbyte will first upload the data to an S3 bucket and then issue a COPY command to Redshift [source](https://docs.airbyte.com/integrations/destinations/redshift#redshift). Please note that there are some limitations when dealing with SUPER data type in Redshift. For example, an individual FIELD within a SUPER cannot exceed 65k chars [source](https://github.com/airbytehq/airbyte/issues/28541). If you encounter issues with large data, you might need to consider normalizing the _airbyte_data
field from JSON back to a columnar format outside of Redshift before attempting to load into Redshift [source](https://github.com/airbytehq/airbyte/issues/28541).Henrik Nilsson
09/24/2024, 6:15 AMuser
09/24/2024, 6:16 AM.json
file per stream, or dynamically generating the schema of your stream in code. The schema of a stream is the return value of Stream.get_json_schema
. For example, if you want to define your schema in code, you can override Stream.get_json_schema
in your stream class to return a dict
describing the schema using JSONSchema. Here's an example: python def get_json_schema(self): schema = super().get_json_schema() schema['dynamically_determined_property'] = "property" return schema
[Source](https://docs.airbyte.com/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema) If you're setting up a destination like Postgres, you can specify the target schema during the setup process. For example, when setting up a Postgres destination, you would specify the schema
parameter as shown below: python postgres_destination = PostgresDestination( name="postgres", host="localhost", port=5432, database="postgres", schema="public", username="postgres", password=POSTGRES_PASSWORD, ssl_mode=PostgresDestination.Disable(), )
[Source](https://airbyte.com/tutorials/configure-airbyte-with-python-dagster) Please note that the schema names are case sensitive and multiple schemas may be used at one time. If no schemas are set explicitly, all existing schemas will be synced. [Source](https://docs.airbyte.com/integrations/destinations/postgres#naming-conventions-part-2) In some cases, you might encounter permission issues when trying to write to a target schema. If you receive an error stating Current role does not have permissions on the target schema
, make sure that the destination SCHEMA
is one that the role you've provided has permissions on. [Source](https://docs.airbyte.com/integrations/destinations/snowflake#troubleshooting)