Hans Lellelid
07/29/2022, 12:26 PMSELECT create_distributed_table('github_events', 'repo_id');
(That's just an example, but the table needs to be marked as distributed to the worker nodes.) This needs to happen before data is inserted; doing it afterwards requires both that the master Citus node have sufficient space to hold the whole table before it is sharded to the workers and also the performance of distributing the data later on large tables is not great (they recommend distributing the table before adding data). Perhaps this is something I need to take up with the dbt folks, as I don't think there's an easy mechanism to customize how DBT creates the tables before data is inserted (?) Or does Airbyte have some hooks that I could use to have the raw tables also be distributed? Ultimately maybe Airbyte is just not the right tool for this job.)