Hi there! I'm new here, but joined to ask a quest...
# advice-data-transformation
h
Hi there! I'm new here, but joined to ask a question I couldn't find anyone talking about on the interwebs. I'm interested in moving a large (e.g. multiple TB) database from one Postgres Citus instance to another. The way this needs to work is that the table needs to be created in the new Citus server and then I need to call something like:
Copy code
SELECT create_distributed_table('github_events', 'repo_id');
(That's just an example, but the table needs to be marked as distributed to the worker nodes.) This needs to happen before data is inserted; doing it afterwards requires both that the master Citus node have sufficient space to hold the whole table before it is sharded to the workers and also the performance of distributing the data later on large tables is not great (they recommend distributing the table before adding data). Perhaps this is something I need to take up with the dbt folks, as I don't think there's an easy mechanism to customize how DBT creates the tables before data is inserted (?) Or does Airbyte have some hooks that I could use to have the raw tables also be distributed? Ultimately maybe Airbyte is just not the right tool for this job.)