Hi! I have some questions regarding S3 staging wit...
# ask-community-for-troubleshooting
a
Hi! I have some questions regarding S3 staging with a Snowflake destination and Postgres source. The documentation on part size states:
Affects the size limit of an individual Redshift table. Optional. Increase this if syncing tables larger than 100GB. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default table limit of 100GB. Note, a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.
• I see that the default value is '5' in my local build, though the documentation says '10MB'. Is this just a typo, or am I missing something? • I want to try syncing a ~1TB table using this method, which would mean that I should increase the part size to ~100MB, correct? • Which container does the memory requirement rule of thumb apply to? I am working in Kubernetes and want to get my resource limits right.
👀 1
a
Hi @Andrew Morrison, are you using redshift or snowflake as a destination? I think you pasted the Redshift documentation but mentioned you want to use Snowflake.
I checked the doc and there's a typo indeed
a
I am using Snowflake as a destination. I just copied the documentation from https://docs.airbyte.com/integrations/destinations/snowflake#aws-s3. I was wondering why it mentions Redshift at all.
a
I opened an issue to fix the doc. So you're right, 5Mb is the default value. You're right about trying 100MB part size. The data transfer is handled by sync pods that are dynamically created. You can tweak the
JOB_MAIN_CONTAINER_MEMORY_REQUEST
and
JOB_MAIN_CONTAINER_MEMORY_LIMIT
environment variable to change the memory requirements.
a
Awesome, thank you for the information!