Just wanted to see if you guys made any updates to Snowflake connector. If I remember, it was accepting customer managed external S3 buckets vs. Snowflake internal stages which are easier and more secure. Also data should be sent in 100MB 250MB chucks in order for Snowflake to use full MPP power of larger warehouses to ingest data in parallel in a much faster way.
100-250MB(compressed) per file is recommended but can go less depends on the total data size. If total size was 100MB, you could split it in to smaller files 10 x 10MB or 5 x 20MB. Ingestion process uses one core per file and each node has 8 cores and each cluster can have 1-128 nodes depending on the size. This means each node can ingest 8 files at a time. I would use logic where total amount of data is split in to as many smaller files as possible where each file size 10 to 250MB each. This will ensure all the cores in the cluster will have something to process w/o sitting idle.
09/27/2021, 9:18 PM
AS for where the files are exported for ingestion there should be 2 options with following sub-selecttions.
1. External Stage (meaning customer has blob storage of their own that they manage) where your code would do the job of uploading files to appropriate cloud storage locations.
a. AWS (credentials + S3 path )
b. Azure (credentials + blob path )
c. GCP Storage (credentials + blob path )
2. Internal Stage (Snowflake managed storage defined by a name where access is granted via Snowflake RBAC to userid that is making connection to Snowflake). You would use JDBC, ODBC or another Snowflake driver to connect snowflake and issue a PUT SQL command that would encrypt & upload the file(s) automatically. You do not have to code the upload process in your connector and it is handled by the ODBC/JDBC driver.
a. Option 1, use an existing Internal Stage: In this case, you would only define the Schema + the name of the internal stage along with optional SubFoldering to use.
b. Option 2, use temp internal stage. user would specify a schema to use. You can use SQL CREATE STAGE to create a upload location, SQL PUT to upload local files to the stage & COPY to ingest them. Once data is ingested, You can delete the stage and that will get rid of all the files in that stage. (most ELT tools use this option)