Hey guys, is there any way to output multiple file...
# feedback-and-requests
v
Hey guys, is there any way to output multiple files with an AWS S3 destination? I notice in the docs it says "Currently, each data sync will only create one file per stream. In the future, the output file can be partitioned by size. Each partition is identifiable by the partition ID, which is always 0 for now." My use case is that I am trying to sync a massive Azure Table storage table (over 1 billion rows) to S3, and a single file will not be efficient to work with once in S3. I'd also like to see the output files come in to S3 as the sync runs, so I can make sure the data is coming through correctly. Right now I am just seeing the following in the logs:
Copy code
...
2022-03-01 01:07:45 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14455000
2022-03-01 01:07:46 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14456000
2022-03-01 01:07:47 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14457000
2022-03-01 01:07:47 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14458000
...
If multiple output files is not possible, is there any way I can at least see staged data that has been processed so far? My desired output format is Parquet with SNAPPY compression.
I have the same usecase but am using azure blob instead of s3