Ameya Bapat

02/03/2022, 9:50 AM
Hi We have following requirements. I have listed them as per domain in which they might be related. • S3-source 1. support csv along with json. • S3-Destination: 1. output files can be partitioned based on configured files size(ex. 10mb each) instead of dumping one huge file(~Gb) in every sync. 2. If one of the csv value has nested data in it then output csv creates mulitple rows in csv to indicate single row record. It disturbs our csv consumer processing and rows/records count. • Snowflake-Source There should be a way to filter out some of columns from syncing as sometimes it is not advisable to sync all columns to the destination as some columns could contain irrelevant or sensitive information. • Connection: - It should take first sync start time for the connection - Along with frequency, it should also support day, time schedule like (Ex. every Monday 3pm, everyday at 1pm) or cron strings. • Sync Jobs: - Observer/subscriber Callback model to inform external systems about the completion of job. The callback could inform all the jobs details in it.(Ex. success/failure, data_synced, count etc). Currently external system has to periodically make jobs api calls to get the status of job. As we don't know the amount of time the job could take, it demands periodic calls from the external systems.
Hey @Ameya Bapat; thank you for the feedback! Most of your suggestion matches existing enhancement issues with have in our repo. Feel free to upvote the related issue you find and open new ones for your specific findings.