Using the Airbyte official Salesforce connector, I am syncing data to a Kafka destination. However, I am facing storage issues on the destination side. I am syncing only five streams (Account, Contact, Case, CaseFeed, EmailMessage), but the number and size of the messages are so high that the initial sync consumes more than 100 GB of storage. This is primarily due to the large size of messages in the EmailMessage and CaseFeed objects.
I would like to know what options are available for syncing data when there is insufficient storage on the destination side. Ideally, I would like to adjust the amount of data sent to the destination per sync.
Currently, the Airbyte Salesforce connector only supports a start date. What if we added support for a start and end date? This way, we could limit the amount of data synced to the destination. Additionally, if there is a way to select or drop fields from specific streams, that would be very helpful.
If you have any other suggestions that could help in this case, please let me know.
Context: The problem I am trying to solve is to fetch data from Salesforce, scan it for PII information, and then discard the Kafka messages after processing. I am planning to use ephemeral Kafka topics, but the challenge is with the storage requirement for the initial sync.
Thank you.