Hi everyone :) My name is Tom Griffin - I’m experi...
# ask-community-for-troubleshooting
t
Hi everyone :) My name is Tom Griffin - I’m experimenting with Airbyte to build out a vaccination warehouse for a Albany County in New York. One of the data sources is the state’s immunization registry. Each day they cut a csv of the prior day’s vaccination events. We pull that file, ingest it, etc... We ended up writing a small python script that syncs their SFTP directory with a local directory on our end (based on filenames). We went this route because there were days where they dropped more than one file and we didn’t want to risk losing anything. I experimented with the SFTP connector and was able to download specific files I defined as part of the source configuration. I was unable to match all files in the directory, like /directory/filename2021*.csv. What would be the preferred method to mimic what I am doing now in terms of the directors synchronization and then only processing the new files (with new defined as those that I didn’t have before the job ran)? For example, is there a way that I could trigger a script to run beforehand that would stage the data locally? Any ideas would be appreciated :)
👀 2
d
Hey Tom! That's a really cool use case!
We don't yet support pre-hook triggers or file regex on a directory
I think the easiest way to do so now is to run an Airflow workflow. The workflow would consist of a couple of steps: 1. Run the before script to sync directories 2. For each new file, create a new source and connection with the destination warehouse 3. Trigger a sync 4. Clean up by deleting the just created source/connection (optional, we could leave this around but it would make the UI messy)
m
we actually have an issue to track this particular use case: https://github.com/airbytehq/airbyte/issues/2622
it mentions s3 but we want to support any kind of files based system
t
It’s pretty remarkable that question like this at 2 o’clock in the morning on the East Coast gets the attention of the CEO of a company that just raised $26m.
airbyte heart 1
🤣 1
👍 1