https://linen.dev logo
s

Seth Saperstein

12/25/2021, 5:21 AM
Hey everyone! Wasn’t entirely sure where to post this as it’s related to configuration and integration. I’m looking to use DBT for normalization into Redshift. I’d love for all downstream models of my raw data to be ran anytime the airbyte job runs. This is possible with the
dbt run --select <model>+
however to get the raw model to my dbt project, I don’t love the suggestion of hopping into the airbyte container, determining the normalization directory, and grabbing the generated dbt model, syncing that back to a dbt repo, and then integrating the dbt repo on the airbyte job configuration. Has anyone found a better way of exporting dbt models? I’m also planning on running on dbt cloud and this configuration means that I cannot “deploy” models via dbt cloud when the source dataset changes. Alternatively, I could trigger the dbt cloud api but that isn’t possible directly via airbyte, which means I would then have to use Airflow to schedule the airbyte job and then kick of the dbt cloud job. This means that Airflow code must be written for new data sources and I’m looking to keep data integration and normalization self-service to speed up development time for new datasets. If anyone has suggestions I’m all ears.
Current idea is a crontab to sync models from airbyte to s3 and let our data modelers bring it into their dbt project from there. The benefit is that self-service is maintained but the drawback is that I still lose the observability and alerting functionality of dbt cloud’s deployment feature.
z

Zawar khan

12/27/2021, 2:24 PM
Hi @Seth Saperstein, this is an interesting discussion. @Chris (deprecated profile), do we have a suggestion for this use case. We have an issue that requests a post replication hook to trigger a DBT workflow. Could it be a solution for you, @Seth Saperstein? If it is, I will encourage you to share your need continue the discussion on GitHub.
Yes! That is spot on what I was looking for! Glad to see I’m not the only one with this issue. I’ll continue the discussion on Github! Thank you!!
This does not cover, however, the exporting of the generated dbt models. Any thoughts there? Is there a way to get the dbt models generated by airbyte without having to check the logs then exec into the container @[DEPRECATED] Augustin Lafanechere?
@Seth Saperstein we are thinking of adding a feature to download dbt models from the UI. Would it make your life easier?
farrr easier. Makes for a much smoother development lifecycle with airbyte and dbt. I’m imagining a lifecycle such as the following that is entirely self-service for our non-engineers: New connection: • Configure the source • Configure the destination (most likely already configured as your warehouse) • Configure the connection • Run the connection (default normalization enabled) • Grab the generated dbt model (the difficult bit) • Go to dbt cloud, paste in the staging model • Build, test, and modify downstream models in dbt cloud • Kick off your model deployment in dbt cloud Scheduled updates to the connection: • Turn the connection to a scheduled update • Select a custom dbt transformation (as configured above) • Have the custom dbt transformation trigger all downstream models with command
dbt run <model>+
• Optional: Have the dbt command trigger via a hook into your dbt cloud deployment rather than the github repo (so that dbt run history is in a unified view and dbt cloud alerting benefits are realized)
Is there any temporary solution with this in mind or should we have our TDS’s (technical data stewards) who are doing the dbt/airbyte configuration reach out to our engineers to exec into the airbyte container and grab the generated dbt model ad hoc?
@Seth Saperstein, I created a GitHub issue for this feature. Could you share the lifecycle you thought about in it? I do not currently have another solution other than the one described in our Custom transformation tutorial, which is based on
docker cp