Is there an issue or discussion around refactoring...
# feedback-and-requests
g
Is there an issue or discussion around refactoring
incremental-dedupe
and the ability to avoid having three variations of the same table e.g.
_AIRBYTE_RAW_MYTABLE
,
MYTABLE_SCD
,
MYTABLE
or at least avoid the
_SCD
table. this issue is a seperate discussion on refactoring the full-table deduping
u
Would it be acceptable solution for you to export the generated normalization project and re-add it as a custom dbt transformations where SCD tables are not materialized?
j
There’s this issue to discuss this though: https://github.com/airbytehq/airbyte/issues/3487
u
Would it be acceptable solution for you to
export the generated normalization project and re-add it as a custom dbt transformations where SCD tables are not materialized?
not really given the number of tables we have. this also don’t fair well in the interest of self-serve i.e. allowing other internal stakeholders to add their own tables to be migrated. this would require an additional step to add your own normalization, simply to dematerialize a table. seems like a lot of unnecessary work and a blocker in enabling other internal stakeholders to leverage airbyte
u
You should add a comment of your use case to the 3487 issue then!
j
Otherwise, I haven’t tried this yet, but with the work on this PR as you pointed out (hopefully will be merged this week), https://github.com/airbytehq/airbyte/pull/7162 it’ll be possible to play with the unique_key parameter see https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models#defining-a-uniqueness-constraint-optional if we push to implement https://github.com/airbytehq/airbyte/issues/7163, it would open more doors for customisations… For example, you’d be able to tweak and implement a dedupe without history without full custom dbt transformations but only an additional advanced value in normalization configs. (of course, implementing a dedicated dedup without history is still valuable and easier to select in the UI)
u
this would require an additional step to add your own normalization, simply to dematerialize a table. seems like a lot of unnecessary work and a blocker in enabling other internal stakeholders to leverage airbyte
if you could more easily change the materialization of scd tables (replace “table” by “view” or “cte”) on one line in the dbt_project.yml file (that you could configure on the normalization step) instead of a full export dbt project / import custom transformation. would that be acceptable too?
u
i totally missed those PRs and issues. they look great! and being able to add a
dbt_profile.yml
in the UI is a nice approach
if you could more easily change the materialization of scd tables (replace “table” by “view” or “cte”) on one line in the dbt_project.yml file (that you could configure on the normalization step) instead of a full export dbt project / import custom transformation.
why even go this far and instead allow a user to provide environment variables that you’ve configured in your standard dbt project? e.g. for the SCD.sql
Copy code
{{ config(materialized=env_var('DEFAULT_SCD_MATERIALIZATION', 'table')) }}
that way the user need not modify the actual dbt project at all
u
some inspirational reading (they provide optional to pass env vars in the UI) https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables