trying to understand CDC - getting a final table r...
# ask-community-for-troubleshooting
g
trying to understand CDC - getting a final table row with. mostly NULL values and reviewing the
_scd
table i can see several rows with changes but not sure why they’re not persisting to the final deduped row: see example in photo, the final row with have OTHER = NULL
u
you should see only the first row in your final table, can you share the final table too?
g
the final table would be the active row. but
other
should be
hello
not NULL
actual data from
_scd
table. final table is just that first row i.e. all NULLs
am i missing something or is that not expected behaviour?
@Subodh (Airbyte) thoughts?
bump. anyone familiar with CDC can help with this (i think somewhat trivial) question. i have one record changed multiple times but only the last record is being persisted in the final table. but the columns that weren’t changed along the way are ending up as NULL. See example in image where the columns:
type
changes each time but
other
stays the same and so it results in NULL if this can’t be answered, it then it appears to be a clear major bug with CDC?
s
@Chris (deprecated profile) can you help us understand how
_scd
tables are created?
c
_scd
tables are adding new columns to the raw tables to group rows by primary key and sort them by cursor field, they don’t touch other columns • final tables are keeping only the last record from
_scd
tables
g
final tables are keeping only the last record from
_scd
tables.
i am seeing that behaviour, which seems not to make sure i.e. i’m seeing all the modified reccords for an instance, and the columns that aren’t modified for a given instance are NULL, which means the final row will only have values for columns that were modified in the last change.
c
if the
other
column wasn’t modified, then it should report the last value it had otherwise it’s considered as being modified to NULL
_scd
/ dedup mode doesn’t treat rows differently if they are being produced by a CDC or non-CDC source
g
so regarding CDC adding records for changes. each record has to include all columns, even ones that aren’t modified
👍 1
if the 
other
 column wasn’t modified, then it should report the last value it had otherwise it’s considered as being modified to NULL (edited)
i’m not seeing this. this is helpful to narrow down the issue
@Subodh (Airbyte) i imagine its either CDC implementation or my binlogs are just completely wrong. but im concerned maybe the recent change on
ts_ms
could be making the CDC malfunction on identify column changes
s
Can you share the record from the actual table and see if all the columns have the right value?
c
To read more about what
_scd
tables are trying to do (type 4): https://en.wikipedia.org/wiki/Slowly_changing_dimension
s
@Chris (deprecated profile) is the final table in the destination created from the
_scd
table?
c
yes
the final table will do something similar to
select * from table_scd where _airbyte_active_row = True
g
ah kk i pinpointed the issue. its not related to CDC or SCD its related to this issue https://airbytehq.slack.com/archives/C01MFR03D5W/p1629883145126500