Something we just ran into trying to help Kelvin c...
# contributing-to-airbyte
j
Something we just ran into trying to help Kelvin create a Github dashboard from normalized outputs
u
The timestamps are strings not timestamps in the target postgres db
u
Copy code
root@ca0be5786d74:/tmp/workspace/17/0/normalize/models/generated# cat stargazers.sql
with
stargazers_node as (
  select
    _airbyte_emitted_at,
    {{ dbt_utils.current_timestamp_in_utc()  }} as _airbyte_normalized_at,
    cast({{ json_extract_scalar('_airbyte_data', ['user_id'])  }} as {{ dbt_utils.type_float()  }}) as user_id,
    cast({{ json_extract_scalar('_airbyte_data', ['starred_at'])  }} as {{ dbt_utils.type_string()  }}) as starred_at,
    cast({{ json_extract_scalar('_airbyte_data', ['_sdc_repository'])  }} as {{ dbt_utils.type_string()  }}) as _sdc_repository
  from {{ source('public', '_airbyte_raw_stargazers')  }}
),
stargazers_with_id as (
  select
    *,
    {{ dbt_utils.surrogate_key(['user_id',
        'starred_at',
        '_sdc_repository'])  }} as _airbyte_stargazers_hashid
    from stargazers_node
)
u
i think that's expected.
u
(if not stellar)
u
Copy code
root@ca0be5786d74:/tmp/workspace/17/0# cat catalog.json
{"streams":[{"stream":{"name":"stargazers","json_schema":{"type":"object","properties":{"user":{"type":"object"},"user_id":{"type":"number"},"starred_at":{"type":"string"},"_sdc_repository":{"type":"string"}}},"supported_sync_modes":["incremental"],"source_defined_cursor":false,"default_cursor_field":[]},"sync_mode":"incremental","cursor_field":["user"]}]}
u
i believe all times, dates, etc get treated as string
u
it basically makes it unusable in analytics tools like metabase
u
since we don't have any other appropriate primitve.
u
presumably looker/etc also
u
is it string at the github-singer level or in normalization?
u
that i don't know.
u
actually with the catalog I posted above I guess it has to be in github-singer
u
lol. well if they didn't we probably would 😬
u
so we don’t support format: date-time?
u
That kind of ruins all time-based analysis
u
Like it requires DBT parsing
u
actually a lot of our schemas do include date-time
u
I don’t see any reference to it in the normalization code though
u
So I guess there are two tickets I need to create here
u
1. Patch the catalog for the github source
u
2. support date-time formatting in normalization?
u
feels like #2 is pretty critical priority
u
1 a bit less so, but it is blocking Kelvin’s work
u
we should do 2. it's just matter of when.
u
i think our bet was that if you care you'll use DBT to parse it.
u
and that's why we could hold off in the short term.
u
hmm
u
could be it's just time to do it or that that bet was wrong.
u
i guess all i'm saying is i'm not surprised. i think it was a thoughtfully made decision. but i also agree we do need to figure out when to do it.
u
i have no regrets.
u
I think it depends on target audiences
u
as long as you ignore that schedules are stored separately from connections 🙄 , and you know, some other stuff.
u
This seems like a deal breaker for Analysts/BI
u
Probably isn’t a deal breaker for DBT users or for people not doing time-based analysis (which is a decent %)
u
u
u