https://linen.dev logo
#contributing-to-airbyte
Title
# contributing-to-airbyte
j

Jared Rhizor (Airbyte)

01/21/2021, 12:08 AM
Something we just ran into trying to help Kelvin create a Github dashboard from normalized outputs
u

user

01/21/2021, 12:08 AM
The timestamps are strings not timestamps in the target postgres db
u

user

01/21/2021, 12:10 AM
Copy code
root@ca0be5786d74:/tmp/workspace/17/0/normalize/models/generated# cat stargazers.sql
with
stargazers_node as (
  select
    _airbyte_emitted_at,
    {{ dbt_utils.current_timestamp_in_utc()  }} as _airbyte_normalized_at,
    cast({{ json_extract_scalar('_airbyte_data', ['user_id'])  }} as {{ dbt_utils.type_float()  }}) as user_id,
    cast({{ json_extract_scalar('_airbyte_data', ['starred_at'])  }} as {{ dbt_utils.type_string()  }}) as starred_at,
    cast({{ json_extract_scalar('_airbyte_data', ['_sdc_repository'])  }} as {{ dbt_utils.type_string()  }}) as _sdc_repository
  from {{ source('public', '_airbyte_raw_stargazers')  }}
),
stargazers_with_id as (
  select
    *,
    {{ dbt_utils.surrogate_key(['user_id',
        'starred_at',
        '_sdc_repository'])  }} as _airbyte_stargazers_hashid
    from stargazers_node
)
u

user

01/21/2021, 12:10 AM
i think that's expected.
u

user

01/21/2021, 12:10 AM
(if not stellar)
u

user

01/21/2021, 12:10 AM
Copy code
root@ca0be5786d74:/tmp/workspace/17/0# cat catalog.json
{"streams":[{"stream":{"name":"stargazers","json_schema":{"type":"object","properties":{"user":{"type":"object"},"user_id":{"type":"number"},"starred_at":{"type":"string"},"_sdc_repository":{"type":"string"}}},"supported_sync_modes":["incremental"],"source_defined_cursor":false,"default_cursor_field":[]},"sync_mode":"incremental","cursor_field":["user"]}]}
u

user

01/21/2021, 12:10 AM
i believe all times, dates, etc get treated as string
u

user

01/21/2021, 12:11 AM
it basically makes it unusable in analytics tools like metabase
u

user

01/21/2021, 12:11 AM
since we don't have any other appropriate primitve.
u

user

01/21/2021, 12:11 AM
presumably looker/etc also
u

user

01/21/2021, 12:11 AM
is it string at the github-singer level or in normalization?
u

user

01/21/2021, 12:11 AM
that i don't know.
u

user

01/21/2021, 12:12 AM
actually with the catalog I posted above I guess it has to be in github-singer
u

user

01/21/2021, 12:13 AM
lol. well if they didn't we probably would 😬
u

user

01/21/2021, 12:13 AM
so we don’t support format: date-time?
u

user

01/21/2021, 12:13 AM
That kind of ruins all time-based analysis
u

user

01/21/2021, 12:13 AM
Like it requires DBT parsing
u

user

01/21/2021, 12:14 AM
actually a lot of our schemas do include date-time
u

user

01/21/2021, 12:15 AM
I don’t see any reference to it in the normalization code though
u

user

01/21/2021, 12:16 AM
So I guess there are two tickets I need to create here
u

user

01/21/2021, 12:16 AM
1. Patch the catalog for the github source
u

user

01/21/2021, 12:16 AM
2. support date-time formatting in normalization?
u

user

01/21/2021, 12:17 AM
feels like #2 is pretty critical priority
u

user

01/21/2021, 12:17 AM
1 a bit less so, but it is blocking Kelvin’s work
u

user

01/21/2021, 12:18 AM
we should do 2. it's just matter of when.
u

user

01/21/2021, 12:18 AM
i think our bet was that if you care you'll use DBT to parse it.
u

user

01/21/2021, 12:18 AM
and that's why we could hold off in the short term.
u

user

01/21/2021, 12:19 AM
hmm
u

user

01/21/2021, 12:19 AM
could be it's just time to do it or that that bet was wrong.
u

user

01/21/2021, 12:19 AM
i guess all i'm saying is i'm not surprised. i think it was a thoughtfully made decision. but i also agree we do need to figure out when to do it.
u

user

01/21/2021, 12:19 AM
i have no regrets.
u

user

01/21/2021, 12:20 AM
I think it depends on target audiences
u

user

01/21/2021, 12:20 AM
as long as you ignore that schedules are stored separately from connections 🙄 , and you know, some other stuff.
u

user

01/21/2021, 12:20 AM
This seems like a deal breaker for Analysts/BI
u

user

01/21/2021, 12:21 AM
Probably isn’t a deal breaker for DBT users or for people not doing time-based analysis (which is a decent %)
u

user

01/21/2021, 12:23 AM
u

user

01/21/2021, 12:23 AM
u

user

01/21/2021, 12:31 AM
3 Views