This message was deleted.
# secoda-feature-requests
w
This message was deleted.
4
l
@elegant-house-93198 worth a conversation I think! I'd love to get Airbyte on the list of integrations 🙂
👍 2
e
Hey @polite-microphone-78573 👋 for the Airbyte integration would you want to see metadata about a connection and the jobs that were run for that connection?
p
Hey. I’m mostly interested in adding another node to the Data Lineage graph. I would really like to be able to connect a dbt Source to its origin within Airbyte.
Each stream making up an Airbyte Source would populate as a resource within Secoda’s Catalog - maybe as a ‘table’, maybe as some unique resource type. At the moment, Airbyte only syncs to Data Warehouses, so each Airbyte Connection produces one or more tables. [in most cases] Secoda will have already catalogued those resulting Destination tables, either via the DBT integration or the raw database integration.
Regardless of how you extract the metadata from Airbyte (via the config archive export OR Airbyte’s internal Postgres DB OR the airbyte api), the table and schema information should be sufficient to match them up with dbt resources.
That’s not to say that job monitoring would not be a great feature. I can see something along the lines of the high-level activity snapshotted on dbt’s ‘Source Freshness’ page being very helpful
Last thought for the weekend - I’m sure you guys have considered adding support for many EL tools - Fivetran, Singer/Stitch, etc, not just Airbyte. Within the ‘modern’ data stack, most of the info generated by EL tools is siloed + self-contained. In my experience, it only shows up in two places downstream: 1. The columns in the source tables themselves. for airbyte, its
_airbyte_ab_id
,
_airbyte_emitted_at
,
_airbyte_normalized_at.
a. The other platforms have analogous columns 2. In the DBT
sources.yaml
file under the
loader
property (manually configured)
It would be relatively straightforward (but not a small feature) to create a ‘profiler’ for source tables - parse table schemas, recognize the naming convention of each supported platform’s columns (i.e. 1. above), infer the Loader platform type and then create the corresponding Resources in the Secoda Catalog That way you’d get to skip building the actual integration for each EL tool, and instead just have a .yaml file depicting the structure of each Loader that the profiler supports. I’d bet there are tough corner cases, and perhaps it becomes too difficult for some Loaders (i’m only familiar with the aforementioned 3), but thought i’d pass along my brainstorm.
b
Plus one on Airbyte connector!
🙌 1
p
inspired by Lightdash’s new features this week - all of which can be utilized by adding a few lines of
meta
configuration to your existing dbt
schema.yml
files -
it seems to me like that approach could be the lowest-effort MVP for new Secoda integrations. In the case of airbyte, add a
meta.secoda
config block in a
source.yml
that describes the preceding step/node’s Airbyte-Source and Airbyte-Destination That block would translate to two, new static nodes in Secoda’s resource DAG. No need to grab icons or external info, or configure another integration with metadata extraction and the works. these nodes need not be ‘explorable’.
1
e
Hey @polite-microphone-78573, thanks for the additional context. Do you have a link to those new features that Lightdash released?
e
Thanks @polite-microphone-78573