brave-forest-5974
11/05/2021, 8:22 AMbrave-forest-5974
11/05/2021, 8:28 AMbrave-forest-5974
11/05/2021, 8:30 AMbrave-forest-5974
11/05/2021, 8:33 AMbrave-forest-5974
11/05/2021, 8:37 AMbrave-forest-5974
11/05/2021, 8:41 AMbrave-forest-5974
11/05/2021, 8:42 AMmammoth-bear-12532
MetadataAuditEvent_v4
and MetadataChangeLog_Versioned_v1
and ETL them to your lake or data warehouse.
3. Lineage: We currently cap the lineage viz to 100 upstreams and downstreams for a single node, so the UI shouldn’t suffer too much. We have plans to make the UI and API support filtering search experience within the lineage graph to allow for navigating really dense graphs.
4. GraphQL API security: We are targeting to release support for access tokens in the next couple of months. cc @big-carpet-38439 who can add more context here
5. Upgrades: For primary data, we recommend using managed services (like AWS RDS or similar) and using its backup-restore functionality. For restoring indexes, we have a helpful guide here (https://datahubproject.io/docs/how/restore-indices/). In our managed offering, we follow a similar approach for our customer deployments.
6. Scaling: The datahub services are stateless and so are theoretically infinitely scalable horizontally. In terms of scaling bottlenecks, currently the bottlenecks that we are aware of are in the metadata service (for batch ingesting tons of metadata -> e.g. ingestion of 2K+ Looker entities takes about 15 seconds with parallel REST calls: quite a bit of low hanging fruit here), for up to a million entities with an average metadata footprint of 1MB per entity, we have seen customers being comfortably served with the large instance versions of the specific technologies (RDS, Elastic, Kafka) with a minimum of 3 hosts for the distributed systems. UI usage is typically not where you will see any bottlenecks, you will only see performance being a concern when you start using this for programmatic use cases. Obviously at that point, the workload matters a lot (e.g. are these predominantly primary key based queries or search queries etc). [all primary key based reads go to MySQL / Cassandra. Only graph queries / search queries will hit elastic]big-carpet-38439
11/08/2021, 6:03 PMbrave-forest-5974
11/09/2021, 1:34 PM