:question: Question(s) of the Week: When running D...
# random
l
Question(s) of the Week: When running DataHub, where are you starting to hit performance issues? How have you tuned DataHub to address performance issues and what was the outcome? @orange-night-91387 and I are excited for your feedback -- leave your comments in the 🧵 and we’ll pick a random swag winner next Tuesday, July 5th yay
g
I started my experience with DataHub by doing a POV on an instance with the minimum recommended resources. When I performed more than one ingest in parallel, the user experience in the UI was drastically affected. In docker-compose, one setting that reduced but did not solve the problem was increasing the cpu_shares of GMS and Elasticsearch. But the real solution to the problem was when I migrated the deployment to K8S with 3 GMS replicas and 3 Elasticsearch replicas. I'm working on enabling LoadBalancer to allow the GMS to scale high enough to support both search and ingestion traffic at peak times of parallel ingestion. Another possibility: separate GMS replicas just for ingestion purposes and others just to serve the frontend.
thanks bear 1
l
I could be misinterpreting but i was quite surprised at how long it would take to publish mcp/mces via a kafka-sink. As an example, we have an daily ingestion run that uses kafka as a source that takes about ~15 minutes to complete publishing ~2400 records to kafka as a sink.
Copy code
...
 'workunits_produced': 2121}
Sink (datahub-kafka) report:
{'downstream_end_time': None,
 'downstream_start_time': None,
 'downstream_total_latency_in_seconds': None,
 'failures': [],
 'records_written': 2359,
 'warnings': []}
I know from experience that querying for the list of kafka topics is relatively quick (you can test it using
console-sink
) which makes me think that we are possible publishing these mcp individually as opposed to a micro batch which could hurt performance. The main concern is that this is for a single recipe so as we get into the 10s of recipes ingestion runs could take hours.
thanks bear 1
teamwork 1
Sample github actions output
l
Swag time, swag time!! @lemon-hydrogen-83671 I’ll DM ya!
l
50/50 king 🎉
actual lol 2
l
And @gentle-camera-33498 won last week 😂
g
hahaha congrats @lemon-hydrogen-83671! I'm here not only for the swags @little-megabyte-1074. I'll try to respond to every question (if I could)!! I want to help the community.
teamwork 2