Hi <@UV0M2EB8Q> <@U01GCJKA8P9>. I was attempting t...
# getting-started
s
Hi @mammoth-bear-12532 @big-carpet-38439. I was attempting to add query usage for AWS Athena in datahub. AWS Athena does not provide any in-built way to get query history. But we store it ourselves using their API in a postgres table. So I am going to read through these files https://github.com/linkedin/datahub/tree/master/metadata-ingestion/src/datahub/ingestion/source/usage to understand it and then add a source for reading query history from a table to ingest it. Just wanted to know if there are any limitations in the storage model or any gotcha I should be aware of regarding query usage.
🚀 1
Hi @mammoth-bear-12532 This is a initial draft of what I have https://github.com/linkedin/datahub/pull/3269/files. Tested on my local and there are still sql parsing issues. But it still gets a lot of useful query metrics for AWS Athena in datahub. Wanted some feedback on the config file. Looks good? Would appreciate some feedback on code on the draft
🙌 1
A weird thing is happening that I am not able to understand yet. Through this code I am able to see column level usage and monthly queries. But I am not seeing the queries themselves Am I missing sending some metadata here?
m
@helpful-optician-78938 or @witty-state-99511 should be able to help you here.
w
Hey @square-activity-64562, it seems like you’re providing the
query_text
as well, so, can you verify if
query_text
contains the sql queries you expect?
s
yes they do have the text. It seems it was happening for some cases only. For others it was showing some queries. I'll dig in tomorrow to find out more details
w
awesome! Also, I’m curious to know why you’re sending the usage query through the yml file.
s
@witty-state-99511 Two reasons • this table is specific to us. In another process we are calling AWS Athena API, getting this and populating in this table after some processing of the API responses. • If I have the sql present in yaml then this is not specific to AWS Athena. I can also get data from other systems which store query logs in database. e.g. superset has a
query
table. So after I am done I should be able to get our postgres/mysql query usage from superset. It won't be complete because those databases are being queried from other places too but better having at least superset queries for those databases than not having anything