Hi everyone. I'm building on AWS Managed Flink, ho...
# troubleshooting
f
Hi everyone. I'm building on AWS Managed Flink, however in order to take advantage of temporal joins and time travel I need to setup a Hive catalog. AWS uses Glue somehow when using the studio notebooks, but these only support up to flink version 1.15. We switched to 1.19 without notebooks to allow us to create github projects with CI/CD but cannot setup Hive since I lack .jar dependency that allows me to connect to the Glue metastore. I've seen this related to other product (EMR): https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html And found this: https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/tree/branch-3.4.0 But I'm not sure how to proceed so I prefer to ask around if someone has a better idea. By the way we are using python and SQL Thanks you all for your time.
j
Why do you need to use a hive catalog?
f
Im capturing CDC from our postgresdb. Im storing them in upsert-kafka fashion. I need to use temporal joins and time travel to manage the state growth of some regular joins we have on our project. I ran into this requirement https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/catalogs/#interface-in-catalog-for-supporting-time-travel
j
Got it thank you. And you're trying to do this with MSF or MSF Studio? I think you would have better luck with MSF but not sure there is a way to define your own catalog.
f
First we tried studio but switched to code for these reasons: • only deploy one insert job per notebook • version 1.15 lacks some sql functions we need • version 1.15 lacks time travel • we needed version control
👍 2
j
checking on this…
🖖 1
Hive isn't supported on msf today
f
what about EMR?
j
Yes with EMR you can configure it how you like
f
thanks a lot, I'll do some testing