Hi everyone has someone successfully dealt with `struct type DataHub #ingestion

Hi everyone, has someone successfully dealt with `...

colossal-furniture-76714

05/26/2021, 3:42 PM

Hi everyone, has someone successfully dealt with

struct_type

columns? The method here https://datahubproject.io/docs/metadata-ingestion#hive-hive returns only a string column for the highest level, but not the columns below. To illustrate further: I get columnA, but not columnA.subColumnB or columnA.subColumnC.subsubColumnD. I think

sqlalchemy

does not read the column as structured, but pyhive does. Does anyone had a similar use case in the past and can point me to the right direction? Thanks a lo

gray-shoe-75895

05/26/2021, 9:30 PM

Seems like this is a long-standing limitation of the pyhive library that we use: https://github.com/dropbox/PyHive/issues/121#issuecomment-321133036. The underlying issue is that it maps the struct type (and array/map) to string for some reason https://github.com/dropbox/PyHive/blob/master/pyhive/sqlalchemy_hive.py#L138

delightful-plumber-77060

05/27/2021, 1:43 PM

We have smth similar with ingestions from Avro, get only root fields with struct type in the UI, but in the raw schema we could see the whole struct

colossal-furniture-76714

05/28/2021, 8:01 AM

Hello @gray-shoe-75895, thanks for your reply. Yes, I saw that issue on github as well. This is quite a limitation for us as one very important table has a struct type that we would like to solve into columns in datahub. We could use mysql as connector instead but I guess that would not change anything?

gray-shoe-75895

05/28/2021, 5:27 PM

Yep using the mysql connector instead likely won’t help here

loud-island-88694

05/30/2021, 3:31 PM

We will find a workaround for this

colossal-furniture-76714

09/27/2021, 12:55 PM

bump

miniature-tiger-96062

09/27/2021, 10:43 PM

Hi @colossal-furniture-76714, We have found a way to parse DDL structs and still ingest data through sqlalchemy/pyhive. I will keep this thread updated on it.

miniature-tiger-96062

09/27/2021, 10:44 PM

Also, thank you for sharing the details of the workaround you suggested 🙂

colossal-furniture-76714

09/29/2021, 3:35 PM

Ok, thanks that sounds great. Do you use pyhive to connect to spark?

colossal-furniture-76714

09/29/2021, 3:35 PM

There is another problem with this connection as pyhive does not speak spark sql dialect properly yet.

colossal-furniture-76714

09/29/2021, 3:36 PM

Would be happy if you share your results here.

Open in Slack

Previous Next