Hi everyone, has someone successfully dealt with `...
# ingestion
c
Hi everyone, has someone successfully dealt with
struct_type
columns? The method here https://datahubproject.io/docs/metadata-ingestion#hive-hive returns only a string column for the highest level, but not the columns below. To illustrate further: I get columnA, but not columnA.subColumnB or columnA.subColumnC.subsubColumnD. I think
sqlalchemy
does not read the column as structured, but pyhive does. Does anyone had a similar use case in the past and can point me to the right direction? Thanks a lo
g
Seems like this is a long-standing limitation of the pyhive library that we use: https://github.com/dropbox/PyHive/issues/121#issuecomment-321133036. The underlying issue is that it maps the struct type (and array/map) to string for some reason https://github.com/dropbox/PyHive/blob/master/pyhive/sqlalchemy_hive.py#L138
d
We have smth similar with ingestions from Avro, get only root fields with struct type in the UI, but in the raw schema we could see the whole struct
c
Hello @gray-shoe-75895, thanks for your reply. Yes, I saw that issue on github as well. This is quite a limitation for us as one very important table has a struct type that we would like to solve into columns in datahub. We could use mysql as connector instead but I guess that would not change anything?
g
Yep using the mysql connector instead likely won’t help here
l
We will find a workaround for this
c
bump
m
Hi @colossal-furniture-76714, We have found a way to parse DDL structs and still ingest data through sqlalchemy/pyhive. I will keep this thread updated on it.
Also, thank you for sharing the details of the workaround you suggested 🙂
c
Ok, thanks that sounds great. Do you use pyhive to connect to spark?
There is another problem with this connection as pyhive does not speak spark sql dialect properly yet.
Would be happy if you share your results here.