Hi, I have some files in S3 buckets that I want to...
# advice-data-governance
s
Hi, I have some files in S3 buckets that I want to govern. Those are PDF files and .gz with documentation. Is that supported by Datahub? Because I don't see in which entity those files will fit: https://datahubproject.io/docs/graphql/enums#entitytype And I don't know how to ingest that information
a
Hi, I don’t believe we support ingesting pdf- is there any way to convert to CSV or another format?
m
What sort of metadata do you want to get out of a pdf file? With regard to gz csv/parquet, i think it should work, datahub uses spark to extract metadata from s3 source, spark can work with gz files