Hi, I have some files in S3 buckets that I want to govern. Those are PDF files and .gz with documentation. Is that supported by Datahub? Because I don't see in which entity those files will fit: https://datahubproject.io/docs/graphql/enums#entitytype And I don't know how to ingest that information
a
astonishing-answer-96712
04/03/2023, 9:17 PM
Hi, I don’t believe we support ingesting pdf- is there any way to convert to CSV or another format?
m
modern-artist-55754
04/14/2023, 12:29 PM
What sort of metadata do you want to get out of a pdf file? With regard to gz csv/parquet, i think it should work, datahub uses spark to extract metadata from s3 source, spark can work with gz files