Hello We are trying to use the Data Lake source, i...
# troubleshoot
g
Hello We are trying to use the Data Lake source, it works but ingests the data without partitions, and instead of one dataset, we got a thousand. Do we have some configuration for it? For example, instead of two datasets:
Copy code
dim_geo_location_processed/version=20220312T000000/dim_geo_location_csv
dim_geo_location_processed/version=20220313T000000/dim_geo_location_csv
we expect to have one
dim_geo_location_processed/dim_geo_location_csv
i
Hello Oleksandr , Data lake source not detecting partitions is a known limitation. We are working on fixing this limitation. In the mean time I would suggest using a data catalog connector like Hive or Glue to get this information if available.
g
cool, thanks we have Glue and wanted to have S3 in addition
i
Do you have objects in S3 that are not crawled by Glue? Purely out of curiosity could you say why?
g
We are working on PoC for our client and just to show the whole functionality for our use case