A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Hello
We are trying to use the Data Lake source, it works but ingests the data without partitions, and instead of one dataset, we got a thousand.
Do we have some configuration for it?
For example, instead of two datasets:
```dim_geo_location_processed/version=20220312T000000/dim_geo_location_csv
dim_geo_location_processed/version=20220313T000000/dim_geo_location_csv```
we expect to have one `dim_geo_location_processed/dim_geo_location_csv`

Hello Oleksandr ,

Data lake source not detecting partitions is a known limitation. We are working on fixing this limitation. In the mean time I would suggest using a data catalog connector like <https://datahubproject.io/docs/metadata-ingestion/source_docs/hive|Hive> or <https://datahubproject.io/docs/metadata-ingestion/source_docs/glue|Glue> to get this information if available.

cool, thanks
we have Glue and wanted to have S3 in addition

Do you have objects in S3 that are not crawled by Glue? Purely out of curiosity could you say why?

We are working on PoC for our client and just to show the whole functionality for our use case