rich-battery-25772
09/09/2022, 11:05 AMpub struct DeltaTableLoadOptions {
..............
/// Indicates whether DeltaTable should track files.
/// This defaults to `true`
///
/// Some append-only applications might have no need of tracking any files.
/// Hence, DeltaTable will be loaded with significant memory reduction.
pub require_files: bool,
}
The main problem is that the flag couldn’t be managed from the python deltalake’s library (it needs to be changed to manage the flag).
And also a question is how we can calculate the number of files in alternative way.
• Datahub’s code (using of DeltaTable class):
https://github.com/datahub-project/datahub/blob/083ab9bc0e7b9d8ba293afcf9fae4ffb71c4f86c/metadata-ingestion/src/datahub/ingestion/source/delta_lake/delta_lake_utils.py#L24
• Deltalake’s python library:
- DeltaTable class: https://github.com/delta-io/delta-rs/blob/45a0404287287ead94005740dad90b67922e0ec9/python/deltalake/table.py#L72
- RawDeltaTable class: https://github.com/delta-io/delta-rs/blob/45a0404287287ead94005740dad90b67922e0ec9/python/src/lib.rs#L78
• Deltalake’s rust library:
- DeltaTableBuilder class (require_files is in the options: DeltaTableLoadOptions field): https://github.com/delta-io/delta-rs/blob/45a0404287287ead94005740dad90b67922e0ec9/rust/src/builder.rs#L116helpful-optician-78938
09/09/2022, 5:34 PM