is `concat` lazy Like ```df = reduce lambda df1 df2 df1 conc Distributed Data Community #general

is `concat` lazy? Like ```df = reduce( lambda...

Amir Shukayev

08/29/2025, 4:01 AM

concat

lazy? Like

Copy code

df = reduce(
    lambda df1, df2: df1.concat(df2),
    [
        df_provider[i].get_daft_df()
        for i in range(num_dfs)
    ],
)

Is there any way to lazily combine a set of dfs? in any order

Amir Shukayev

08/29/2025, 4:10 AM

I’m okay with providing the warc paths in a list (i think that should work for lazyness) but having this pattern in my codebase would be nice too.

jay

08/29/2025, 4:25 AM

Yeah it’s lazy, but I wonder if there’s any performance implications doing it this way though

jay

08/29/2025, 4:25 AM

In general would recommend passing in a list of paths

👍 1

Amir Shukayev

08/29/2025, 4:26 AM

ah i see, yeah i was ooming but i had a large

num_dfs

Malcolm Greaves

08/29/2025, 5:23 PM

In general Daft likes to keep everything in a single data frame. Unless you are joining a data frame with another one. (For example you’re reading from two different sources and then joining on some key). So as much as you can, it’s best to make sure your data is stored in such a way that you can do a daft.read_XYZ call and get everything into one data frame. If you have desperate sources of data that you want to write to the same area so you can do this, I’d suggest using something like the parquet writer and setting it to append mode instead of overwrite. Then you can use that as a source for your other dataframe.

Open in Slack

Previous Next