is `concat` lazy? Like ```df = reduce( lambda...
# general
a
is
concat
lazy? Like
Copy code
df = reduce(
    lambda df1, df2: df1.concat(df2),
    [
        df_provider[i].get_daft_df()
        for i in range(num_dfs)
    ],
)
Is there any way to lazily combine a set of dfs? in any order
I’m okay with providing the warc paths in a list (i think that should work for lazyness) but having this pattern in my codebase would be nice too.
j
Yeah it’s lazy, but I wonder if there’s any performance implications doing it this way though
In general would recommend passing in a list of paths
👍 1
a
ah i see, yeah i was ooming but i had a large
num_dfs
m
In general Daft likes to keep everything in a single data frame. Unless you are joining a data frame with another one. (For example you’re reading from two different sources and then joining on some key). So as much as you can, it’s best to make sure your data is stored in such a way that you can do a daft.read_XYZ call and get everything into one data frame. If you have desperate sources of data that you want to write to the same area so you can do this, I’d suggest using something like the parquet writer and setting it to append mode instead of overwrite. Then you can use that as a source for your other dataframe.