Garrett Weaver
07/25/2025, 4:44 AMselect
prior to writing to parquet, it is not necessarily respected such that the order when reading back is different, is this expected?Garrett Weaver
07/28/2025, 5:33 PMdaft.exceptions.DaftCoreException: Not Yet Implemented: Window functions are currently only supported on the native runner.
A small test with new engine on seems to work, but want to make sure there are not any caveats.Everett Kleven
07/28/2025, 9:41 PMYufan
07/29/2025, 7:30 AMAggregateFnV2
interface to define an efficient aggregation UDFAmir Shukayev
07/31/2025, 5:10 PMPiqi Chen
07/31/2025, 11:59 PMGarrett Weaver
08/01/2025, 4:53 PMGiridhar Pathak
08/06/2025, 9:43 PMSky Yin
08/09/2025, 3:54 PMKesav Kolla
08/14/2025, 5:10 AMMichele Tasca
08/24/2025, 4:24 PM“first”
and “last”
aggregation strategies for window functions? Are there plans to support them?
I commented in this git issue, but also asking here in case i missed something
(Btw.. I’m evaluating different framewroks for a new project of ours, and it’s amazing how many things “just work” in daft. Too bad no first or last is a deal breaker for us)can cai
08/26/2025, 10:10 AMGarrett Weaver
08/27/2025, 5:54 AMKesav Kolla
08/27/2025, 11:26 AMGarrett Weaver
08/27/2025, 6:18 PMdaft.func
vs daft.udf
? I would guess that if the the underlying python code is not taking advantage of any vectorization but maybe just a list comprehension [my_func(x) for x in some_series],
then just use daft.func
?Garrett Weaver
08/28/2025, 4:21 PMVOID 001
08/29/2025, 3:55 AMdf = daft.from_pydict({
"json": [
'{"a": 1, "b": 2}',
'{"a": 3, "b": 4}',
],
})
df = daft.sql("SELECT json.* FROM df")
df.collect()
Amir Shukayev
08/29/2025, 4:01 AMconcat
lazy? Like
df = reduce(
lambda df1, df2: df1.concat(df2),
[
df_provider[i].get_daft_df()
for i in range(num_dfs)
],
)
Is there any way to lazily combine a set of dfs? in any orderSky Yin
08/29/2025, 10:31 PMGarrett Weaver
09/04/2025, 8:41 PMget_next_partition
is running there.Desmond Cheong
09/04/2025, 11:58 PMVOID 001
09/05/2025, 5:56 AMPeer Schendel
09/07/2025, 9:10 AMimport os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2025-03-01-preview",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
# Upload a file with a purpose of "batch"
file = client.files.create(
file=open("test.jsonl", "rb"),
purpose="batch",
extra_body={"expires_after":{"seconds": 1209600, "anchor": "created_at"}} # Optional you can set to a number between 1209600-2592000. This is equivalent to 14-30 days
)
print(file.model_dump_json(indent=2))
print(f"File expiration: {datetime.fromtimestamp(file.expires_at) if file.expires_at is not None else 'Not set'}")
file_id = file.id
Edmondo Porcu
09/07/2025, 4:17 PMChanChan Mao
09/08/2025, 5:29 PMChanChan Mao
09/09/2025, 6:23 PMKyle
09/11/2025, 5:04 AMEdmondo Porcu
09/12/2025, 6:36 PMRakesh Jain
09/12/2025, 10:15 PMKyle
09/15/2025, 6:22 AM