Tan Phan
08/07/2025, 4:07 PMCory Grinstead
08/08/2025, 5:29 PMCory Grinstead
08/08/2025, 6:45 PMTan Phan
08/08/2025, 7:12 PMcollect()
works fine until I do write_parquet
, the error message is channel closed
(from rust I guess). When there is projection pushdown, it works fineCoury Ditch
08/12/2025, 3:55 PMamit singh
08/12/2025, 6:25 PM่ๆๅณฐ
08/12/2025, 11:45 PMCory Grinstead
08/13/2025, 9:59 PM่ๆๅณฐ
08/14/2025, 3:53 AMKevin Wang
08/14/2025, 9:54 PMRobert Howell
08/14/2025, 11:26 PMdf = daft.read_video_frames(
path="<https://www.youtube.com/watch?v=jNQXAC9IVRw>",
image_height=480,
image_width=640,
is_key_frame=True,
)
Navneeth Krishnan
08/20/2025, 1:36 PMCory Grinstead
08/20/2025, 7:50 PMdaft.File
functionality is ready for review!
https://github.com/Eventual-Inc/Daft/pull/5002Robert Howell
08/20/2025, 11:09 PMimport daft
from <http://daft.functions.ai|daft.functions.ai> import embed_text
daft.set_provider("openai") # <- can configure api_key here
df = daft.from_pydict({"text": ["hello, world!"]})
df = df.with_column("embedding", embed_text(df["text"]))
df.show()
โญโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ text โ embedding โ
โ --- โ --- โ
โ Utf8 โ Embedding[Float32; 1536] โ
โโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโก
โ hello, world! โ <Embedding> โ
โฐโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
(Showing first 1 of 1 rows)
Scoped context with an attached session
import daft
sess = load_session() # your session init logic
with daft.use_context(sess):
# everything within this block resolves via the session.
https://github.com/Eventual-Inc/Daft/pull/4997Kevin Wang
08/21/2025, 12:49 AMCory Grinstead
08/22/2025, 4:19 PMDesmond Cheong
08/26/2025, 8:42 PMrequirements.txt
+ pyproject.toml
files and into a unified pyproject.toml
! https://github.com/Eventual-Inc/Daft/pull/4849Zhiping Wu
08/27/2025, 3:58 AMEverett Kleven
08/28/2025, 5:01 PMEverett Kleven
08/31/2025, 9:15 PM/usr/local/lib/python3.12/dist-packages/daft/dashboard/__init__.py:91: UserWarning: Failed to broadcast metrics over <http://127.0.0.1:3238/api/queries>: HTTP Error 400: Bad Request
warnings.warn(f"Failed to broadcast metrics over {url}: {e}")
can cai
09/02/2025, 11:36 AMclass LanceDBScanOperator(ScanOperator, SupportsPushdownFilters):
...
def can_absorb_filter(self) -> bool:
return isinstance(self, SupportsPushdownFilters)
def can_absorb_limit(self) -> bool:
return False
def can_absorb_select(self) -> bool:
return False
can cai
09/03/2025, 5:27 PMdef stream_plan(
self,
plan: DistributedPhysicalPlan,
partition_sets: dict[str, PartitionSet[ray.ObjectRef]],
) -> Iterator[RayMaterializedResult]:
plan_id = plan.id()
t=time.time()
sz=len(cp.dumps(plan))
print("plan pickle bytes:", sz, "secs:", time.time()-t)
ray.get(self.runner.run_plan.remote(plan, partition_sets))
while True:
materialized_result = ray.get(self.runner.get_next_partition.remote(plan_id))
if materialized_result is None:
break
yield materialized_result
During the test, I found that the size of DistributedPhysicalPlan reached 13GB, which is very unreasonable. Moreover, the serialization process took a very long time, nearly 10 minutes.
Is there any way to easily analyze what is too big in the DistributedPhysicalPlan Object?
plan pickle bytes: 13298686802 secs: 596.3822300434113
Robert Howell
09/03/2025, 8:13 PM<http://daft.ai|daft.ai>
module refactor into its own PR. https://github.com/Eventual-Inc/Daft/pull/5125Robert Howell
09/03/2025, 8:21 PMSrinivas Lade
09/03/2025, 8:54 PMPatrick Kane
09/08/2025, 1:07 AMKevin Wang
09/08/2025, 7:00 PMEverett Kleven
09/09/2025, 5:11 PMNavneeth Krishnan
09/10/2025, 5:02 PMEverett Kleven
09/10/2025, 9:04 PM