Distributed Data Community #daft-dev

Join Slack

Desmond Cheong

05/22/2025, 11:07 PM

@Kevin Wang https://github.com/Eventual-Inc/Daft/pull/4410 might help our poor github runners

Cory Grinstead

05/23/2025, 3:11 PM

small PR to fix a regression in str.substr https://github.com/Eventual-Inc/Daft/pull/4415

Robert Howell

05/23/2025, 5:53 PM

@Cory Grinstead little PR to add a prelude for the required ScalarUDF imports. https://github.com/Eventual-Inc/Daft/pull/4416

Colin Ho

05/27/2025, 5:17 PM

Cutting release today, please list any blockers in the 🧵

Srinivas Lade

05/27/2025, 11:07 PM

Very excited for this, can't wait to see the final move to logical plans so we can start refactoring the optimizer rules

Robert Howell

05/27/2025, 9:13 PM

PR to enable scalar function lowering e.g. overloads which let's us do type-checking then lowering in the planner rather than during evaluation hence 'dynamic' naming in the PR. https://github.com/Eventual-Inc/Daft/pull/4431

Robert Howell

05/28/2025, 7:03 PM

PR to create a Daft value from a JSON string! • reviewer: @Cory Grinstead • customer request: @NikkTheGreek via https://github.com/Eventual-Inc/Daft/issues/3994 • PR: https://github.com/Eventual-Inc/Daft/pull/4438 Example

Copy code

import daft

df = daft.from_pydict({

    "person": [
        '{"name": "Alice", "age": 30}',
        '{"name": "Bob", "age": 25}',
        '{"name": "Charlie", "age": 35}',
    ]
})

# STRUCT<name: STRING, age: BIGINT>
person_type = dt.struct(
    {
        "name": dt.string(),
        "age": dt.int64(),
    }
)

df.select(df["person"].from_json(person_type)).show()

╭────────────────────────────────╮
│ person                         │
│ ---                            │
│ Struct[name: Utf8, age: Int64] │
╞════════════════════════════════╡
│ {name: Alice,                  │
│ age: 30,                       │
│ }                              │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ {name: Bob,                    │
│ age: 25,                       │
│ }                              │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ {name: Charlie,                │
│ age: 35,                       │
│ }                              │
╰────────────────────────────────╯

(Showing first 3 of 3 rows)

🙌 3

🙏 1

Kevin Wang

05/29/2025, 1:26 AM

@Robert Howell PR to fully implement MemoryTable and MemoryCatalog: https://github.com/Eventual-Inc/Daft/pull/4445

🙌 1

🔥 3

Robert Howell

05/30/2025, 10:15 PM

@Kevin Wang back at you! Here's a deprecated API cleanup, and I've added a CTE map to daft.sql which better fits the SQL model and allows us to deprecate then remove SQLCatalog. https://github.com/Eventual-Inc/Daft/pull/4460

Garrett Weaver

05/30/2025, 11:10 PM

👋 I am querying trino via daft and running into some unexpected behavior. I have one query where I left join a table and one of the columns ends up having all null values. when written to parquet, this column ends up with a type of null, but it should have type string. this is unexpected behavior, right?

Robert Howell

06/02/2025, 11:13 PM

@Kevin Wang @Cory Grinstead PR for a flattened

.jq

method. https://github.com/Eventual-Inc/Daft/pull/4470

Cory Grinstead

06/03/2025, 6:37 PM

small PR to get daft-dashboard working without errors again. https://github.com/Eventual-Inc/Daft/pull/4475 Note that the visualization is completely disabled for now until we can come up with a better visualization than the ugly and broken mermaid viz.

Kevin Wang

06/03/2025, 11:13 PM

@jay PR ready for UC volumes support!! I'll get a demo notebook for you soon as well https://github.com/Eventual-Inc/Daft/pull/4476

🙌 1

Cory Grinstead

06/04/2025, 4:02 PM

Expr refactor is officially done after this PR is merged! https://github.com/Eventual-Inc/Daft/pull/4480

🙌 1

Giridhar Pathak

06/04/2025, 7:03 PM

list type column iteration/processing 🧵

Giridhar Pathak

06/04/2025, 7:12 PM

isinstance() checks on data frame column values. 🧵

Garrett Weaver

06/05/2025, 5:17 PM

~~probably missing this in the docs, how can I get the datatype of the elements in a list type?~~, nvm

.dtype

Cory Grinstead

06/05/2025, 5:40 PM

FYI, I have a few PR's I'd like to get reviewed. https://github.com/Eventual-Inc/Daft/pulls?q=is%3Aopen+is%3Apr+author%3Auniversalmind303+-is%3Adraft

Cory Grinstead

06/05/2025, 10:16 PM

PR to do a little bit of performance cleanup on the expressions (ScalarFunction). https://github.com/Eventual-Inc/Daft/pull/4489 cc @Robert Howell I think this moves us a bit closer to your intended usage of the

ScalarFunctionFactory

as the

ScalarFunction

is no longer directly storing the

ScalarUDF

, but instead resolving it during planning.

Kevin Wang

06/11/2025, 1:41 AM

@Sammy Sidhu PR to update our AWS crates. Will work on removing our dependency/vendoring of openssl as a next step https://github.com/Eventual-Inc/Daft/pull/4508

Cory Grinstead

06/11/2025, 9:17 PM

Could I get a quick review on this PR? It's to help debug an issue a daft spark user is facing https://github.com/Eventual-Inc/Daft/pull/4521

Cory Grinstead

06/11/2025, 11:49 PM

could I also get a review on this one here https://github.com/Eventual-Inc/Daft/pull/4524 related to the same issue as 4521. cc @Kevin Wang

Everett Kleven

06/12/2025, 7:24 PM

Quick heads up: Major Service outage on GCP -> No claude.

🥲 2

Giridhar Pathak

06/17/2025, 12:40 AM

any plans to add a Dataframe.write_json() function to compliment the write_csv() and write_parquet() etc?

Zhiping Wu

06/20/2025, 3:22 AM

Hi, may I check do we have any github workflow bot which accept rerun command to rerun the failed CI? example from hudi as bellow picture shows, or how can I rerun/re-trigger the failed ci without re-submit a new change?

Xianyang Liu

06/23/2025, 8:12 AM

Hi, has anybody met a problem such as the following when debugging with IntelliJ? Seems like the io module shadows the built-in io.

Everett Kleven

06/24/2025, 1:06 AM

Working on an issue focused on arrow schema mismatches . Is there any known work related to this besides: https://github.com/Eventual-Inc/Daft/issues/3605 https://github.com/Eventual-Inc/Daft/issues/1958

Robert Howell

06/25/2025, 11:22 PM

@Desmond Cheong @Srinivas Lade here's a PR that enables us to pushdown filters into LanceDB. https://github.com/Eventual-Inc/Daft/pull/4616

🙌 2

Matthew Powers

06/26/2025, 12:21 AM

Does Daft support geospatial types now? Any posts/info where I can learn more?

Xin Xianyin

06/30/2025, 12:37 PM

hello, i’m a new developer. It seems daft used both pyproject.toml and requirements.txt to manage the dependencies. what’s the relation between the two? why we don’t only use pyproject.toml and uv to manage the dependencies?