jay
08/15/2024, 11:09 PMjay
08/15/2024, 11:16 PM/invite_all
jay
08/15/2024, 11:47 PMSammy Sidhu
08/16/2024, 10:37 PMCory Grinstead
08/20/2024, 7:41 PMKevin Wang
08/28/2024, 8:37 PMSammy Sidhu
08/30/2024, 11:15 PMjay
09/10/2024, 6:16 PMjay
09/10/2024, 6:17 PMDaftError::TypeError Cannot perform comparison on types: Date, Utf8
Perhaps this is a SQL-level optimization we’d need to make @Cory Grinstead?David Blum
09/12/2024, 6:08 PMColin Ho
09/13/2024, 7:13 PMjay
09/19/2024, 11:00 PMSammy Sidhu
09/23/2024, 4:50 AMjay
09/24/2024, 9:49 PMjay
09/26/2024, 8:43 AMCory Grinstead
09/26/2024, 10:20 PMCory Grinstead
10/07/2024, 9:49 PMInterval
datatype. This'll allow for relative date comparisonsjay
10/27/2024, 3:14 AMjay
10/27/2024, 3:17 AMjay
12/11/2024, 4:28 AMColin Ho
12/14/2024, 1:20 AMSandeep
12/17/2024, 8:38 PMKevin Wang
01/30/2025, 10:29 PMRobert Howell
03/11/2025, 10:05 PMHongbo Miao
03/16/2025, 9:20 PMEverett Kleven
04/01/2025, 11:46 PMEverett Kleven
04/28/2025, 10:58 PMSrihari Thyagarajan
05/12/2025, 6:29 AMRobert Howell
06/02/2025, 11:22 PMimport daft
from daft import DataType as dt
from daft import col
# here's our raw sample data which is just some json dump from a sensor
df = daft.from_pydict(
{
"sample": [
'{ "x": 1 }', # missing y, we'll insert 0 in its place
'{ "x": 1, "y": 1 }', # ok
'"HELLO, WORLD!"', # you're not supposed to be here..
'{ "x": 3, "y": 3 }', # ok
'{ "x": 4, "y": 4 }', # ok
'{ "x": false }', # wrong data type..
]
}
)
# select all objects, using 0 as the default for missing keys
filter = """
(. | objects?) | { x: .x // 0, y: .y // 0 }
"""
# our point type is an x/y pair.
point_t = dt.struct({"x": dt.int64(), "y": dt.int64()})
# we have the successfully extracted each sample point, now deserialize into our type.
points = (df.select(col("sample").jq(filter).try_deserialize("json", point_t).alias("point"))).drop_null()
# now find the max from the origin, no need to sqrt it.
p = col("point")
furthest_point = (
points.with_column("distance", p["x"] * p["x"] + p["y"] * p["y"])
.sort("distance", desc=True)
.limit(1)
.select(p)
.to_pydict()["point"][0]
)
assert furthest_point == {"x": 4, "y": 4}
Links
• https://gist.github.com/rchowell/6d03fca6a44be2d8ef71a8d837acc4fa#file-test_jq-py
• https://github.com/Eventual-Inc/Daft/pull/4470ChanChan Mao
06/06/2025, 6:16 PM