https://www.getdaft.io logo
Join Slack
Powered by
# daft-dev
  • k

    Kevin Wang

    09/08/2025, 7:00 PM
    @Cory Grinstead @R. C. Howell first batch of Daft functions migrated, PTAL! It's a big PR but we've already discussed and agreed upon the plans for the ones in this PR so I'm hoping the code is straightforward to review. I figured it would be more efficient to just batch all of them but let me know if you prefer if I split the PR up https://github.com/Eventual-Inc/Daft/pull/5086
    c
    • 2
    • 2
  • e

    Everett Kleven

    09/09/2025, 5:11 PM
    Hey team, when are we thinking we will snap the next release? I'd like to take advantage of @Srinivas Lade’s base64 encoder before I PR the structured outputs example. If its later in the week I'll just PR now.
    k
    • 2
    • 1
  • n

    Navneeth Krishnan

    09/10/2025, 5:02 PM
    Is DAFT Async UDFs a work in progress or is it already released? Couldn’t find any examples in the documentation so curious to know how to implement them.
    c
    • 2
    • 5
  • e

    Everett Kleven

    09/10/2025, 9:04 PM
    Is it just me or has the functions docs section been moved? @Kevin Wang @Desmond Cheong
    k
    n
    • 3
    • 15
  • n

    Nish Shukla

    09/11/2025, 9:00 PM
    Hey Team, any updates on this https://github.com/Eventual-Inc/Daft/issues/1954. We are tying to read from databricks delta tables and looks like deletion vectors are still not supported. Havent seen an update on the issue since May so not sure where you guys are on moving to delta-kernal-rs path. cc: @Robert Howell
    r
    j
    • 3
    • 4
  • c

    can cai

    09/16/2025, 6:18 AM
    Hi @Colin Ho, Does daft support previewing audio and video now? Is there a clear positioning and capability to support it at present?
    👀 1
    c
    c
    • 3
    • 6
  • a

    Andrew Kursar

    09/16/2025, 8:30 PM
    Hello! daft has an upper bound on pyiceberg https://github.com/Eventual-Inc/Daft/blob/v0.6.1/pyproject.toml#L43 but the latest 0.10 release of pyiceberg has some nice features. Could the upper bound be removed to exclude 0.9.1 specifically if the bug is fixed? I'm not sure if 0.10 still has the same bugs identified in 0.9.1.
    k
    • 2
    • 3
  • e

    Eric Maynard

    09/17/2025, 5:02 AM
    Hey, is there a doc/guide I should follow to use the integration tests? After following
    CONTRIBUTING.md
    I'm able to actually run tests with e.g.:
    Copy code
    DAFT_RUNNER=native make test EXTRA_ARGS="-m integration -v tests/integration/iceberg/test_iceberg_writes.py"
    However, this fails as it looks like the test expects some other task to have run first and have started a catalog:
    Copy code
    exc = HTTPError('404 Client Error: Not Found for url: <http://localhost:8181/v1/config>'), error_handler = {}
    r
    • 2
    • 2
  • e

    Elgreco

    09/17/2025, 1:02 PM
    Hi all, I ve been trying to get daft working with docling, unfortunately running into issues when setting UDF concurrency, at some point the container gets killed with exit code 135, no logs at all are emitted, it just stops. The memory limit and cpu limit is also never hit, since I set it to run on a huge node on purpose? Any thoughts
    c
    • 2
    • 8
  • d

    Desmond Cheong

    09/20/2025, 6:32 PM
    RFC for adding Datasets to Daft, starting with a Common Crawl dataset 🕸️! https://github.com/Eventual-Inc/Daft/discussions/5248
    🙌 5
    a
    • 2
    • 3
  • n

    Navneeth Krishnan

    09/22/2025, 2:55 PM
    Copy code
    "content": [
                            {"type": "text", "text": lab_prompt},
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/png;base64,{image_data}"
                                }
                            }
                        ]
    Hey guys! At the moment llm_generate function does not support prompt message content like the one above…. I tried, but get type error and other issues. Is this assumption correct or am I doing something wrong? Also, in the llm_generate function implementation, while the expected value type of prompt content is “str” (which is not always the case especially when working with vision models as shown above) …I noticed that prompt content is passed as it is… but then I still face type issues. Can I go around this limitation somehow or do I need to write my own udf for this?
    k
    e
    • 3
    • 4
  • c

    Coury Ditch

    09/22/2025, 7:53 PM
    Is support for sort-merge joins still on the roadmap? Curious if this is a "within the next year" or "within the next month", or somewhere in between level priority. Thanks!
    c
    • 2
    • 7
  • c

    ChanChan Mao

    09/23/2025, 12:33 AM
    Hey everyone! Just posted this in #C041NA2RBFD but wanted to share it in this channel too. We're bringing back Daft Contributor Sync series where we'll highlight work in the open source, cover latest releases and features, and shout out our contributors! This month's contributor sync will be This Thursday September 25 at 4pm PT. We'll be talking about major improvements that we've shipped in the last few months, like Model APIs, UDF improvements, integrations with Turbopuffer, Clickhouse, and Lance, and our new
    daft.File
    datatype. Following that, @Colin Ho will dive into his work on Flotilla, our distributed engine, and showcase some exciting benchmark results 👀 We'll leave plenty of time at the end for questions and discussions. Add to your calendar and we'll see you then! 👋
    ❤️ 1
  • g

    Garrett Weaver

    09/24/2025, 11:11 PM
    trying to put up a small PR, but seeing a mypy error on code I have not changed
    Copy code
    daft/__init__.py:125: error: Name "range" already defined (by an import)  [no-redef]
    c
    • 2
    • 2
  • e

    Elgreco

    09/25/2025, 12:37 PM
    I just figured out after some time that Docling is not thread safe, so I wonder how you guys got docling to work in threaded mode
    c
    • 2
    • 7
  • g

    Garrett Weaver

    09/25/2025, 5:03 PM
    I think this fix for Decimal in
    pyiceberg
    will make it possible to officially bump
    pyiceberg
    to latest, but we would need them to cut a release, maybe y'all can convince them to put out a patch release sooner 🙏
    ❤️ 2
    d
    • 2
    • 2
  • s

    Sen Lin

    09/27/2025, 12:18 AM
    Hi Team, I am using daft to process a large batch of images for embedding. It just fails with error below. After some debugging I found that it failed for this image

    http://images.cocodataset.org/test2017/000000522914.jpg▾

    , perhaps because is it grayscale instead of rgb. • Does daft support grayscale image embedding? • Is there a better way to figure out which row failed out of 1 million? I used binary search. It would be nice to have a better way.
    Copy code
    TypeError: Cannot handle this data type: (1, 1, 1), |u1
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/Users/senlin/Work/smoosense/smoosense-py/tests/do.py", line 33, in <module>
        run()
      File "/Users/senlin/Work/smoosense/smoosense-py/tests/do.py", line 20, in run
        df = df.to_pandas()
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/api_annotations.py", line 38, in _wrap
        return func(*args, **kwargs)
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/dataframe/dataframe.py", line 4221, in to_pandas
        self.collect()
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/api_annotations.py", line 38, in _wrap
        return func(*args, **kwargs)
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/dataframe/dataframe.py", line 4014, in collect
        self._materialize_results()
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/dataframe/dataframe.py", line 3976, in _materialize_results
        self._result_cache = get_or_create_runner().run(self._builder)
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/runners/native_runner.py", line 66, in run
        results = list(self.run_iter(builder))
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/runners/native_runner.py", line 99, in run_iter
        yield from results_gen
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/execution/native_executor.py", line 42, in <genexpr>
        return (
      File "/Users/senlin/Work/smoosense/smoosense-py/.venv/lib/python3.9/site-packages/daft/execution/udf.py", line 139, in eval_input
        raise UDFException(response[1]) from base_exc
    daft.errors.UDFException: User-defined function `<daft.ai._expressions._ImageEmbedderExpression object at 0x116c4d190>` failed when executing on inputs:
      - __TruncateRootUDF_0-12-0__ (Image[MIXED], length=1)
    c
    d
    s
    • 4
    • 11
  • n

    Navneeth Krishnan

    09/28/2025, 10:56 AM
    Hey guys, I’m trying to achieve streaming execution for pdf processing. My use case: 1 pdf that has N pages Each page needs to be run through OCR model (inference endpoint) and entity extraction model (inference endpoint ). Pages have no connection to each other whatsoever. So if page 2 ocr completes before page 1, page 2 entity extraction should start automatically. So I’m looking for a streaming implementation. Does anyone have an already implemented solution? Would help if someone can give me some direction on how to achieve this with DAFT. I am aware that there is a document processing example in the docs but not sure if it is a streaming execution implementation.
    c
    • 2
    • 9
  • g

    Garrett Weaver

    09/28/2025, 3:02 PM
    For day_of_week function, I was thinking a simple addition is to support iso (1-7). Was going to try to add, should this be a parameter in current function or separate function?
    d
    • 2
    • 1
  • m

    Matthew Powers

    10/02/2025, 2:24 PM
    We just released a geospatial engine that's written in Rust called SedonaDB. I was chatting with some team members about geometry/geography types in Daft, but I can't seem to find those messages anymore. In any case, let me know if you're still thinking about adding geo types to Daft. It would be awesome if we could make them interoperable with SedonaDB! I'm guessing y'all don't wanna build a full fledged geo engine so giving users an amazing Daft + SedonaDB experience may be the best of both worlds.
    🔥 5
    r
    d
    • 3
    • 3
  • e

    Everett Kleven

    10/02/2025, 8:01 PM
    Is there a canonical way of type hinting what DataType an expression input for a function? The Expression arguments for the generate_text function require specific inputs and I'm not sure how to concisely present that in the preview.
    👀 1
    k
    c
    • 3
    • 5
  • e

    Everett Kleven

    10/07/2025, 6:08 PM
    Is there a standard way of specifying io_config for functions? (see thread)
    r
    • 2
    • 3
  • p

    Phillip Chiu

    10/14/2025, 10:52 PM
    Hello! New joiner here with a question about building daft I am working on packaging daft on conda-forge. The latest version of daft available on conda-forge is an ancient 0.4.6. This process is currently stalled for the following chain of reasons: • conda-forge requires packages to be built from source • building daft from source requires
    bun
    • conda-forge requires all build dependencies to installed from conda-forge •
    bun
    is not yet available on conda-forge (and building
    bun
    is harder than building daft, apparently) If I understand correctly,
    bun
    is only used for generating some assets for the
    daft-dashboard
    component. My question, to the daft developers - is there an easy way to build daft without the
    daft-dashboard
    feature, to relieve the dependency on
    bun
    and get daft released on conda-forge? Or are there any other solutions out there? (As an example, could the outputs of whatever
    bun
    generates possibly be included in the source release of daft on Github?) Thanks for the patience!
    👀 1
    s
    • 2
    • 7
  • c

    can cai

    10/17/2025, 5:58 AM
    Hi, I encountered a slow execution issue when using read_parquet + where + limit. Has anyone else encountered this problem please? https://github.com/Eventual-Inc/Daft/issues/5406
    j
    c
    z
    • 4
    • 8
  • v

    VOID 001

    10/21/2025, 1:02 PM
    Hi, I am looking at the issue: https://github.com/Eventual-Inc/Daft/issues/4179 I want to know what is the exact requirement for this issue? Is it for fixing the "TABLE" only keyword? or fixing all the FROM KEYWORD case? Looks like if we can distinguish the "FROM" keyword generated by the tokenizer and look for next non-whitespace token that matches the keyword "table". We shall be able to prevent such a case, but I am not sure if this is any corner case that can bypass this check.
    m
    • 2
    • 7
  • v

    VOID 001

    10/24/2025, 1:15 AM
    I found that currently daft-dashboard only listen on v4-all address. But not listening on v6 address. And it by default listen to 0.0.0.0 which might have a potential security risk for some deployment environment. Are we going to change the daft-cli to be able to support a --addr flag so that we can listen on any address we specified?
    s
    • 2
    • 5
  • e

    Everett Kleven

    10/24/2025, 6:01 PM
    Can we remove all of the Qwen 3 models from our sentence transformers unit tests? I'm not sure what we are getting out of downloading 8B, 4B, or even 0.6 B models in unit tests.
    d
    r
    • 3
    • 18
  • m

    Malcolm Greaves

    10/28/2025, 3:49 AM
    👋 Hello Daft community! daftIf anyone is itching to hack on Daft, here are a slew of great first issues to dive into! typingcat We're happy to help contributors with design reviews and discussions on Slack S and Github Issues G Add support for Series[start:end] (4771) Summary: Request to add
    __getitem__
    slicing on Series so
    series[start:end]
    works like in other dataframe libraries What needs to be done: Implement Series slicing (including negative indices/step handling) in the Python API and execution layer; add tests and documentation. Expression.var Summary: Add
    var()
    to the Expression API (ideally with a
    ddof
    parameter) to improve Narwhals support. What needs to be done: Implement
    Expression.var(ddof=…)
    with the appropriate aggregation kernel, cover numeric types, and add docs and unit tests. Expression.pow Summary: Add exponentiation support, e.g.,
    daft.col("a").pow(2)
    or
    daft.col("a") ** 2
    , for Narwhals compatibility. What needs to be done: Implement the power operation (scalar and column exponents), ensure type promotion/null-handling, and add examples/tests and docs. Expression.product Summary: Introduce a product aggregation (e.g.,
    daft.col("a").product()
    ) to multiply values across rows/groups; also related to Narwhals. What needs to be done: Add the product aggregation kernel, handle overflow/NaN behavior, add tests across numeric dtypes, and document the API. `ddof` argument to `stddev` Summary: Add a
    ddof
    parameter to
    stddev
    for NumPy/pandas-style semantics (useful in window/`over` contexts) What needs to be done: Update the aggregation to use
    N - ddof
    where appropriate, verify behavior vs. reference libraries, and add tests and docs. Support for `SHOW TABLES LIKE ...`for MemoryCatalog Summary: MemoryCatalog currently doesn’t support
    SHOW TABLES LIKE pattern
    . What needs to be done: Implement LIKE-pattern filtering (including wildcards/escaping) in the MemoryCatalog, add parser logic, tests, and documentation. Hash rows of dataframe Summary: Add a simple way to compute a stable hash per row (e.g.,
    df.with_column("hash", daft.functions.hash("*"))
    ) for de-duplication/fingerprinting. What needs to be done: Expose a row-hashing API backed by existing kernels; ensure consistent behavior across types/nulls, and add tests and docs. sql: bad error message if trying to read from a table called 'table' Summary: Reading from a DataFrame named
    table
    via
    daft.sql
    raises a misleading parse error; the message should explain that
    table
    is not a valid identifier. What needs to be done: Improve error handling and messaging around reserved identifiers; add a regression test for
    table
    as a name. `df.show` max_width does not work without manually setting the `format` option as well Summary:
    show(max_width=…)
    has no effect unless
    format="fancy"
    is also set; users expect width to be respected by default What needs to be done: Fix display-width handling in the preview logic so
    max_width
    works without `format`; add unit tests and update docs/help text. Missing docstring items for Expression page of API Docs Summary: Docs maintenance task: a number of Expression functions lack proper docstring sections (parameters/returns/examples). What needs to be done: Fill in missing docstrings across the listed functions, build the docs to verify rendering, and submit the updates.
    ❤️ 8
  • v

    VOID 001

    11/02/2025, 4:05 PM
    Looks like the latest version of the
    main
    branch will fail when building the daft-dashboard?
    Copy code
    ./src/app/queries/page.tsx
      Error evaluating Node.js code
      ResolveMessage: Cannot find module './node_modules/babel-plugin-react-compiler' from '/home/projects/daft/src/daft-dashboard/frontend/node_modules/next/dist/compiled/babel/bundle.js'
    
      Make sure that all the Babel plugins and presets you are using
      are defined as dependencies or devDependencies in your package.json
    Caused by:                                                                                                                                                                                                                                  [50/1641]
      process didn't exit successfully: `/home/projects/daft/target/debug/build/daft-dashboard-65c660a79cd99970/build-script-build` (exit status: 1)
      --- stdout
      cargo:rustc-env=DASHBOARD_ASSETS_DIR=/home/projects/daft/target/debug/build/daft-dashboard-18cbc4a68d544994/out
      cargo:rerun-if-changed=frontend/src/
      cargo:rerun-if-changed=frontend/bun.lockb
      cargo:rerun-if-changed=build.rs
      bun install v1.3.1 (89fa0f34)
    
      Checked 428 installs across 490 packages (no changes) [13.00ms]
         ▲ Next.js 16.0.1 (Turbopack)
    
         Creating an optimized production build ...
      cargo:warning=Failed to build frontend assets
    
      --- stderr
      Saved lockfile
      $ next build --no-mangling
       ⚠ Mangling is disabled. Note: This may affect performance and should only be used for debugging purposes.
    
      > Build error occurred
      Error: Turbopack build failed with 3 errors:
      ./src/app/layout.tsx
      Error evaluating Node.js code
      ResolveMessage: Cannot find module './node_modules/babel-plugin-react-compiler' from '/data04/projects/daft/src/daft-dashboard/frontend/node_modules/next/dist/compiled/babel/bundle.js'
    
      Make sure that all the Babel plugins and presets you are using
      are defined as dependencies or devDependencies in your package.json
      file. It's possible that the missing plugin is loaded by a preset
      you are using that forgot to add the plugin to its dependencies: you
      can workaround this problem by explicitly adding the missing package
      to your top-level package.json.
    
    
      Import traces:
        Client Component Browser:
          ./src/app/layout.tsx [Client Component Browser]
          ./src/app/layout.tsx [Server Component]
    
        Client Component SSR:
          ./src/app/layout.tsx [Client Component SSR]
          ./src/app/layout.tsx [Server Component]
    
    
      ./src/app/queries/page.tsx
      Error evaluating Node.js code
      ResolveMessage: Cannot find module './node_modules/babel-plugin-react-compiler' from '/home/projects/daft/src/daft-dashboard/frontend/node_modules/next/dist/compiled/babel/bundle.js'
    e
    • 2
    • 3
  • e

    Everett Kleven

    11/04/2025, 7:26 PM
    I am actively tracking a return dtype issue for
    embed_text
    and
    embed_image
    as of 0.6.8.
    Copy code
    daft.exceptions.DaftCoreException: DaftError::External task 256 panicked with message "not implemented: Daft casting from Struct[data: List[Float32], shape: List[UInt64]] to Float32 not implemented
    If you are experiencing a similar errors, please add your comments to https://github.com/Eventual-Inc/Daft/issues/5494 .
    c
    • 2
    • 1