https://kaskada.io logo
Join Slack
Powered by
# general
  • r

    Ryan

    08/31/2023, 6:17 PM
    If you haven’t heard already, there are big changes afoot for Kaskada. Here’s an overview of what’s changing and why:
  • r

    Ryan

    08/31/2023, 6:17 PM
    https://kaskada.io/2023/08/25/introducing-the-new-kaskada.html
  • r

    Ryan

    08/31/2023, 6:17 PM
    Feel free to comment on HN too if you’re into that: https://news.ycombinator.com/item?id=37340796
  • r

    Ryan

    09/01/2023, 2:35 PM
    Welcome @Aleksejs Avstreihs!
  • r

    Ryan

    09/12/2023, 5:01 PM
    Hey @Sriharsha Yayi welcome to the Kaskada community
  • r

    Ryan

    09/13/2023, 11:36 AM
    Some interesting work from google using a variant of transformers for timeseries forecasting: https://blog.research.google/2023/09/tsmixer-all-mlp-architecture-for-time.html?m=1
  • r

    Ryan

    09/14/2023, 2:12 PM
    An interesting article highlighting the overlap between AI and IoT. This seems like a very interesting and underdeveloped area: https://www.visualcapitalist.com/sp/artificial-intelligence-of-things/
  • r

    Ryan

    09/26/2023, 8:21 PM
    Great talk by Mick Wever at the OSS EU conference about Kaskada, RAG, and real-time generative AI:

    https://www.youtube.com/watch?v=FKvqTaUjeq0&t=2772s▾

  • c

    Cara Dianne

    09/28/2023, 2:57 AM
    great talk in sf
    👍 1
  • r

    Ryan

    09/29/2023, 4:42 PM
    We have a bunch of new community members - thank you all for joining, and welcome! (cc @Andrew Heim @Abdu M @Ishnoor Singh @Hex @Jayson Arthur McCauliff @Doug Skinner @Levi Adissi @Artem Sorokin @Kiara Gutiérrez)
    👋 5
  • e

    Eric

    10/02/2023, 3:54 PM
    Hey all, I wrote a blog post on my learnings from Fine-Tuning LLMs for the BeepGPT project. Check it out here: https://epinzur.github.io/posts/001-openai-fine-tuning-1/post.html
    👍 1
  • n

    Nikolay Vizovitin

    10/12/2023, 12:02 PM
    Hello! I'm new here, so not sure if that's the right place to ask questions. I'm evaluating Kaskada for use in my project, but unfortunately available documentation is quite lacking at the moment, and many things are unclear even after reading it and looking through the available examples. 1. Can somebody help me out in understanding the memory model of Kaskada? What determines how long events are stored (in memory? somewhere else?), and how to make sure large amount of events does not overwhelm the system? 2. How to process application restarts, e.g. prefetch some amount of events from some external source? How to determine how much? 3. Is it OK to store complex data in events (e.g. whole Slack message dictionaries)? 4. How performant is Kaskada, are there any existing benchmarks? 5. Can Kaskada pull events from external system or only push is supported (which seems to be the case)? I would also appreciate if you can point me to the relevant and up-to-date pieces of code for the questions. I'm planning to use Kaskada from Python.
    r
    b
    • 3
    • 17
  • r

    Ryan

    10/26/2023, 4:45 PM
    Lots of AI meetups coming up if you’re in NYC, SF or Palo Alto! Starting tonight in NYC there’s an AICamp meetup: https://www.aicamp.ai/event/eventdetails/W2023102615
  • r

    Ryan

    10/26/2023, 4:45 PM
    Then Saturday AIify is doing a full-day AI event in San Francisco: https://aiify.io/events/231028sf/
  • r

    Ryan

    10/26/2023, 4:46 PM
    Followed up by a VC-focused meetup in Palo Alto next Wednesday: https://www.meetup.com/silicon-valley-gen-ai-community/events/296940963/
  • r

    Ryan

    11/02/2023, 6:35 PM
    The guys at Aiify did a great job of recording, editing and releasing the presentation videos - if you’d like to see the presentation we gave it’s available on YouTube here:

    https://www.youtube.com/watch?v=jkONaensj-s▾

    What do you think?
  • p

    Petter Egesund

    01/13/2024, 7:42 PM
    A noob question 🙂 I guess the records of the compete timestream is not kept in memory - and only the aggregations? Otherwise it would cost lots of memory?
  • b

    Ben Chambers

    01/13/2024, 7:44 PM
    Yep. Each input timeline can be made from the contents of multiple files. You can also compute timelines from multiple other timelines, which merges the points
  • b

    Ben Chambers

    01/13/2024, 7:44 PM
    And yep. Aggregations are the only thing that need to be kept in memory
  • p

    Petter Egesund

    01/13/2024, 7:45 PM
    Yes, just found the first one myself 🙂
  • j

    Jordan Frazier

    01/13/2024, 7:46 PM
    If you get into more complicated queries, the shift operations may also buffer inputs in memory until they are emitted at some defined condition
  • p

    Petter Egesund

    01/13/2024, 7:47 PM
    yes, but how would you deal with an example like this: if purchases.count() > 10 then print purchases.sum()
  • p

    Petter Egesund

    01/13/2024, 7:47 PM
    for evaulating the last expression you need the hole dataset?
  • b

    Ben Chambers

    01/13/2024, 7:48 PM
    The output is at each point in time — so you get a timeline that contains the sum (up to that point in time) as long as there have been more than 10 points. So you only need two aggregations — the count so far and the sum so far — to compute that
  • p

    Petter Egesund

    01/13/2024, 7:49 PM
    So if I get you right both the count and sum is updated every time a new record appear, so you do not need the hole dataset at once?
  • j

    Jordan Frazier

    01/13/2024, 7:51 PM
    That’s correct
  • p

    Petter Egesund

    01/13/2024, 7:51 PM
    Sounds magic. But how do you now that you should compute the sum, as this line only kicks in after 10 purchases?
    b
    • 2
    • 2
  • j

    Jordan Frazier

    01/13/2024, 7:54 PM
    @Ben Chambers can correct me if I’m wrong, but the latest version should have the ability to call ‘.explain()’ which should display the compute plan, showing how the operations receive input and produce output. If you check that, you’ll see that the input (purchases) flows into both the count() and sum() aggregations for all records. However, because of the “if”, the plan will include a condition node that only emits the sum() when the count() > 10.
    👍 1
  • a

    Anannya Mishra

    10/24/2024, 9:22 PM
    Hi folks. Are you an ML or data leader tired of stale data and slow iteration cycles? TurboML is about to change the game, and we want you to be part of it. We at TurboML are launching an exclusive early access program for our Real-Time Machine Learning Platform. We're inviting ML and data professionals to help shape the future of MLOps [only a select few folks will be accepted] • Leveraging real-time data • Live data experimentation • Continuous updates in the ML models • Compare multiple ML models on the freshest data • Real-time feature engineering along with a feature store Why join? • Be among the first to experience cutting-edge real-time MLOps technology • Receive dedicated support from our founding team (ex-Google, AWS) • Enhance your skills in utilizing streaming and batch data across various stages of the ML lifecycle • Gain exclusive insights into our roadmap and feature development • Be part of a community of pioneering beta testers for networking and mentorship opportunities that may lead to potential industry collaborations. Checkout the recent talk given by our CTO on how TurboML's platform overcomes the challenges posed by real-time data that enable fresher features, faster models and more.

    https://www.youtube.com/watch?v=wEU9LvCnnY4▾

    Just sign up via the link below or you are always welcome to DM me for any queries! https://2ly.link/20W46
  • a

    Anannya Mishra

    12/25/2024, 8:14 AM
    https://x.com/siddharthb_/status/1871805715368554710
    m
    • 2
    • 1