Kaskada #general

Join Slack

Ryan

08/31/2023, 6:17 PM

If you haven’t heard already, there are big changes afoot for Kaskada. Here’s an overview of what’s changing and why:

Ryan

08/31/2023, 6:17 PM

https://kaskada.io/2023/08/25/introducing-the-new-kaskada.html

Ryan

08/31/2023, 6:17 PM

Feel free to comment on HN too if you’re into that: https://news.ycombinator.com/item?id=37340796

Ryan

09/01/2023, 2:35 PM

Welcome @Aleksejs Avstreihs!

Ryan

09/12/2023, 5:01 PM

Hey @Sriharsha Yayi welcome to the Kaskada community

Ryan

09/13/2023, 11:36 AM

Some interesting work from google using a variant of transformers for timeseries forecasting: https://blog.research.google/2023/09/tsmixer-all-mlp-architecture-for-time.html?m=1

Ryan

09/14/2023, 2:12 PM

An interesting article highlighting the overlap between AI and IoT. This seems like a very interesting and underdeveloped area: https://www.visualcapitalist.com/sp/artificial-intelligence-of-things/

Ryan

09/26/2023, 8:21 PM

Great talk by Mick Wever at the OSS EU conference about Kaskada, RAG, and real-time generative AI:

https://www.youtube.com/watch?v=FKvqTaUjeq0&t=2772s▾

Cara Dianne

09/28/2023, 2:57 AM

great talk in sf

👍 1

Ryan

09/29/2023, 4:42 PM

We have a bunch of new community members - thank you all for joining, and welcome! (cc @Andrew Heim @Abdu M @Ishnoor Singh @Hex @Jayson Arthur McCauliff @Doug Skinner @Levi Adissi @Artem Sorokin @Kiara Gutiérrez)

👋 5

Eric

10/02/2023, 3:54 PM

Hey all, I wrote a blog post on my learnings from Fine-Tuning LLMs for the BeepGPT project. Check it out here: https://epinzur.github.io/posts/001-openai-fine-tuning-1/post.html

👍 1

Nikolay Vizovitin

10/12/2023, 12:02 PM

Hello! I'm new here, so not sure if that's the right place to ask questions. I'm evaluating Kaskada for use in my project, but unfortunately available documentation is quite lacking at the moment, and many things are unclear even after reading it and looking through the available examples. 1. Can somebody help me out in understanding the memory model of Kaskada? What determines how long events are stored (in memory? somewhere else?), and how to make sure large amount of events does not overwhelm the system? 2. How to process application restarts, e.g. prefetch some amount of events from some external source? How to determine how much? 3. Is it OK to store complex data in events (e.g. whole Slack message dictionaries)? 4. How performant is Kaskada, are there any existing benchmarks? 5. Can Kaskada pull events from external system or only push is supported (which seems to be the case)? I would also appreciate if you can point me to the relevant and up-to-date pieces of code for the questions. I'm planning to use Kaskada from Python.

Ryan

10/26/2023, 4:45 PM

Lots of AI meetups coming up if you’re in NYC, SF or Palo Alto! Starting tonight in NYC there’s an AICamp meetup: https://www.aicamp.ai/event/eventdetails/W2023102615

Ryan

10/26/2023, 4:45 PM

Then Saturday AIify is doing a full-day AI event in San Francisco: https://aiify.io/events/231028sf/

Ryan

10/26/2023, 4:46 PM

Followed up by a VC-focused meetup in Palo Alto next Wednesday: https://www.meetup.com/silicon-valley-gen-ai-community/events/296940963/

Ryan

11/02/2023, 6:35 PM

The guys at Aiify did a great job of recording, editing and releasing the presentation videos - if you’d like to see the presentation we gave it’s available on YouTube here:

https://www.youtube.com/watch?v=jkONaensj-s▾

What do you think?

Petter Egesund

01/13/2024, 7:42 PM

A noob question 🙂 I guess the records of the compete timestream is not kept in memory - and only the aggregations? Otherwise it would cost lots of memory?

Ben Chambers

01/13/2024, 7:44 PM

Yep. Each input timeline can be made from the contents of multiple files. You can also compute timelines from multiple other timelines, which merges the points

Ben Chambers

01/13/2024, 7:44 PM

And yep. Aggregations are the only thing that need to be kept in memory

Petter Egesund

01/13/2024, 7:45 PM

Yes, just found the first one myself 🙂

Jordan Frazier

01/13/2024, 7:46 PM

If you get into more complicated queries, the shift operations may also buffer inputs in memory until they are emitted at some defined condition

Petter Egesund

01/13/2024, 7:47 PM

yes, but how would you deal with an example like this: if purchases.count() > 10 then print purchases.sum()

Petter Egesund

01/13/2024, 7:47 PM

for evaulating the last expression you need the hole dataset?

Ben Chambers

01/13/2024, 7:48 PM

The output is at each point in time — so you get a timeline that contains the sum (up to that point in time) as long as there have been more than 10 points. So you only need two aggregations — the count so far and the sum so far — to compute that

Petter Egesund

01/13/2024, 7:49 PM

So if I get you right both the count and sum is updated every time a new record appear, so you do not need the hole dataset at once?

Jordan Frazier

01/13/2024, 7:51 PM

That’s correct

Petter Egesund

01/13/2024, 7:51 PM

Sounds magic. But how do you now that you should compute the sum, as this line only kicks in after 10 purchases?

Jordan Frazier

01/13/2024, 7:54 PM

@Ben Chambers can correct me if I’m wrong, but the latest version should have the ability to call ‘.explain()’ which should display the compute plan, showing how the operations receive input and produce output. If you check that, you’ll see that the input (purchases) flows into both the count() and sum() aggregations for all records. However, because of the “if”, the plan will include a condition node that only emits the sum() when the count() > 10.

👍 1

Anannya Mishra

10/24/2024, 9:22 PM

Hi folks. Are you an ML or data leader tired of stale data and slow iteration cycles? TurboML is about to change the game, and we want you to be part of it. We at TurboML are launching an exclusive early access program for our Real-Time Machine Learning Platform. We're inviting ML and data professionals to help shape the future of MLOps [only a select few folks will be accepted] • Leveraging real-time data • Live data experimentation • Continuous updates in the ML models • Compare multiple ML models on the freshest data • Real-time feature engineering along with a feature store Why join? • Be among the first to experience cutting-edge real-time MLOps technology • Receive dedicated support from our founding team (ex-Google, AWS) • Enhance your skills in utilizing streaming and batch data across various stages of the ML lifecycle • Gain exclusive insights into our roadmap and feature development • Be part of a community of pioneering beta testers for networking and mentorship opportunities that may lead to potential industry collaborations. Checkout the recent talk given by our CTO on how TurboML's platform overcomes the challenges posed by real-time data that enable fresher features, faster models and more.

https://www.youtube.com/watch?v=wEU9LvCnnY4▾

Just sign up via the link below or you are always welcome to DM me for any queries! https://2ly.link/20W46

Anannya Mishra

12/25/2024, 8:14 AM

https://x.com/siddharthb_/status/1871805715368554710