Hey all Looking for people who use LLMs in production I want saas #general

Hey all! Looking for people who use LLMs in produc...

Nir Gazit

08/03/2023, 12:21 PM

Hey all! Looking for people who use LLMs in production - I want to learn about how you do monitoring! (as we've been frustrated with all existing tools 😓)

👀 5

Gwen Shapira

08/03/2023, 1:47 PM

Great Q. We are just starting, so looking to hear from others. So far, we’ve been using OpenAI and monitoring like normal API (usage, latency, error rates). But I suspect there’s a lot we should be doing and don’t know yet.

Ram

08/03/2023, 2:00 PM

@Nir Gazit could you mention what you tried and what did not work so far?

Nir Gazit

08/03/2023, 2:18 PM

We're missing a way to measure the quality of the results, and checking for regressions. We ended up building our own internal custom tool. @Gwen Shapira @Ram would love to chat if you're using LLMs in prod 🙂

Eric Jensen

08/03/2023, 4:16 PM

Seems like fertile ground for a new product, especially there’s nothing out there yet!

Nir Gazit

08/03/2023, 4:17 PM

There are plenty of tools trying to do o11y for LLMs, but I think there’s an interesting unexplored edge there

Gwen Shapira

08/03/2023, 4:19 PM

@Nir Gazit do share? I was under the impression that only a human can evaluate the quality of an LLM response...

Nir Gazit

08/03/2023, 4:20 PM

@Gwen Shapira yes, but evaluating a regression in the quality can potentially be done by machines in a quantifiable manner

❤️ 1

Gwen Shapira

08/03/2023, 4:39 PM

I'm all 👂

Aaron Kimball

08/03/2023, 6:59 PM

even if it does require human auditing, there's still probably a useful automation of a test harness environment + aws mechanical turk and the resulting result message bus or dashboard

Aaron Kimball

08/03/2023, 6:59 PM

I would also be curious to hear how others are doing this

Gwen Shapira

08/03/2023, 6:59 PM

woah, I didn't think about turk-driven-development 🙂

Rick Spurgeon

08/17/2023, 3:12 PM

Is one strategy for ‘mechanically’ evaluating response quality to evaluate question rephrasing?

sundi133

10/24/2023, 3:25 AM

it is done manually so far and is very tricky, I started a project to automate some parts of testing as evaluating manually was a big headache for me personally - https://github.com/sundi133/llm-datacraft

sundi133

10/24/2023, 3:26 AM

would love to have any thoughts and collaborate if interested, I have some ideas to automate qa gen more smartly and ranking an llm endpoint

Open in Slack

Previous Next