<@U084APK40NS> im trying to test krispAI and find ...
# ask-ai
c
@millions-hair-24956 im trying to test krispAI and find the right settings level for VAD. krisp filters the audio out so i dont have the raw audio though. what is best practice on testing this?
g
It seems likely that testing KrispAI with VAD settings involves a few considerations. Since Krisp filters out noise, you might not have access to the raw audio, which can affect VAD performance. My understanding is that you should first ensure that Krisp is enabled after establishing a connection, as it requires authentication to be effective, according to LiveKit Slack. For testing, you might want to compare the VAD performance with and without Krisp enabled to understand its impact. Adjusting the VAD parameters such as
activation_threshold
,
min_speech_duration
, and
min_silence_duration
can help fine-tune the sensitivity to speech, as suggested in LiveKit Docs. Would you like more detailed guidance on adjusting these parameters? Sources: - LiveKit Docs - LiveKit Slack 👍 1 button 👎 0 button 💡 Hint: Mention @millions-hair-24956 in the thread for followups.
c
and do people just tweak settings and test the quality via qualitative assessment? how can i automate the testing of this or are there best practices around testing these things in general with livekit?
@millions-hair-24956 and do people just tweak settings and test the quality via qualitative assessment? how can i automate the testing of this or are there best practices around testing these things in general with livekit?
g
It seems likely that many users initially rely on qualitative assessment to tweak settings for audio quality, especially when using noise cancellation and VAD. However, for a more automated approach, you can simulate different audio environments and use automated scripts to test various VAD settings. According to LiveKit Docs, LiveKit provides tools for load testing and simulating real-world conditions, which can be adapted for testing audio quality. To automate testing, you might consider using a combination of pre-recorded audio samples with varying noise levels and a script to adjust VAD parameters systematically. You can then analyze the results by checking the accuracy of speech detection and transcription. This approach allows you to quantify the impact of different settings without relying solely on subjective listening tests. Would you like more detailed steps on setting up such an automated testing environment? Sources: - LiveKit Docs 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
and how do i 'load the audio' in to tests? is it basically like i record an MP4 and can 'fake' a person talking into my computer using some library?
@millions-hair-24956 and how do i 'load the audio' in to tests? is it basically like i record an MP4 and can 'fake' a person talking into my computer using some library?
g
To load audio into tests, you can simulate a participant by programmatically publishing pre-recorded audio files into a LiveKit room. This can be done using the LiveKit SDKs, which allow you to publish audio tracks from files. According to LiveKit Slack, you can use the Python SDK to publish a WAV file, which involves creating an audio track and pushing audio frames to it. Here's a basic approach: 1. Prepare Audio Files: Record or obtain audio files in a supported format like WAV or MP4. 2. Publish Audio Track: Use the LiveKit SDK to create a local audio track and publish it to a room. This involves reading the audio file, converting it to the appropriate format, and sending it as audio frames. 3. Simulate User Interaction: This setup can simulate a user speaking into the system, allowing you to test VAD and other audio processing features. Would you like more detailed guidance on implementing this with a specific SDK? Sources: - LiveKit Slack 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
@millions-hair-24956 yes id like just more info on the code to load this in and what an example test may look like
g
To simulate audio input in a LiveKit room using pre-recorded audio files, you can use the LiveKit Python SDK to publish audio tracks. Here's a step-by-step guide with a code example: 1. Prepare Your Audio File: Ensure your audio file is in a format like WAV, which is easy to process. 2. Set Up Your Environment: Install the necessary Python packages, including
livekit
and any audio processing libraries like
numpy
and
wave
. 3. Publish Audio Track: Use the LiveKit SDK to create an audio track and publish it to a room. Here's a basic example:
Copy code
import asyncio
import wave
import numpy as np
from livekit import rtc

async def publish_audio(file_path, room_url, token):
    # Open the audio file
    with wave.open(file_path, 'rb') as wav_file:
        sample_rate = wav_file.getframerate()
        num_channels = wav_file.getnchannels()
        frames = wav_file.readframes(wav_file.getnframes())
        pcm_data = np.frombuffer(frames, dtype=np.int16)

    # Create an audio source and track
    source = rtc.AudioSource(sample_rate, num_channels)
    track = rtc.LocalAudioTrack.create_audio_track("audio", source)

    # Connect to the room
    room = rtc.Room()
    await room.connect(room_url, token)

    # Publish the audio track
    await room.local_participant.publish_track(track)

    # Send audio frames
    frame_duration_ms = 20
    samples_per_frame = int(sample_rate * frame_duration_ms / 1000)
    for start in range(0, len(pcm_data), samples_per_frame):
        end = start + samples_per_frame
        frame_data = pcm_data[start:end].tobytes()
        samples_per_channel = len(frame_data) // (num_channels * 2)
        if samples_per_channel == 0:
            continue
        audio_frame = rtc.AudioFrame(
            data=frame_data,
            sample_rate=sample_rate,
            num_channels=num_channels,
            samples_per_channel=samples_per_channel
        )
        await source.capture_frame(audio_frame)
        await asyncio.sleep(frame_duration_ms / 1000)

    # Keep the connection alive
    try:
        while True:
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        await room.disconnect()

# Example usage
asyncio.run(publish_audio("path_to_your_audio.wav", "wss://your_livekit_server", "your_access_token"))
This script reads a WAV file, creates an audio track, and publishes it to a LiveKit room. You can use this setup to automate testing by simulating different audio inputs and observing how your VAD settings respond. Would you like more information on any specific part of this process? Sources: - https://livekit-users.slack.com/archives/C07FVFGAUKX/p1733152587.089089 - https://docs.livekit.io/python/livekit/plugins/silero/index.html#livekit.plugins.silero.VAD 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
@millions-hair-24956 and what is best pracitice about testing suites. do i do this to test specific functionalities but how do i test for instance how well krisp may or may not be working. a prerecorded clip wont do that i assume
g
It seems likely that testing the effectiveness of Krisp or any noise cancellation feature involves more than just using pre-recorded clips. My understanding is that you should consider a combination of real-time testing and automated scripts to evaluate performance under various conditions. According to LiveKit Slack, Krisp is effective for background noise cancellation but may not handle background speakers well. For best practices in testing: 1. Real-time Testing: Conduct live tests in environments with varying noise levels to assess Krisp's performance. This can help you understand how well it cancels noise in real-world scenarios. 2. Automated Testing: Use scripts to simulate different noise conditions by playing background noise through speakers while running your application. This can help you measure the impact of Krisp on audio quality and VAD performance. 3. Quantitative Metrics: Collect metrics such as signal-to-noise ratio (SNR) and transcription accuracy to objectively evaluate Krisp's effectiveness. 4. User Feedback: Gather feedback from users in different environments to understand their experience with noise cancellation. Would you like more detailed guidance on setting up these tests? Sources: - LiveKit Slack 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
@millions-hair-24956 oh so i can just measure the SNR or things like that of the audio that comes into livekit?
g
It seems likely that measuring the Signal-to-Noise Ratio (SNR) of audio coming into LiveKit can provide insights into the effectiveness of noise cancellation features like Krisp. However, LiveKit itself does not provide built-in tools for measuring SNR directly. You would need to implement this measurement using external audio processing libraries. To measure SNR, you can use libraries such as
numpy
and
scipy
in Python to analyze the audio data. Here's a basic approach: 1. Capture Audio Data: Use the LiveKit SDK to capture audio frames from the incoming audio track. 2. Calculate SNR: Use audio processing techniques to separate the signal from the noise and calculate the SNR. This involves identifying the power of the signal and the power of the noise. 3. Analyze Results: Compare the SNR values before and after applying noise cancellation to evaluate its effectiveness. Would you like more detailed guidance on implementing SNR measurement in your setup? Sources: - https://docs.livekit.io/client-sdk-flutter/livekit_client/RemoteAudioTrack/getReceiverStats.html - https://docs.livekit.io/client-sdk-js/interfaces/AudioReceiverStats.html Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
@millions-hair-24956 whats the best way to 'send the audio out for evaluation'
g
To send audio out for evaluation, you can follow these steps: 1. Capture Audio: Use the LiveKit SDK to capture audio from a participant's track. This can be done by subscribing to the track and accessing the audio frames. 2. Process Audio: Convert the audio frames into a format suitable for analysis, such as WAV or PCM. This can be done using libraries like
wave
in Python. 3. Send for Evaluation: Once the audio is in the desired format, you can send it to an external service or tool for evaluation. This could be a machine learning model for speech analysis, a third-party API for transcription, or a custom script for quality assessment. 4. Analyze Results: Collect and analyze the results from the evaluation to assess the audio quality or other metrics of interest. Would you like more detailed guidance on any specific part of this process? Sources: - LiveKit Slack 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.