https://venicedb.org logo
Join Slack
Powered by
# github-notifications
  • g

    GitHub

    05/05/2025, 11:02 PM
    #1764 [server][da-vinci] Bumped up rocksdb-9.10.0 Pull request opened by gaojieliu ## Problem Statement We are observing a high space amplification in one of the heavy use cases and we would like to evaluate whether the latest BlobDB related optimizations can improve that or not. ## Solution Picked up the additional tuning of BlobDB, which is available since 9.7.0: Changed the semantics of the BlobDB configuration option blob_garbage_collection_force_threshold to define a threshold for the overall garbage ratio of all blob files currently eligible for garbage collection (according to blob_garbage_collection_age_cutoff). This can provide better control over space amplification at the cost of slightly higher write amplification. ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
    • 1
    • 1
  • g

    GitHub

    05/05/2025, 11:49 PM
    new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by gaojieliu
    <https://github.com/linkedin/venice/commit/8d5e5739b42bbab39514e77790c1e8f3667a0e51|8d5e5739>
    - [server][da-vinci] Bumped up rocksdb-9.10.0 (#1764) linkedin/venice
  • g

    GitHub

    05/06/2025, 1:04 AM
    #1765 [avro][router][server][controller][client][da-vinci][vpj] Removed SSL handshake offloading feature Pull request opened by huangminchn ## Problem Statement 1. Test results show that Netty SSL handshake offloading feature doesn't work with OpenSSL. 2. Besides, users observed casting exception in avro serializer, but no stacktrace. ## Solution 1. Removed the SSL handshake offloading feature and all its related configs. 2. Log full stacktrace when avro serializer failed to serialize. ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/06/2025, 5:00 AM
    #1766 [server][dvc] Add duplicate key count metric for rocksDB Pull request opened by majisourav99 ## Problem Statement The version topic collects duplicate keys for writing or delete to the same keys. During Kafka log compaction those are they are de-duped and VT size is maintained under control. ## Solution But without log-compaction, we need to find stores with lots of duplicate writes which can be repushed to trim the VT topic size. This PR adds a metric to find the duplicate key count from rocksdb statistics. ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/06/2025, 9:25 PM
    #1767 [server][controller][vpj][dvc][cc] Pass PubSubPositionTypeRegistry to PubSub consumers Pull request opened by sushantmane ## Pass PubSubPositionTypeRegistry to PubSub consumers Introduced PubSubConsumerAdapterContext to encapsulate all required configurations and runtime dependencies for creating PubSub consumers via factory APIs. This includes passing components like PubSubPositionTypeRegistry and PubSubTopicRepository to downstream consumers in a clean, unified way. Updated all call sites to use this new context object when invoking PubSubConsumerAdapterFactory#create, replacing individual parameter passing.
    ## AI Generated Summary of Changes
    This pull request introduces significant changes to the PubSub integration within the Da Vinci client, focusing on simplifying configuration management and improving the internal handling of PubSub components. The changes include replacing the
    PubSubPositionDeserializer
    with a more comprehensive PubSub component initialization process, refactoring constructors across multiple classes, and updating the
    VeniceChangelogConsumerClientFactory
    to streamline PubSub consumer creation.
    ### PubSub Configuration and Initialization Enhancements:
    • Introduced a new method
    initializePubSubInternals
    in
    ChangelogClientConfig
    to centralize the initialization of PubSub-related components (e.g.,
    PubSubPositionDeserializer
    ,
    PubSubConsumerAdapterFactory
    ) based on consumer properties. This replaces manual configuration and ensures consistency. [1] [2]
    • Removed the
    setPubSubPositionDeserializer
    method and its associated getter from
    ChangelogClientConfig
    , as the deserializer is now initialized internally. [1] [2]
    ### Constructor Refactoring:
    • Updated constructors for
    InternalLocalBootstrappingVeniceChangelogConsumer
    ,
    LocalBootstrappingVeniceChangelogConsumer
    , and
    VeniceAfterImageConsumerImpl
    to remove the
    PubSubPositionDeserializer
    parameter. These classes now rely on the internally managed PubSub components in
    ChangelogClientConfig
    . [1] [2] [3]
    ### Streamlined PubSub Consumer Creation:
    • Refactored
    VeniceChangelogConsumerClientFactory
    to replace the
    getConsumer
    method with
    getPubSubConsumer
    , which uses the new PubSub initialization logic in
    ChangelogClientConfig
    . This ensures that all PubSub components are derived from a single source of truth. [1] [2]
    ### Code Cleanup:
    • Removed unused imports and redundant fields related to the old
    PubSubPositionDeserializer
    approach across multiple files. [1] [2]
    These changes enhance maintainability and reduce the risk of configuration inconsistencies by consolidating PubSub-related logic into a single initialization method.
    ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/06/2025, 10:14 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by gaojieliu
    <https://github.com/linkedin/venice/commit/40e70431d1de277d23bea91b5f707e1900ced2c4|40e70431>
    - Revert "[server][da-vinci] Bumped up rocksdb-9.10.0 (#1764)" (#1768) linkedin/venice
  • g

    GitHub

    05/07/2025, 1:04 AM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by misyel
    <https://github.com/linkedin/venice/commit/8c48f8b7e9ea79749dbe98fa88943e5d315f0cec|8c48f8b7>
    - [controller] Fix store migration after a target region push with deferred swap (#1760) linkedin/venice
  • g

    GitHub

    05/07/2025, 5:53 PM
    #1769 [test][controller] enable rt versioning in prod and tests Pull request opened by arjun4084346 ## Problem Statement Use RT versioning in all the tests run. more on this feature is in #1555 Fixed a bug in RealTimeTopicSwitcher, in finding using RT topic name correctly. Also fixed a bug in parseStoreFromRealTimeTopic in finding store from separate rt ## Solution ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/07/2025, 8:08 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by misyel
    <https://github.com/linkedin/venice/commit/3ad768c59f0ab8999910eafd851b75a70ea71c41|3ad768c5>
    - [da-vinci] Add delayed ingestion in dvc for target region push with deferred swap (#1510) linkedin/venice
  • g

    GitHub

    05/07/2025, 8:08 PM
    #1510 [da-vinci] Add delayed ingestion in dvc for target region push Pull request opened by misyel ## Summary, imperative, start upper case, don't end with a period Previously, we added in deferred swap for a target region push in #1375 and #1421 . For stores that have dvc clients that use this feature, we would like to delay ingestion for non target regions. This pr delays ingestion in dvc for non target regions until the version has been swapped to the new version. The total time for a version to be fully available on dvc will be target region push time + wait time (default 1hr) + dvc ingestion time. This feature is off by default for dvc clients unless
    DEFERRED_VERSION_SWAP_SERVICE_WITH_DVC_CHECK_ENABLED
    is set to false in the parent controller ## How was this PR tested? Added integration test ## Does this PR introduce any user-facing changes? • No. You can skip the rest of this section. • Yes. Make sure to explain your proposed changes and call out the behavior change. linkedin/venice
    • 1
    • 1
  • g

    GitHub

    05/07/2025, 9:31 PM
    #1770 [controller] Account for manual roll forward in DeferredVersionSwapService Pull request opened by misyel ## Problem Statement After the push is complete in a target region push w/ deferred swap in the target regions, users can manually roll forward in nuage. Currently, when this happens, we do not mark the parent version status as ONLINE so the DeferredVersionSwapService will keep checking for that version to see if it can roll forward. This happens because we use the parent version status to coordinate when to start and stop checking a version for roll forward and when we manually roll forward, we need to mark the parent version status as ONLINE ## Solution In the check where we get the eligible non target regions can roll forward, add in a check for non target regions that already rolled forward. If all non target regions already rolled forward, mark the parent version status as ONLINE so the DeferredVersionSwapService stops checking this version for a roll forward ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/07/2025, 11:17 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by m-nagarajan
    <https://github.com/linkedin/venice/commit/8ab13739398db470f73c5c40e252485b2a91bfb9|8ab13739>
    - [router] Add MetricEntityStateGeneric for flexible dimension handling for OTel (#1667) linkedin/venice
  • g

    GitHub

    05/07/2025, 11:17 PM
    #1667 [router] Add MetricEntityStateGeneric for flexible dimension handling for OTel Pull request opened by m-nagarajan ## Problem Statement Existing subclasses of
    MetricEntityState
    are for specific cases with specific number of dimensions which should be enums that are perf and GC optimized by reusing
    Attributes
    rather than recreate it everytime. But for cases like controllers where the metric emission is infrequent and is not processing tens of thousands or more of QPS, it would benefit to have a generic non cached version of a subclass rather than creating multiple new subclasses for all combination of dimensions. ## Solution 1. Introduce
    MetricEntityStateGeneric
    which takes in
    baseDimensionsMap
    during init and all remaining dimensions as a
    Map
    during
    record()
    call where it validates the data for all required dimensions. This can be reused for most of the metrics in controllers. This will also help in code maintainability by reducing some of the subclasses needed. 2. Unlike other subclasses which are typesafe and most of the checks are compile time, this class will have some of the checks during runtime, thus can fail during runtime. The failures are logged with a redundant log filter and are capture in an internal metric:
    venice.internal.metric_record_failure
    . 3. This can't be used for 0 dynamic dimensions case and it will throw an error during initialization.
    MetricEntityStateBase
    should be used instead. ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
    • 1
    • 1
  • g

    GitHub

    05/08/2025, 12:00 AM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by xunyin8
    <https://github.com/linkedin/venice/commit/169d1031530b3fc0892481b4345f3081e55af12d|169d1031>
    - [server] Improve ReadQuotaEnforcementHandler init() behavior and visibility (#1757) linkedin/venice
  • g

    GitHub

    05/08/2025, 12:49 AM
    #1771 [server][controller][vpj][dvc][cc] Pass PubSubPositionTypeRegistry to admin clients Pull request opened by sushantmane ## Pass PubSubPositionTypeRegistry to admin clients Stacked PR. Needs to go after: PR#1767 Introduced PubSubAdminAdapterContext to encapsulate all required configurations and runtime dependencies for creating PubSub admin clients via factory APIs. This includes passing components like PubSubPositionTypeRegistry and PubSubTopicRepository to downstream consumers in a clean, unified way. Updated all call sites to use this new context object when invoking PubSubAdminAdapterFactory#create, replacing individual parameter passing. ## Solution ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/08/2025, 5:37 AM
    #1772 [fast-client][router] Disable auto retry in httpclient5 to avoid delay upon 429/503 response Pull request opened by gaojieliu ## Problem Statement Today, when hitting 429/503 responses in Fast Client, the latency will go beyond 1s. ## Solution HttpClient5 has a feature: auto retry and check this class to find more details: {@link org.apache.hc.client5.http.impl.DefaultHttpRequestRetryStrategy} This feature is enabled by default, when this feature is enabled, upon 429/503 responses, it will introduce a delay of 1s (default), and retry, which means the user of this lib: Fast-Client and Router will observe
    1s latency, which is unexpected.
    This PR disables this feature to avoid the above behavior. Whether retry upon these error response or not will be decided by the user instead of the transporting layer.
    ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
    • 1
    • 1
  • g

    GitHub

    05/08/2025, 7:27 AM
    #1773 [server][controller][dvc][cc] Lazily instantiate PubSubPositionFactory on demand Pull request opened by sushantmane ## Lazily instantiate PubSubPositionFactory on demand Previously, all PubSubPositionFactory implementations were eagerly instantiated at startup based on configuration. This caused runtime failures when new factory class names were added in config before the corresponding code was deployed. This change defers factory instantiation to the first usage via computeIfAbsent on a ConcurrentHashMap, avoiding unnecessary class loading and improving robustness in mixed- version rollouts. ## AI Generate Summary of Changes
    This pull request refactors the
    PubSubPositionTypeRegistry
    class to improve memory efficiency and error handling by switching to a lazy initialization approach for
    PubSubPositionFactory
    instances. It also updates the test cases to align with the new lazy-loading behavior.
    ### Refactoring for Lazy Initialization:
    • Replaced the eager initialization of
    typeIdToFactoryMap
    with a lazy-loading approach using
    VeniceConcurrentHashMap
    and
    computeIfAbsent
    . This ensures that
    PubSubPositionFactory
    instances are only created when accessed. (
    internal/venice-common/src/main/java/com/linkedin/venice/pubsub/PubSubPositionTypeRegistry.java
    , [1] [2]
    • Removed the
    instantiateFactories
    method, as factory instances are now created on demand instead of being preloaded. (
    internal/venice-common/src/main/java/com/linkedin/venice/pubsub/PubSubPositionTypeRegistry.java
    , internal/venice-common/src/main/java/com/linkedin/venice/pubsub/PubSubPositionTypeRegistry.javaL265-L286)
    ### Constructor and Dependency Updates:
    • Updated the constructor of
    PubSubPositionTypeRegistry
    to stop preloading factory instances into
    typeIdToFactoryMap
    , aligning with the lazy-loading design. (
    internal/venice-common/src/main/java/com/linkedin/venice/pubsub/PubSubPositionTypeRegistry.java
    , internal/venice-common/src/main/java/com/linkedin/venice/pubsub/PubSubPositionTypeRegistry.javaL123)
    ### Test Case Adjustments:
    • Modified the test case
    testRegistryRejectsUnknownClassName
    to validate the lazy initialization behavior by triggering factory creation during runtime instead of during registry construction. (
    internal/venice-common/src/test/java/com/linkedin/venice/pubsub/PubSubPositionTypeRegistryTest.java
    , internal/venice-common/src/test/java/com/linkedin/venice/pubsub/PubSubPositionTypeRegistryTest.javaL88-R92)
    ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
    • 1
    • 1
  • g

    GitHub

    05/08/2025, 7:47 AM
    #1774 [server][dvc] Add support to use heartbeat for ready-to-serve check Pull request opened by sixpluszero ## [server][dvc] Add support to use heartbeat for ready-to-serve check This PR adds a new mode to use heartbeat to measure ready-to-serve for any given replica on the node. When the config is enabled, server/DVC instance will use heartbeat lag instead of offset lag to measure if partition is caught up and is ready-to-serve. This PR also deprecated hybrid store time lag usage as it is not used anywhere and heartbeat lag makes more sense to replace it. Also, there is some offline discussion about refactoring all the offset management - this won't happen in current PR as the change will be too big. It will gradually happen in future related PRs. ## Solution This PR can be seen as first part of the effort to use heartbeat for measurement in ingestion path. There are 3 paths for offset measurement today: 1. ready-to-serve check - This PR 2. fast restart relaxation - Future PR with heartbeat checkpoint 3. fast current version complete relaxation - No good solution so far - decided to keep untouched 4. blob transfer trigger criteria - Future PR with heartbeat checkpoint 5. Remove Aggregate mode - Future PR with simplified ready-to-serve check usage. ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? User will no longer have to specify offset lag or time lag as it has been confusing for them and it is not accurate without proper write rate consideration. • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/08/2025, 3:53 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by m-nagarajan
    <https://github.com/linkedin/venice/commit/0d7d335e93164b0ad6a98d769ddcf0610b089ad1|0d7d335e>
    - [tc][fc][dvc] Add client availability/latency metrics in Otel (#1689) linkedin/venice
  • g

    GitHub

    05/08/2025, 5:26 PM
    #1473 [duckdb] Created a CLI tool to interact with DuckDBDaVinciRecordTransformer Pull request opened by kvargha ## [duckdb] Created a CLI tool to interact with DuckDBDaVinciRecordTransformer This PR creates a CLI tool to interact with DuckDBDaVinciRecordTransformer, similar to the AdminTool. ## How was this PR tested? I tested this locally pointed to a store that exists, but for the purpose of releasing this to OSS I generalized the variables. ## Does this PR introduce any user-facing changes? • No. You can skip the rest of this section. • Yes. Make sure to explain your proposed changes and call out the behavior change. linkedin/venice
    • 1
    • 2
  • g

    GitHub

    05/08/2025, 6:21 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by gaojieliu
    <https://github.com/linkedin/venice/commit/fd3706552eb8467cf42c4ba7770bbde5b158c842|fd370655>
    - [fast-client][router] Disable auto retry in httpclient5 to avoid delay upon 429/503 response (#1772) linkedin/venice
  • g

    GitHub

    05/08/2025, 6:31 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by sushantmane
    <https://github.com/linkedin/venice/commit/d272ae59915402765c58e6e2a6ce5e98be7bf1d7|d272ae59>
    - [server][controller][dvc][cc] Lazily instantiate PubSubPositionFactory on demand (#1773) linkedin/venice
  • g

    GitHub

    05/08/2025, 7:14 PM
    #1775 [controller] Add metric for stalled version swap Pull request opened by misyel ## Problem Statement For monitoring deferred version swaps, we are missing a metric to know when a version swap hasn't happened and it is stalled without manually monitoring a store. ## Solution Add a new metric to track stalled version swaps. A version swap for a store is considered stalled if the push completion time and more than 110% of the store wait time has passed. If this happens, we will emit a count and decrement this count when the version swap happens for this store. ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/08/2025, 7:45 PM
    #1776 [server] Rename DIV Class Pull request opened by KaiSernLim ## Summary Dropped the
    Kafka
    from
    KafkaDataIntegrityValidator
    . ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. linkedin/venice
    • 1
    • 1
  • g

    GitHub

    05/08/2025, 8:25 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by KaiSernLim
    <https://github.com/linkedin/venice/commit/7b7df46726d0fb347d91a239845409d4bc70c6d2|7b7df467>
    - [server] Rename DIV Class (#1776) linkedin/venice
  • g

    GitHub

    05/08/2025, 8:46 PM
    1 new commit pushed to
    <https://github.com/linkedin/venice/tree/main|main>
    by kvargha
    <https://github.com/linkedin/venice/commit/60b6466dba9f84997b938086f876245a49c1e13e|60b6466d>
    - [dvc][server][doc] Add exponential backoff for zk update retries + doc cleanup (#1734) linkedin/venice
  • g

    GitHub

    05/09/2025, 12:26 AM
    #1777 [server]Upgrade zstd-jni to 1.5.6-8 and Adjust Test Logic Pull request opened by namithanivead ## Problem Statement Upgraded the Zstd JNI library to
    com.github.luben:zstd-jni:1.5.6-8
    , which includes a fix to ensure all native code execution is protected by a shared lock for improved thread safety. ## Solution 1. Dependency Upgrade • Upgraded Zstd JNI to version 1.5.6-8 1. Exception Handling in Tests • Updated exception assertions in TestZstdLibrary.java: • Previous: e.getMessage().equals("Src size is incorrect") • Current: e.getMessage().equals("nb of samples too low") • All affected unit tests have been updated accordingly and are now passing. 1. Test Fix for Sample Count Validation • A ZstdException was triggered in one test due to providing only 10 samples, violating the minimum requirement (sampleSizes.length <= 10)). This check enforces that at least 11 samples are required to proceed. So, when your test only had 10 samples, it triggered the exception: "nb of samples too low" ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/09/2025, 12:28 AM
    #1778 [draft] Rt2 rt disabled NOT FOR REVIEW Pull request opened by arjun4084346 ## Problem Statement ## Solution ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/09/2025, 12:42 AM
    #1779 [WIP] Avoid constructing D2Client for server and router. Pull request opened by haoxu07 ## Problem Statement ## Solution ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice
  • g

    GitHub

    05/09/2025, 3:13 AM
    #1780 [controller] provide option to enable/disable RT versioning based on cluster Pull request opened by arjun4084346 ## Problem Statement Right now we pick isRealTimeTopicVersioningEnabled based on common configs, which force us to set same value for all the clusters and prevent us to do a cluster-wise rollout of RT versioning ## Solution As a solution, now we would pick this config from cluster-specific config object. Also to avoid inconsistency, we will prohibit store migration from a cluster where RT versioning is enabled/disabled to a cluster where it is disabled/enabled. ### Code changes • Added new code behind a config. If so list the config names and their default values in the PR description. • Introduced new log lines. • Confirmed if logs need to be rate limited to avoid excessive logging. ### Concurrency-Specific Checks Both reviewer and PR author to verify • Code has no race conditions or thread safety issues. • Proper synchronization mechanisms (e.g.,
    synchronized
    ,
    RWLock
    ) are used where needed. • No blocking calls inside critical sections that could lead to deadlocks or performance degradation. • Verified thread-safe collections are used (e.g.,
    ConcurrentHashMap
    ,
    CopyOnWriteArrayList
    ). • Validated proper exception handling in multi-threaded code to avoid silent thread termination. ## How was this PR tested? • New unit tests added. • New integration tests added. • Modified or extended existing tests. • Verified backward compatibility (if applicable). ## Does this PR introduce any user-facing or breaking changes? • No. You can skip the rest of this section. • Yes. Clearly explain the behavior change and its impact. linkedin/venice