https://pulsar.apache.org/ logo
Join Slack
Powered by
# general
  • z

    zaryab

    10/09/2025, 1:02 PM
    Subject: Bookie crashes with -25 error when enabling StreamStorageLifecycleComponent (Pulsar 4.1.1) Hi everyone 👋 — I’m setting up a small Pulsar 4.1.1 cluster and running into an issue when enabling stream storage. Cluster Setup I have 3 VMs: Host CPUs RAM Storage Roles 10.0.1.74 8 16GB 500GB SSD ZooKeeper + Bookie + Broker 10.0.1.75 8 16GB 400GB SSD ZooKeeper + Bookie + Broker 10.0.1.91 12 64GB 100GB SSD ZooKeeper Cluster Init Command bash bin/pulsar initialize-cluster-metadata \ --cluster dev-cluster-1 \ --metadata-store zk10.0.1.742181,10.0.1.752181,10.0.1.912181 \ --configuration-metadata-store zk10.0.1.742181,10.0.1.752181,10.0.1.912181 \ --web-service-url http://10.0.1.74:8080,10.0.1.75:8080 \ --web-service-url-tls https://10.0.1.74:8443,10.0.1.75:8443 \ --broker-service-url pulsar://10.0.1.74:6650,10.0.1.75:6650 \ --broker-service-url-tls pulsar+ssl://10.0.1.74:6651,10.0.1.75:6651 Bookie Config (relevant parts) properties storageserver.grpc.port=4181 dlog.bkcEnsembleSize=2 dlog.bkcWriteQuorumSize=2 dlog.bkcAckQuorumSize=1 storage.range.store.dirs=data/bookkeeper/ranges storage.serve.readonly.tables=false storage.cluster.controller.schedule.interval.ms=30000 Issue When I run Pulsar in stateless mode, ZooKeeper, BookKeeper, and Brokers all start fine. But when I enable: properties extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent both Bookies crash shortly after startup with a BKTransmitException and error -25. Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/9 Excerpt from logs kotlin Caused by: org.apache.distributedlog.exceptions.BKTransmitException: Failed to open ledger handle for log segment ... : -25 What I’ve tried Verified ZooKeeper quorum and BookKeeper ledger directories Cleaned /data/bookkeeper and restarted the cluster Ensured the ensemble/write/ack quorum configs match cluster size Question Has anyone successfully enabled the stream storage component (StreamStorageLifecycleComponent) on Pulsar 4.1.1? What does the -25 (BKTransmitException) usually indicate in this context — ZK metadata corruption, missing ledger, or a config mismatch? Any guidance or example configurations for Pulsar 4.x stream storage clusters would be greatly appreciated
  • n

    Nicolas Belliard

    10/15/2025, 2:18 PM
    Hi Team 👋 I'm investigating an issue related to the Pulsar broker configuration parameter
    delayedDeliveryTrackerFactoryClassName
    . We initially used
    InMemoryDelayedDeliveryTracker
    , (because we where using version 2.7 of pulsar) which caused acknowledged delayed messages to be reprocessed after a broker restart likely due to its state stored only in memory. Given our high message volume (millions), this behavior is problematic. A screenshot is available showing the lag escalation following a broker restart. We're generating delayed messages out of sequence, resulting in gaps within the acknowledged message stream. This causes non-contiguous ranges of messages to be marked as deleted or eligible for deletion. In our screenshot, the value of
    nonContiguousDeletedMessagesRanges
    is 16833. To mitigate this following the upgrade of pulsar to the version 4.0.4, we updated the broker config to use
    org.apache.pulsar.broker.delayed.BucketDelayedDeliveryTrackerFactory
    , which should persist delayed delivery metadata to disk via BookKeeper ledger buckets. However, after switching to the bucket-based tracker, we're still seeing the same behavior post-restart. A few observations and questions: • I checked the
    pulsar_delayed_message_index_loaded
    metric and noticed that messages are still being loaded into memory, while
    pulsar_delayed_message_index_bucket_total
    remains at zero. Is this expected? Shouldn’t the bucket tracker be persisting and loading from disk? • Are there additional broker settings required to fully enable bucket-based delayed delivery tracking? For example: ◦ Do we need to explicitly configure
    delayedDeliveryTrackerBucketSize
    or
    delayedDeliveryMaxNumBuckets
    ? ◦ Is there any dependency on topic-level settings or namespace policies that could override the broker-level tracker configuration? ◦ Could other settings interfere with delayed message persistence? Any insights or guidance would be greatly appreciated. Thanks for your help!
    l
    • 2
    • 9
  • b

    benjamin99

    10/17/2025, 3:01 AM
    Hi team: I DO REALLY NEED THE HELP RIGHT NOW. One of my bookie node in the production cluster are stuck in the CrashLoopBackOff status; after checking the logs, I find out that it is because the node cannot successfully access the lock from the metadata server (which we use Oxia in our cluster). I have tried to restart the Oxia cluster, since I saw some error messages saying that the oxia cannot append the WAL to the stream (I were facing the similar situation last month, and the restarting Oxia trick did work for me). Unfortunately, the bookie node still cannot get it's own lock after restarting the Oxia. I REALLY NEED SOMEONE THAT CAN HELP ME OUT WITH THE ISSUE 😢
    m
    l
    • 3
    • 21
  • j

    Jonatan Bien

    10/21/2025, 7:14 PM
    we seeing Oxia as a metadata storage system crashing repeatedly for us after 20 minutes of use with WAL corruption as described here https://github.com/oxia-db/oxia/issues/772 did anyone experienced something similar and can share workarounds?
    m
    • 2
    • 5
  • m

    Margaret Figura

    10/22/2025, 4:00 PM
    Hi all... I'm debugging an issue where we're using Pulsar with non-persistent topics, and in this case, we have only 1 consumer. The data rate is low -- ~5000 msgs/sec, 1kb each -- but we see some small amount of drops. Normally, we have much larger scale and we can successfully run with 50 consumers at 3 million msgs/sec with 0 drops. So it's odd we have drops here where CPU usage on every component is quite low. The Pulsar consumer receive queue is set to 50k and never gets low. I've also taken our software out of it and replaced our consumers with the sample Java consumer code from the docs. There, we just throw away the message (no
    println()
    or other work per message). Again, CPU usage is under 10% for all components, but I see the same small drops. I started debugging and found Pulsar is dropping because the Netty connection's
    .isWritable()
    returns false and this causes Pulsar to immediately drop. This "Returns
    true
    if and only if the I/O thread will perform the requested write operation immediately", meaning there is room available in Netty's ChannelOutboundBuffer. I found that if I increase the Netty low/highWaterMarks, the drops go away, but it's not possible without a code change to Pulsar broker. I'm looking for any suggestions on different configurations I can try. Thanks!!
    👀 1
    d
    • 2
    • 5
  • v

    Vaibhav Swarnkar

    10/25/2025, 7:26 PM
    We are using Apache Spark for batch processing and are now planning to extend the same for Stream Processing. I was checking the stream-native pulsar connector and was wondering does it supports Continuous Trigger or RealTime trigger like Databricks? The idea is to have sub-milliseconds end-to-end latency. Is this at all possible with Pulsar <> Spark?
    d
    • 2
    • 3
  • k

    Kiryl Valkovich

    10/26/2025, 7:42 PM
    📣 Dekaf UI is now open-source software licensed under Apache 2.0: https://github.com/visortelle/dekaf Please try, rise bugs, feature requests, and ask questions on GitHub. Thank you!
    🎉 7
    🦜 2
    pulsarlogo 3
    🚀 2
    m
    l
    t
    • 4
    • 15
  • a

    Andrew

    10/29/2025, 5:11 AM
    Hi all, is there a good guide for migrating from zookeeper to oxia? My searches are coming up empty. Thanks 😄
  • d

    David K

    10/29/2025, 12:47 PM
    Unfortunately, there isn’t really an easy way to do that. The main issue is that the data is stored in very different formats, so it can’t just be copied.
  • j

    Jack Pham

    10/29/2025, 5:41 PM
    Hi all, I have a question about this issue: https://github.com/apache/pulsar-client-go/issues/1297. We don’t specify consumers' name, and I understand that Pulsar will create a unique name in that case. However, if the consumer name is unique, shouldn’t the producer name (
    ..<subscription>-<consumerName>-DLQ
    ) be as well, since it uses the consumer name in the producer’s name? Will consumer stop consuming message if this happen? We are using Pulsar client 4.0.0 where producer name constructed as:
    Copy code
    .producerName(String.format("%s-%s-%s-DLQ", this.topicName, this.subscription, this.consumerName))
  • r

    Romain

    10/29/2025, 7:23 PM
    Hi everyone We’re using Pulsar with strict schema governance - each namespace has
    schemaValidationEnforced=true
    and
    isAllowAutoUpdateSchema=false
    (only under an approved process), so only admins can push schemas. Here’s the issue: when a consumer is configured with a
    DeadLetterPolicy
    and a message fails too many times (or is negatively acknowledged repeatedly), the client will publish the message to a dead-letter topic (default name
    <topic>-<subscription>-DLQ
    ) after the redelivery threshold. That topic doesn’t necessarily exist ahead of time (unless created before), so when it’s first used it may trigger topic creation and/or schema registration. Because our namespace forbids auto schema updates and enforces schemas, this can fail - the consumer isn’t authorized to register the schema for the DLQ topic. To work around this, we’re creating a separate namespace (e.g.,
    <namespace>-dlq
    ) where: •
    isAllowAutoUpdateSchema=true
    •
    schemaValidationEnforced=false
    • so consumers can safely publish DLQ messages without schema conflicts. Is this the recommended approach? Is there a cleaner way to allow DLQ schema creation while keeping production namespaces locked down? Any official guidance or community best practices would be really appreciated 🙏 Thanks!
    l
    • 2
    • 1
  • f

    Francesco Animali

    10/30/2025, 8:52 AM
    hello everybody, I have a reproducible test case that demonstrate that pulsar 2-way geo replication doesn't support exclusive access producers. If this is a limitation made by design, I am not sure what benefit that could bring. Otherwise, if it's not a limitation made by design, I believe that it should be resolved to strengthen and improve the pulsar geo replication feature. I have opened this issue 24914. Will appreciate if someone can take a look and suggest whether it makes sense or not.
    l
    • 2
    • 10
  • c

    Chaitanya Gudipati

    11/05/2025, 3:48 PM
    Hi Folks, I was exploring Apache Pulsar functions. From the documentation, we seem to use Apache Bookkeeper as the state storage interface. Couple of questions on the state storage for pulsar functions: 1\ Do we have any upper limit on the storage allocated for the pulsar function state storage? 2\ Do we have a tiered storage paradigm even for the pulsar function state storage similar to the event stream storage? TIA.
    d
    • 2
    • 6
  • j

    Jack Pham

    11/05/2025, 11:11 PM
    I’m facing a problem where the consumer internal DLQ producer can’t connect due to a conflict with another producer with the same name. This issue was fixed in version 4.1.0, but it requires Java 17, which is not feasible for us at the moment (still with java8 ) I want to implement a short-term workaround to detect this issue and recreate the consumer with a different name, which, in theory, should resolve the conflict. The pulsar client implementation, however, seems to hide the entire DLQ handling, with exceptions not thrown or propagated back to external code. Is there a way to archive what I need here?
    l
    • 2
    • 3
  • t

    Tomek Zmijowski

    11/06/2025, 9:46 PM
    Hey! I'm evaluating migration options for moving our Pulsar stacks from EC2 to Kubernetes environments, where the requirement is to minimize the service downtime. So far so good I learned a lot about the geo-replication, which could be used by me, but I'm wondering what the story is behind the PIP-188 https://github.com/apache/pulsar/issues/16551 . AFAIk the work has been delivered, but I can't find instructions on how to leverage and try this solution. The list of features is impressive
    Copy code
    Publish ordering guarantee
    Consumer ordering guarantee
    Incoming replicator ordering guarantee
    Outgoing replicator ordering guarantee with the topic unavailability tradeoff
    Auto resource creation (tenant, namespace, partitioned-topic, subscriptions) in a green cluster
    Auto topic deletion after migration successfully completed for a topic
    Enable migration at cluster level or per namespace level
    Stats to show topic's migration state
    But the thing is that due to missing configuration steps, it's hard to test this feature. Can someone explain how to start with that?
  • u

    Ujjain Bana

    11/10/2025, 1:57 PM
    <URGENT> Hi, there are many .log files under the /data/bookie1/ledgers/current directory, which occupy a large amount of space. How can I clean them up? For temporary fix can i manually delete these log files ?
    l
    • 2
    • 1
  • b

    bhasvij

    11/11/2025, 2:29 PM
    Pulsar Flink Connector side we don't have any support from Pulsar side, currently?
    d
    • 2
    • 10
  • n

    Nithin Subbaraj

    11/12/2025, 10:47 AM
    Hi Team ,On pulsar bookkeeper server under /data/bookkeeper/ledgers/current there are old .log files which are older like 2 years more , we set retention in broker for acknowlege messages to 60 minutes . But still the old log files are not getting delete , ledger is the most consuming one which is filling disk space Checked https://apache-pulsar.slack.com/archives/C5Z4T36F7/p176278302931373
  • l

    Lari Hotari

    11/17/2025, 8:58 AM
    📣 [ANNOUNCE] Apache Pulsar 3.0.15, 4.0.8 and 4.1.2 released 📣 For Pulsar release details and downloads, visit: https://pulsar.apache.org/download Release Notes are at: • 3.0.15: https://pulsar.apache.org/release-notes/versioned/pulsar-3.0.15/ (previous LTS release, support until May 2026) • 4.0.8: https://pulsar.apache.org/release-notes/versioned/pulsar-4.0.8/ (Current LTS release) • 4.1.2: https://pulsar.apache.org/release-notes/versioned/pulsar-4.1.2/ (Latest release) Please check the release notes for more details.
    🔥 3
    🎉 3
  • a

    Alexandre Burgoni

    11/17/2025, 9:27 AM
    Hi everyone, has anyone experience
    504 Gateway Timeout
    from pulsar clients in a production cluster ? We are currently experiencing timeout of proxies from time to time on multiple clusters with a HTTP
    504
    , exception message is
    SSL BAD PACKET LENGTH
    . It looks like an issue between proxy - broker connection pool, but cannot yet prove it. We're running
    4.1.0
    We have to reboot proxies to fix the issue for now
  • a

    Alexander Brown

    11/17/2025, 7:03 PM
    What's the technical reason between having journal/ledger on same nvme versus having two separate drives, one for journal and one for ledgers?
  • d

    David K

    11/17/2025, 7:45 PM
    There are several reasons why you should have the journal and ledger disks on separate physical volumes, including performance. But the primary reason is that they serve two different purposes. The journal disk is used for short-term storage of the messages before they are indexed and written to the ledger disk. The journal disk provides data durability guarantees in the event of a failure, the bookie can recover and load the messages from the journal disk. However, if the journal disk fails, Pulsar will continue to operate. So separating them eliminates a single point of failure from the storage layer as well.
    ✅ 1
  • b

    Ben Hirschberg

    11/17/2025, 11:13 PM
    Hi all 👋 I have a question about per-key scheduling behavior in Pulsar. I need strict ordering and exclusivity per
    sensor_id
    , but I don’t want long-lived key to consumer stickiness. Instead, I’m trying to achieve this logic:
    If no consumer is currently processing
    sensor_id = X
    , then the next message for that sensor should be assigned to the next available consumer (round-robin or least-loaded).
    All while preserving ordering and ensuring no two consumers ever process the same key concurrently.
    KeyShared
    ensures ordering and exclusivity, but it uses stable key-range hashing, so a key stays with one consumer until that consumer dies. Is there any Pulsar pattern, config, or upcoming feature that supports dynamic per-message key assignment instead of sticky key-range ownership? Or is this fundamentally outside Pulsar’s delivery semantics? Thanks! 🙏
  • b

    Ben Hirschberg

    11/18/2025, 6:12 AM
    We already use a shared subscription with
    KeyShare
    option, since we do want messages to be processed in order per Key (this is something our design requires)
    👍 1
    d
    • 2
    • 1
  • s

    Sahin Sarkar

    11/20/2025, 6:23 AM
    Hi guys, how's everyone doing?
  • s

    Sahin Sarkar

    11/20/2025, 7:24 AM
    I had a scenario with me, in which I have microservices A and B (both are multi pod deployments), A does some computation and updates its database, then it needs to let all the pods of B know about the updates, which are some basic configs. how can this be done so the system is scalable? and I don't want service B to use much resources coz it is already resource constrained... I have checked the following approaches which are allowed in my company: 1. through kafka: all pods of service B will have a different consumer group, and they all subscribe to the same topic into which A would push its updates 2. through pulsar: similar to kafka, and I also got to know that pulsar is more suitable for these kinds of fan out scenarios. how exactly I so have some idea, but don't know this exactly 3. through zookeeper: this is more preferred by my seniors who have some experience using it. They have claimed that the approach using zk would use the least resources, and would suit for usage in service B. which approach should I use given the constraints? and if pulsar, then how exactly is its subscription model better than kafka?
    a
    l
    d
    • 4
    • 4
  • f

    Francesco Animali

    11/20/2025, 2:28 PM
    hey pulsarers! is there any chance that this issue gets merged? https://github.com/apache/pulsar/issues/24914
  • l

    Lari Hotari

    11/21/2025, 7:56 AM
    We've just released Apache Pulsar Helm Chart 4.4.0 🎉 The official source release, as well as the binary Helm Chart release, are available at https://www.apache.org/dyn/closer.lua/pulsar/helm-chart/4.4.0/?action=download The helm chart index at https://pulsar.apache.org/charts/ has been updated and the release is also available directly via helm. The main highlights of this release are the upgrade of the default Pulsar version to 4.0.8 and the Helm chart's integration with Dekaf UI. Dekaf is a web-based UI for Apache Pulsar, licensed under Apache 2.0 (GitHub: https://github.com/visortelle/dekaf). Thanks to @Kiryl Valkovich for this great contribution to the Apache Pulsar community! Release Notes: https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-4.4.0 Docs: https://github.com/apache/pulsar-helm-chart#readme and https://pulsar.apache.org/docs/helm-overview ArtifactHub: https://artifacthub.io/packages/helm/apache/pulsar/4.4.0 Thanks to all the contributors who made this possible.
    🤩 2
    thankyou 3
    🎉 1
  • d

    DANIEL STRAUGHAN

    11/21/2025, 7:14 PM
    Hello, I am trying to update the bearer token that a function is using in the Kubernetes runtime with a REST API. I am able to use the CLI
    bin/pulsar-admin functions update --tenant <TENANT> --namespace <NS> --name example-test-function --update-auth-data
    to perform this functionality. Is there a way to do this with the functions REST API?
  • j

    Jack Pham

    11/22/2025, 1:15 AM
    After update client from 4.0.0 to 4.0.7 (to include change to resolve issue where DLQ producer name conflict) we got exception:
    Copy code
    org.apache.pulsar.client.api.PulsarClientException$FeatureNotSupportedException: The feature of getting partitions without auto-creation is not supported by the broker. Please upgrade the broker to version that supports PIP-344 to resolve this issue.
    Looking at the code, i see something like
    useFallbackForNonPIP344Brokers
    it seems like from 4.0.7 no longer support falling back? what version from 4.0.0 to 4.07 that have the DLQ producer name conflict fix but still supporting fallback for non PIP-344 broker?