pulsar #general

sijieg

08/18/2025, 8:48 PM

🚀 Join us at Data Streaming Summit SF 2025! We’re bringing the global data streaming community together Sep 29–30 at the Grand Hyatt at SFO — with talks from OpenAI, LinkedIn, Netflix, Uber, Google, Databricks and deep-dive tracks on Pulsar, Kafka, Flink, Iceberg, AI + streaming, and more. 💡 Special for the Pulsar community: Use code PULSAR50
for 50% off registration. 👉 Register: https://www.eventbrite.com/e/data-streaming-summit-san-francisco-2025-tickets-1432401484399?aff=oddtdtcreator 📅 Full schedule: https://datastreaming-summit.org/event/data-streaming-sf-2025/schedule What to expect • Sep 29 — Training & Workshop Day: Hands-on data streaming training + advanced Streaming Lakehouse workshop (with AWS). • Sep 30 — Main Summit: Inspiring keynotes + 4 tracks: Deep Dives, Use Cases, AI + Stream Processing, Streaming Lakehouse. • Talks from top companies and community sessions featuring Pulsar, Flink, Iceberg and other data streaming technologies. Would love to see you at the Summit! 🎉

samriddhi

08/19/2025, 9:19 PM

Question: Best practices for schema-agnostic Pulsar Functions? I want to write a generic Pulsar Function that can work with any input/output schema without hardcoding schema types. Current approach: • ✅ Input: Using

AUTO_CONSUME

- works great for reading any schema • ❌ Output: Need exact schema match, but

AUTO_PRODUCE

doesn't exist The challenge: To avoid static schemas, I need to get schema info from Pulsar Admin at runtime, but Pulsar Functions don't have access to

PulsarAdmin

(getting errors when trying). Questions: 1. What's the recommended pattern for schema-agnostic functions? 2. How do I discover output topic schema at runtime without Admin access? 3. Any alternatives to runtime schema discovery for generic functions? Goal: One function that works with multiple topic pairs having different schemas (Avro→Avro, JSON→JSON, Avro→JSON, etc.) without recompiling. Anyone solved this or know the best practices?

samriddhi

08/19/2025, 9:19 PM

Please see we have schema enforcement enabled

Dan Rossi

08/20/2025, 7:44 PM

Question: If I'm doing event sourcing, without a snapshot, how might I be able to load events for a given aggregate Id, to build aggregate state? I see that reader has the ability to use HashKeys, but as the db size grows, hashes will collide with other keys and load those objects back as well, which will slow performance. Is there a better way to do this that anyone is doing? I also see there's an option to copy events to a secondary storage. However, this could cause race conditions, because the events might not be there right away. Also it forces me to store twice as much data. Has anyone figured out the best way to handle this just using pulsar?

Samuel

08/21/2025, 11:22 AM

We're currently using a single topic to publish messages intended for multiple customers. Since these are push-based messages, we want to avoid overloading customer servers by implementing rate limiting. Our idea is to introduce a message router that directs messages into customer-specific topics. This way, we can apply dispatch rate limiting on each individual topic, effectively controlling the push rate for each customer. The consumption rate from Apache Pulsar would then directly map to our push rate, ensuring we stay within safe limits. We expect to create approximately 30 customer-specific topics and are considering using a multi-topic consumer to handle them. Here are our key questions: 1. What limitations should we be aware of when using a multi-topic consumer in Pulsar? 2. How scalable is this approach? Is a multi-topic consumer suitable for handling around 30 topics? 3. What happens if we scale to 100 topics or more does the multi-topic consumer model still hold up, or are there recommended alternatives at that scale?

Thomas MacKenzie

08/26/2025, 10:26 PM

Are there any config, when set in the brokers that are not showing in

conf/broker.conf

by any chance? I've been applying various changes to the brokers configuration with no issues, but I'm trying to set

managedLedgerForceRecovery

and it does not seems to work. It's not present or applied (when the container starts, the application logs each config field being applied). https://github.com/apache/pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java#L2355-L2360 Looking at the config with

cat conf/broker.conf | grep managedLedgerForceRecovery

, I have no results either. I understand it's a dynamic configuration field, but others are and I can see them in that file set with the right value so I'm wondering if I'm missing something? Pulsar

4.0.6

Thanks for your help

✅ 1

Jack LaPlante

08/27/2025, 6:34 PM

Does anyone know how I can cut a new release of terraform-provider-pulsar? I have merged a PR into it that I would like to use. I also have this PR to update the docs which needs a review. cc: @Lari Hotari @Rui Fu

Thomas MacKenzie

08/28/2025, 3:24 AM

What would be the best option to handle (the most graceful way I'd say) to handle managed ledger exceptions? We recently had an outage (in a non production environment) and this error showed up

Copy code

server error: PersistenceError: org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering ledger error code: -10

This error was preventing the applications to publish messages, and also created producers. For 2h I could see the ledger count a bit off at 0 (not sure what happened, but bookies were up during that time I believe) Some context: We believe the bookies were restarted (we use AWS spot instances in this env, so maybe a ungraceful shutdown). I have 2 main questions: • What would be the best course of action when this happens? (curious in manual intervention although not reactive with a system running 24/7) • I know there are 2 brokers fields available fields

managedLedgerForceRecovery

and

autoSkipNonRecoverableData

◦ Could one of them help? (do they serve the same purpose). It seems like

autoSkipNonRecoverableData

be avoided is part of the legacy codebase ◦ Are they both destructive (data loss permanently)? I opened a PR to add

managedLedgerForceRecovery

to the broker conf, thanks for the info about the risks it involves Lari ◦ Is one better than the other? Thank you for your help

Gaurav Ashok

09/04/2025, 11:12 AM

Hi. I wanted to get some advice on taming the zk outstanding requests / broker latencies, that we are seeing every 15 mins. It is happening due to the ModularLoadManagerImpl::writeBundleDataOnZooKeeper() Pulsar : 3.0.12 Configs: Zk: MaxOustandingRequests = 1000 Pulsar: loadBalancerReportUpdateMaxIntervalMinutes=15 loadBalancerResourceQuotaUpdateIntervalMinutes=15 metadataStoreBatchingEnabled=true metadataStoreBatchingMaxDelayMillis=5 metadataStoreBatchingMaxOperations=1000 metadataStoreBatchingMaxSizeKb=128 The zk's outstanding requests touch 1000 for about a minute every 15 mins due to the LoadResourceQuotaUpdaterTask job on leader . During this duration, the message production face latencies upto 20 sec. There are ~100 brokers and bookies and > 20K bundles. We are trying to evaluate, what we can do to fix this in short term. 1. Evaluating if increasing MaxOustandingRequests further can have favourable impact. 2. Evaluating the approach of "throttling" the zk writes in method writeBundleDataOnZooKeeper(). Currently it enqueues all zk writes for all bundles, likely causing this issue. We were thinking of doing the writes on a new dedicated thread, over long time (5m-10m) slowly. The job anyway runs every 15mins, so if this write load can spread across this duration, maybe the pressure from ZK can be avoided. Are there gotchas that we need to keep in mind when trying this approach? 3. If we remove "zk writes" in this job ModularLoadManagerImpl::writeBundleDataOnZooKeeper() entirely. what are the repurcussions? Who is using the /LB/bundle-data & /LB/broker-time-avg data? The later versions of pulsar has a strategy of choosing topK bundles only, so we will likely explore that later. But in short term, what changes can we explore to fix this? -- Edit loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.OverloadShedder loadBalancerLoadPlacementStrategy=org.apache.pulsar.broker.loadbalance.impl.LeastResourceUsageWithWeight loadBalancerReportUpdateThresholdPercentage=10 loadBalancerReportUpdateMinIntervalMillis=60000 loadBalancerReportUpdateMaxIntervalMinutes=15 loadBalancerHostUsageCheckIntervalMinutes=1

Cong Zhao

09/08/2025, 11:56 AM

📣 [ANNOUNCE] Apache Pulsar 4.1.0 released 📣 The Apache Pulsar team is proud to announce Apache Pulsar version 4.1.0. For Pulsar release details and downloads, visit: https://pulsar.apache.org/download Release Notes are at: https://pulsar.apache.org/release-notes/versioned/pulsar-4.1.0

🎉 7

Jiji K

09/09/2025, 6:41 AM

Hello Folks ! - Hope you guys are doing great. Want to know one thing, Does anyone tried Java version - IBM’s Semeru Java runtime on pulsar cluster ?

Artur Jablonski

09/16/2025, 1:43 PM

Hello! I am trying to understand what are the necessary conditions to achieve effectively once semantics when getting data to Pulsar topic from "external world". I captured my understanding in this github discussion: https://github.com/apache/pulsar/discussions/24605 Trying to get some feedback if my understanding is correct. What's the best way to get some feedback on this? Greets

Amanda

09/16/2025, 3:12 PM

Good morning! I have a question regarding geo-replication: I have a two cluster geo-replication set up deployed via kubernetes. when running a producer on cluster A for the first time (using auto topic creation - did not manually create it) my producer successfully runs and sends the message on the cluster A service url. before starting the consumer on cluster B on that same topic, I wanted to check the topic stats on cluster B, but it says the topic doesn't exist. Once I start the consumer, the consumer on the cluster B service url, it receives the messages successfully. But I thought topics were supposed to be replicated across clusters? Is this an error in my geo-replication set-up, or is this expected behavior? Would enabling replicated subscriptions help this issue?

Abhilash Mandaliya

09/17/2025, 11:30 AM

Hi Guys. I recently created one issue. Would appreciate any help on the same: https://github.com/apache/pulsar/issues/24755 Thank you 🙏

Fabri

09/18/2025, 4:22 AM

Hello, I'd like to know if there is a plan to upgrade the current flink pulsar connector to be supported by new flink versions, the official connector is compatible onlly with flink 1.18. The problem is that I can see that the oficial github repo is abandoned, no one is checkin' the PRs, a fix was marged many months ago and there is no new build image about it. Why?

Vaibhav Swarnkar

09/19/2025, 10:10 AM

Has Debezium been moved to the latest version in Pulsar? We are using Oracle, and the current version of Debezium always fails in case of DDL changes, making it rarely work because of the parsing error. Anybody has a solution to this?

Abhilash Mandaliya

09/19/2025, 11:33 AM

Hi. Why does Pulsar Sink crash if it is created with

EFFECTIVELY_ONCE

guarantee and the sink calls the

record.fail()

09/19/2025, 6:53 PM

👋 happy friday! I filed an issue yesterday, seems like there might be a memory leak with pulsar key-based batching in the pulsar go client producer. Posting in #C5Z4T36F7 to raise awareness, thank you! 🙇 https://github.com/apache/pulsar-client-go/issues/1423 🤔

👀 1

benjamin99

09/20/2025, 9:14 AM

Hi, I am now setting up a pulsar cluster which utilized Oxia as the metadata store, and now I am facing the following issue in my Oxia cluser

Copy code

{
  "level": "warn",
  "time": "2025-09-20T07:01:17.961738475Z",
  "component": "public-rpc-server",
  "error": {
    "error": "oxia: failed to append to wal: 535046 can not immediately follow 459362: oxia: invalid next offset in wal",
    "kind": "*errors.withStack",
    "stack": null
  },
  "namespace": "bookkeeper",
  "peer": "10.194.131.14:52392",
  "shard": 6,
  "message": "Write stream has been closed by error"
}

I did search the related topic in the Oxia github page, but found no issues/discussions. Does anyone have had facing the similar issues before, or have any clue about how to resolve it?

Gergely Fábián

09/20/2025, 4:23 PM

Having a bookkeeper replicaCount at 4, what would be a most natural Ensemble-Qw-Qa setting, if I want to tolerate one of the replicas going down even for a considerable amount of time (node outage, etc.). I believe E should be less than replicaCount (to tolerate one replica going down), while best to have E=Qw (avoid issues with lack of consistency). So I guess 3-3-3 or 3-3-2 would be best. What configurations would be recommended?

bhasvij

09/23/2025, 4:37 PM

I am seeing pulsar perf producer is showing 50K msg/sec rate where as grafana dashboard is showing 12K. I have 4 paritions, by any channce it is showing single parition message rate?

Lari Hotari

09/27/2025, 1:09 PM

📣 [ANNOUNCE] Apache Pulsar 3.0.14, 3.3.9, 4.0.7 and 4.1.1 released 📣 For Pulsar release details and downloads, visit: https://pulsar.apache.org/download Release Notes are at: • 3.0.14: https://pulsar.apache.org/release-notes/versioned/pulsar-3.0.14/ (previous LTS release, support until May 2026) • 3.3.9: https://pulsar.apache.org/release-notes/versioned/pulsar-3.3.9/ (support has already ended in December 2024) • 4.0.7: https://pulsar.apache.org/release-notes/versioned/pulsar-4.0.7/ (Current LTS release) • 4.1.1: https://pulsar.apache.org/release-notes/versioned/pulsar-4.1.1/ (Latest release) Please check the release notes for more details.

🎉 4

Artur Jablonski

09/30/2025, 6:24 AM

Hi, Seems to me an answer to my question could be quite useful to a lot of people new to Pulsar that try to design around it with constraint of effectively one delivery semantics. Is there anything I can do to get some attention to it from Pulsar community?

ck_xnet

10/01/2025, 1:01 PM

Hi, I'm running a Pulsar 4.0.2 Cluster (3 broker, 3 bookies, 3 zookeeper) with partitioned topics and I have two problems: 1. The filesystem usage on the bookies doesn't seem to go down (bk1: 100GB, bk2: 400GB, bk3: 100G). I already set a retention policy on my main namespace (2weeks, 10GB) and the metriks in Grafana report the correct topic sizes (100GB storage size and ~30GB backlog size). I get the usage on bk1 and bk3, but not the 400GB on bk2... 2. The bookie service on bk2 stops frequently with an error "io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4194304 byte(s) of direct memory (used: 2147483648, max: 2147483648)". I can't find the setting where I can control the limit for this memory setting. I already increased the jvm allocations in the pulsar_env.sh those don't seem to correlate.. Thanks!

Lari Hotari

10/02/2025, 4:05 AM

We've just released Apache Pulsar Helm Chart 4.3.0 🎉 The official source release, as well as the binary Helm Chart release, are available at https://www.apache.org/dyn/closer.lua/pulsar/helm-chart/4.3.0/?action=download The helm chart index at https://pulsar.apache.org/charts/ has been updated and the release is also available directly via helm. Release Notes: https://github.com/apache/pulsar-helm-chart/releases/tag/pulsar-4.3.0 Docs: https://github.com/apache/pulsar-helm-chart#readme and https://pulsar.apache.org/docs/helm-overview ArtifactHub: https://artifacthub.io/packages/helm/apache/pulsar/4.3.0 Thanks to all the contributors who made this possible.

🎉 4

Fabri

10/03/2025, 8:38 PM

Hello, I want to use pulsar with .Net but the librsry dotpulsar is too basic without many features, so I saw this client matrix https://pulsar.apache.org/client-feature-matrix/#consumer Where there is a column called .Net (C#,F#,VB), is this other library?

Praveen Gopalan

10/06/2025, 6:37 AM

Hi Team, We are currently operating a clustered environment consisting of two instances, running Apache Pulsar version 4.1.1 . During load testing, with "brokerDeduplicationEnabled= true", we have encountered issues related to message loss. For your reference and to support further analysis, I have attached the broker configuration file. And the MessageId corresponding to an unpublished message is. { "partition": -1, "ledgerId": -9223372036854800000.0, "batchIndex": -1, "entryId": -9223372036854800000.0 } DotPulsar version 4.3.2 is being used as the client library. We would appreciate your assistance in investigating this issue and welcome any insights or recommendations you may have.

zaryab

10/09/2025, 1:02 PM

Subject: Bookie crashes with -25 error when enabling StreamStorageLifecycleComponent (Pulsar 4.1.1) Hi everyone 👋 — I’m setting up a small Pulsar 4.1.1 cluster and running into an issue when enabling stream storage. Cluster Setup I have 3 VMs: Host CPUs RAM Storage Roles 10.0.1.74 8 16GB 500GB SSD ZooKeeper + Bookie + Broker 10.0.1.75 8 16GB 400GB SSD ZooKeeper + Bookie + Broker 10.0.1.91 12 64GB 100GB SSD ZooKeeper Cluster Init Command bash bin/pulsar initialize-cluster-metadata \ --cluster dev-cluster-1 \ --metadata-store zk10.0.1.742181,10.0.1.752181,10.0.1.912181 \ --configuration-metadata-store zk10.0.1.742181,10.0.1.752181,10.0.1.912181 \ --web-service-url http://10.0.1.74:8080,10.0.1.75:8080 \ --web-service-url-tls https://10.0.1.74:8443,10.0.1.75:8443 \ --broker-service-url pulsar://10.0.1.74:6650,10.0.1.75:6650 \ --broker-service-url-tls pulsar+ssl://10.0.1.74:6651,10.0.1.75:6651 Bookie Config (relevant parts) properties storageserver.grpc.port=4181 dlog.bkcEnsembleSize=2 dlog.bkcWriteQuorumSize=2 dlog.bkcAckQuorumSize=1 storage.range.store.dirs=data/bookkeeper/ranges storage.serve.readonly.tables=false storage.cluster.controller.schedule.interval.ms=30000 Issue When I run Pulsar in stateless mode, ZooKeeper, BookKeeper, and Brokers all start fine. But when I enable: properties extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent both Bookies crash shortly after startup with a BKTransmitException and error -25. Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/9 Excerpt from logs kotlin Caused by: org.apache.distributedlog.exceptions.BKTransmitException: Failed to open ledger handle for log segment ... : -25 What I’ve tried Verified ZooKeeper quorum and BookKeeper ledger directories Cleaned /data/bookkeeper and restarted the cluster Ensured the ensemble/write/ack quorum configs match cluster size Question Has anyone successfully enabled the stream storage component (StreamStorageLifecycleComponent) on Pulsar 4.1.1? What does the -25 (BKTransmitException) usually indicate in this context — ZK metadata corruption, missing ledger, or a config mismatch? Any guidance or example configurations for Pulsar 4.x stream storage clusters would be greatly appreciated

Nicolas Belliard

10/15/2025, 2:18 PM

Hi Team 👋 I'm investigating an issue related to the Pulsar broker configuration parameter

delayedDeliveryTrackerFactoryClassName

. We initially used

InMemoryDelayedDeliveryTracker

, (because we where using version 2.7 of pulsar) which caused acknowledged delayed messages to be reprocessed after a broker restart likely due to its state stored only in memory. Given our high message volume (millions), this behavior is problematic. A screenshot is available showing the lag escalation following a broker restart. We're generating delayed messages out of sequence, resulting in gaps within the acknowledged message stream. This causes non-contiguous ranges of messages to be marked as deleted or eligible for deletion. In our screenshot, the value of

nonContiguousDeletedMessagesRanges

is 16833. To mitigate this following the upgrade of pulsar to the version 4.0.4, we updated the broker config to use

org.apache.pulsar.broker.delayed.BucketDelayedDeliveryTrackerFactory

, which should persist delayed delivery metadata to disk via BookKeeper ledger buckets. However, after switching to the bucket-based tracker, we're still seeing the same behavior post-restart. A few observations and questions: • I checked the

pulsar_delayed_message_index_loaded

metric and noticed that messages are still being loaded into memory, while

pulsar_delayed_message_index_bucket_total

remains at zero. Is this expected? Shouldn’t the bucket tracker be persisting and loading from disk? • Are there additional broker settings required to fully enable bucket-based delayed delivery tracking? For example: ◦ Do we need to explicitly configure

delayedDeliveryTrackerBucketSize

delayedDeliveryMaxNumBuckets

? ◦ Is there any dependency on topic-level settings or namespace policies that could override the broker-level tracker configuration? ◦ Could other settings interfere with delayed message persistence? Any insights or guidance would be greatly appreciated. Thanks for your help!

benjamin99

10/17/2025, 3:01 AM

Hi team: I DO REALLY NEED THE HELP RIGHT NOW. One of my bookie node in the production cluster are stuck in the CrashLoopBackOff status; after checking the logs, I find out that it is because the node cannot successfully access the lock from the metadata server (which we use Oxia in our cluster). I have tried to restart the Oxia cluster, since I saw some error messages saying that the oxia cannot append the WAL to the stream (I were facing the similar situation last month, and the restarting Oxia trick did work for me). Unfortunately, the bookie node still cannot get it's own lock after restarting the Oxia. I REALLY NEED SOMEONE THAT CAN HELP ME OUT WITH THE ISSUE 😢