pulsar #general

Soumya Ghosh

01/16/2025, 8:49 PM

We are observing an intermittent issue where consumer does not receive messages but there is a non-zero backlog in the subscription. Setup summary: Pulsar 4.0.1, 3 brokers, 5 bookies, 3 ZKs, Produce and Consume through Pulsar Java client (3.3.3) When we noticed this issue first, we couldn’t figure out what was and then after 2-3 minutes the data in backlog was delivered. But during that we did run CLI to consume messages from Earliest and received no messages. This time when issue was observed again, we again ran CLI from the broker node itself to consume messages from earliest. We observed that no messages were delivered for 10 minutes and then suddenly all backlog messages were delivered to original consumer and CLI consumer. The backlog in Java consumer was 30. The backlog in CLI consumer was 2697 as we were configured the subscription to consume from earliest message. Topics details Partition - 1 Dispatch throttling - 1 MB/s Publish throttling - 1 MB/s Retention - 1 day Message TTL - 1 day Pulsar topic stats, noted that blockedConsumerOnUnackedMsgs and blockedSubscriptionOnUnackedMsgs was false when backlog messages were not delivered.

Copy code

{
  "msgRateIn": 0,
  "msgThroughputIn": 0,
  "msgRateOut": 0,
  "msgThroughputOut": 0,
  "bytesInCounter": 270413,
  "msgInCounter": 2697,
  "systemTopicBytesInCounter": 0,
  "bytesOutCounter": 358717,
  "msgOutCounter": 3577,
  "bytesOutInternalCounter": 0,
  "averageMsgSize": 0,
  "msgChunkPublished": false,
  "storageSize": 270413,
  "backlogSize": 270413,
  "backlogQuotaLimitSize": -1,
  "backlogQuotaLimitTime": -1,
  "oldestBacklogMessageAgeSeconds": 29328,
  "oldestBacklogMessageSubscriptionName": "java-sdk",
  "publishRateLimitedTimes": 0,
  "earliestMsgPublishTimeInBacklogs": 0,
  "offloadedStorageSize": 268368,
  "lastOffloadLedgerId": 0,
  "lastOffloadSuccessTimeStamp": 0,
  "lastOffloadFailureTimeStamp": 0,
  "ongoingTxnCount": 0,
  "abortedTxnCount": 0,
  "committedTxnCount": 0,
  "publishers": [],
  "waitingPublishers": 0,
  "subscriptions": {
    "console_sub": {
      "msgRateOut": 0,
      "msgThroughputOut": 0,
      "bytesOutCounter": 0,
      "msgOutCounter": 0,
      "msgRateRedeliver": 0,
      "messageAckRate": 0,
      "chunkedMessageRate": 0,
      "msgBacklog": 2697,
      "backlogSize": 270413,
      "earliestMsgPublishTimeInBacklog": 0,
      "msgBacklogNoDelayed": 2697,
      "blockedSubscriptionOnUnackedMsgs": false,
      "msgDelayed": 0,
      "msgInReplay": 0,
      "unackedMessages": 0,
      "type": "Failover",
      "msgRateExpired": 0,
      "totalMsgExpired": 0,
      "lastExpireTimestamp": 0,
      "lastConsumedFlowTimestamp": 0,
      "lastConsumedTimestamp": 0,
      "lastAckedTimestamp": 0,
      "lastMarkDeleteAdvancedTimestamp": 0,
      "consumers": [
        {
          "msgRateOut": 0,
          "msgThroughputOut": 0,
          "bytesOutCounter": 0,
          "msgOutCounter": 0,
          "msgRateRedeliver": 0,
          "messageAckRate": 0,
          "chunkedMessageRate": 0,
          "availablePermits": 1000,
          "unackedMessages": 0,
          "avgMessagesPerEntry": 0,
          "blockedConsumerOnUnackedMsgs": false,
          "drainingHashesCount": 0,
          "drainingHashesClearedTotal": 0,
          "drainingHashesUnackedMessages": 0,
          "lastAckedTimestamp": 0,
          "lastConsumedTimestamp": 0,
          "lastConsumedFlowTimestamp": 0,
          "lastAckedTime": "1970-01-01T00:00:00Z",
          "lastConsumedTime": "1970-01-01T00:00:00Z"
        }
      ],
      "isDurable": true,
      "isReplicated": false,
      "allowOutOfOrderDelivery": false,
      "consumersAfterMarkDeletePosition": {},
      "drainingHashesCount": 0,
      "drainingHashesClearedTotal": 0,
      "drainingHashesUnackedMessages": 0,
      "nonContiguousDeletedMessagesRanges": 0,
      "nonContiguousDeletedMessagesRangesSerializedSize": 0,
      "delayedMessageIndexSizeInBytes": 0,
      "subscriptionProperties": {},
      "filterProcessedMsgCount": 0,
      "filterAcceptedMsgCount": 0,
      "filterRejectedMsgCount": 0,
      "filterRescheduledMsgCount": 0,
      "durable": true,
      "replicated": false
    },
    "java-sdk": {
      "msgRateOut": 0,
      "msgThroughputOut": 0,
      "bytesOutCounter": 358717,
      "msgOutCounter": 3577,
      "msgRateRedeliver": 0,
      "messageAckRate": 0,
      "chunkedMessageRate": 0,
      "msgBacklog": 30,
      "backlogSize": 3069,
      "earliestMsgPublishTimeInBacklog": 0,
      "msgBacklogNoDelayed": 30,
      "blockedSubscriptionOnUnackedMsgs": false,
      "msgDelayed": 0,
      "msgInReplay": 0,
      "unackedMessages": 0,
      "type": "Failover",
      "msgRateExpired": 0,
      "totalMsgExpired": 0,
      "lastExpireTimestamp": 0,
      "lastConsumedFlowTimestamp": 0,
      "lastConsumedTimestamp": 0,
      "lastAckedTimestamp": 0,
      "lastMarkDeleteAdvancedTimestamp": 0,
      "consumers": [
        {
          "msgRateOut": 0,
          "msgThroughputOut": 0,
          "bytesOutCounter": 0,
          "msgOutCounter": 0,
          "msgRateRedeliver": 0,
          "messageAckRate": 0,
          "chunkedMessageRate": 0,
          "availablePermits": 1000,
          "unackedMessages": 0,
          "avgMessagesPerEntry": 0,
          "blockedConsumerOnUnackedMsgs": false,
          "drainingHashesCount": 0,
          "drainingHashesClearedTotal": 0,
          "drainingHashesUnackedMessages": 0,
          "lastAckedTimestamp": 0,
          "lastConsumedTimestamp": 0,
          "lastConsumedFlowTimestamp": 0,
          "lastAckedTime": "1970-01-01T00:00:00Z",
          "lastConsumedTime": "1970-01-01T00:00:00Z"
        }
      ],
      "isDurable": true,
      "isReplicated": false,
      "allowOutOfOrderDelivery": false,
      "consumersAfterMarkDeletePosition": {},
      "drainingHashesCount": 0,
      "drainingHashesClearedTotal": 0,
      "drainingHashesUnackedMessages": 0,
      "nonContiguousDeletedMessagesRanges": 0,
      "nonContiguousDeletedMessagesRangesSerializedSize": 0,
      "delayedMessageIndexSizeInBytes": 0,
      "subscriptionProperties": {},
      "filterProcessedMsgCount": 0,
      "filterAcceptedMsgCount": 0,
      "filterRejectedMsgCount": 0,
      "filterRescheduledMsgCount": 0,
      "durable": true,
      "replicated": false
    }
  },
  "replication": {},
  "nonContiguousDeletedMessagesRanges": 0,
  "nonContiguousDeletedMessagesRangesSerializedSize": 0,
  "delayedMessageIndexSizeInBytes": 0,
  "compaction": {
    "lastCompactionRemovedEventCount": 0,
    "lastCompactionSucceedTimestamp": 0,
    "lastCompactionFailedTimestamp": 0,
    "lastCompactionDurationTimeInMills": 0
  },
  "metadata": {
    "partitions": 1,
    "deleted": false
  },
  "partitions": {}
}

Attached screenshots, backlog of 2.7k is captured when CLI consumer is started and it recovers after 10 minutes. The metrics for dispatch rate and dispatch throughput spikes at 12:13 when messages are delivered to CLI and a Java application.

David K

01/16/2025, 9:06 PM

Copy code

"lastAckedTime" : "1970-01-01T00:00:00Z",
 "lastConsumedTime" : "1970-01-01T00:00:00Z"

David K

01/16/2025, 9:06 PM

Seems a bit off.

Sandesh Shingare

01/17/2025, 1:18 PM

Hello, What is the default value of

maxPendingMessages

in pulsar Java producer client? The doc says 0, meaning the check will be disabled by default. We have set

blockIfQueueFull

to true in our producer without explicitly setting

maxPendingMessages

config. In this case at what queue size the producer send calls will be blocked? I see 166 being shown as

maxPendingMessages

in the producer stats when I run my application. Can someone please help understand this? Pulsar client version: 4.0.1

Jerome B

01/17/2025, 11:24 PM

Hi, is it possible that using backoff on a topic with subscription with a lot of message not consumed (I mean about 1 million), will overload pulsar ?

bmtrivedi

01/19/2025, 3:22 AM

Hi. Can zombie writes happen in a partitioned topic? If yes, how is it resolved? Consider a scenario that a single publisher is publishing to a partitioned topic. The publisher sends a publish request which times out (but that request is lingering in the Pulsar ecosystem, either in broker or bookkeeper) and then publisher retries the request which is successfully written. After some successful publishes, the first timed out lingering request went ahead and wrote the events in the partition, and it will then break the order of events.

Lari Hotari

01/20/2025, 3:41 PM

📯 [ANNOUNCE] Apache Pulsar 3.0.9 released The Apache Pulsar team is proud to announce Apache Pulsar version 3.0.9. Pulsar is a highly scalable, low latency messaging platform running on commodity hardware. It provides simple pub-sub semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. For Pulsar release details and downloads, visit: https://pulsar.apache.org/download Release Notes are at: https://pulsar.apache.org/release-notes/versioned/pulsar-3.0.9/ We would like to thank the contributors that made the release possible. Regards, The Pulsar Team

🎉 1

Lari Hotari

01/20/2025, 3:42 PM

📯 [ANNOUNCE] Apache Pulsar 3.3.4 released The Apache Pulsar team is proud to announce Apache Pulsar version 3.3.4. Pulsar is a highly scalable, low latency messaging platform running on commodity hardware. It provides simple pub-sub semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. For Pulsar release details and downloads, visit: https://pulsar.apache.org/download Release Notes are at: https://pulsar.apache.org/release-notes/versioned/pulsar-3.3.4/ We would like to thank the contributors that made the release possible. Regards, The Pulsar Team

🎉 1

Lari Hotari

01/20/2025, 3:43 PM

🎆 [ANNOUNCE] Apache Pulsar 4.0.2 released pulsarlogo The Apache Pulsar team is proud to announce Apache Pulsar version 4.0.2. Pulsar is a highly scalable, low latency messaging platform running on commodity hardware. It provides simple pub-sub semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. For Pulsar release details and downloads, visit: https://pulsar.apache.org/download Release Notes are at: https://pulsar.apache.org/release-notes/versioned/pulsar-4.0.2/ We would like to thank the contributors that made the release possible. Regards, The Pulsar Team

🎉 3

ojoxdan

01/21/2025, 3:52 PM

Good day everyone, I had previoulsy set the dispatch rate of a topic at 150 msg/s then later reset to unlimted with the value

-1

, after the last configuration I was expecting the rate to increase but was stuck at 150, Do I need to clear or restart something in in my k8 cluster ? @David K @Yu Wei Sung @Lari Hotari

Alexander Brown

01/22/2025, 5:26 PM

This might be a simple answer, but we have pulsar helm chart 3.8.0 and running with oxia in k8s. 3 brokers/3 bookies, and at 400k msg/s latencies are 10ms, but at ~600k to 1000k latency spikes to high number like 25s. Bookies are running nvme, and I'm wondering if we just need to add more bookies or if there are some easy places to increase msg throughput?

Preet Patel

01/23/2025, 9:25 AM

Hi, I m trying to upgrade pulsar-3.0.2 to 4.0.1 inside kubernetes cluster, Do we have any utility / function available through which we can port the data during upgradation. like, I want tenants, namespaces, topics and data inside topics to be persisted throughout the upgrade and do not want any kind of data loss.

Olivier NOUGUIER

01/24/2025, 2:04 PM

👋 Hi everyone! I like to reproduce my zio-pravega connector for Pulsar, should be easy at 90%. But I cannot find a way to share an (aka open a running) pulsar transaction, is it possible ?

Lari Hotari

01/27/2025, 8:30 AM

For users running Pulsar Docker containers on Apple M4 on MacOS Sequoia 15.2, there's a JVM bug that causes the JVM to crash and it results in an error message "Pulsar requires Java 17 or later". The workaround is to pass _JAVA_OPTIONS=-XX:UseSVE=0 environment variable. (this JVM setting is only supported on arm64 platforms) example of using

-e "_JAVA_OPTIONS=-XX:UseSVE=0"

to pass the environment option on the docker command line:

Copy code

docker run --rm -it -p 6650:6650 -p 8080:8080 -e "_JAVA_OPTIONS=-XX:UseSVE=0" apachepulsar/pulsar:4.0.2 bin/pulsar standalone -nfw -nss

The problem will be fixed in Pulsar 4.0.3 docker images which will include the latest Corretto OpenJDK release which contains the fix. More details in https://github.com/apache/pulsar/issues/23891#issuecomment-2615087017 .

👍 2

Shubh Vanvat

01/27/2025, 2:14 PM

Hi, i am trying to setup dekaf in my kubernetes setup is there any helm chart available to streamline the process? any step by step guide to get started will be helpful

Soumya Ghosh

01/29/2025, 1:50 PM

We are observing an intermittent issue where Pulsar producer client is unable to produce message to a partitioned topic, it is failing with

org.apache.pulsar.client.api.PulsarClientException$TimeoutException

after 30 seconds. Setup summary: Setup summary: Pulsar 4.0.1, 3 brokers, 5 bookies, 3 ZKs, Produce and Consume through Pulsar Java client (4.0.1) We also upgrade Pulsar cluster and client version to 4.0.2 but still observing the same issue.

Copy code

// Pulsar client
this.pulsarClient =
          PulsarClient.builder()
              .serviceUrl(connectionUrl)
              .memoryLimit(0, SizeUnit.MEGA_BYTES)
              .build();

// Producer configurations
ProducerBuilder<T> builder =
        pulsarClient
            .newProducer(schema)
            .producerName(producerName)
            .topic(topicName)
            .compressionType(CompressionType.LZ4)
            .enableBatching(true)
            .batchingMaxPublishDelay(30, TimeUnit.MILLISECONDS)
            .blockIfQueueFull(true);

Pulsar brokers, bookies and client instances are running in the same AWS VPC, so chances of network outage are quite low. Brokers are configured to use

advertisedAddress

advertisedListeners

is not configured In the client application, the Pulsar client and Producer object is initialized once and then messages are produced in sporadic intervals. There is no other producer or consumer running on the cluster and there is more than sufficient resources available to cater to produce request. Following are attached screenshots of client application and broker node which owns this topic. Time frame - 10:30 UTC We observed that after message failed to produce in 30 seconds, Pulsar producer clients were closed due to keep-alive timeout expiry and then recreated. At the same time we see in brokers logs

Connection reset by peer

Any thoughts on what could be going wrong here?

Daniel Kaminski

01/29/2025, 3:54 PM

Hi, we observe a strange behaviour in our pulsar cluster which I would like to clarify if this is intended. We ran in a HA setup which means when I create a topology this will be replicated in the second cluster as well. So one of our customers created a namespace with the property schemaValidationEnforced: true but this seems to cause some issues for the system topics.

[pulsar-io-6-4] WARN  org.apache.pulsar.broker.service.AbstractReplicator - [<persistent://tenant/namespace/__change_events> | pulsar-prod-ha1-->pulsar-prod-ha2] Failed to create remote producer (org.apache.pulsar.client.api.PulsarClientException$IncompatibleSchemaException: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Producers cannot connect or send message without a schema to topics with a schema caused by org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Producers cannot connect or send message without a schema to topics with a schema","reqId":3006917652050389956, "remote":"our.pulsar.cluster/<ip-address>:6651", "local":"/<ip-address>:34314"}), retrying in 57.934 s

2025-01-29T14:41:26,689+0000 [pulsar-io-6-4] ERROR org.apache.pulsar.client.impl.ProducerImpl - [<persistent://tenant/namespace/__change_events>] [pulsar.repl.pulsar-prod-ha1-->pulsar-prod-ha2] Failed to create producer: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Producers cannot connect or send message without a schema to topics with a schema caused by org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Producers cannot connect or send message without a schema to topics with a schema","reqId":3006917652050389956, "remote":"our.pulsar.cluster/<ip-address>:6651", "local":"/<ip-address>"}

025-01-29T14:41:26,689+0000 [pulsar-io-6-4] WARN  org.apache.pulsar.client.impl.ClientCnx - [id: 0x755f1150, L:/100.70.32.147:34314 - R:rour.pulsar.cluster/<ip-address>:6651] Received error from server: org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Producers cannot connect or send message without a schema to topics with a schema caused by org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Producers cannot connect or send message without a schema to topics with a schema

I turned off schemaValidationEnforced off on namespace level and i can confirm that the error is gone and I do not see it for any system topic for that specific namespace. Can you confirm to me that this is intended? I guess the schemaValidationEnforced: true is inherited to the topic but I would be curious how this is solved when it comes to the system topics? We ran currently the pulsar LTS version 3.0.8.1.

Lari Hotari

01/30/2025, 8:45 AM

Pulsar 4.0.2 release contains a regression related to

seek

by timestamp. Please check the updated release notes for the workaround. Details also in this comment: https://github.com/apache/pulsar/issues/23910#issuecomment-2623862571

Conor

01/31/2025, 10:42 AM

Hello, If someone can spare a moment, could you take a look at https://github.com/apache/pulsar/issues/23890? Our dev environment has been down all week, and I'm worried the same issue could occur in a production cluster

Amy Krishnamohan

01/31/2025, 9:04 PM

Hello! Today, StreamNative published a blogpost that displays quite a remarkable benchmark result. This technology is based on Oxia (metadata coordinator that replace zookeeper) and leaderless architecture in Ursa engine. To learn more, please check out this blogpost. https://streamnative.io/blog/how-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour

👍 1

🎆 2

Soumya Ghosh

02/03/2025, 10:19 AM

Hello all, if someone can assist with this - https://github.com/apache/pulsar/issues/23920 This is an extension of slack discussion thread

hulusi

02/03/2025, 12:15 PM

hello everyone how can I detect my system helath check some time bookie is shuting down... can someone help me please ASAP!

Suvrajit Manna

02/03/2025, 2:31 PM

Hi Everyone, Is there any way to mimic

ZeroQueueConsumerImpl

like behaviour in case of Partitioned Topic? I was going through this issue: https://github.com/apache/pulsar-client-python/issues/43 Is any of the workaround mentioned, available for Pulsar Java Client ?

Slackbot

02/04/2025, 3:11 PM

message has been deleted

Girish Sharma

02/05/2025, 7:35 AM

I am trying to follow the bundle allocation flow but not getting any head or tales based on just the info/debug logs. Suppose we create a namespace and a topic in that namespace. Assumption is that the namespace is part of a ns-isolation policy and there are specific primary brokers that namespace can be hosted on. As per code, when a lookup happens, it should eventually go in ModularLoadManager::selectBroker but in the logs, I am not seeing the relevant logs.. Are there other places where this decision is made (considering the isolation regex in mind)? Adding debugger is difficult because the decision can happen on any of the broker and we want to replicate it on a multi broker setup with different isolation groups

Shaun

02/06/2025, 12:59 AM

Should I expect to see consistent behavior between

persistent

non-persistent

topics when using a subscription type of

failover

? I am seeing that the broker never sends notification on active and inactive status changes to

non-persistent

topics like it does for

persistent

topics. Is this expected?

Conor

02/06/2025, 10:04 AM

If someone can spare a moment, could you take a look at https://github.com/apache/pulsar/issues/23890?

I haven't found a solution for this, so I am going to go ahead and delete the ledgers mentioned in the error messages from the Bookies manually. If anyone knows a more correct solution please let me know

Thomas MacKenzie

02/06/2025, 5:39 PM

I'm looking at region / zone awareness, is there any other way to set the placement policies than using the

pulsar-admin

client? I've been looking in the configuration in the helm chart and directly in the broker and bookkeeper instances confs but did not find anything. Thanks https://pulsar.apache.org/docs/4.0.x/administration-isolation-bookie/

Sreenath Sasidharan

02/06/2025, 8:43 PM

Hello, I am trying to set a service account name(

serviceAccountName

) allowing access to our s3 buckets to pulsar functions worker in Kubernetes. Based on what I read from the documentation: https://pulsar.apache.org/docs/4.0.x/functions-runtime-kubernetes/ and source code, I dont see a way to do it for python via

customRuntimeOptions

. Is that correct or have I missed something?

Soumya Ghosh

02/11/2025, 9:33 AM

Hello, I am getting an error in initializing cluster metadata. Setup - 3 ZK, 5 bookies, 3 broker, 1 proxy node. Command

bin/pulsar initialize-cluster-metadata --cluster <cluster_name> --metadata-store zk:zk_1,zk_2,zk_3:2181 --configuration-metadata-store zk:zk_1,zk_2,zk_3:2181             --web-service-url http://<broker_r53>:8080  --broker-service-url pulsar://<broker_r53>:6650

Output :

Killed

Are there any flags to get verbose error?