Fabri
06/07/2025, 12:11 AMSahaj Kodia
06/09/2025, 4:58 PMAlexander Preuß
06/10/2025, 4:56 PMRémi Collignon-Ducret
06/11/2025, 4:07 PMFabri
06/14/2025, 8:47 AMSahaj Kodia
06/17/2025, 9:59 AMShrey Kothari
06/19/2025, 9:15 AMhulusi
06/20/2025, 11:44 AMEric Satterwhite
06/23/2025, 12:19 PMEric Satterwhite
06/23/2025, 12:28 PMDaniel Kaminski
06/24/2025, 7:46 AMjava.lang.IllegalStateException: Field 'message' is not set
at org.apache.pulsar.common.api.proto.CommandLookupTopicResponse.getMessage(CommandLookupTopicResponse.java:220)
at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:629)
at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:154)
The issue persisted until the function pods were manually restarted, by which time their own customers had already been impacted.
• ClientCnx.handleLookupResponse --> ClientCnx represents the client's connection to a broker. It was processing the response to a CommandLookupTopic request. A topic lookup is the mechanism a client uses to ask the Pulsar cluster, "Which broker is currently serving this topic?".
• CommandLookupTopicResponse.getMessage --> The client received a response from the broker. The code then attempted to get an error message from this response.
• IllegalStateException: Field 'message' is not set --> This exception means the client code expected the message field to be populated in the response it received, but it wasn't. In the Pulsar protocol, this message field is optional. The broker sent a response indicating failure or redirection but didn't include the optional descriptive text.
Our Hypothesis:
Our theory is that the function's internal client entered a "stuck" state during the broker rolling upgrade and failed to recover.
What happened during the update:
1. A broker pod is terminated
2. All client connections to that broker are severed.
3. The Pulsar Function's internal client automatically tries to reconnect.
4. As part of the reconnection, it performs a topic lookup.
5. The ownership of the topic may be in the process of being transferred from the old broker to a new one. If the client's lookup request hits a broker during this transient state, the broker might issue a Redirect or Failed lookup response.
6. In some edge cases, this response is sent without the optional message field, triggering the bug in the
We also discovered a version mismatch between the cluster and the client libraries:
• *Client Libraries:*pulsar-client and pulsar-functions-api are both version 3.2.1.
• Cluster Version:3.0.11.
My question are now:
Is it a requirement for the client library versions to strictly match the Pulsar cluster version? Could this version skew be a potential cause for this kind of failure during a rolling upgrade?
Are you aware of any existing GitHub issues related to this specific IllegalStateException in this context? We were unable to find a direct match in our own research.
Thanks in advance!Ali Ahmed
06/28/2025, 5:21 AMbenjamin99
06/30/2025, 9:36 AMFatih
07/01/2025, 11:43 AMloadBalancerEnabled=true
loadBalancerReportUpdateThresholdPercentage=10
loadBalancerReportUpdateMinIntervalMillis=5000
loadBalancerReportUpdateMaxIntervalMinutes=15
loadBalancerHostUsageCheckIntervalMinutes=1
loadBalancerSheddingEnabled=true
loadBalancerSheddingIntervalMinutes=1
loadBalancerSheddingGracePeriodMinutes=30
loadBalancerBrokerMaxTopics=50000
loadBalancerBrokerOverloadedThresholdPercentage=85
loadBalancerResourceQuotaUpdateIntervalMinutes=15
loadBalancerAutoBundleSplitEnabled=true
loadBalancerAutoUnloadSplitBundlesEnabled=true
loadBalancerNamespaceBundleMaxTopics=1000
loadBalancerNamespaceBundleMaxSessions=1000
loadBalancerNamespaceBundleMaxMsgRate=30000
loadBalancerNamespaceBundleMaxBandwidthMbytes=100
loadBalancerNamespaceMaximumBundles=128
loadBalancerOverrideBrokerNicSpeedGbps=
loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder
loadBalancerLoadPlacementStrategy=org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate
loadBalancerBrokerThresholdShedderPercentage=10
loadBalancerAverageResourceUsageDifferenceThresholdPercentage=10
loadBalancerMsgRateDifferenceShedderThreshold=50
loadBalancerMsgThroughputMultiplierDifferenceShedderThreshold=4
loadBalancerHistoryResourcePercentage=0.9
loadBalancerBandwithInResourceWeight=1.0
loadBalancerBandwithOutResourceWeight=1.0
loadBalancerCPUResourceWeight=1.0
loadBalancerDirectMemoryResourceWeight=1.0
loadBalancerBundleUnloadMinThroughputThreshold=10
loadBalancerAvgShedderLowThreshold = 15
loadBalancerAvgShedderHighThreshold = 40
loadBalancerAvgShedderHitCountLowThreshold = 8
loadBalancerAvgShedderHitCountHighThreshold = 2
loadBalancerDebugModeEnabled=false
loadBalancerBrokerLoadTargetStd=0.25
loadBalancerSheddingConditionHitCountThreshold=3
loadBalancerTransferEnabled=true
loadBalancerMaxNumberOfBrokerSheddingPerCycle=3
loadBalancerBrokerLoadDataTTLInSeconds=1800
loadBalancerMaxNumberOfBundlesInBundleLoadReport=10
loadBalancerSplitIntervalMinutes=1
loadBalancerMaxNumberOfBundlesToSplitPerCycle=10
loadBalancerNamespaceBundleSplitConditionHitCountThreshold=3
loadBalancerServiceUnitStateTombstoneDelayTimeInSeconds=3600
loadBalancerMemoryResourceWeight=1.0
Ali Ahmed
07/02/2025, 6:03 AMAmanda
07/02/2025, 6:48 PMserviceUrl
for the local cluster.
I enabled geo-replication at the namespace level, not at the topic level.
Does anyone know how to solve this issue? Using v4.0.1idoh
07/03/2025, 7:40 AMMichał Cukierman
07/04/2025, 8:34 AMsindhushree
07/07/2025, 1:42 PMkailevy
07/09/2025, 12:29 AMJeroen van der Wal
07/09/2025, 1:55 PMNikolas Petrou
07/13/2025, 6:28 AMtenant: public
namespace: default
name: jdbc_sink_pulsar_to_mysql_temp
archive: connectors/pulsar-io-jdbc-sqlite-4.0.0.nar
inputs:
- <persistent://public/default/temp_schema>
configs:
jdbcUrl: "jdbc:<mysql://mysql:3306/mqtt_db>"
userName: "user1"
password: "1234567890"
tableName: "pulsar_to_db_temp"
insertMode: INSERT
key: "message_id"
nonKey: "temperature,timestamp,pulsar_timestamp"
and i mount the connector under the connectors folder like so - ./pulsar-mysql/pulsar-io-jdbc-sqlite-4.0.0.nar:/pulsar/connectors/pulsar-io-jdbc-sqlite-4.0.0.nar
but i get this error
ERROR org.apache.pulsar.functions.instance.JavaInstanceRunnable - Sink open produced uncaught exception:
java.sql.SQLException: No suitable driver found for jdbc:<mysql://mysql:3306/mqtt_db>
at java.sql.DriverManager.getConnection(Unknown Source) ~[java.sql:?]
at java.sql.DriverManager.getConnection(Unknown Source) ~[java.sql:?]
at org.apache.pulsar.io.jdbc.JdbcAbstractSink.open(JdbcAbstractSink.java:97) ~[pulsar-io-jdbc-core-4.0.0.jar:?]
at org.apache.pulsar.functions.instance.JavaInstanceRunnable.setupOutput(JavaInstanceRunnable.java:1080) ~[?:?]
at org.apache.pulsar.functions.instance.JavaInstanceRunnable.setup(JavaInstanceRunnable.java:263) ~[?:?]
at org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:313) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Am I using the wrong connector? Or am I missing a configurationTyler Tafoya
07/16/2025, 6:32 PM/pulsar/data/bookkeeper/ledger0/current
I see files .log files days older than my retention period.
When researching this issue, I came across [PCK](https://docs.streamnative.io/private-cloud/v1/tools/pck/pck-overview) which seems to confirm this issue, as it aims to mitigate the problem. This does not look to be publicly available though - are there any alternative solutions?
Thanks in advance!sagar
07/17/2025, 10:59 AMtcolak
07/26/2025, 2:33 PMThomas MacKenzie
07/29/2025, 4:00 AM1
for the msg-dispatch-rate
and 315600000
for the dispatch-rate-period
, which essentially lower as much as possible the message throughput (I'm aware we can't actually pause message throughput with the dispatch rate).
But I noticed that the dispatch rate values returned by the admin client, on the application side, when the brokers restart, are empty as if they were not set. The consequence is that it unpauses the consumers, only to re-pause then after the brokers are done restarting. I just wanted to know if this is an expected behavior? I was assuming that this type of data, like topic properties was some sort of distributed data (from zookeeper) and was not impacted by brokers restartkazeem
07/30/2025, 11:08 PMLari Hotari
07/31/2025, 3:03 PMShresht Jain
08/02/2025, 6:43 PMJeffrey Tan
08/04/2025, 8:36 AM