MQTT client doesn't realize that it lost its conne...
# help
t
What happens if the ESP doesn't lose its wifi connection but the wifi router is no longer connected to the internet? When we unplug our router from the internet, the ESP is still happy about its connection and calls to mqtt.publish don't give any errors. After about 10 seconds and several calls to publish, it seems to hang in the publish call, and when we reconnect the router, the ESP unhangs and resumes normally after a similar period. It's like the mqtt client doesn't realize that it is not online, which is supported by the fact that when it finally comes back online it doesn't need to call mqtt.connect to be able to publish normally. (We are using version 2.0.4 of the mqtt client.)
f
I did a bit of work to improve the stability of the MQTT package on Friday. I will continue with it tommorow (Monday), and will have a look at your scenario.
t
Thanks 🙂
f
Fyi: I'm feeling under the weather, so I won't make any progress on this today.
t
Hi Florian. Any progress on this?
f
I'm going to work on this today.
I just looked into this.
TCP connections have a really long delay before they consider a connection dead. I tried on my Linux and an ESP32 and they both took 15-30 minutes to realize that something is wrong. I googled a bit to find out whether one can reduce that delay, but didn't yet find anything for the ESP. If there is, we can try to expose it, so that users can set it. Independently, I just implemented (but not tested) a
max_inflight
option. For QoS=1 packets, the client would block a
publish
if there were too many outstanding packets. Together with a
with_timeout
on the
publish
this could yield a faster detection of dropped connections.
t
That could explain it. What about QoS=0? (which is what we are using)
f
QoS=0 would just send into the void. So there wouldn't be any difference to the current behavior. The window size on the esp32 is relatively small so you can probably still detect it quite early with a
with_timeout
(at least I think it's the window size that makes the call blocking at some point)
t
Hi Florian
with_timeout
solved the issue, but we could probably benefit from the changes you made to the mqtt package. Do you plan to make a new package release with the changes? /Tommy
f
Yes. Thanks for reminding me.
t
No problem 😄
f
Released
t
Thanks 🙂
r
It looks like the new mqtt package requires 2.0.0-alpha.41, do you remember what exactly is required from it? 🙂
f
Mostly minor things.
Like an implementation of
monitor.Signal
and similar.
I could "backport" it and release another version that still runs on v1.
Which version do you have?
r
We are using 2.0.0-alpha.35 currently, but we do need to upgrade in the near future anyway, so we are just discussing atm 🙂
f
I'm pretty sure it should just work if there aren't any warnings during compilation.
Maybe 35 is recent enough.
I just picked the one I was testing with.
r
I am getting
The SDK constraint defined in the package.lock file is not satisfied: v2.0.0-alpha.35 < ^2.0.0-alpha.41
when i try to build
Or when it compiles*
f
I uploaded a branch with alpha.35 as requirement. Let's see if it the build goes green.
If it does, then I can just release a new version with lowered constraints.
r
I'm not sure how to use a package from a specific branch 🤔
f
Looks clean.
r
Oh okay, you were trying it
f
Just fyi: if you want to experiment with a modified version of a package, the easiest is generally to check the package out, and then to install it with
jag pkg install --local --name mqtt ../mqtt
where
mqtt
is the name under which it would be used in your project, and
../mqtt
is the path to the checked out version.
Since I just uploaded a branch with lower constraints that worked, I will request a review and then release a new version for the package manager soon after.
so no need to do this on your side
r
Okay thank you very much! 🙂
f
v2.1.1 should be live.
Please let me know if you see weird things.
r
I will start testing now 🙂
I noticed our fix with
with_timeout
around publish does not work anymore with v2.1.1, it never times out and the esp never recovers, not even when internet is back 🤔 but I haven't looked further into this yet, but I will today
f
Thanks for testing. Interesting. If you have the time to look into it a bit more, that would be great. Let me know when you want me to have a look.
r
I will have to push it to tomorrow, but here's some info: I am blocking the mqtt port 1883 to disconnect the esp from mqtt, and when I unblock it again, it reports this:
Copy code
2022-11-30 11:48:39,808 DEBUG: Connected to broker
2022-11-30 11:48:39,814 DEBUG: Attempting to (re)connect
2022-11-30 11:48:39,825 Heap report @ out of memory:
2022-11-30 11:48:39,838   ┌───────────┬─────────┬───────────────────────┐
2022-11-30 11:48:39,843   │   Bytes   │  Count  │  Type                 │
2022-11-30 11:48:39,855   ├───────────┼─────────┼───────────────────────┤
2022-11-30 11:48:39,860   │    5256   │      4  │  external byte array  │
2022-11-30 11:48:39,866   │  102400   │     19  │  toit                 │
2022-11-30 11:48:39,871   │    8952   │     37  │  lwip                 │
2022-11-30 11:48:39,876   │    7992   │    723  │  heap overhead        │
2022-11-30 11:48:39,881   │    7864   │     51  │  event source         │
2022-11-30 11:48:39,886   │   32872   │    313  │  thread/other         │
2022-11-30 11:48:39,891   │   22304   │     21  │  thread/spawn         │
2022-11-30 11:48:39,896   │   23280   │    159  │  untagged             │
2022-11-30 11:48:39,901   │   33864   │     81  │  wifi                 │
2022-11-30 11:48:39,914   └───────────┴─────────┴───────────────────────┘
2022-11-30 11:48:39,921   Total: 244784 bytes in 685 allocations (82%), largest free 52k, total free 54k
2022-11-30 11:48:39,924 DEBUG: Connection established
2022-11-30 11:48:39,926 INFO: disconnect from server
2022-11-30 11:48:39,929 DEBUG: Attempting to (re)connect
2022-11-30 11:48:39,937 DEBUG: Attempting to (re)connect
It keeps giving the
Heap report @ out of memory
, so I think it tries to reconnect multiple times with the same client, and having multiple clients with same name will disconnect each other.
f
The new client is probably more aggressive in trying to reconnect.
If you don't close it explicitly, it will continue to do so.
The old one gave up after some tries.
So if you create a new client each time the
publish
doesn't work but don't close the old one, you might end up with many clients trying to run in parallel.
r
Yup, the old one completely disconnects after some tries
We dont run publish if we are disconnected from mqtt
But I think the problem lies with the catch around publish, it does not seem to work anymore. It didnt catch
with_timeout
, but I didnt have time to look further into this. So if we cant detect a disconnect with catch, we may be using publish when disconnected, and making multiple clients.
f
There are a few things here: - the client uses a reconnection-strategy. That one is configurable, but is now (by default) the "tenacious" strategy, which will never give up. It just tries over and over again, incrementing the delay between attempts by 1 second. - as long as the reconnection strategy is trying, a
publish
call doesn't fail. However, the
publish
is blocked at that point.
I don't know why you are seeing the out-of-memory. If it's related to the mqtt client, then that's clearly something to look into.
How did you catch exceptions from the
publish
? Was it with a
with_timeout
or was it a different one?
Note, that you can go back to the old reconnection strategy by passing it into the constructor or
start
function. (Don't remember which one).
r
We are now using
with_timeout
to catch, but I think we will go back to old strategy for connecting. I will look into the new strategy later, but for now we want the client to close.
f
In that context: the new client also has an option to set the maximum number of inflight packets.
That is, packets that haven't been acked yet.
This makes the
publish
block if there are too many inflight messages.
If you want to, we can schedule a VC tomorrow, and discuss what you need, and how best to achieve it.
r
Yeah the inflight packets is what we are excited for 🙂 but I dont know when I have time to look at it. A VC next week would be great I think, I will see tomorrow if thats okay
f
Sure. Just ping me.
r
Great thanks! 👍
We would like to have a chat on Monday @floitsch 🙂 So around 9:00-10:00 on Monday ?
f
Would it be possible to move it to a bit later?
r
Sure, when do you have time?
f
Anything after 10:00 works. Preferred is 11:00+
r
How about 13:00?
f
Perfect