Jaguar wifi closes (ESP32S3)
# help
m
I am having an issue where a device flashed with jaguar and just rebooted, closes the Wifi after about 40 seconds and does not reestablish the wifi connection. A log from the console (with more wifi info than normal)
Copy code
[wifi] DEBUG: connecting
I (6206) wifi:mode : sta (f4:12:fa:d4:9a:20)
I (6206) wifi:enable tsf
I (7416) wifi:new:<11,2>, old:<1,0>, ap:<255,255>, sta:<11,2>, prof:1
I (8096) wifi:state: init -> auth (b0)
I (8096) wifi:state: auth -> assoc (0)
I (8156) wifi:state: assoc -> run (10)
W (8166) wifi:<ba-add>idx:0 (ifx:0, 70:a7:41:ba:7e:46), tid:5, ssn:1, winSize:64
I (8166) wifi:connected with Spottune, aid = 1, channel 11, 40D, bssid = 70:a7:41:ba:7e:46
I (8166) wifi:security: WPA2-PSK, phy: bgn, rssi: -40
I (8166) wifi:pm start, type: 1

I (8166) wifi:set rx beacon pti, rx_bcn_pti: 0, bcn_timeout: 0, mt_pti: 25000, mt_time: 10000
[wifi] DEBUG: connected
I (8176) wifi:BcnInt:102400, DTIM:1
W (8246) wifi:<ba-add>idx:1 (ifx:0, 70:a7:41:ba:7e:46), tid:6, ssn:1, winSize:64
I (9246) esp_netif_handlers: sta ip: 192.168.1.152, mask: 255.255.255.0, gw: 192.168.1.1
[wifi] INFO: network address dynamically assigned through dhcp {ip: 192.168.1.152}
[wifi] INFO: dns server address dynamically assigned through dhcp {ip: [192.168.1.1]}
[jaguar] INFO: running Jaguar device 'stex_jig' (id: 'dc6b35c1-f193-45bb-a7d7-7926b54d3fbc') on 'http://192.168.1.152:9000'
I (38866) wifi:state: run -> init (6c0)
I (38866) wifi:pm stop, total sleep time: 16065060 us / 30696067 us

W (38866) wifi:<ba-del>idx
W (38866) wifi:<ba-del>idx
I (38866) wifi:new:<11,0>, old:<11,2>, ap:<255,255>, sta:<11,2>, prof:1
[wifi] DEBUG: closing
I (38866) wifi:flush txq
I (38866) wifi:stop sw txq
I (38866) wifi:lmac stop hw txq
I (38866) wifi:Deinit lldesc rx mblock:10
After this, the device needs to be rebooted before it starts jaguar again
r
Is this with the latest version of toit ? I did see my device with alpha 63 close its wifi, and never try to reconnect.
m
Yes, this is very current toit/jaguar.
k
I am looking into this. I hope to have an update (and a fix) early next week.
What is the latest version where you didn't experience this problem? I assume you've been running on 47 for a while, @Rikke.
I'm thinking that this might be a problem exposed by https://github.com/toitlang/toit/commit/214ab6d5dfd4ae13138dc9adb3cfd12f44c85a40.
The close dance is rather complicated, so my first step will be to make this more robust. It feels like we forget to notify Jaguar that the network is closed and I'm thinking that it might happen because we have concurrent modifications of some state in the wifi service. Are you running other code that use the network on top of Jaguar when you encounter this?
https://github.com/toitlang/toit/blob/master/system/extensions/esp32/wifi.toit#L161 <-- this is where we start notifying other processes that the network is down, but we wait for them to call us back (close), which is a little bit shaky. They do this asynchronously, so we may be manipulating the resources map while we're notifying, which is clearly bad. Also, I think I'll close the resource eagerly and then notify.
But first step is to try to find a good repro.
I think I have a repro.
r
47 yes, and im not sure if it is the excatly same problem. I was unable to reproduce it on a 2nd device. But the first device ended up closing wifi, and never recovered like 5 times in a row.
k
I have a repro that shows that we're losing close notifications, so Jaguar is never informed.
Easy to fix. Should be ready for testing Monday morning.
Got a fix ready. Will test it a bit more.
If I am right, it's not a Jaguar issue, but a general resource notification problem.
m
no
k
Okay. In that case, I think there are two problems. One simple one in Jaguar. When I started using Task.group I got it wrong.
(February 6)
Before my change, we would use the UDP broadcasting to discover that the network was closed and then take down the http server.
After my change, we only take it down and restart if the broadcasting throws.
Woops.
(we just need to use
Task.group --required=1
)
The old code (before starting to use Task.group) was correct: https://github.com/toitlang/jaguar/commit/0170761a87c43ad025ddbec0e77b36ff66d9f31f.
There is still a bug in resource notifications, but it requires more than one net.open call.
Both fixes have landed in SDK v2.0.0-alpha.64 and Jaguar v1.9.10.
9 Views