OTA with FirmwareWriter freezes esp32 on alpha63
# help
r
Hello. I've updated to alpha 63 from alpha 47, but I have problems with OTA now. It looks like the esp32 completly freezes after writing ~350-400 times with the
FirmwareWriter
. The OTA process is spawned, but everything freezes when writing with
FirmwareWriter
. I will provide more details soon
This is when I get core 0 panic'ed:
Copy code
info: 2023-03-10T09:30:57Z: Starting Ota
 | -- Allocated memory 2884 | System free memory 40112 | Allocated in object heap 6048 -- |
ota starts now
Writes: 50
Allocated memory 5416 | System free memory 20936 | Allocated in object heap 45416
Writes: 100
Allocated memory 5416 | System free memory 25296 | Allocated in object heap 79528
Writes: 150
Allocated memory 5416 | System free memory 23504 | Allocated in object heap 113640
Writes: 200
Allocated memory 5416 | System free memory 29248 | Allocated in object heap 147564
Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed). 

Core  0 register dump:
PC      : 0x40196e5c  PS      : 0x00060034  A0      : 0x8008663e  A1      : 0x3ffb1280  
A2      : 0x3ffddd30  A3      : 0x3ffddc14  A4      : 0x00000010  A5      : 0x3ffb12c8  
A6      : 0x40196e5c  A7      : 0x3ffdde24  A8      : 0x800921e5  A9      : 0x3ffb1240  
A10     : 0x3ffddd30  A11     : 0x00000010  A12     : 0x3ffb4a90  A13     : 0x0000cdcd  
A14     : 0x00060023  A15     : 0x00060021  SAR     : 0x00000018  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x4008b559  LEND    : 0x4008b561  LCOUNT  : 0x00000027
m
What else is running when you do the firmware update? This error typically happens when a low-level interrupt handler tries to access the flash, but the flash is being used by the firmware updater. We saw it once where the ringbuffer was not in IRAM and uart was running.
r
We are reading from the UART, and we write some of the messages received to a file. We have quite a lot of traffic here, so we use UART up to a few times per second
I can try and disable the reading, to see if it helps
m
It probably will help.
r
Okay that worked. 🤔
This problem didnt affect us before.
I will go through the sdkconfig
Looks like
CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH=y
has been activated. It was disabled before
m
Yes, that will do it. I did think we had the ringbuffer in IRAM. @erikcorry didnt we patch esp-idf to keep the ringbuffer in IRAM?
r
Yeah, if I disable it everything works again with uart interrupts enabled
We should be able to patch it 🙂
m
It should not really compile on ESP32 with
CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH=n
as it will run out of iram. Did you change anything else?
r
No, I only changed the
CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH
m
And you are running on plain ESP32 (not s3 or something?)
r
Yes, the old ESP32-WROOM32D
We do have BT disabled
m
Ah! That will do it.
BT uses 26kb iram, so disabling leaves a lot of space free for the FREERTOS (14k)
r
Yes, we had the same problem in the past, and thats when we disabled the BT
It looks like the problem has returned with alpha 64 and the new UART driver. I have
# CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH is not set
and
# CONFIG_BT_ENABLED is not set
. The same thing happens where it either freezes or panics. I will look through the UART code now.
Disabling the UART while writing makes it go through
If I use alpha 63 and the same 2 changes to sdk,
# CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH is not set
and
# CONFIG_BT_ENABLED is not set
it works. UART is active and the same code is used for the 2 versions (except toit version)
There's actually 1 more difference. We have removed the high level interrupt code from version 63, because we dont have the resources when we use a high speed UART. We do not have to remove anything from the 64 code, and our LEDs are not complaining about not enough high level interrupt resources.
k
Sorry that you're getting bitten by all these issues all at once.
r
I wish I was better at finding the root cause, but I hope these posts helps on tracking down sneaky bugs 🙂
m
Hey @Rikke I will take an extra look at the new uart code. We must likely forgot an IRAM flag in some interrupt handler code
r
Yeah, I looked at the new code but I didnt have time to look at it all
m
I found something. There were a few methods that was called from the ISR, that was not marked as IRAM. I am creating a PR for it now.
I must admit that I never got around to test that scenario. We use it in our app aswell, so I would have hit it eventually. Thanks for helping bring it to light.
r
Yup, that did it. I was able to OTA normally with 65 😉