Stack, Heap, and Thread Crash Hunting in mbed OS

ARM mbed OS derives its thread management capabilities from ARM’s RTX realtime operating system.

When a thread crashes due to a stack overflow or other HardFaults, it can be unclear what code is causing the issue. Also unclear is how the threads are initially created, as ARM mbed OS defines its stack sizes via a handful of preprocessor #defines and linker provides which aren’t well documented at all.

Here are some notes on the things I’ve had to figure out by reading the source.

The Crash

The first thing that happened, was a crash. One particular piece of code would run, and then RTX would throw the error:

RTX error code: 0x00000001, task ID: 0x200094FC

Digging through the codebase leads to RTX_Conf_CM.c:

Callers of os_error look like this:

If we look at where OS_ERR_STK_OVF is invoked, it matches the rt_stk_check function:

And the value of OS_ERR_STK_OVF is:

Now, it was clear that one of the running threads was causing a stack overflow.

Threads

But which piece of code corresponded to the thread?

The task ID is provided by the os_error crash output. To figure out which code was running at the time, you need to to enumerate the threads in the system and list their function entry points.

I added the following functions:

Now, when I enter main(), the first thing I see is:

threadId: 0x200094fc
threadId: 0x200094bc, entry: 0x23161, osThreadInfoState:     0, osThreadInfoStackSize: 800, osThreadInfoStackMax:  0
threadId: 0x200094fc, entry: 0x22541, osThreadInfoState:     0, osThreadInfoStackSize: 2048, osThreadInfoStackMax:  0
threadId: 0x200096d8, entry: 0x22589, osThreadInfoState:     0, osThreadInfoStackSize: 512, osThreadInfoStackMax:  0

With the thread entry point address, I can search the ELF file generated by the ARM gcc compiler for corresponding code:

arm-none-eabi-nm.exe mbed5.elf | less

00023160 T osTimerThread
00022540 T pre_main
00022588 T os_idle_demon

The addresses are slightly different (off by one, not sure why), but it’s clear that these are the thread entry functions we want.

Stack Regions

So how do these functions allocate their stack regions?

For the idle thread:

For the timer thread:

For the main thread:

For the interrupt service routines + OS scheduler:

Stack Region Defines

The above code defines the regions according to the mbed Memory Model.

+-------------------+   Last Address of RAM
| Scheduler Stack   |
+-------------------+
|                   |   RAM
|                   |
|         ^         |
|         |         |
|    Heap Cont..    |
+-------------------+
| app thread n      |
|-------------------|
| app thread 2      |
|-------------------|
| app thread 1      |
|-------------------|
|         ^         |
|         |         |
|       Heap        |
+-------------------+
| ZI                |
+-------------------+
| ZI: OS drv stack  |
+-------------------+
| ZI: app thread 3  |
+-------------------+
| ZI: Idle Stack    |
+-------------------+
| ZI: Timer Stack   |
+-------------------+
| ZI: Main Stack    |
+-------------------+
| RW                |  
+===================+   First Address of RAM
|                   |
|                   |   Flash

The proprocessor defines controlling these regions are as follows:

Thread Define Default Total Bytes
idle OS_IDLESTKSIZE 128 x4 = 512
timer OS_TIMERSTKSZ 200 x4 = 512
main WORDS_STACK_SIZE 512 x4 = 2048
isr/scheduler ISR_STACK_SIZE 2048

Oddly, also, it appears that OS_MAINSTKSIZE isn’t being used properly, as thread_stack_main[] is defined transitively via DEFAULT_STACK_SIZE → WORDS_STACK_SIZE → TOOLCHAIN_GCC.

Printing Runtime Memory Map on Cortex-M / RTX

Which prints:

threadId: 0x200094bc, stack_start: 0x20008ed8,          stack_end: 0x200091f8, size: 800
threadId: 0x200094fc, stack_start: 0x200066c8,          stack_end: 0x20006ec8, size: 2048
threadId: 0x200096d8, stack_start: 0x20009228,          stack_end: 0x20009228, size: 0
                  mbed_heap_start: 0x2000977c,      mbed_heap_end: 0x2000f800, size: 24708
             mbed_stack_isr_start: 0x2000f800, mbed_stack_isr_end: 0x20010000, size: 2048

The Solution

Fixing up the main thread stack to use 4KB instead of 2KB solved the stack overflow.

The hardest part was finding the correct compiler #define to change.

At runtime, the thread statistics printer shows exactly how much memory over 2KB is used (when -DMBED_HEAP_STATS_ENABLED=1 -DMBED_STACK_STATS_ENABLED=1):

A Better Solution

Instead of hacking the cmsis_os.h header (which isn’t really portable and would be an eternal out-of-tree patch), the better solution is to run the code in a new thread that has more than 2KB stack.

Under mbed + RTX, you do this:

3 thoughts on “Stack, Heap, and Thread Crash Hunting in mbed OS”

  1. “The addresses are slightly different (off by one, not sure why)” Its because of ARM-Thumb instruction set – it is distinguished by set last significant bit in calling (e.g. a Thread, a Vector) addresses.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.