understanding thread stack sizes and how alpine is different

From time to time, somebody reports a bug to some project about their program crashing on Alpine.  Usually, one of two things happens: the developer doesn’t care and doesn’t fix the issue, because it works under GNU/Linux, or the developer fixes their program to behave correctly only for the Alpine case, and it remains silently broken on other platforms.

The Default Thread Stack Size

In general, it is my opinion that if your program is crashing on Alpine, it is because your program is dependent on behavior that is not guaranteed to actually exist, which means your program is not actually portable.  When it comes to this kind of dependency, the typical issue has to deal with the thread stack size limit.

You might be wondering: what is a thread stack, anyway?  The answer, of course, is quite simple: each thread has its own stack memory, because it’s not really feasible for multiple threads to use the same stack memory, and on most platforms the size of that memory is much smaller than the main thread’s stack, though programmers are not necessarily aware of that discontinuity.

Here is a table of common x86_64 platforms and their default stack sizes for the main thread (process) and child threads:

OS Process Stack Size Thread Stack Size
Darwin (macOS, iOS, etc) 8 MiB 512 KiB
FreeBSD 8 MiB 2 MiB
OpenBSD (before 4.6) 8 MiB 64 KiB
OpenBSD (4.6 and later) 8 MiB 512 KiB
Windows 1 MiB 1 MiB
Alpine 3.10 and older 8 MiB 80 KiB
Alpine 3.11 and newer 8 MiB 128 KiB
GNU/Linux 8 MiB 8 MiB

I’ve highlighted the OpenBSD and GNU/Linux default thread stack sizes because they represent the smallest and largest possible default thread stack sizes.

Because the Linux kernel has overcommit mode, GNU/Linux systems use 8 MiB by default, which leads to a potential problem when running code developed against GNU/Linux on other systems.  As most threads only need a small amount of stack memory, other platforms use smaller limits, such as OpenBSD using only 64 KiB and Alpine using at most 128 KiB by default.  This leads to crashes in code which assumes a full 8MiB is available for each thread to use.

If you find yourself debugging a weird crash that doesn’t make sense, and your application is multi-threaded, it likely means that you’re exhausting the stack limit.

What can I do about it?

To fix the issue, you will need to either change the way your program is written, or change the way it is compiled.  There’s a few options you can take to fix the problem, depending on how much time you’re willing to spend.  In most cases, these sorts of crashes are caused by attempting to manipulate a large variable which is stored on the stack.  Generally, moving the variable off the stack is the best way to fix the issue, but there are alternative options.

Moving the variable off the stack

Lets say that the code has a large array that is stored on the stack, which causes the stack exhaustion issue.  In this case, the easiest solution is to move it off the stack.  There’s two main approaches you can use to do this: thread-local storage and heap storage.  Thread-local storage is a way to reserve additional memory for thread variables, think of it like static but bound to each local thread.  Heap storage is what you’re working with when you use malloc and free.

To illustrate the example, we will adjust this code to use both kinds of storage:

void some_function(void) {
 char scratchpad[500000];



memset(scratchpad, 'A', sizeof scratchpad);


}

Thread-local variables are referenced with the thread_local keyword.  You must include threads.h in order to use it:

#include <threads.h>



void some_function(void) { 
 thread_local char scratchpad[500000];

 memset(scratchpad, ‘A’, sizeof scratchpad);
 }

You can also use the heap.  The most portable example would be the obvious one:

#include <stdlib.h>



const size_t scratchpad_size = 500000;



void some_function(void) {
 char *scratchpad = calloc(1, scratchpad_size);



memset(scratchpad, 'A', scratchpad\_size);



free(scratchpad);


}

However, if you don’t mind sacrificing portability outside gcc and clang, you can use the cleanup attribute:

#include <stdlib.h>



#define autofree __attribute__(cleanup(free))



const size_t scratchpad_size = 500000;



void some_function(void) {
 autofree char *scratchpad = calloc(1, scratchpad_size);



memset(scratchpad, 'A', scratchpad\_size);


}

This is probably the best way to fix code like this if you’re not targeting compilers like the Microsoft one.

Adjusting the thread stack size at runtime

pthread_create takes an optional pthread_attr_t pointer as the second parameter.  This can be used to set an alternate stack size for the thread at runtime:

#include <pthread.h>



pthread_t worker_thread;



void launch_worker(void) {
 pthread_attr_t attr;



 pthread_attr_init(&attr);
 pthread_attr_setstacksize(&attr, 1024768);



pthread\_create(&worker\_thread, &attr, some\_function);


}

By changing the stacksize when calling pthread_create, the child thread will have a larger stack.

In modern Alpine systems, since 2018, it is possible to set the default thread stack size at link time.  This can be done with a special LDFLAGS flag, like -Wl,-z,stack-size=1024768.

You can also use tools like chelf or muslstack to patch pre-built binaries to use a larger stack, but this shouldn’t be done inside Alpine packaging, for example.

Hopefully, this article is helpful for those looking to learn how to solve the stack size issue.