understanding thread stack sizes and how alpine is different

From time to time, somebody reports a bug to some project about their program crashing on Alpine.  Usually, one of two things happens: the developer doesn’t care and doesn’t fix the issue, because it works under GNU/Linux, or the developer fixes their program to behave correctly only for the Alpine case, and it remains silently broken on other platforms.

The Default Thread Stack Size

In general, it is my opinion that if your program is crashing on Alpine, it is because your program is dependent on behavior that is not guaranteed to actually exist, which means your program is not actually portable.  When it comes to this kind of dependency, the typical issue has to deal with the thread stack size limit.

You might be wondering: what is a thread stack, anyway?  The answer, of course, is quite simple: each thread has its own stack memory, because it’s not really feasible for multiple threads to use the same stack memory, and on most platforms the size of that memory is much smaller than the main thread’s stack, though programmers are not necessarily aware of that discontinuity.

Here is a table of common x86_64 platforms and their default stack sizes for the main thread (process) and child threads:

OS Process Stack Size Thread Stack Size
Darwin (macOS, iOS, etc) 8 MiB 512 KiB
FreeBSD 8 MiB 2 MiB
OpenBSD (before 4.6) 8 MiB 64 KiB
OpenBSD (4.6 and later) 8 MiB 512 KiB
Windows 1 MiB 1 MiB
Alpine 3.10 and older 8 MiB 80 KiB
Alpine 3.11 and newer 8 MiB 128 KiB
GNU/Linux 8 MiB 8 MiB

I’ve highlighted the OpenBSD and GNU/Linux default thread stack sizes because they represent the smallest and largest possible default thread stack sizes.

Because the Linux kernel has overcommit mode, GNU/Linux systems use 8 MiB by default, which leads to a potential problem when running code developed against GNU/Linux on other systems.  As most threads only need a small amount of stack memory, other platforms use smaller limits, such as OpenBSD using only 64 KiB and Alpine using at most 128 KiB by default.  This leads to crashes in code which assumes a full 8MiB is available for each thread to use.

If you find yourself debugging a weird crash that doesn’t make sense, and your application is multi-threaded, it likely means that you’re exhausting the stack limit.

What can I do about it?

To fix the issue, you will need to either change the way your program is written, or change the way it is compiled.  There’s a few options you can take to fix the problem, depending on how much time you’re willing to spend.  In most cases, these sorts of crashes are caused by attempting to manipulate a large variable which is stored on the stack.  Generally, moving the variable off the stack is the best way to fix the issue, but there are alternative options.

Moving the variable off the stack

Lets say that the code has a large array that is stored on the stack, which causes the stack exhaustion issue.  In this case, the easiest solution is to move it off the stack.  There’s two main approaches you can use to do this: thread-local storage and heap storage.  Thread-local storage is a way to reserve additional memory for thread variables, think of it like static but bound to each local thread.  Heap storage is what you’re working with when you use malloc and free.

To illustrate the example, we will adjust this code to use both kinds of storage:

void some_function(void) {

    char scratchpad[500000];



    memset(scratchpad, 'A', sizeof scratchpad);

}

Thread-local variables are referenced with the thread_local keyword.  You must include threads.h in order to use it:

#include <threads.h>



void some_function(void) {

    thread_local char scratchpad[500000];


    memset(scratchpad, 'A', sizeof scratchpad);

}

You can also use the heap.  The most portable example would be the obvious one:

#include <stdlib.h>



const size_t scratchpad_size = 500000;



void some_function(void) {

    char *scratchpad = calloc(1, scratchpad_size);



    memset(scratchpad, 'A', scratchpad_size);



    free(scratchpad);

}

However, if you don’t mind sacrificing portability outside gcc and clang, you can use the cleanup attribute:

#include <stdlib.h>



#define autofree __attribute__(cleanup(free))



const size_t scratchpad_size = 500000;



void some_function(void) {

    autofree char *scratchpad = calloc(1, scratchpad_size);



    memset(scratchpad, 'A', scratchpad_size);

}

This is probably the best way to fix code like this if you’re not targeting compilers like the Microsoft one.

Adjusting the thread stack size at runtime

pthread_create takes an optional pthread_attr_t pointer as the second parameter.  This can be used to set an alternate stack size for the thread at runtime:

#include <pthread.h>



pthread_t worker_thread;



void launch_worker(void) {

    pthread_attr_t attr;



    pthread_attr_init(&attr);

    pthread_attr_setstacksize(&attr, 1024768);



    pthread_create(&worker_thread, &attr, some_function);

}

By changing the stacksize when calling pthread_create, the child thread will have a larger stack.

Adjusting the stack size at link time

In modern Alpine systems, since 2018, it is possible to set the default thread stack size at link time.  This can be done with a special LDFLAGS flag, like -Wl,-z,stack-size=1024768.

You can also use tools like chelf or muslstack to patch pre-built binaries to use a larger stack, but this shouldn’t be done inside Alpine packaging, for example.

Hopefully, this article is helpful for those looking to learn how to solve the stack size issue.

9 thoughts on “understanding thread stack sizes and how alpine is different”

  1. Hi, great article.

    You were saying that “use tools like chelf or muslstack to patch pre-built binaries to use a larger stack, but this shouldn’t be done inside Alpine packaging”. Can you explain the reason why we shouldn’t do that in Alpine packaging?

    Thanks in advance!

  2. Heyo!

    Great writeup! You say that sometimes a patch specifically for Alpine can cause failures back in GNU/Linux, but I’m having trouble seeing why? Can you give a general example of a patch that would resolve this error in Alpine and break execution in GNU/Linux please.

    Thanks!

    1. No, I said that the GNU/Linux situation is equally broken, but appears to be fine. Because glibc uses RLIMIT_STACK for the thread stack size, setting RLIMIT_STACK to a lower setting as the system administrator will result in the same crashing behavior. You can verify that with ulimit -s.

  3. On personal note I don’t think replacing stack allocation with alloc-free pair is great idea. You see, its extra failure point (who checks *alloc success and does meaningful error handling?). Then stack allocations are fully automatic, while alloc/free are pretty much a notorious way to shoot own leg. So this advice leads to more memory usage errors, more memleaks, more vulns and so on. On other hand you can’t do these kinds of fail with stack variables.

    And on side note I wonder why can’t Alpine just get larger stack size and be more or less on par with other systems. It makes C programs coding more complicated and error prone than it should be. While its debatable if one should store 8MB at stack, at least 512K or so doesn’t looks odd these days.

    And I guess it is up to maintaners of distro or so to dodge quirks of their distro in the end. Adding custom linker flags is up to them. Maybe Alpine even can consider exporting (default?) LDFLAGS like that? Pushing your quirks on someone else head just wouldn’t do.

    Yes, there’re some things that “aren’t portable”. However attempts to write Perfectly Portable Program inevitably end up in some flying spaghetti monster that is scary both inside and outside – or as another option – some virtually useless trivial program where one stands at least some chance to get it right.

Comments are closed.