Decode segfault errors in dmesg

You are writing a C program. Time has come to run it. You are pretty confident that it will run at once.

$ ./foo
Segmentation fault

The machine hardly reminds you that you were over-confident. But before rushing to re-compile your program with debugging symbols or adding printf() calls here and there, have a look at the output of the Linux kernel:

$ dmesg
foo[1234]: segfault at 2a ip 0000000000400511 sp 00007fffe00a3260 error 4 in foo[400000+1000]

These are some hints in dmesg output:

  • foo is the executable name
  • 1234 is the process ID
  • 2a is the faulty address in hexadecimal
  • the value after ip is the instruction pointer
  • the value after sp is the stack pointer
  • error 4 is an error code
  • the string at the end is the name of the virtual memory area (VMA)

The error code is a combination of several error bits defined in fault.c in the Linux kernel:

/*
 * Page fault error code bits:
 *
 *   bit 0 ==    0: no page found       1: protection fault
 *   bit 1 ==    0: read access         1: write access
 *   bit 2 ==    0: kernel-mode access  1: user-mode access
 *   bit 3 ==                           1: use of reserved bit detected
 *   bit 4 ==                           1: fault was an instruction fetch
 */
enum x86_pf_error_code {
    PF_PROT         =               1 << 0,
    PF_WRITE        =               1 << 1,
    PF_USER         =               1 << 2,
    PF_RSVD         =               1 << 3,
    PF_INSTR        =               1 << 4,
};

Since you are executing a user-mode program, PF_USER is set and the error code is at least 4. If the invalid memory access is a write, then PF_WRITE is set. Thus:

  • if the error code is 4, then the faulty memory access is a read from userland
  • if the error code is 6, then the faulty memory access is a write from userland

Moreover, the faulty memory address in dmesg can help you identify the bug. For instance, if the memory address is 0, the root cause is probably a NULL pointer dereference.

The name of the VMA may give you an indication of the location of the error:

#include <stdlib.h>

int main(void)
{
        free((void *) 42);
        return 0;
}

When executed, the program above triggers a segfault and the VMA name is the libc. So we can imagine that a libc function was called with an invalid pointer.

bar[1234]: segfault at 22 ip 7fb171207824 sp 7fff839b57d8 error 4 in libc-2.19.so[7fb17118b000+19f000]

The fault handler is architecture dependent, so you will not observe the same messages in dmesg with other architectures than x86. For instance, on ARM no message is displayed unless the Linux kernel has been built with CONFIG_DEBUG_USER.

Fondation Louis Vuitton

A 64-bit 64-beam architecture
Fondation Louis Vuitton