Introduction
Wild
Wild is a new open source linker, written by David Lattimore. It overdelivers on its promises to be wildly fast. I’m obsessed with toolchain speed, so naturally, I gravitated to Wild. At first, I was cautiously optimistic, but found that it just worked as a drop in replacement for LLD and Mold. Integrating Wild into my development workflow has made a noticeable difference to my iteration speed. For example, Wild can link Clang in 120ms, which is incredibly fast. That’s less than the time it takes a packet to leave my laptop in Johannesburg and arrive at my development machine in Dublin.
Breaking out of the Linux monoculture
A few months ago, Colin Percival wrote up a blog post detailing his recent work on FreeBSD. As I read this blog post, it dawned on me that I have spent almost all of my career immersed in a Linux monoculture, and that this was probably to my detriment. At the same time, I’ve been meaning to learn how to use DTrace for a while. For the last few weeks, I’ve been running FreeBSD and Illumos (OmniOS), and it’s been an incredibly fun experience. While writing Rust on Illumos, I noticed my builds getting snagged on something slow, right at the end, again and again. I quickly confirmed my suspicions that it was the linker, and set about replacing it with Wild.
Running Wild on Illumos
Like Linux and BSD, binaries on the Illumos system are linked based on the System V ABI.
It was reasonable to simply try Wild and see what happened.
I checked out and ran Wild and its test suite (at f2f9776):
test result: FAILED. 10 passed; 67 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.25s
Of the 77 integration tests, only 10 passed. These tests are comprised of small, self-contained programs that are linked by Wild. Each integration test consists of four steps:
- Compile the source code.
- Link with Wild and GNU ld (and possibly Mold and Gold if present).
- Run the resulting executable.
- Compare the output of Wild against the other linkers.
I decided to try identify why these tests were failing and to fix them, one by one.
On analyzing the first test case, I found that the program Wild linked received a SIGKILL as soon as I tried to run it.
That is, it was failing at the third step:
❯ /home/omnios/wild/wild/tests/build/trivial.c-default-host.wild
zsh: killed /home/omnios/wild/wild/tests/build/trivial.c-default-host.wild
When the exact same program was linked by GNU ld, it ran just fine.
Wild’s test suite comes with an invaluable tool - linker-diff - for comparing the output of Wild to other linkers.
I ran the linker-diff tool, and found that there were dozens of differences between the binaries, and nothing stood out as clearly wrong in the binary Wild produced.
At this point, my instinct was to trace this program in GDB - surely the program was doing something illegal, and I could single step until I found it?
After all, it was a trivial program of just a few hundred instructions.
I ran it under GDB and placed a breakpoint on the entry point, which was the symbol _start:
(gdb) b _start
Breakpoint 1 at 0x401374
(gdb) r
Starting program: /home/omnios/wild/wild/tests/build/trivial.c-default-host.wild
During startup program terminated with signal SIGKILL, Killed.
(gdb)
I double checked that _start was indeed the entry point as designated in the ELF file.
SIGKILL?
What gives?
Enter DTrace
At this point, I began to try identify what was sending my program the SIGKILL by tracing it with DTrace.
I knew that the system had tens of thousands of DTrace providers.
I grepped around for signal handling providers, and filtered out function boundary trace providers:
❯ sudo dtrace -l | grep -i signal | grep -iv fbt
314 syscall lwp_cond_signal entry
315 syscall lwp_cond_signal return
702 proc genunix sigtimedwait signal-clear
705 proc genunix sigtoproc signal-send
706 proc genunix sigtoproc signal-discard
707 proc genunix psig signal-handle
2203 sdt ip cc_cong_signal cwnd-cc-cong-signal
This felt a little bit like cheating. It couldn’t be that easy to hook into whatever is sending a signal, right? Wrong! It is that easy. In fact, the provider even comes with documentation:
❯ sudo dtrace -lv -n signal-send
ID PROVIDER MODULE FUNCTION NAME
705 proc genunix sigtoproc signal-send
Probe Description Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: Unknown
Argument Attributes
Identifier Names: Evolving
Data Semantics: Evolving
Dependency Class: ISA
Argument Types
args[0]: lwpsinfo_t *
args[1]: struct psinfo *
args[2]: int
I confirmed in the online documentation that the third argument was the signal number. I then wrote a one-liner to print a backtrace when this probe fired, and then re-ran my program in another tmux pane:
❯ sudo dtrace -n 'proc:::signal-send /args[2] == 9/ { stack(); }'
dtrace: description 'proc:::signal-send ' matched 1 probe
CPU ID FUNCTION:NAME
4 705 sigtoproc:signal-send
genunix`psignal+0x34
elfexec`elfexec+0x480
genunix`gexec+0x667
genunix`exec_common+0x73b
genunix`exece+0x58
unix`sys_syscall+0x1a8
The third-from-last function in the callstack - elfexec+0x480 - is clearly in the ELF loader.
I now had the exact address of the code that was rejecting the binary Wild produced.
At this point I asked Claude for advice, and it suggested using mdb to disassemble the code at this kernel address, which I duly ran:
❯ TERM=xterm sudo mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci zfs ip hook neti sockfs arp usba stmf stmf_sbd mm lofs sata sd random cpc ufs logindmux ptm klmmod nfs ]
> elfexec+0x480::dis
elfexec+0x450: xorl %r13d,%r13d
elfexec+0x453: xorl %ebx,%ebx
elfexec+0x455: call +0x7b39286 <uprintf>
elfexec+0x45a: movl -0xf0(%rbp),%edi
elfexec+0x460: cmpl $-0x1,%edi <0xffffffff>
elfexec+0x463: jne +0x97 <elfexec+0x500>
elfexec+0x469: movq 0xffffffffffffff00(%rbp),%rdi
elfexec+0x470: movl $0x9,%esi
elfexec+0x475: movl $0x8,%r15d
elfexec+0x47b: call +0x7b73ea0 <psignal>
elfexec+0x480: testl %r13d,%r13d
elfexec+0x483: je +0x11 <elfexec+0x496>
elfexec+0x485: movq -0x98(%rbp),%rdi
elfexec+0x48c: movl $0x38,%esi
elfexec+0x491: call +0x7afc35a <kmem_free>
elfexec+0x496: movq -0xe0(%rbp),%rdi
elfexec+0x49d: testq %rdi,%rdi
elfexec+0x4a0: je +0x9 <elfexec+0x4ab>
elfexec+0x4a2: movq -0x80(%rbp),%rsi
elfexec+0x4a6: call +0x7afc345 <kmem_free>
elfexec+0x4ab: testq %rbx,%rbx
Unfortunately, the OmniOS kernel that I was using did not have debug symbols built in, and I could not find an easy way to obtain them. Had I had debug symbols, the next few steps I took would have been unnecessary.
Anyhow, I opened up the source code at the branch used to stamp out this kernel (usefully, the exact commit is obtainable from uname -a: omnios-r151054-6ad70ba62c).
I then navigated to the definition of elfexec and searched for callsites of psignal.
Luckily, there was only one:
bad:
if (fd != -1) /* did we open the a.out yet */
(void) execclose(fd);
psignal(p, SIGKILL);
Now, all I had to do was find the branch that jumped here with a goto bad.
Searching for goto bad, I was disheartened to find 24 results, but I very quickly noticed most of them were gated by this if statement:
if (intphdr != NULL) {
/// ...
// Many of the gotos live here.
/// ...
}
Tracing the data flow into this intphdr pointer, I found that it was passed to the function mapelfexec:
static int
mapelfexec(
vnode_t *vp,
Ehdr *ehdr,
uint_t nphdrs,
caddr_t phdrbase,
Phdr **uphdr,
Phdr **intphdr,
Phdr **stphdr,
Phdr **dtphdr,
Phdr *dataphdrp,
caddr_t *bssbase,
caddr_t *brkbase,
intptr_t *voffset,
uintptr_t *minaddrp,
size_t len,
size_t *execsz,
size_t *brksize)
and within this function, the **intphdr argument was set inside this branch:
case PT_INTERP:
/*
* The ELF specification is unequivocal about the
* PT_INTERP program header with respect to any PT_LOAD
* program header: "If it is present, it must precede
* any loadable segment entry." Linux, however, makes
* no attempt to enforce this -- which has allowed some
* binary editing tools to get away with generating
* invalid ELF binaries in the respect that PT_INTERP
* occurs after the first PT_LOAD program header. This
* is unfortunate (and of course, disappointing) but
* it's no worse than that: there is no reason that we
* can't process the PT_INTERP entry (if present) after
* one or more PT_LOAD entries. We therefore
* deliberately do not check ptload here and always
* store dyphdr to be the PT_INTERP program header.
*/
*intphdr = phdr;
break;
How interesting! I’d found a warning about dodgy ELF files being allowed on Linux and not Illumos. I was getting closer. I then checked the binary Wild produced, to see what program headers it actually had:
❯ readelf -l /home/omnios/wild/wild/tests/build/trivial.c-default-host.wild
Elf file type is EXEC (Executable file)
Entry point 0x401370
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000000e0 0x00000000000000e0 R 0x8
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000370 0x0000000000000370 R 0x1000
LOAD 0x0000000000000370 0x0000000000401370 0x0000000000401370
0x0000000000000031 0x0000000000000031 R E 0x1000
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00
01 .eh_frame
02 .text
03
Well, that was disappointing: the particulars of the block comment advising caution were actually irrelevant.
The branch that sets interp could not possibly have been taken, because this binary doesn’t have a PT_INTERP program header.
Using this information, I now knew that the if (intphdr != NULL) branch also could not have been taken.
That left only three branches:
if (error != 0)
goto bad;
if (uphdr != NULL && intphdr == NULL)
goto bad;
if (dtrphdr != NULL && dtrace_safe_phdr(dtrphdr, args, voffset) != 0) {
uprintf("%s: Bad DTrace phdr in %s\n", exec_file, exec_file);
goto bad;
}
The first and third branches were trivial to rule out following the exact same flow-of-data analysis I did for intphdr.
Again, I would have preferred to have debug symbols, and to have used mdb, but I did not wind up needing them.
Nonetheless, I was astounded.
It took me only 15 minutes to find the line of code in the kernel that rejected the binary produced by Wild.
Concretely, Illumos disallowed the loading of a binary that had a PHDR program header without a corresponding PT_INTERP program header.
Note that the nomenclature can be quite confusing - PHDR self-referentially refers to the byte slice within the ELF file that contains the program headers.
I duly filed an issue explaining my findings, and then followed up with a very janky workaround PR.
This PR ignored the -C flag expected by the system linker, and manually set the expected dynamic linker on Illumos.
We were now passing 27 tests:
test result: FAILED. 27 passed; 51 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.49s
Linking Rust programs with Wild
The simplest way to link Rust programs with Wild is to instruct Cargo to delegate linking to the Clang driver, and to pass the Clang driver the name of a linker:
[target.x86_64-unknown-illumos]
linker = "clang"
rustflags = [
"-C", "link-arg=--ld-path=/home/omnios/.cargo/bin/wild",
]
Unfortunately, this simply did not work.
First, I noticed the painful slowness of the link, and then I checked for the calling card Wild leaves in the .comment section:
❯ readelf -p.comment target/debug/rg
String dump of section '.comment':
[ 1] rustc version 1.90.0 (1159e78c4 2025-09-14)
[ 2d] GCC: (OmniOS 151054/14.2.0-il-1) 14.2.0
[ 55] @(#)illumos May 2025
I then tried using the older, -fuse-ld argument, to no avail:
= note: clang: error: invalid linker name in argument '-fuse-ld=wild'
After confirming Wild was indeed on the $PATH, I passed in its absolute path:
[target.x86_64-unknown-illumos]
linker = "clang"
rustflags = [
"-C", "link-arg=-fuse-ld=/home/omnios/.cargo/bin/wild",
]
Success!
❯ readelf -p.comment target/debug/rg
String dump of section '.comment':
[ 0] GCC: (OmniOS 151054/14.2.0-il-1) 14.2.0
[ 29] @(#)illumos May 2025
[ 3f] rustc version 1.90.0 (1159e78c4 2025-09-14)
[ 6b] Linker: Wild version 0.6.0
Despite the apparent success, this left me with more questions than answers:
- Why is
--ld-pathsilently ignored by Clang? - Why does
-fuse-ldwork in such a surprising way?- This argument doesn’t even show up in the
--helpfor the Clang driver. - It accepts
ldas a non-absolute path, but nothing else I tried.
- This argument doesn’t even show up in the
- Why does the driver pass
-Cto the provided linker regardless of what it is?- It passes
-Cto GNU ld, which of course explodes with the error/usr/gnu/bin/ld: unrecognized option '-C'.
- It passes
Clang driver
At this point, I opened the source code for the Clang driver on GitHub. I immediately discovered that Solaris and its derivatives such as Illumos have their own driver, with specific logic:
// Accept 'bfd' and 'gld' as aliases for the GNU linker.
if (UseLinker == "bfd" || UseLinker == "gld")
// FIXME: Could also use /usr/bin/gld here.
return "/usr/gnu/bin/ld";
I checked out and built the driver from source, and ran a few experiments with it to determine answers to my questions.
- The
--ld-pathargument was ignored because it had simply never been implemented in the Solaris driver. - The
-fuse-ldargument is undocumented presumably because it is “deprecated”. - The driver is bimodal: it has one mode for running the Solaris Link Editor and one for running GNU ld.
- The
-fuse-ldargument accepts absolute paths to linkers but drives the linker as if it is the Solaris Link Editor, no matter what.
The Solaris driver code was full of FIXME and TODO comments, so I resolved to fix them.
I changed the driver logic to accept --ld-path as an argument, and to drive that linker in a way that is compatible with GNU ld - which all of the alternative linkers aim to be.
I changed the Cargo config to use the modified driver:
[target.x86_64-unknown-illumos]
linker = "/home/omnios/llvm-project/build/bin/clang"
rustflags = [
"-C", "link-arg=--ld-path=/home/omnios/.cargo/bin/wild",
]
Wild then errored out as follows:
wild: error: -m elf_x86_64_sol2 is not yet supported
This was good news.
It meant that the modified driver was indeed driving Wild as if it was GNU ld!
I raised a small PR in Wild to accept elf_x86_64_sol2 as a valid value for the so-called “emulation” argument and to set the dynamic linker to /lib/amd64/ld.so.1 if this value is provided.
This also let us remove the janky workaround from earlier.
I confirmed that the produced RipGrep binary worked as expected.
Finally, Wild was working properly on Illumos!
I raised PR #163000 in Clang to change the driver to work sensibly. Hopefully this PR will get merged, so that using Wild on Illumos will be as seamless as it is on Linux.