The missing rootwait

Rootwait is a Linux kernel command line parameter that makes the kernel wait (indefinitely) for a root device to show up. This can be useful for devices that are detected asynchronously such as USB or MMC medias. This post, however, is not really about rootwait. Instead, it is about something that every programmer have and will encounter.

Probably every programmer confronts  situations where they make a small and innocent change, which ends up making the whole system crash and burn. On one occasion I was modifying the kernel configuration and noticed some options that were unnecessary. So, I decided to optimize a little (I know I shouldn’t make many changes at once but I did anyway). I compiled the kernel successfully, updated the kernel binary and booted the device. The boot crashed in a kernel panic and the device went into a boot loop because of watchdog timeout.

I was baffled. So, I reverted the changes and everything started working again. At this point I thought that the culprit had to be one of the removed or changed kernel configurations. I removed the configurations one by one, and sure enough the problem came back at some point. So, that configuration option had to be the one causing the issue.

The problem was that it was not related to the root device in any way. It just didn’t make sense. So, I tried different combinations and mysteriously some of them seemed to work correctly and some didn’t. Yet, I just couldn’t figure out what was actually breaking the boot.

This is probably another situation every programmer encounters. Your mind is just blank and you start to question yourself and everything. You even start to make wild guesses. Maybe the compiler is broken? You start to panic a little inside. You should know what is going on. It is your job to understand these things.

After a while you probably get an ‘oh’ moment that just makes everything clear. For me it was when I noticed an eMMC chip initialization print just after the root mount kernel panic. It made sense. The removed configurations had probably changed the initialization timings which triggered the issue. I added the rootwait to kernel command line and sure enough everything worked just fine.

So the real problem was a missing rootwait. Apart from the boot timings, the real problem was completely unrelated to the changes that triggered it. The device just happened to work by chance because the additional features took time to initialize before kernel tried to mount the root filesystem.

Leave a comment