As you’ll note, there was very little blog output ’round these parts two days ago (read: I didn’t post a single durn thing). This was entirely due to a very, very long 16 hour day which featured a large-scale LAN outage. The root cause was a bit of a heat problem we had about a week ago causing various and sundry hard drives to misbehave in highly passive-aggressive fashions, meaning reboots all-around were required to shake out the cobwebs. Only when we rebooted one of our biggest servers, it didn’t come back.
The root disk, which we have mirrored via Sun’s Volume Manager (used to be Solstice Disk Suite) was reporting that both halves of the root mirror “
Need[ed] Maintenance“. Additionally, we were experiencing an error that read thusly:
Error: svc:/system/filesystem/root:default failed to mount /usr (see 'svcs -x' for details)
[ system/filesystem/root:default failed fatally (see 'svcs -x' for details) ] Requesting System Maintenance Mode
Console login service(s) cannot run
Here’s the funny thing: we don’t break the
/usr partition out separately. All core/root directories are mounted in the root partition, thus this sort of message was confusing, to say the least.
We called Sun for support, as the
metasync d0 recommended by Sun’s output did diddly squat and attempts to boot from either side of the root mirror only ended in failure. I sat on the phone with Sun engineers for the better part of 5 hours, desperately searching for an answer. Sunsolve/Docs.Sun/internal Sun engineering documentation revealed nothing for either myself or the valiant support staff. Finally, while on hold yet again, in frustration I Googled the error we were receiving and came across this posting in which another user’s system was exhibiting extremely similar symptoms. Turns out that they were missing a newline at the end of
/etc/vfstab and thus, by remounting the root partition as rw (instead of the ro that is the default for maintenance mode) and issuing an
echo >> /etc/vfstab, they were able to get the system back up and booting. Beyond desperate, I emulated the behavior and, lo and behold!, the system booted. Several sets of metadevices needed syncing, but the root came up cleanly and we were back in business.
Needless to say, my Sun case engineer swore that he was going to document the case so that the next unfortunate soul that simply neglects to end a system-critical file with a stinkin’ newline can be quickly and efficiently told what to do.