Bravo, Sun. Bra. VO.


As you’ll note, there was very little blog output ’round these parts two days ago (read: I didn’t post a single durn thing). This was entirely due to a very, very long 16 hour day which featured a large-scale LAN outage. The root cause was a bit of a heat problem we had about a week ago causing various and sundry hard drives to misbehave in highly passive-aggressive fashions, meaning reboots all-around were required to shake out the cobwebs. Only when we rebooted one of our biggest servers, it didn’t come back.
The root disk, which we have mirrored via Sun’s Volume Manager (used to be Solstice Disk Suite) was reporting that both halves of the root mirror “Need[ed] Maintenance“. Additionally, we were experiencing an error that read thusly:

Error: svc:/system/filesystem/root:default failed to mount /usr (see 'svcs -x' for details)
[ system/filesystem/root:default failed fatally (see 'svcs -x' for details) ]
Requesting System Maintenance Mode
Console login service(s) cannot run

Here’s the funny thing: we don’t break the /usr partition out separately. All core/root directories are mounted in the root partition, thus this sort of message was confusing, to say the least.
We called Sun for support, as the metasync d0 recommended by Sun’s output did diddly squat and attempts to boot from either side of the root mirror only ended in failure. I sat on the phone with Sun engineers for the better part of 5 hours, desperately searching for an answer. Sunsolve/Docs.Sun/internal Sun engineering documentation revealed nothing for either myself or the valiant support staff. Finally, while on hold yet again, in frustration I Googled the error we were receiving and came across this posting in which another user’s system was exhibiting extremely similar symptoms. Turns out that they were missing a newline at the end of /etc/vfstab and thus, by remounting the root partition as rw (instead of the ro that is the default for maintenance mode) and issuing an echo >> /etc/vfstab, they were able to get the system back up and booting. Beyond desperate, I emulated the behavior and, lo and behold!, the system booted. Several sets of metadevices needed syncing, but the root came up cleanly and we were back in business.
Needless to say, my Sun case engineer swore that he was going to document the case so that the next unfortunate soul that simply neglects to end a system-critical file with a stinkin’ newline can be quickly and efficiently told what to do.

4 Comments

BTW one can tell (X)Emacs, as I do, to always end such files with a newline automatically. It is default behavior of XEmacs to warn (in the minibuffer) on save if there is no newline.

@Aron Rubin
(X)Emacs are certainly nice choices when coding, but for sysadmin tasks, they’re usually overkill. When ssh’d into a box, you simply need a lightweight editor (joe, pico, nano, or my favorite, vim) that is guaranteed to be there and not have relatively obscene system requirements (no LISP necessary!).

Seriously, Sun (uh, Oracle now via inheritance) are the biggest clowns. I just ran into this myself and after 4 hours of my time, plus going over it with Sun Engineers for another 3 hours, Sun recommended updating the boot archive (which I knew wouldn’t work) and if it didn’t, reinstalling 127127-11 (what!?!). Engineer said “looks like SVM is corrupt. if those things don’t work you are going to have to re-install the OS.” Unbelievable. After chopping my vfstab down to the nitty gritty, I got a successful boot. At about that same time and unfortunately not sooner, I found your blog post. Wish I had found it last night…(*grrrr*sigh*) Why SMF can’t do the courtesy of checking for a line at the end of vfstab is beyond me. Lazy bastards!