Fun In The Sun (Microsystems Server)

Or: RedHat Enterprise Linux’s `ypbind` Is Functionally Brain-Dead

WARNING/WARNUNG/ADVERTENCIA/AVERTISSEMENT: Geeky rant follows. If you don’t give a hoot about UNIX and/or Linux, you may just want to give this post a pass. -ed.
First, a little background: like many shops with a core infrastructure consisting of UNIX/*NIX servers of varying ages and configurations, we have run our network directory services using the venerable NIS directory technology provided by Sun Microsystems and implemented on nearly every single POSIX-compliant operating system on the planet. It is fast, well-understood, well-tested and generally easy to use (if set up properly). Our UNIX systems and desktops hum merrily along 99.9% of the time, blissfully confident in NIS’s ability to keep them happy and informed of the goings-on on the network. Our network is architected so that our primary (“master”) NIS server is supplemented by a lower-powered backup NIS “slave” server so that, in the event of a failure on our main server, the “slave” can take over and keep our NIS clients happy.
However, our secondary server has been having heartaches recently – apparently a patch from Sun that is supposed to prevent users from being able to overload the NIS server and cause it to

[…]prevent the ypserv(1M) NIS server process from answering NIS name service requests. A Denial of Service (DoS) may occur as clients currently bound to the NIS server may experience hangs or slow performance. Users may no longer be able to log in on affected NIS clients.

…is actually causing the server to die on its own. That’s right: we traded a potential DoS, instigated by users, for one that apparently triggers itself.
Now, this doesn’t cause an issue for Solaris clients; their NIS client software is intelligent enough to detect whether an NIS server process is running on a certain server and fail over to an alternate if said NIS server ever dies. RedHat’s (and perhaps other Linuxes’ – I don’t know because I haven’t tested other distros) NIS client isn’t this intelligent. Apparently, RH’s NIS setup uses `ping` to determine whether a server is still alive, which means that an NIS server process could die and, as long as the server hardware stayed active, Linux clients would continue to try to bind to a non-functional server, thus triggering a DoS on multiple systems. RH’s NIS client also uses `ping` to determine which NIS server to bind to; it functionally ignores the order set by DHCP servers and/or /etc/yp.conf and binds to whichever server provides the lowest latency.
All of this would be immaterial, but for one critical point: our primary server is connected into our network via a fiber optic gigabit link, while our secondary server runs on a gigabit copper link. To this point, copper networking equipment tends to have lower latencies than its fiber equivalents, which means that, you guessed it, our Linux clients were all persistently binding to the “slave” NIS server, regardless of its actual ability to serve up directory information. Thus, when the NIS processes would die on the “slave”, all of our stupid RedHat boxes would freeze, waiting for directory service on the part of a non-funcional box whose only claim to fame at the time was a functioning NIC.
Needless to say, we backed that patch out and, of course, everything’s happy again in Linux Land. Hooray for cascading failures!

Friday Link Dump

Okay, time to toss out all the nifty links that I stumbled over in the past week but never really made it to “full post” status. Enjoy.
Vids

  1. Kinetic destruction visited upon an old Toyota, rubber band-style.
  2. Robin Williams guest-starred on “Who’s Line Is It, Anyway?”. Witness the hilarity.
  3. Do not, under any circumstances, take these guys on in Beiruit/Beer Pong.
  4. You’ve got to hand it to the Japanese people – a prank show involving sauna ejector seats on a ski hill would get sued into oblivion here in the U.S.

Pix

  1. This week’s Something Awful Photoshop Phriday – Computers in Movies – resulted in some hilarious entries, in particular “Memento” and “The Color #9900FF;”. I laughed so hard that I shed a few tears, but then again, I’m a huge geek, so YMMV.
  2. The Top 10 Places to Find Free Images for Your Blog from About.com (I wonder, have they looked at acquiring aboot.ca for all their readers from Canadia?)
  3. Literal translations of old sayings – a Fark photoshop “new classic”.

Tunes

  1. The Mac and Linux versions of Songbird have been released. Play music in your browser.
  2. Re: Your Brains. An ode to officeplace zombies with a distinctively They Might Be Giants flair to it. Heh. (World of Warcraft machinima video here, for those that are interested.)
  3. Birdy Nam Nam is a quartet of DJs from France that construct their music (almost) entirely using turntables. Their performance of their song “Absesses” was enough to win them a global DJing contest. Wickedly good stuff – their entire album is worth a listen if you can snag a copy.

Why I Love The Internet, Part 308,456

…It makes me laugh. An interesting discussion popped up over at Slashdot regarding the lack of female applicants to, and therefore, lack of females being sponsored by the GNOME project’s “Summer of Code” (sponsored by Google). The conversation revolved around the general lack of females in tech fields and spawned the following comment:

There’re no women on the internet! Everyone knows that! It’s the place where men are men, women are men, and children are fbi agents.

Now that’s comedy. Heh.

A Virtual Cornucopia Of Cool Software

Google has been on a “pro-Doug” tear recently as far as I can see, releasing first Picasa, then Google Earth for Linux, along with the cool-in-concept Google Browser Sync plugin for Firefox. The Google Sync extension only ranks cool in concept because, well, in order for it to work to its capacity, you have to store all your bookmarks, history, cookies, tabs and, most importantly, passwords on Google’s servers. The data is encrypted prior to being sent to Google, but it’s only done with a PIN as the encryption salt, meaning that Google has access to both the algorithm used to encrypt the data and the encrypted data itself. The PINs, they can guess. The “Oh wow!” factor is probably mitigated by how much one trusts Google to not be evil with personal data.
The Picasa port was accomplished using Winelib, meaning that it’s not a true native port, but I’ll take what I can get in terms of being able to run the best image management software out there. The Google Earth port is apparently native code, as it’s based off of QT. Now, we just need a SketchUp port for Linux and a Picasa port for Macs and the awesomeness will be complete.
*grin*

Network Gremlins

I don’t know if it’s a universal I.T. thing or not, but at my place of employment we sysadmins have taken to blaming any freak accident/unexplainable computer phenomenon/Series of Unfortunate Events on “gremlins”. A person couldn’t log on five minutes ago and all of a sudden, they can? Gremlins. USB sticks now mounting when, previously, they weren’t? Gremlins. You get the picture.
Gremlins!Well, last Friday and today have been some of the most gremlin-filled days in recent memory, bar none. We’ve all tried to be sanguine about the whole affair and just shrug our shoulders and mutter “Gremlins!”, but that only takes one so far. Perhaps we bought a cursed Cisco box with a time-delayed Curse Activation Feature without knowing it.
I came in to work on Friday to discover that no one could receive any mail, a condition that was causing no little consternation amongst the throngs shackled to their cubes and, after a careful bit of investigation by myself and the team lead, we determined What Apparently Went Wrong:

  1. We back up all of our DNS, DHCP and NIS server maps using CVS in order to keep ourselves from getting into a bad state with no easy way to back out damaging configuration changes. Somehow, our master DNS configuration file was partially overwritten so that any reference to a shared key (I’ll get to that) was removed.
  2. Our DNS tables are generated (mostly) on-the-fly by our DHCP server, which relieves us of a great deal of administrative burden. However, one can’t just have DHCP servers overwriting our DNS maps willy-nilly, a condition which we avoid by requiring access to a shared key that both DHCP and DNS can trust, thus allowing clients that are authorized in our NIS setup to request an IP from the DHCP server and have one assigned as well as have the DNS server updated.
  3. The DHCP server must be restarted/reloaded in order to read new ethers addresses from the NIS tables, which we accomplish thrice-hourly with a simple cronjob.
  4. Since the reference to the shared key was overwritten, the DHCP server was no longer able to force DNS updates, meaning that individual hosts began dropping from the DNS radar like flies.
  5. At around the same time that DNS began to fail, our primary mail server had a minor NIS hiccup that caused it to fail over to our secondary NIS server.
  6. All email addresses are fed through the NIS aliases map in order to tell the mail server who the intended recipient[s] are.
  7. Our secondary NIS server had recently been replaced with a newer, beefier box that was receiving all NIS map updates from the master server except aliases for causes not quite clear at this time, although much finger pointing was aimed in the direction of a faulty Makefile.
  8. Our mail server, unable to determine where to deliver mail, threw up its hands, spewed a whole bunch of “aliases: no such map” messages into the syslogs and contentedly queued up mail for the better part of a morning.
  9. All of which translated into: no mail for anyone until we figured this out.

This email fun was followed by a raging wave of thunderstorms that swept through the area, knocking out building power first (our compute center is UPS’d and generator-backed, so no worries there) and then knocking a transformer and some Verizon telcomm equipment offline, effectively nuking our external link and a sizable portion of our surrounding area, meaning no web access to end the day, followed by some incorrectly-configured Macs sitting on admins desks giving us heartburn for a goodly portion of the day as well. Wheee!
There's... something.  On. The wing!While Friday was fun, I came in to work fully expecting an easy day, as HGCDs (High Gremlin Count Days) are normally few and far between. However, ’twas not to be. I arrived to find my voicemail blinking and my boss standing in my office saying “Our web is down”. After running this statement through my Management-to-IT filters, I realized he was saying that no one could get to any external websites. I and the team lead poked around a bit before realizing that there is a bug in the newest version of RedHat Enterprise (the version our web proxy just happens to run) that ignores the specified default route when being run on machines with multiple NICs, such as proxy servers. This bug was triggered when our proxy, sensing a Disturbance In The DNS Force on Friday had run dhclient and thus begun ignoring the default route, resulting in our poor proxy having no idea how to get to the content that people were requesting of it. We manually added the default route and things once again moved to Status: Hunky Dory. Problem solved, at least for now.
As for me, I’m avoiding ladders, black cats and mirrors for the rest of the week, just to be safe.

Site News And Updates

I’ve made a few changes around here lately and thought it might be worth pointing out the (fairly subtle) modifications.
First up is the Rolling Archives visible at the bottom of the main blog page, originally conceived of by Michael Heilman and then nicely WordPress plugin-ized by Zeo, meaning that it was dead easy to drop them in to my K2 theme setup.
Secondly, check the very bottom of the sidebar to the right side of this page – I’ve added a Server Load button that gives you a fairly good idea as to just how (over)loaded the server that runs this site is at any given time. It’s using the Linux-standard /proc/loadavg, which is a small deviation from the standard UNIX load system. Basically, any load over 1.0 is indicative of a heavy load. (Right now, it’s reading between 2-3 fairly consistently, with spikes to well over 8 during the day. I’ve contacted my hosting provider and they have acknowledged the issue but have declined to give me an ETA for a fix. For all my hosted blogs, this affects you too, so if your sites seem slow, that’s most likely the cause). It might not end up being very useful for my visitors, but it definitely gives me warm fuzzies.
Third and last of all, Feedburner seems to have added the ability to add “social bookmarking” entries to one’s feeds, so I dropped in my del.icio.us username and now those of you reading this site via my RSS feed should begin getting daily digests of the bookmarks I posted to del.icio.us on any given day.
Drop a comment in the comments section of this post and let me know what you think, if’n y’all don’t mind…

Bill Gates: Apple’s Best Pitchman

Heh. Looks like Microsoft’s continuing troubles with security patches may be bearing fruit for Apple and Linux distributors (in the satire market, at least).
Scrappleface: Microsoft Extends Apple Sales Promotion to Jan. 10.

(2006-01-05) — Microsoft Corp. today announced that it would extend, until January 10, its program to promote the sales of rival Apple computers, as well as its drive to double the daily downloads of Linux operating systems.
The previously-secret promotional program, dubbed Operation Why Not Switch?, uses delayed-release of a Windows security patch to prompt customers to take a look at alternatives to Microsoft products.

Again I say: heh. Go and read the whole thing.

This Just Gets Better And Better

Ahhhh, Sony BMG, how we love to hate thee! Not only have you been caught installing unauthorized rootkits on your customers’ PCs, but it’s been found that you’re illegally using LGPL’d software to do so.
Tsk tsk. What’s the Japanese word for schadenfreude, anyways?
Anti-Competitive Update #1: Via Fark, we find out that Sony BMG is also leading a price-fixing cartel intent on driving online music prices up. For shame.
Anti-Competitive Update #2: Update #1 corrected to note that this in regards to all online sales, at least in the consumer electronics realm, thus it is Sony in its entirety and not simply its BMG music division that is involved in this scheme. In the U.K.. As far as I can tell.

Mmmmmm, Alpo!

Microsoft trusts its own technology, as any user of MS products will readily attest to. The compilers they release to the general public are feature-filled and rarely compromised by bugs of staggering size and scope. They are also unequivocal in their disdain for Linux, thus it comes as no surprise that the new wireless LAN they have implemented at their Redmond, WA campus runs on Microsoft technology.
Wait, what? It runs Linux, you say? Well I’ll be a monkey’s uncle!
Heh.
(Via Dougal Campbell.)