Normally not very, but today was.... interesting. It started yesterday where this production server didn't allow for any logins. Not via shh, not via console/IPMI. But it kept doing its job, so we put fixing it on the back burner until downtime was more convenient.
Well, more convenient was today. As soon as I got the go-ahead from the field crew I started shutting down the entire production cluster, preparing to boot into single user mode, expecting this was pam faillock or something else trivial, caused by the operators.
Well, it was caused by the operators, but it was far from trivial. During boot: "Failed to chroot, /bin/sh: no such file or directory"....
...Shit.
At first I suspected a drive or filsystem failure, but booting into an emergency shell revealed that everything I looked for for a healthy boot was there. Everything except ld-Linux. Instead it was a broken symlink to /usr/lib64
...the fuck?
All troubleshooting was done over a 256kbps vsat, so anything that I needed to type was excruciatingly slow. It took me a few hours, but I managed to transplant in the necessary libraries from dracut so that I could at least bring up a network stack. No wonder no logins worked earlier with a missing lib64. Once I got the network up and running I could transplant in lib64 from an almost identical server in the same cluster. Chroot worked again, so it looked promising.
I issued the reboot command and crossed my fingers. It booted! Then came the time to bring up the production cluster, and that's when I noticed that someone with a GUI and fat fingers had at one point managed to accidentally move the lib64 folder to /media/lib64