this post was submitted on 22 Aug 2025
21 points (100.0% liked)

Casual Conversation

1225 readers
150 users here now

Share a story, ask a question, or start a conversation about (almost) anything you desire. Maybe you'll make some friends in the process.


RULES

  1. Be respectful: no harassment, hate speech, bigotry, and/or trolling.
  2. Encourage conversation in your OP. This means including heavily implicative subject matter when you can and also engaging in your thread when possible.
  3. Avoid controversial topics (e.g. politics or societal debates).
  4. Stay calm: Don’t post angry or to vent or complain. We are a place where everyone can forget about their everyday or not so everyday worries for a moment. Venting, complaining, or posting from a place of anger or resentment doesn't fit the atmosphere we try to foster at all. Feel free to post those on !goodoffmychest@lemmy.world
  5. Keep it clean and SFW
  6. No solicitation such as ads, promotional content, spam, surveys etc.

Casual conversation communities:

Related discussion-focused communities

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] neidu3@sh.itjust.works 1 points 6 days ago* (last edited 6 days ago)

Normally not very, but today was.... interesting. It started yesterday where this production server didn't allow for any logins. Not via shh, not via console/IPMI. But it kept doing its job, so we put fixing it on the back burner until downtime was more convenient.

Well, more convenient was today. As soon as I got the go-ahead from the field crew I started shutting down the entire production cluster, preparing to boot into single user mode, expecting this was pam faillock or something else trivial, caused by the operators.

Well, it was caused by the operators, but it was far from trivial. During boot: "Failed to chroot, /bin/sh: no such file or directory"....

...Shit.

At first I suspected a drive or filsystem failure, but booting into an emergency shell revealed that everything I looked for for a healthy boot was there. Everything except ld-Linux. Instead it was a broken symlink to /usr/lib64

...the fuck?

All troubleshooting was done over a 256kbps vsat, so anything that I needed to type was excruciatingly slow. It took me a few hours, but I managed to transplant in the necessary libraries from dracut so that I could at least bring up a network stack. No wonder no logins worked earlier with a missing lib64. Once I got the network up and running I could transplant in lib64 from an almost identical server in the same cluster. Chroot worked again, so it looked promising.

I issued the reboot command and crossed my fingers. It booted! Then came the time to bring up the production cluster, and that's when I noticed that someone with a GUI and fat fingers had at one point managed to accidentally move the lib64 folder to /media/lib64