this post was submitted on 10 Apr 2025
180 points (98.4% liked)

Home

751 readers
1 users here now

Lemmy.zip instance discussion.

For all things relating to Lemmy.zip.

Main instance rules apply, with the additional rules below:

founded 2 years ago
MODERATORS
 

Hi All,

As some of you may have realised, the planned upgrade sort of crashed everything, and we had our longest period of downtime since the site began.

This is partly because I had to go to sleep (thanks to a newborn and a job).

The good news is that the backup process worked! We've restored to seconds before the upgrade took the site offline.

The bad news is that federation is likely to be.. wonky.. for a little while. The site may also go up and down while I undo some of the fixes I tried.

Ultimately the issue came down to the upgrade failing (I am not sure why - will be digging into this now the priority is no longer getting the site up) and then the containers not talking to eachother, so the UI wouldn't talk to lemmy, and lemmy wouldn't talk to the database.

I rebuilt the containers, restored the backup, restarted everything, and it's all come back up (admittedly not perfect right now).

Importantly, I want to issue an apology. This isn't what I want for Lemmy.zip, and it should've been handled way better by myself. I'm always learning but this took way longer than it should've, and while I take some solace in the fact the backup process worked and has been proven to work in production, the delay in being able to get this back up is entirely my fault and frankly unacceptable.

I'll be working to document this outage, the steps it took to get it back up, and some form of repeatable plan so a repair can be replicated in the future if I'm not available.

In terms of upgrading to 0.19.11 - I will have to try again soon as it's got some security fixes we desperately need to implement.

Thanks

Demigodrick

top 50 comments
sorted by: hot top controversial new old
[–] tigeruppercut@lemmy.zip 57 points 4 months ago (1 children)

Importantly, I want to issue an apology

Way I see it, family and mental health always comes before internet randos. Thanks for working hard for everyone.

[–] Demigodrick@lemmy.zip 11 points 4 months ago

Lots of Internet randos have been very nice and supportive, so I feel a debt to the community to make this place the best it can be.

But thank you ❤️

[–] Demigodrick@lemmy.zip 46 points 4 months ago (1 children)

I will try and reply to each comment - but you've all been really kind and that means so much ❤️

If you're interested, this graph will show you how far behind we are. We should eventually catch up, but things will likely be very delayed for up to 12 hours.

The status page did not work as expected - and I'll try and link a few more places where I post updates. If you haven't yet, definitely join the matrix space and you'll get minute by minute panic updates 🫠

[–] shortwavesurfer@lemmy.zip 7 points 4 months ago (1 children)

That graph is really kind of neat, but it seems to only be synchronizing with a single instance at a time from what I can tell. I saw the world line has dropped significantly, but the other lines don't look like they've fallen yet.

[–] Demigodrick@lemmy.zip 12 points 4 months ago (1 children)

Yes, the lemmy.world admins kindly manually reset the timer for their instance so it started updating straight away!

If an instance goes down, other instances slowly back off sending retries of activities so not to waste sending them to dead instances.

You can use this tool to see this info. It links lemmy.world but you can search for any instance, and then look up lemmy.zip either under failed or lagging instances. You'll see on the far right the "next send try" time and date. Looks like a lot will try again around 9pm (although I'm not entirely sure on the timezone there) - so over the next few hours instances will send another try, see that lemmy.zip is back up, and then start federation with us again :)

load more comments (1 replies)
[–] Omgboom@lemmy.zip 39 points 4 months ago* (last edited 4 months ago) (3 children)

I've been there. But it is my honor to bestow upon you this award to commemorate the accomplishment

[–] locuester@lemmy.zip 19 points 4 months ago (1 children)

Ah yes. I still wear my 25 year old “deleted a prod database” badge with honor

[–] swizzlestick@lemmy.zip 17 points 4 months ago

It's a bittersweet honour to have. My personal fail was being too cocky updating a 'handful' of product descriptions.

(15398 rows(s) affected)

load more comments (2 replies)
[–] Debs@lemmy.zip 28 points 4 months ago (1 children)

Thanks for all your hard work. We missed .zip while it was gone.

load more comments (1 replies)
[–] Blablablabum@lemmy.zip 27 points 4 months ago* (last edited 4 months ago) (4 children)

Thanks for the update.

I was a bit worried for your mental health as the hours of downtime continued :)

Awesome that the backup restore procedure work that well.

One thing I have been wondering is, why status.lemmy.zip stayed all green during all of this.

[–] BrikoX@lemmy.zip 8 points 4 months ago (2 children)

Because it was technically working, it's just that "UI wouldn't talk to lemmy, and lemmy wouldn't talk to the database". Soo they were operating, but not communicating to each other.

load more comments (2 replies)
[–] possiblylinux127@lemmy.zip 5 points 4 months ago (1 children)

I was a little worried that he got arrested by the UK POLICE due to the online privacy act

[–] Demigodrick@lemmy.zip 6 points 4 months ago

😆 I heard a siren the other day and thought "oh shit, they've found me!"

load more comments (2 replies)
[–] 0x0@lemmy.zip 17 points 4 months ago (1 children)

entirely my fault and frankly unacceptable.

You're providing a service out of your own time, pocket and energy; you don't owe anyone.
It's the other way around, we owe you.

So thank you.
Learn from your mistakes and carry on. 👍

load more comments (1 replies)
[–] LiveLM@lemmy.zip 16 points 4 months ago* (last edited 4 months ago) (1 children)

Dude, you're being wayyyy to harsh on yourself!
You run this awesome instance for free while caring for a newborn, you don't owe anybody nothing.
Forget the delay, forget apologies and "unacceptable". Real life comes before social media, don't beat yourself up for the outage.

People who can't stand downtime should practice personal redundancy by creating backup accounts on other instances ;)

load more comments (1 replies)
[–] Blaze@lemmy.zip 16 points 4 months ago

Thank you for this post. Don't be so harsh on yourself, everyone can make a mistake!

Good to see Lemmy.zip back up!

[–] rumba@lemmy.zip 15 points 4 months ago (2 children)

Dude, keeping this running with a job and a newborn? You're headed for sainthood.

If you don't have one, you could start an out of band chat during updates, just in case you need some eyes on things or just some moral support. I'm sure we have at least a few subject matter experts around if you can stand us :)

[–] BrikoX@lemmy.zip 11 points 4 months ago
load more comments (1 replies)
[–] GeekFTW@lemmy.zip 13 points 4 months ago (1 children)

Things gotta fuck up sometimes, tis how we figure shit out and learn things! You got this.

load more comments (1 replies)
[–] possiblylinux127@lemmy.zip 13 points 4 months ago (1 children)

No need to apologize as you have been doing a stellar job. Your family needs to always take priority no matter what. I don't care if it is down for a week as your health and kid are far more important.

One thing I will say is that I think Lemmy.zip could really benefit from a external way of communicating announcements. It doesn't need to be complicated and you could reuse your existing mastodon account to post updates when things go wrong. It also could allow for users to give advise on how to fix issues.

load more comments (1 replies)
[–] Montagge@lemmy.zip 13 points 4 months ago* (last edited 4 months ago) (1 children)

No worries! Make sure you're getting enough rest!

load more comments (1 replies)
[–] FryHyde@lemmy.zip 12 points 4 months ago (1 children)

As a new parent myself, I'm stunned you managed to find the time to restore it at all. Good on ya, fella!

load more comments (1 replies)
[–] win95@lemmy.zip 11 points 4 months ago (2 children)

Thanks for the update! I figured it must have had something to do with the baby and the busy life + the update not working as expected, so I was patient.

After 12hrs or so I did go on mastodon to look for an update (just a 'everything crashed, working on the backup' kinda message) so if this ever happens again that might be an idea?

Thanks for working so hard on getting everything back up and don't forget to rest!

[–] BrikoX@lemmy.zip 8 points 4 months ago (7 children)
load more comments (7 replies)
load more comments (1 replies)
[–] Marty_TF@lemmy.zip 11 points 4 months ago (1 children)

thanks for the effort and also explanation

load more comments (1 replies)
[–] TacoEvent@lemmy.zip 11 points 4 months ago (1 children)

I appreciate the transparency and frankly couldn't ask for more. Shit happens and this is a one-person operation. Thanks for all your effort!

load more comments (1 replies)
[–] catrass@lemmy.zip 11 points 4 months ago (1 children)

Appreciate the honesty and transparency. Thank you for your hard work maintaining the site, and hopefully you're able to restore everything to a fully working state!

load more comments (1 replies)
[–] stoy@lemmy.zip 11 points 4 months ago (1 children)

Thank you for all your hard work, as an IT guy I know the feeling when production doesn't work as it should, and the feeling of relief when the backups are actually being restored and working.

Take care and make sure to take a break if you need to, we'll still be here.

load more comments (1 replies)
[–] shortwavesurfer@lemmy.zip 10 points 4 months ago (1 children)

Hey man, you've got absolutely nothing to worry about. The fact that you have this service for us at all is quite frankly amazing and we thank you for it. As another commenter said below, I'd rather have a day worth of downtime than to be on big corporate social media and have everything fixed quicker. Because I know that I'm not the product here. When it did not come back, I checked the status page and it said it was working. So I just figured something broke and decided I'd wait until it came back.

Actually... disregard everything I said above. I'm so fucking mad right now. I could bite holes in bricks. I mean, how dare you notice that there's a problem and not get it fixed absolutely immediately. /s

load more comments (1 replies)
[–] MrTolkinghoen@lemmy.zip 10 points 4 months ago (1 children)

Don't worry. Newborn is a trump card, but even without it you literally are a volunteer.

Anyone complaining about you volunteering your time esp with a newborn, is not a parent..but ignoring that you're doing this for free. Thanks for your time, effort, and just happy it's back up :).

Social media that can go down for a day or two is way better than a shit hole of advertising and manipulation that is Facebook, reddit, and all the rest.

load more comments (1 replies)
[–] other_cat@lemmy.zip 10 points 4 months ago (1 children)

Thank you for the transparent announcement, and don't sweat it!

load more comments (1 replies)
[–] mrodri89@lemmy.zip 10 points 4 months ago (1 children)

We appreciate you very much! Take all the rest you need. 🫡

load more comments (1 replies)
[–] nailingjello@lemmy.zip 10 points 4 months ago (1 children)

Hard agree with all of the other comments here. No apologies needed, you do a great job of keeping this instance going and the transparency is appreciated.

I temporarily switched over to an alt account and was back browsing Lemmy after figuring out .zip was offline, absolutely no big deal.

load more comments (1 replies)
[–] TowardsTheFuture@lemmy.zip 10 points 4 months ago (1 children)

Hey you’ve done a ton for all of us and I can’t thank you enough for the work and dedication. Don’t be too hard on yourself, your child and well being are both important, it’s fine. I’d rather some downtime than losing you as admin. Pretty sure most on the sever would agree.

load more comments (1 replies)
[–] EchoCranium@lemmy.zip 9 points 4 months ago (1 children)

Don't fret about it, things happen. You run a great service for us. A job and a new family already add up to being more than two full-time obligations. Managing Zip along with all that is a lot. Thanks for doing it.

load more comments (1 replies)
[–] Grimm@lemmy.zip 8 points 4 months ago (1 children)

I think you handled it very well. Not sure how it could’ve been handled better tbh. I figured something didn’t go as planned and I didn’t have any problems waiting for you to find a solution. No apologies needed.

load more comments (1 replies)
[–] swizzlestick@lemmy.zip 8 points 4 months ago (1 children)

Trial by fire. At least it was interesting(!)

Praise be to the backup strategy 🙂

load more comments (1 replies)
[–] Jg1@lemmy.zip 8 points 4 months ago (1 children)

I didn't realize it was a single person keeping it all running! Tech sometimes goes wonky, good job getting it back online!

load more comments (1 replies)
[–] PurpleGameBoy@lemmy.zip 8 points 4 months ago (1 children)

Thanks for your hard work! Remember your mental health always has priority though. Cheers mate.

load more comments (1 replies)
[–] MoreFPSmorebetter@lemmy.zip 8 points 4 months ago (1 children)

How dare you interrupt my ability to look at memes and see the same news article posted in 17 places at once!

Jokes aside I appreciate the work y'all do to keep this sorta thing running without any pay or thanks for the most part.

I am greatful.

load more comments (1 replies)
[–] Million@lemmy.zip 8 points 4 months ago
[–] Jerry@feddit.online 7 points 4 months ago

Been there, done that, with my Friendica instance. 2 days of downtime while rebuilding a corrupted database, while people are tapping their feet waiting for all to return. I'm with you in spirit, my friend.

Thanks for all your hard work keeping the dream alive! And for keeping good backups

[–] Blaze@lemmy.dbzer0.com 6 points 4 months ago (1 children)

Thank you for this post. Don't be too harsh on yourself, everyone can make a mistake!

Glad to have lemmy.zip up and running again!

load more comments (1 replies)
[–] TranquilTurbulence@lemmy.zip 6 points 4 months ago* (last edited 4 months ago) (1 children)

No worries. You’re doing a great job even though things are hard from time to time.

Thanks for your efforts. ❤️

load more comments (1 replies)
[–] 1Fuji2Taka3Nasubi@lemmy.zip 6 points 4 months ago* (last edited 4 months ago) (3 children)

Thanks for the hard work! Glad the server is back online.

A suggestion: Post a message on status.lemmy.zip when there is maintenance. That was where I thought to check when I found that the main site was not working. Though, it was reporting the site was fine when it was unavailable, this time.

Oh, and congratulations on the newborn!

load more comments (3 replies)
[–] Outdated4134@lemmy.zip 6 points 4 months ago

You're good bro. Sort of assumed something went wrong with the upgrade.

[–] FrostyTrichs@crazypeople.online 5 points 4 months ago

Unfortunately these things can and do happen. I'm glad you were able to get things functional with a restoration. Best of luck troubleshooting and repairing the leftover gremlins.

Thanks for all you do to support Lemmy.

[–] Eyck_of_denesle@lemmy.zip 5 points 4 months ago

Congratulations on the baby. We should thank you for making us go touch grass.

load more comments
view more: next ›