RoundSparrow

joined 2 years ago
MODERATOR OF
[–] RoundSparrow@lemmy.ml 10 points 2 years ago (25 children)

I've largely given up on pull requests.... for sake of sanity. But I waded back in...

I made a pull request today... and I very strategically choose to do it with minimal of features so that it would just go through... and I got lectured that JOIN is never a concern and that filtering based on the core function of the site (presenting fresh meat to readers) was a bad use of the database. I've never seen hazing on a project like this. Memcached and Redis should be discussed every day as "why are we not doing what every website does?", but mum is the word.

[–] RoundSparrow@lemmy.ml 0 points 2 years ago* (last edited 2 years ago)

Community stuff can work well if done right. For example you don’t see Debian repositories constantly crashing.

I don't follow your comment, are you suggesting I said something negative about open source project communities? I was talking about the Lemmy social media communities who actually comment and fund the 64-core server upgrades without asking why the site crashes with only 57K users.... the people who comment and post on Lemmy.... not the "open source" programmer community, but the social media community of Lemmy.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago* (last edited 2 years ago)

If anyone bothered to actually look at the SQL SELECT that Lemmy uses to list posts every time you hit refresh it would be blindingly obvious how convoluted it is. yet the community does not talk about the programming issues and instead keeps raising money for 64 core hardware upgrades without recognizing just how tiny Lemmy's database really is and how 57K users is not a large number at all!

your original one, friend. I wouldn’t have argued this point if you had started here.

I mentioned "ORM" right in my first comment.

SELECT 
   "post"."id" AS post_id, "post"."name" AS post_title,
   -- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
   -- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
     "person"."id" AS p_id, "person"."name",
     -- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
     -- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
     -- "person"."bot_account", "person"."ban_expires",
     "person"."instance_id" AS p_inst,
   "community"."id" AS c_id, "community"."name" AS community_name,
   -- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
   -- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
   -- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
   "community"."instance_id" AS c_inst,
   -- "community"."moderators_url", "community"."featured_url",
     ("community_person_ban"."id" IS NOT NULL) AS ban,
   -- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
   -- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
   --"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
   --  "community_follower"."pending",
   ("post_saved"."id" IS NOT NULL) AS save,
   ("post_read"."id" IS NOT NULL) AS read,
   ("person_block"."id" IS NOT NULL) as block,
   "post_like"."score",
   coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread

FROM (
   ((((((((((
   (
	   (
	   "post_aggregates" 
	   INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
	   )
   INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
   )
WHERE (((((((
  ((("community"."deleted" = false) AND ("post"."deleted" = false)) AND ("community"."removed" = false))
  AND ("post"."removed" = false)) AND ("post_aggregates"."creator_id" = 3)) AND ("post"."nsfw" = false))
  AND ("community"."nsfw" = false)) AND ("local_user_language"."language_id" IS NOT NULL))
  AND ("community_block"."person_id" IS NULL))
  AND ("person_block"."person_id" IS NULL))
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;
[–] RoundSparrow@lemmy.ml 11 points 2 years ago* (last edited 2 years ago) (3 children)

the people who run Lemmy don’t have the money to support a fleet of failover servers that take over when the main server goes offline.

That has nothing to do with the issue I'm talking about. Every server with the amount of data in them would fail. Doesn't matter if you had 100 servers on standby.

The Rust logic for database access and PostgreSQL logic in lemmy is unoptimized and there is a serious lack of Diesel programming skills. site_aggregates table had a mistake where 1500 rows were updated for every single new comment and post - and it only got noticed when lemmy.ca was crashing so hard they made a complete copy of the data and studied what was gong on.

Throwing hardware at it, as you describe, has been the other thing... massive numbers of CPU cores. What's needed is to learn what Reddit did before 2010 with PostgreSQL.... as Reddit also used PostgreSQL (and is open source).

That’s basically the only reason you don’t see lots of downtime from major corporations: investment in redundancy,

Downtime because you avoid using Redis or Memcached caching at all costs in your project isn't common to see in major corporations. But Lemmy avoids caching any data from PostgreSQL at all costs. Been that way for several years. May 17, 2010: "Lesson 5: Memcache;"

As I said in my very first comment, server crashing as a way to scale is a very interesting approach.

EDIT: Freudian slip, "memecached" instead of Memcached

[–] RoundSparrow@lemmy.ml 4 points 2 years ago (7 children)

It’s not

It's really odd how many people around here think the server crashes are perfectly normal and are glad to see newcomers driven away.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago

Only way to solve this (imho) is to reinstall Lemmy BUT use another subdomain.

I wold agree that this is worth considering as an approach to not clash identity and get into custom SQL or Rust programming. But there isn't even really a procedure in place to decommission the old lemmy entity... so another damned if you do, damned if you don't in 0.18.4 era.

I'm a little surprised that the federation private key/public key signing doesn't get upset about all new keys appearing on the same domain name. I've tried to get details of exactly how a server joins the Lemmy network and gets discovered over on !lemmyfederation@lemmy.ml but haven't gotten any actually discussion on the details.

What do you think? Will this work?

I've seen people nuke and start-over their database from empty several times while having problems setting up NGinx and Docker... or whatever part.

I'm glancing at the list of SEQUENCE in Lemmy....

CREATE SEQUENCE public.admin_purge_comment_id_seq
CREATE SEQUENCE public.admin_purge_community_id_seq
CREATE SEQUENCE public.admin_purge_person_id_seq
CREATE SEQUENCE public.admin_purge_post_id_seq
CREATE SEQUENCE public.captcha_answer_id_seq
CREATE SEQUENCE public.comment_aggregates_id_seq
CREATE SEQUENCE public.comment_id_seq
CREATE SEQUENCE public.comment_like_id_seq
CREATE SEQUENCE public.comment_reply_id_seq
CREATE SEQUENCE public.comment_report_id_seq
CREATE SEQUENCE public.comment_saved_id_seq
CREATE SEQUENCE public.community_aggregates_id_seq
CREATE SEQUENCE public.community_block_id_seq
CREATE SEQUENCE public.community_follower_id_seq
CREATE SEQUENCE public.community_id_seq
CREATE SEQUENCE public.community_language_id_seq
CREATE SEQUENCE public.community_moderator_id_seq
CREATE SEQUENCE public.community_person_ban_id_seq
CREATE SEQUENCE public.custom_emoji_id_seq
CREATE SEQUENCE public.custom_emoji_keyword_id_seq
CREATE SEQUENCE public.email_verification_id_seq
CREATE SEQUENCE public.federation_allowlist_id_seq
CREATE SEQUENCE public.federation_blocklist_id_seq
CREATE SEQUENCE public.instance_id_seq
CREATE SEQUENCE public.language_id_seq
CREATE SEQUENCE public.local_site_id_seq
CREATE SEQUENCE public.local_site_rate_limit_id_seq
CREATE SEQUENCE public.local_user_id_seq
CREATE SEQUENCE public.local_user_language_id_seq
CREATE SEQUENCE public.mod_add_community_id_seq
CREATE SEQUENCE public.mod_add_id_seq
CREATE SEQUENCE public.mod_ban_from_community_id_seq
CREATE SEQUENCE public.mod_ban_id_seq
CREATE SEQUENCE public.mod_hide_community_id_seq
CREATE SEQUENCE public.mod_lock_post_id_seq
CREATE SEQUENCE public.mod_remove_comment_id_seq
CREATE SEQUENCE public.mod_remove_community_id_seq
CREATE SEQUENCE public.mod_remove_post_id_seq
CREATE SEQUENCE public.mod_sticky_post_id_seq
CREATE SEQUENCE public.mod_transfer_community_id_seq
CREATE SEQUENCE public.password_reset_request_id_seq
CREATE SEQUENCE public.person_aggregates_id_seq
CREATE SEQUENCE public.person_ban_id_seq
CREATE SEQUENCE public.person_block_id_seq
CREATE SEQUENCE public.person_follower_id_seq
CREATE SEQUENCE public.person_id_seq
CREATE SEQUENCE public.person_mention_id_seq
CREATE SEQUENCE public.person_post_aggregates_id_seq
CREATE SEQUENCE public.post_aggregates_id_seq
CREATE SEQUENCE public.post_id_seq
CREATE SEQUENCE public.post_like_id_seq
CREATE SEQUENCE public.post_read_id_seq
CREATE SEQUENCE public.post_report_id_seq
CREATE SEQUENCE public.post_saved_id_seq
CREATE SEQUENCE public.private_message_id_seq
CREATE SEQUENCE public.private_message_report_id_seq
CREATE SEQUENCE public.received_activity_id_seq
CREATE SEQUENCE public.registration_application_id_seq
CREATE SEQUENCE public.secret_id_seq
CREATE SEQUENCE public.sent_activity_id_seq
CREATE SEQUENCE public.site_aggregates_id_seq
CREATE SEQUENCE public.site_id_seq
CREATE SEQUENCE public.site_language_id_seq
CREATE SEQUENCE public.tagline_id_seq
CREATE SEQUENCE utils.deps_saved_ddl_id_seq

[–] RoundSparrow@lemmy.ml 13 points 2 years ago* (last edited 2 years ago) (37 children)

Let the servers keep crashing, tell everyone to add new instances to help with performance, which puts 1500 rows into the database tables that used to have 50 rows and invokes a massive federation 1-vote-1-https overhead... causing more crashing... all the while ignoring the SQL design of machine-generated ORM statements and counting logic hidden in the background triggers.

... keep users off your sever as a method of scaling by crashing. It's one of the more interesting experiences I've had this year! And I spent all of February and March with the release of GPT-4... which was also interesting!

[–] RoundSparrow@lemmy.ml 17 points 2 years ago (2 children)

That feature you linked to is to flair users.... there is a different issue to flair posts: https://github.com/LemmyNet/lemmy/issues/317

[–] RoundSparrow@lemmy.ml 3 points 2 years ago (2 children)

It is complicated. It's surely a damned-if-do damned-if-don't situation. It doesn't sound like you had all that much in terms of local users, communities, posts, comments - so at least that's in your favor.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago

I haven't looked around at alternatives.

Lemmy has a lot of front-end app development going on and I think that's one of the big strengths. The API can be bloated with a lot of duplicate data in JSON responses but it is usable.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago* (last edited 2 years ago) (6 children)

Lemmy is pretty immature as code to actually run in production. It may be well over 4 years old, but the whole thing seems to have very little in the way of information that a server operator can look at to check the health and problems under the covers. It also doesn't deal with unrecognized data very well and hides a lot of errors in a log where the messages are often not very much of a hint what is going on.

Lemmy surely is unique, as I almost never see people using it actually criticize the code for quality assurance and testing. More often than not, I see people cheering and defending it. I've had to look through this experience and code as it is more run like an art project or a music band than any serious focus on data integrity or performance concern.

[–] RoundSparrow@lemmy.ml 2 points 2 years ago (4 children)

if it were me right now with Lemmy 0.18.4, I'd take the server offline, do a PostgreSQL dump file - keep a copy, then hand-edit the sequence numbers in the dump file - and do a restore.

you probably only had a few users, so I would set user to 100, person id can be higher because of federation - but jump ahead to 10000 maybe. Post and comment set ahead to 10000 ... and community set ahead to 10000 because that gets federated

the PostgreSQL sequence numbers should only get used on newly created objects here-forward.

 

I haven't had this much trouble since Ada coding in 1986 when I was age 16.

Can someone please help me get the procedural syntax right?

https://github.com/LemmyNet/lemmy/pull/3805

 

This is the first post or comment in Lemmy history to say log_min_duration_statement ... ;)

It is possible to instruct PostgreSQL to log any query that takes over a certain amount of time, 2.5 seconds what I think would be a useful starting point. Minimizing the amount of logging activity to only those causing the most serious issues.

"Possibly the most generally useful log setting for troubleshooting performance, especially on a production server. Records only long-running queries for analysis; since these are often your "problem" queries, these are the most useful ones to know about. Used for pg_fouine." - https://postgresqlco.nf/doc/en/param/log_min_duration_statement/

I think it would really help if we could get lemmy.world or some other big site to turn on this logging and share it so we can try to better reproduce the performance overloads on development/testing systems. Thank you.

 

Someone was asking about it in support, so I was curious to try it. The Rust code has HideCommunity, I even tried to bypass the JavaScript client and do a direct API call.

Does anyone see the function call to send a HideCommunity object?

Where it might be disabled in the Rust code? I know it goes all the way into the database and is on SELECT statements.

Thank you

 

"502 bad gateway"

 

For lemmy server testing and performance baseline measurement, I think it would be cool to have every API call exercised.

Anyone willing to create and share some JavaScript client code? Normally these are run with Jest via NodeJS - but we can make it an extra step to integrate into Jest. I'm just thinking someone doing front-end app work can do a well organized hit on every API surface.

You can skip the creation of a user if you want, that code is already in the testing

Probably ideal to organize moderator vs non-moderator.

Something like: edit profile with every option one at a time, create a community, edit it, create posts, edit, delete, undelete, reply, etc. Imagine you were doing interactive tests of a major upgrade and wanted to hit every feature and button.

Right now most of that testing is done in independent scripts, such as a user test: https://github.com/LemmyNet/lemmy/blob/main/api_tests/src/user.spec.ts

And you can see it only tests editing a profile is working, not that actual features change their behavior. Although I've started to add that for back-end behaviors like showing read/unread posts on the list. Front-end devs are the ones who know what end-users do and the fringe cases to look out for. Thank you.

 

I thought some people were out there in June creating stress-testing scripts, but I haven't seen anything materializing/showing results in recent weeks?

I think it would be useful to have an API client that establishes some baseline performance number that can be run before a new release of Lemmy and at least ensure there is no performance regression?

The biggest problem I have had since day 1 is not being able to reproduce the data that lemmy.ml has inside. There is a lot of older content stored that does not get replicated, etc.

The site_aggregates UPDATE statement lacking a WHERE clause and hitting 1500 rows (number of known Lemmy instances) of data instead of 1 row is exactly the kind of data-centered problem that has slipped through the cracks. That was generating a ton of extra PostgreSQL I/O for every new comment and post from a local user.

The difficult things to take on:

  1. Simulating 200 instances instead of just 5 that the current API testing code does. First, just to have 200 rows in many of the instance-specific tables so that local = false API calls are better exercised. And probably about 25 of those instances have a large number of remote subscribers to communities.

  2. async federation testing. The API testing in lemmy right now does immediate delivery with the API call so you don't get to find out the tricky cases of servers being unreachable.

  3. Bulk loading of data. On one hand it is good to exercise the API by inserting posts and comments one at a time, but maybe loading data directly into the PostgreSQL backend would speed up development and testing?

  4. The impact of scheduled jobs such as updates to certain aggregate data and post ranking for sorting. We may want to add special API feature for testing code to trigger these on-demand to stress test that concurrency with PostgreSQL isn't running into overloads.

  5. Historically, there have been changes to the PostgreSQL table layout and indexes (schema) with new versions of Lemmy, which can take significant time to execute on a production server with existing data. Some kind of expectation for server operators to know how long an upgrade can take to modify data.

  6. Searching on communities, posts, comments with significant amounts of data in PostgreSQL. Scanning content of large numbers of posts and comments can be done by users at any time.

  7. non-Lemmy federated content in database. Possible performance and code behavior that arises from Mastodon and other non-Lemmy interactions.

I don't think it would be a big deal if the test takes 30 minutes or even longer to run.

And I'll go out and say it: Is a large Lemmy server willing to offer a copy of their database for performance troubleshooting and testing? Lemmy.ca cloned their database last Sunday which lead to the discovery of site_aggregates UPDATE without WHERE problem. Maybe we can create a procedure of how to remove private messages and get a dump once a month from a big server to analyze possible causes of PostgreSQL overloads? This may be a faster path than building up from-scratch with new testing logic.

 

Reference, June 4, 2023: https://github.com/LemmyNet/lemmy/issues/2910

Questions:

  1. Lemmy.ml was crashing every 10 minutes of every single day since May 25. Do you dispute this claim I make? Server log Evidence?

  2. June 4 issue 2910 was on Github and the same developers who run lemmy.ml put a tag on the issue June 4.

  3. Did it say server-crashing topic?

  4. Were you aware of the end-of-June Reddit API change?

  5. When was the issue fixed, date and who?

  6. Did you ask PostgreSQL communities on Lemmy for help? Which posts?

  7. Is lemmy.ml a developer run instance? Is this your idea of "supported" example of Lemmy's rust lemmy_server code?

Please explain. Like PostgreSQL EXPLAIN?

Was this what was given as an Issue 2910 to you on June 4, 2023?

 

IIRC, it was lemmy.ca full copy of live data that was used (copy made on development system, not live server - if I'm following). Shared Saturday July 22 on GitHub was this procedure:

...

Notable is the AUTO_EXPLAIN SQL activation statements:

LOAD 'auto_explain';
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_analyze = true;
SET auto_explain.log_nested_statements = true;

This technique would be of great use for developers doing changes and study of PostgreSQL activity. Thank you!

 

Right now querying posts has logic like this:

WHERE (((((((((("community"."removed" = $9) AND ("community"."deleted" = $10)) AND ("post"."removed" = $11)) AND ("post"."deleted" = $12)) AND (("community"."hidden" = $13)

Note that a community can be hidden or deleted, separate fields. And it also has logic to see if the creator of the post is banned in the community:

LEFT OUTER JOIN "community_person_ban" ON (("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id"))

And there is both a deleted boolean (end-user delete) and removed boolean (moderator removed) on a post.

Much of this also applies to comments. Which are also owned by the post, which are also owned by the community.

 

Is .moderators supposed to be on GetCommunity() result? I can't seem to find it in lemmy_server api_tests context. All I'm getting is languages and community_view

EDIT: Wait, so there is a "CommunityResponse" and a "GetCommunityResponse", object? What call do I use to Get a GetCommunityResponse object?

https://github.com/LemmyNet/lemmy-js-client/blob/2aa12c04a312ae4ae235f6d97c86a61f58c67494/src/types/GetCommunityResponse.ts#L7

 

in Communities create community/edit community there is a SiteLanguage::read with no site_id, should that call to read have site_id = 1?

For reference, on my production instance my site_language table has 198460 rows and my site table has 1503 rows. Average of 132 languages per site. counts: https://lemmyadmin.bulletintree.com/query/pgcounts?output=table

 

A general description of the proposed change and reasoning behind it is on GitHub: https://github.com/LemmyNet/lemmy/issues/3697

Linear execution of these massive changes to votes/comments/posts with concurrency awareness. Also adds a layer of social awareness, the impact on a community when a bunch of content is black-holed.

An entire site federation delete / dead server - also would fall under this umbrella of mass data change with a potential for new content ownership/etc.

view more: ‹ prev next ›