this post was submitted on 18 Jul 2025
3 points (100.0% liked)

Lemmy Federate

95 readers
1 users here now

Updates and questions about lemmy-federate.com

founded 5 months ago
MODERATORS
 

I fed this output:

https://lemmy-federate.com/api/community.find?input=%7B%22skip%22%3A0%2C%22take%22%3A10%7D

Into json_pp | grep name, and got:

"name" : "science_memes",
               "name" : "al_gore",
               "name" : "applied_paranoia",
               "name" : "windowmanagers",
               "name" : "hihihi",
               "name" : "media_reviews",
               "name" : "petits_animaux",
               "name" : "twnw",
               "name" : "niagaraonthelake",
               "name" : "niagarafalls",

That’s it. There are no more names. Inspecting the dataset seems to show a lot of communities, but only their number. Is there a separate table that maps community numbers to names?

(previous discussion for reference)

you are viewing a single comment's thread
view the rest of the comments
[–] iso@lemy.lol 4 points 2 weeks ago (4 children)

You need to change input parameters (skip, take) to paginate through pages and merge all communities you've fetched. You'll probably need to use something like Python or NodeJS.

For example right now you're using skip=0 and take=10, which will take first 10 communities found in DB.

[–] activistPnk@slrpnk.net 1 points 1 week ago* (last edited 1 week ago) (3 children)

Thanks! That clears some things up.

Using bash, I came up with this code:

for pg in {0..999}
do
    curl -s --get -o lemmy-federate_nodes_pg"$pg".json\
         --data-urlencode 'input={"search":"","skip":'"$pg"',"take":50,"enabledOnly":false}' \
         'https://lemmy-federate.com/api/instance.find'
    
    qty_of_items_returned=$(jq '.result.data.instances | length' < lemmy-federate_nodes_pg"$pg".json)
    [[ "$qty_of_items_returned" -gt 0 ]] || break;
done

When it reached page 80, the array was zero in length so the loop was terminated at that moment. It started off taking 50 nodes per fetch but near the end the amount fetched gradually dropped off. I was expecting it to be 50 per fetch until the last page. I wonder if that’s a throttling feature? Anyway, it’s not problem. It worked.

[–] iso@lemy.lol 2 points 1 week ago (1 children)

That’s interesting 🤨 It shouldn’t reduce the records until the last page.

[–] activistPnk@slrpnk.net 2 points 1 week ago* (last edited 1 week ago) (1 children)

The communities w/relationships is a much bigger dataset. I tried grabbing 100 records per fetch in a loop with no sleep or throttle. Page 523 had 100 records and page 524 was an empty file. I restarted with the skip at 523 and got to page 531. It died again, this time leaving a file that ended in the middle of a JSON field.

Any suggestions? I wonder if I should put a 1 or 2 second delay between pages so the server is not overloaded.

(update) wow, this is bulkier than I expected. 966mb. Hope that didn’t cause any problems. I guess I won’t do that full fetch again. I don’t suppose there an API parameter to select records with updatedAt newer than a specified date?

(update 2) is skip the number of pages, or records? I treated it as pages but it’s starting to look like that’s number of records -- which would mean I grabbed a lot of dupes. Sorry! (if that’s the case)

(update 3) Shit.. looks like skip is the number of records, which makes sense. Sorry for the waste! I’ll fix my script.

[–] iso@lemy.lol 2 points 1 week ago

Good to hear the problem was that!

load more comments (1 replies)
load more comments (1 replies)