this post was submitted on 04 Jul 2023
187 points (100.0% liked)

196

18162 readers
156 users here now

Be sure to follow the rule before you head out.


Rule: You must post before you leave.



Other rules

Behavior rules:

Posting rules:

NSFW: NSFW content is permitted but it must be tagged and have content warnings. Anything that doesn't adhere to this will be removed. Content warnings should be added like: [penis], [explicit description of sex]. Non-sexualized breasts of any gender are not considered inappropriate and therefore do not need to be blurred/tagged.

If you have any questions, feel free to contact us on our matrix channel or email.

Other 196's:

founded 2 years ago
MODERATORS
 
top 6 comments
sorted by: hot top controversial new old
[–] R00bot@lemmy.blahaj.zone 16 points 2 years ago

i tried to get access to facebook's api to mess around (as a student) but they declined my request. i ended up making a bot that ran in a headless browser wasting far more of facebooks resources and i used it to create shitposts that updated the post with the number of reactions lmao.

[–] b3nsn0w@pricefield.org 12 points 2 years ago

fun fact: on the r-site, you can still append .json to the end of any path (before the query params) to get the formatted data

fun fact 2: on the same site you get a similar json if you grab the script that says id="data" (trivial with jsdom if you run nodejs), eval it in a sandbox (node's built-in vm package), and look for your passed global object's $.___r param

fun fact 3: also on the same site, if you use the old interface it's full of data tags intended for css, jsdom goes brrr

fun fact 4: even if they stopped all of this you could use a headless browser and grab the data in flight from the api calls (virgin dom scrubber vs chad api capturer)

i don't know much about the t-site and can't check right now because you can't even access it the normal way, lol

[–] SubWoofer@catgirl.pub 6 points 2 years ago

Scraping my beloved..using more resources from a company's server makes me drool

[–] Shit@sh.itjust.works 4 points 2 years ago

This cracked me up. Especially the 10 minute delay and rate limiting making it better to just scrape.

[–] Jackolantern@lemmy.world 2 points 2 years ago (1 children)

Can someone eli5 me. What’s scraping and how does it work? Like for example in the context of twitter with their current limitations, will scraping still work?

[–] 1rre@discuss.tchncs.de 9 points 2 years ago

Scraping is getting a webpage as if you're a normal user going to that page in firefox/chrome and extracting the bits you want from it. If Twitter makes you sign in to view tweets (which I guess it will now?) then scraping won't help much, otherwise it probably will, however it may take a fair bit of trickery to get working