It's A Digital Disease!

23 readers

1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago

MODERATORS

bOt@zerobytes.monster

[content warning] scraping media by reverse engineering api. (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/datahoarder@zerobytes.monster

0 comments fedilink hide all child comments

The original post: /r/datahoarder by /u/Unlucky-Court-8792 on 2025-01-27 17:59:36.

I wanted to scrape Twitter for an NLP project, but due to its limitations and official API restrictions, I ended up diving into the rabbit hole of website scraping. While learning scraping techniques, I realized that Twitter is one of the hardest platforms to scrape, but most other websites can be scraped relatively easily.

As part of my exploration, I scraped some NSFW media (because, why not?) from Twitter and Instagram. Additionally, I added a section for Instagram Explore and 4chan, scraping and displaying content in real time.

Summary (TL;DR):

Scraped media from Twitter, Instagram, Instagram Explore, and 4chan.
How:
- For Twitter and Instagram, I used WFDownloader.
- For Instagram Explore and 4chan, I reverse-engineered their APIs using apiparrot.
ps: these are all randomly scraped and not handpicked so might contain some garbage.
site : skinsandbones.streamlit.app

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here