this post was submitted on 28 Jan 2025
1 points (100.0% liked)

It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/Unlucky-Court-8792 on 2025-01-27 17:59:36.

I wanted to scrape Twitter for an NLP project, but due to its limitations and official API restrictions, I ended up diving into the rabbit hole of website scraping. While learning scraping techniques, I realized that Twitter is one of the hardest platforms to scrape, but most other websites can be scraped relatively easily.

As part of my exploration, I scraped some NSFW media (because, why not?) from Twitter and Instagram. Additionally, I added a section for Instagram Explore and 4chan, scraping and displaying content in real time.

Summary (TL;DR):

  • Scraped media from TwitterInstagramInstagram Explore, and 4chan.
  • How:
    • For Twitter and Instagram, I used WFDownloader.
    • For Instagram Explore and 4chan, I reverse-engineered their APIs using apiparrot.
  • ps: these are all randomly scraped and not handpicked so might contain some garbage.
  • site : skinsandbones.streamlit.app
no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here