I wanted to scrape Twitter for an NLP project, but due to its limitations and official API restrictions, I ended up diving into the rabbit hole of website scraping. While learning scraping techniques, I realized that Twitter is one of the hardest platforms to scrape, but most other websites can be scraped relatively easily.
As part of my exploration, I scraped some NSFW media (because, why not?) from Twitter and Instagram. Additionally, I added a section for Instagram Explore and 4chan, scraping and displaying content in real time.
Summary (TL;DR):
- Scraped media from Twitter, Instagram, Instagram Explore, and 4chan.
- How:
- For Twitter and Instagram, I used WFDownloader.
- For Instagram Explore and 4chan, I reverse-engineered their APIs using apiparrot.
- ps: these are all randomly scraped and not handpicked so might contain some garbage.
- site : skinsandbones.streamlit.app