It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
3801
 
 
The original post: /r/datahoarder by /u/cdrknives on 2025-02-06 01:35:49.

Mods: Can we make this sticky? Dumping spot for urls for those of us with storage to spare? Noaa, cdc, whatever else that’s a collective of knowledge going offline as of late? I have space… I want to help 🤷‍♂️

3802
 
 
The original post: /r/datahoarder by /u/iLOLZU on 2025-02-06 00:10:54.

If the Library of Congress is a government entity (it is) it could probably get scrubbed. We should probably do something about that. Looking at the Internet Archive statistics, it's 57.6TB, that's quite large. There also doesn't seem to be an easy way of mass downloading from the Library of Congress' site. Am I just paranoid, or is this a valid concern?

3803
 
 
The original post: /r/datahoarder by /u/gravedigger_irl on 2025-02-05 23:22:51.

I've seen a few people asking whether there's a good tool to download subreddits that still works with current api, and after a bit of searching I found this. I'm not an expert with computers, but it worked for a test of a few posts and wasn't too tricky to set up, so maybe this will be helpful to others as well:

https://github.com/josephrcox/easy-reddit-downloader/

3804
 
 
The original post: /r/datahoarder by /u/JustSomeTimmmmmy on 2025-02-05 22:51:39.

Hi,

Looking for some help. I have a 3 x 10TB ZFS pool that is basically full. I don't have room in the case to add another drive. Swapping out all 3 drives for slightly larger ones is not cheap, but also really annoying because I need to backup that data somewhere first before migrating it 😒

Been looking at adding some storage externally from the computer. Don't really need / want a NAS - the current computer has other storage and runs a bunch of things locally on the network.

Was looking at something like the ICY BOX IB-RD3640SU3. The computer runs Ubuntu and the ICY BOX is only listed as only compatible with Windows and macOS - but reading the old doco it looks like that is just for some software I most likely wouldn't want to use. My preference would be to use it with eSATA, but USB is of course an option.

Does anyone have experience with one of these running with Linux?

Other recommendations for something that can do RAID5-like storage externally? (I don't really want to run a ZFS pool entirely off individual external drives.)

Should I get something like a lower-powered Synology / QNAP and just mount a share on the server over the LAN?

Thanks.

3805
 
 
The original post: /r/datahoarder by /u/MidwestPancakes on 2025-02-05 22:43:22.

I know there are many many subs which focus on security, privacy, and servers, but I'm more interested to know what this community thinks about encrypting our hoarded data. Is it worth the extra cycles on the cpu, are we worried about getting caught with data that was once "public" but likely soon to be considered "dangerous"?

I can answers these questions for myself, but as I'm sure there are lots of new people finding this sub, and I'm interested in learning or having my opinions changed as well, would you care to share your thoughts on your current configurations?

3806
 
 
The original post: /r/datahoarder by /u/stayonthecloud on 2025-02-05 22:29:00.

I spent a bunch of time as a private citizen with no SQL skills and no GitHub just saving stuff from FEMA related to equal opportunity, civil rights, and equity. I know that at any moment it will be next in the crosshairs and I wanted to know if there have already been efforts to archive FEMA’s site and data from folks who can comprehensively and accurately save everything. Thank you and you all are my heroes

3807
 
 
The original post: /r/datahoarder by /u/mglyptostroboides on 2025-02-05 22:04:41.

So you've taken up the task of copying and protecting all of the data that the oligarchy has deemed objectionable. Commendable. Don't quit doing that.

Now what?

Information is useless unless it's shared. You might as well have hard drives full of random 1s and 0s generated by an RNG if you're not communicating that data. Information isn't really information unless it's communicated.

Alright, but anyone with a brain cell or two knows what's next. The next phase is outright censorship, and not just of government information assets, but broad censorship. They don't need a way to justify it. Even with the First Amendment, they'll make some idiotic American exceptionalism argument, mirroring the way other authoritarian regimes will say "Wellllll, free speech works for those other countries, but... things are different here. We're better!" and the dipshits who voted us into this mess will uncritically lap it up like the good little ass-kissers they are. America!

And the signs are already here. The bill being proposed in response to DeepSeek R1 wants to make it illegal and punishable by a million dollar fine and up to 20 years in prison for just owning a DeepSeek model. You can tell me the sky is falling. Shit, maybe I am panicking a little. But I'm not taking my chances. These psychopaths have foolishly put all their cards on the table and are starting to show what they're capable of, so the time is well past for giving them the benefit of the doubt. My point is: broad censorship of any kind of data that threatens the hegemony is a very real possibility.

So the time to develop robust, offline systems of mass information exchange is now. I don't mean we need start planning to do it in the near future. I mean we need to start doing it right the fuck now.

Let me draw a parallel with my experience from one of my other hobbies (besides data hoarding lol), amateur radio. The amateur radio community attracts a lot of "prepper" types who are mostly interested in "emcomm". I could explain the problems with a lot of these guys (though I definitely agree with them to a large degree...), but that is neither here nor there. A very common theme among people who get into amateur radio for emergency communication is the expectation that they can get licensed, buy a cheap Baofeng radio and then never use it until a future emergency happens. I've had to explain many times that if they do this without practicing the necessary skills, learning some basic radio and antenna theory, and learning how to communicate effectively on the air, they're going to be fucked when the actual emergency happens because they'll have no clue how to actually use the gear they own.

Or to put it another way: An emergency is the worst time to be learning the skills you need in an emergency.

The same applies here.

It is of utmost importance that you start forming decentralized, offline networks of mass information exchange and distribution immediately.

This can start very small. Buy a few refurbed 8TB HDDs, fill them up with whatever information you feel might be deemed contraband in the near future, trade them with a buddy who you can trust will make a few copies of them and pass them on. Maybe set up an agreement with your buddies that they have to make a specified amount of copies of the data. Or set up a trading agreement. Just whatever you do, don't use the internet to exchange this information because it can blow your cover and it can be censored.

Learn about opsec. Use dead drops to preserve your anonymity. Learn how to encrypt your data for plausible deniability. Use paper-and-pencil encryption methods to obscure your communications. And generally, don't be an idiot.

Start practicing these methods and start networking in meatspace with other people who have already begun such efforts, or are interested in joining yours. That last part is important. This is no time to reject allies. No time for ideological purity tests. If someone is sincerely interested in countering censorship, no matter their own opinions or motivations, they are an asset to the cause.

However you choose to organize it, what matters is that you start practicing systems of information distribution that are robust to censorship right now. Before it's needed. Because it might be needed very soon.

3808
 
 
The original post: /r/datahoarder by /u/Blakethekitty on 2025-02-05 21:06:36.
3809
 
 
The original post: /r/datahoarder by /u/DonnerDinnerParty on 2025-02-05 19:27:58.

Here's an AppleScript that will walk a user through creating a scheduled task that will, during certain hours and days, write/overwrite a junk file once a minute.

It's in applsescript so you could run it without having to compile it as an application.

How to Use this

1. Open Script Editor (Applications > Utilities > Script Editor).

2. Paste the code below.

3. Click File > Save, choose Format: Application, and name it Install RAID Keeper.app.

4. Run the installer.

5. Choose your RAID folder and enter the active hours when prompted.

6. The script is installed and will automatically keep the RAID awake.

-- Prompt user for RAID location
set raidPath to POSIX path of (choose folder with prompt "Select your RAID drive or folder:")

-- Prompt user for active hours
display dialog "Enter the START hour for keeping the RAID awake (24-hour format)" default answer "9"
set startHour to text returned of result

display dialog "Enter the END hour for keeping the RAID awake (24-hour format)" default answer "17"
set endHour to text returned of result

set scriptContent to "
#!/bin/bash

RAID_PATH=\"" & raidPath & "\"

while true; do
    DAY=$(date +%u)  # 1=Monday, 7=Sunday
    HOUR=$(date +%H) # 24-hour format

    # Check if the current time is within the active range
    if [[ \"$DAY\" -ge 3 && \"$DAY\" -le 7 ]] && [[ \"$HOUR\" -ge " & startHour & " && \"$HOUR\" -lt " & endHour & " ]]; then
        dd if=/dev/urandom of=\"$RAID_PATH/.keepalive\" bs=1K count=1 status=none
        sleep 60
    else
        sleep 300
    fi
done
"

-- Define script installation path
set scriptPath to POSIX path of (path to library folder from user domain) & "Scripts/keep_raid_awake.sh"

-- Save the script to ~/Library/Scripts/
do shell script "mkdir -p ~/Library/Scripts/ && echo " & quoted form of scriptContent & " > " & quoted form of scriptPath

-- Set executable permissions
do shell script "chmod +x " & quoted form of scriptPath

-- Add to crontab for automatic startup
do shell script "crontab -l | { cat; echo '@reboot nohup " & scriptPath & " & disown'; } | crontab -"

-- Notify user
display dialog "Installation complete! The script is set up to keep your RAID awake from " & startHour & ":00 to " & endHour & ":00 (Wed-Sun). It will run on startup. If you want to start it now, restart your Mac or run the script manually." buttons {"OK"} default button "OK"

-- END

3810
 
 
The original post: /r/datahoarder by /u/SinaloaFilmBuff on 2025-02-05 19:23:48.

I'm currently having issues getting data from a parity setup created in win 11 storage spaces on windows 10... It got me wondering what is a modern solution for a software application (non-server solution) to run 4-1TB disk in Raid 5 setup? I've come across SnapRAID but it seems most threads i've seen are more than 5 years old.

Vm Solution

I tried passing through all the disks to the default win11 dev enviroment provided in Hyper-v to see if i could extract the information in this way (for some reason Storage spaces from win 10 & 11 are not compatible) but as you can see I'm getting an error for the Storage spaces even though the disk connections are properly connected (assumption, since the disks are being read in disk manager and diskpart)... So at this point i just want to move away from Microsofts Storage spaces.

3811
 
 
The original post: /r/datahoarder by /u/pixel8tryx on 2025-02-05 19:20:49.

I wake to all this utter horseshite about NOAA, the CDC, etc and I'm terrified we're going to lose all public access to scientific data. If they think big business could make a profit on it, it looks like they could lock it up. They just have to declare it "harmful to US prosperity". US corporate prosperity, that is. Hell I almost took a job at NOAA eons ago, so this hurts. I'm even seeing fear even over on r/bioinformatics "PubMed, NCBI, NIH and the new US administration" with a shoutout to you guys. Even the pros are worried.

I'm a 60+ geek with multiple poorly understood chronic health conditions, so I rely on a lot of this data. I'm already worried about losing ACA health insurance (or having it become unaffordable). And worse, I'm female. Unmarried and childfree, I'm probably the most hated and most useless of females to our new gov't leaders. I'm certainly unhireable in today's youth-loving, brogrammer culture. Losing access to scientific data would be the last straw. We cannot allow this to happen.

The only positive things I've seen so far this morning have been in this sub. Hat's off to you guys and gals or whatever you want to be! Maybe I'll identify as an axolotl today in protest of our gov't trying to tell anyone what they can and cannot do with their own bodies. We're getting closer and closer to turning this country into a scary, extreme, right-wing, fundamentalist Christian state. I don't want us to regress back to the dark ages. Thanks for fighting the good fight. I've only got 70 TB here so I don't really qualify as a hoarder (yet), but I'm behind you all the way.

3812
 
 
The original post: /r/datahoarder by /u/StatisticianLive2307 on 2025-02-05 18:11:32.
3813
 
 
The original post: /r/datahoarder by /u/gendercalculus on 2025-02-05 18:11:24.

The subreddit ban was a wake-up call but this extends beyond that: External/standalone websites like wikis and image sharing, Facebook groups, other social media sites, ect. Plus subs for specific topics and surgeries that weren't caught in the ban.

Trans healthcare is desperately under-researched, and unfortunately a lot of valuable information is sourced from people sharing their first-hand experiences. These groups are also a resource for finding information on specific surgeons: sharing experiences, results, timelines, and who to avoid. If this is lost it will set us back decades, if this is saved will save lives.

I'm not sure if I should name specific sites and subs or if that would put targets on their backs. I'm also not sure how to balance the sensitive nature of this data with needing to archive it-- these are photos that could get people outed, and different spaces are public/private to varying degrees.

I'm new to this and just figuring out how to get started contributing, but I hope raising the alarm and creating a place for discussion helps.

3814
 
 
The original post: /r/datahoarder by /u/ixikei on 2025-02-05 18:09:09.

This happened to someone else before me, and I've tried multiple times today with the same result.

https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html

and on https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-geodatabase-file.html

Using both the web interface and the FTP archive on the pages linked above results in a "forbidden, you don't have permission to access this resource".

3815
 
 
The original post: /r/datahoarder by /u/starmen999 on 2025-02-05 17:51:15.
3816
 
 
The original post: /r/datahoarder by /u/KatieTSO on 2025-02-05 17:49:15.

I want to create an archive of specific subreddits I believe may be banned soon. Is there a good way to do this?

3817
 
 
The original post: /r/datahoarder by /u/Tilandaka on 2025-02-05 17:38:49.

I am looking for advice regarding compressing and encrypting files on a Linux system. At the moment I'm using 7-Zip (the 7z package) using the following parameters:

sudo 7z a -t7z -m0=lzma2 -mx=9 -mfb=64 -md=32m -ms=on -mhe=on -p"passw0rd" "file.7z" "folder/"

This will compress and encrypt a folder named folder into a file called file.7z. It will be password protected with the password passw0rd. Is this sufficient? I want something that's secure when I make data backups. I understand it's only as secure as my password too, but are there better ways of doing this? I also want something that compresses well.

The command above will also hide file names in the archive unless the file has been unlocked using the specified password. It uses LZMA2 and an "ultra" level of compression.

If you have any suggestions, please provide full commands here so that everyone here can learn. Most tools have a lot of different parameters that must be set. Simply saying "use xyz..." leaves room for more questions if you do not provide real examples.

Thanks in advance!

3818
 
 
The original post: /r/datahoarder by /u/icysandstone on 2025-02-05 16:18:56.

Really trying to understand the use cases and the usefulness here. It sounds like a fun project that could be really beneficial.

Beyond BLOBs, databases could still be a good idea -- storing references to the files on disk (i.e. the path).

Would greatly appreciate any thoughts or anecdotes on this.

3819
 
 
The original post: /r/datahoarder by /u/PrestigiousEvent7933 on 2025-02-05 16:01:22.

Just saw that NOAA was recently invaded by that one guy from South Africa and his little buddies. Wondering if we have that on back up already or if we need to get going on that? Seems like the kind of place that might have a lot of data at stake.

3820
 
 
The original post: /r/datahoarder by /u/mark1x12110 on 2025-02-05 15:40:53.

I have a folder with 10000s of video files(>1TB) of different formats (mkv, mp4, etc). Are you aware of any software that can scan the folder and present the runtime? I do not need any other metadata, only how long it would take to play them all.

I tried:

  • Windows file properties: Works but needs that I filter manually, it is not easily scrip table

  • MediaInfo: Seems to crash when I load the folder

  • I read somewhere that FFprobe may be able to help but my initial attempt was not successful. I'll have to re-iterate and try again

3821
 
 
The original post: /r/datahoarder by /u/ddcrx on 2025-02-05 14:54:45.

Seeing the list of banned sites today got me thinking.

We’ve obviously become an enemy of this administration by hoarding US Govt data that’s been taken down. What if we get banned too?

Do we have a backup site ready? Lemmy?

If not, mods we should create one ASAP

3822
 
 
The original post: /r/datahoarder by /u/TheAngrySkipper on 2025-02-05 14:54:08.

I perhaps made a mistake today believing that the storage I had accumulated would be sufficient. It appears that I need to purchase a new NAS specifically for backing up certain data repositories. I may be old school, my previous NAS was a DNS-321, is there something that's similar in nature as it relates to ease of setup but perhaps a little more reliable for airflow, and has room for at least 4 desktop-sized SATA HDD's

My ideal price range would be <$100 used or <$200 new. Thanks for any suggestions, and if this is disallowed based on a rule I'm overlooking, if you could point me in the right direction before deleting it I would greatly appreciate it.

3823
 
 
The original post: /r/datahoarder by /u/Fatalis22 on 2025-02-05 14:39:41.

What used to be the biggest spanish videogames forum is going to close next month and I would like to make a dump of all the content (at least the public one). Could someone explain me how to do it?

I tried to use HTTtrack but it gives me error:

MIRROR ERROR

HTTtrack has detected that the current mirror is empty.

The forum is: https://www.foro3djuegos.com/

Thank you!

3824
 
 
The original post: /r/datahoarder by /u/TheBBP on 2025-02-05 13:57:00.

There's been a massive purge of many NSFW or Drug related subreddits today.

This post is for any subreddit purge related discussion, other posts will be removed.

This is a good reminder that nothing is permanent, and that anything that isnt stored within your own control can easily be removed.

Keeping your own backups/archives is a good way to preserve the things you want to keep.

Edit:

Supposedly this was a "bug", reddit admin comment here: - /r/ModSupport/comments/1ii67mt/communities_are_banned_again_for_being_unmoderated/mb3fewv/

Several subs are still banned though.

3825
 
 
The original post: /r/datahoarder by /u/Skylleur on 2025-02-05 11:08:10.

Dear hoarders, I require your assistance as many subreddit's are getting banned and this night at around 4am, a very important subreddit for the transgender community has been banned. R/Transgender_surgeries is gone for the time being, I am not in any capacity to make backups myself and would love help from other people to make these as some very sensitive subreddits still remain. Please reach out to me, I need your help.

This to me justify a rule 8 violation, transgender ressources are extremely important and very likely to be destroyed under the current American administration.

view more: ‹ prev next ›