FaceDeer

joined 2 years ago
[–] FaceDeer@fedia.io 3 points 2 months ago (3 children)

Is it just because the word "SUV" is somehow tainted with evil? Because as I understand it, the main beefs people have with SUVs is that they're big and have poor gas mileage. The "small" and "electric" parts of that description counter those problems.

[–] FaceDeer@fedia.io -2 points 2 months ago (6 children)

Bandwidth can't, though.

Bandwidth is incredibly cheap. The problem these sites are having is not running into bandwidth limits, it's that providing the pages requires processing to generate them. That's why Wikipedia's solution works - they offer all the "raw" data in a single big archive, which takes just as much bandwidth to download but way fewer server resources to process (because there's literally no processing - it's just a big blob of data).

Is it okay to hire a bunch of people to check out half a library's books, then rent them to people for money?

This analogy fails because, as I said, data can be duplicated easily. Making a copy of the data doesn't obstruct other people from also viewing the data provided you avoid the sorts of resource bottlenecks I described above.

Is your problem really about the accessibility of this data? Or is it that you just don't want those awful for-profit companies you hate to have access to it? I really get the impression that that's the real problem here - people hate AI companies, and so a solution that gives everyone what they want is unacceptable because the AI companies are included in "everyone."

[–] FaceDeer@fedia.io -1 points 2 months ago (7 children)

I don’t understand why the burden is on the victims here.

They put the website up. Load balancing, rate limiting, and such go with the turf. It's their responsibility to make the site easy to use and hard to break. Putting up an archive of the content that the scrapers want is an easy and straightforward thing to do to accomplish this goal.

I think what's really going on here is that your concern isn't about ensuring that the site is up, and it's certainly not about ensuring that the data it's providing is readily available. It's that there are these specific companies you don't like and you just want to forbid them from accessing otherwise freely accessible data.

[–] FaceDeer@fedia.io -2 points 2 months ago (9 children)

Unlike water, though, data can be duplicated easily.

[–] FaceDeer@fedia.io -2 points 2 months ago (9 children)

That suggestion is exactly the same as what I started with when I said "IMO the ideal solution would be the one Wikimedia uses, which is to make the information available in an easily-downloadable archive file." It just cuts out the Aaron-Schwarts-style external middleman, so it's easier and more efficient to create the downloadable data.

[–] FaceDeer@fedia.io -3 points 2 months ago (11 children)

If someone did an Aaron-Schwartz-style scrape, then published the data they scraped in a downloadable archive so that AI trainers could download it and use it, would you find that objectionable?

[–] FaceDeer@fedia.io -3 points 2 months ago (15 children)

so every single repository should have to spend their time, energy, and resources on accommodating a bunch of venture funded companies that want to get all of this shit for free without contributing to these repositories at all themselves?

Was Aaron Schwartz wrong to scrape those repositories? He shouldn't have been accessing all those publicly-funded academic works? Making it easier for him to access that stuff would have been "capitulating to hackers?"

I think the problem here is that you don't actually believe that information should be free. You want to decide who and what gets to use that "publicly-funded academic work", and you have decided that some particular uses are allowable and others are not. Who made you that gatekeeper, though?

I think it's reasonable that information that's freely posted for public viewing should be freely viewable. As in anyone can view it. If they want to view all of it and that puts a load on the servers providing it, but there's an alternate way of providing it that doesn't put that load on the servers, what's wrong with doing that? It solves everyones' problems.

[–] FaceDeer@fedia.io 3 points 2 months ago

Well, the Germans wouldn't have because they got defeated long before the Manhattan Project produced a usable weapon. Their own attempt at it failed. Some suspect that Heisenberg actually did sabotage the German project, though it's also possible that he was just bad at it.

But the Soviet Union would have done it later on. Or any of a variety of other countries that probably shouldn't be the first or only countries to have nuclear weapons. Science is not unique to the discoverer, other people can independently discover the same things.

[–] FaceDeer@fedia.io 15 points 2 months ago (1 children)

And not only that, Israel has nuclear weapons.

[–] FaceDeer@fedia.io 2 points 2 months ago

Even more ironically, you could probably shorten that time even more by having an AI analyze the transcript for you.

I've found Firefox's Orbit extension to be quite handy whenever someone directs me to a 30-minute Youtube video as "proving" whatever point they're trying to argue. I can pop it open and ask it to tell me what the video says about that point in just a few seconds. I wouldn't use the AI summary as backing if I was doing surgery on someone, but for a random Internet argument it's fine.

[–] FaceDeer@fedia.io 1 points 2 months ago

Also the article's content doesn't say what the headline says.

[–] FaceDeer@fedia.io 1 points 2 months ago (2 children)

On the plus side, at least on the instance I'm on I was automatically given a link to when this same story was posted here three months ago. Saves some effort.

view more: ‹ prev next ›