Blocking abusive webcrawlers General Articles

People often talk about how bad AI is for the environment, but only focus on the operation of the LLMs themselves. They seem to ignore the much larger impact of what the AI scrapers are doing: not only do those take massive amounts of energy and bandwidth to run, but they’re impacting every single website operator on the planet by increasing their server requirements and bandwidth utilizatoin as well. And this makes it everyone’s problem, since everyone ends up having to foot the bill. It’s asinine and disgusting.

At one point, fully 94% of all of my web traffic was coming from a single botnet like this. These bots do not respect robots.txt or the nofollow link rels that I put on my site to prevent robots from getting stuck in a trap of navigating every single tag combination on my site, and it’s ridiculous just how many resources — both mine and theirs — are constantly being wasted like this.

I’ve been using the nginx ultimate bad bot blocker to subscribe to lists of known bots, but this one particular botnet (which operates on Alibaba’s subnets) has gotten ridiculous, and enough is enough.

So, I finally did something I should have done ages ago, and set up UFW and set up some basic rules.

UPDATE: This article has started getting linked to from elsewhere (including Coyote’s excellent article about the problem), but I no longer use this approach for blocking crawlers as it’s become completely ineffective thanks to the crawlers now behaving like a massive DDoS. These days I’m using a combination of gated access, sentience checks, and, unfortunately, CloudFlare UPDATE TO THE UPDATE no I’m not.

ANOTHER UPDATE: I’ve had to go back to using this technique selectively as some of the crawlers have managed to get around my other mitigations. If only these bot authors would spend 1% as much time on making their bots not be utterly broken as they do on trying to inflict themselves on everyone.

Read more…

Are you having COMPUTER PROBLEMS? fluffy rambles

As I mentioned on Mastodon and Bluesky, my gaming PC got infected by malware/ransomware, particularly Azov and Expiro. I’m not sure how my computer got infected but this was the push I needed to switch it over to Linux, now that VRChat and SteamVR run pretty well on Linux anyway.

Read more…

Computer inventory fluffy rambles

Seeing this post made me think I should also list my computers and their purposes/vague specs. Because I’m that kind of a nerd.

What counts as a “computer” for this list is subjective. I’m not including pure video game consoles (I’d be here all day if I did) or phones (big same) but I am including things like tablets, but only ones which feel computer-y, if that makes any sense (like, no Android tablets, and I’m leaving off an ancient iPad named fluffysaurus).

Sadly I don’t have any proper vintage computing stuff anymore (unless you count a couple of neat old PDAs, anyway).

November 4, 2024: Updated some things

November 29, 2024: Updated more things

Read more…

Mac Studio: a quick review fluffy rambles

I got my Mac Studio yesterday, to replace the Mac mini in my office (the mini now replacing the 13" MacBook Pro in my recording studio, the MacBook Pro replacing the frustrating Lenovo laptop in the living room), and I have all my stuff set up on it. I went with the 10-core M1 Max model (with the upgraded GPU) and 2TB of RAM, sticking to the stock 32GB of RAM.

Read more…

New Mac mini fluffy rambles

Today I wasn’t expecting to get a lot of work done due to my brain still feeling like it’s been through a juicer, and also because my new Mac mini arrived today. So I got to go through the drudgery of reinstalling everything while also seeing the news of the world exploding around me! Hooray!

Anyway, just some random setup notes.

Read more…

Base-10 File Sizes General Articles

So, Mac OS X 10.6 (Snow Leopard) will be using base-10 file sizes. There has been a lot of nerd outcry over this, but frankly, I think it’s about freaking time.

There is absolutely no reason to use base-2 file sizes. Yes, computers deal with things in terms of base 2, but nobody else does. When you look at a file that is 104768926 bytes big, you think, “oh, 105 megabytes,” not “100 megabytes.” As files get bigger and bigger, the disparity between MB and MiB gets worse and worse.

Read more…