Fuck AI LLM scrapers

Wellp, my whack-a-mole approach finally got to be too much to maintain. The last day or so my server has been absolutely inundated with traffic from thousands of IP blocks, all coming from China, and I got sick of trying to keep up with it myself.

I looked into setting up Anubis and preparing to just whitelist a lot of IndieWeb things, but it’s all just so very overwhelming and for now I’ve gone with Cloudflare, problematic as they are, because the amount of energy I can put into this shrinks every day and sometimes I just want things to stop sucking for a while.

All of my DNS has propagated but of course it’ll be a while before the bots decide to update their own DNS caches, so my server is still getting absolutely hammered, but hopefully things will subside, and in the meantime things are at least responsive.

I guess at some point I’ll have to figure out how to actually set up TLS with Cloudflare (since I’ve been using Letsencrypt wildcard certs but obviously those don’t work anymore when Cloudflare is handling my DNS) but that’s a problem for future me. Also I’ll definitely be on the lookout to make sure that Cloudflare is properly honoring my login cookies. It’d definitely be unfortunate if it gets confused about logins, which is one of the more common failure modes with HTTP proxies.

I’m also super worried that this will interfere with IndieWeb stuff, because of course most of the anti-bot things assume that any traffic coming from data centers or from headless/scriptless user agents is abusive. Which is, y'know, 99.99% accurate, but that 0.01% is stuff I really care about (namely interop).

Anyway. I resent that this is the state of the Internet right now. It’s getting really difficult for me to find anything positive about AI when this is how the industry treats everyone.

Yes I’ve heard about iocaine Notes

No I will not be running it

It does absolutely nothing to slow crawlers down (it’s not like they’re going to wait for a page to finish loading before they move on to the next one, crawlers are super optimized to just constantly grab as much bandwidth as possible in parallel), there’s already so much AI slop on the web that it’s not going to contribute meaningfully to model collapse, and all you’re doing by running it is wasting even more resources. Giving the LLM crawlers more content to slurp up just gives them more reasons to waste even more resources, and only continues the death spiral of making the Internet an even worse place.

This isn’t like interfering with scammer call centers through scambaiting or the like. Computers have no problem with having their time wasted.

And meanwhile it does nothing to actually solve the problem.

Some thoughts on comments

You might have noticed that I’ve made a slight change to the comments on this site: the comment threads are only visible to those who are signed in. This is a temporary experiment just to see if it cuts out the spam I’ve been getting and also if it increases the quality of what comments do come in.

I’ve been thinking about how I can go about improving comments in general, in ways which would also satisfy some of my other general long-term plans around Publ.

Read more…

Spammers are relentless and weird

Lately I’ve been getting a bunch of attempted spam comments on random blog entries. Okay, nothing unusual about that, right?

Well, it’s a little unusual in that I use isso, an obscure comment system that requires Javascript to work, so at the very least there’s some sort of browser-based automation, if not outright sweatshop laboring happening.

But today I just got the weirdest fucking spam comment ever. Not weird because of the content (it was for a list of dental clinics in India, which I guess is pretty weird), but because of where it was posted:

On an entry that requires login.

Read more…

Goodbye, Twitter third-party login

So, a little while ago I did an extremely unscientific poll on login methods via Authl on this website. The results of that (measured by folks who accessed my site for any authenticated reason, not just folks visiting the login method poll):

  • 8 signed in via Fediverse (Mastodon/Pleroma/etc.)
  • 4 signed in via IndieAuth
  • 7 signed in via email

Not a single one signed in via Twitter.

Read more…

Banned in the UAE

The other day I discovered that my site is banned in the UAE on the basis of “pornography.” The national filter criteria are pretty fascinating, and now I am on a mission to get banned for as many categories as I can with a single blog post! So, here we go.

Read more…

The frustration of continued existence

My week off from work felt great. But I’m still having difficulty actually focusing at work. I have a bunch of paths of exploration to examine but none of them feel, y'know, right right now.

Meanwhile, my house continues to be a bit more work than I expected. On the plus side, I’ve successfully murdered my lawn and vastly improved my garden and started up my nice meadow. On the minus side, my heating bill is through the roof (literally) and I’ve been getting bids for finally improving the house insulation. So far I’ve had three bids which went thusly:

Read more…