On LLM-based programming

Lately there’s been a brewing culture war in the software community regarding the usage of LLM-based coding tools (ChatGPT, Claude, Gemini, etc.), and I’m seeing a lot of hardline discourse on both sides, but especially from the anti-AI side of things.

First of all, to be clear, I am not in favor of LLM-based programming. My personal opinion is that LLMs are catastrophic to the environment and to the Internet as a whole (especially due to the added burden of every server being overwhelmed by the relentless AI-focused crawlers), and to the intelligence of the humans who wield them.

However, in the current landscape, I have also found myself (begrudgingly) using them on occasion to a limited degree, and in the mind of some folks, this probably irrevocably taints my code. So I’d like to at least discuss my perspective on them.

Read more…

Preventing bot scraping on Publ and Flask

This morning I was once again thinking about how to put some proper antibot behavior onto my websites, without relying on Cloudflare. There are plenty of fronting proxies like Anubis and Go Away which put a simple proof-of-work task in front of a website. This is pretty effective, but it adds more of an admin tax (and is often quite difficult to configure for servers that host multiple websites, such as mine), and sometimes the false positive rates can have some other bad effects, such as disallowing feed readers and the like.

I started going down the path of how to integrate antibot stuff directly into Flask, using an @app.before_request rule that would do much of the same work as Anubis et al, but really the various bots are very stupid and the reason the challenge even works is because they aren’t running any JavaScript at all. This made me think that a better approach would be to have it just look for a simple signed cookie, and if that cookie isn’t there, insert an interstitial page that sets it via form POST (with a Javascript wrapper to automatically submit the form).

But then I realized, Publ already provides this sort of humanity test: the login page!

UPDATE: This approach is a bit obsolete; here is a better approach that uses HTTP 429 responses (which also serve the purpose of signalling to crawlers that they are unwelcome). I also no longer recommend the g.is_bot approach to removing page elements, as Publ now has user.is_bot as a built-in function that works better with caching.

Read more…