Code | 2018/09 — 2025/07 (10)

A simple anti-AI measure for Flask

July 10, 2025 4:56 PM (8 months ago)

After figuring out a basic anti-bot measure for Publ, I decided to try building a simple experiment for Flask in general.

Here is an extremely simple implementation that has worked amazingly well, having implemented it on The Flickr Random Image Generatr.

Preventing bot scraping on Publ and Flask

July 5, 2025 12:44 PM (8 months ago)

This morning I was once again thinking about how to put some proper antibot behavior onto my websites, without relying on Cloudflare. There are plenty of fronting proxies like Anubis and Go Away which put a simple proof-of-work task in front of a website. This is pretty effective, but it adds more of an admin tax (and is often quite difficult to configure for servers that host multiple websites, such as mine), and sometimes the false positive rates can have some other bad effects, such as disallowing feed readers and the like.

I started going down the path of how to integrate antibot stuff directly into Flask, using an @app.before_request rule that would do much of the same work as Anubis et al, but really the various bots are very stupid and the reason the challenge even works is because they aren’t running any JavaScript at all. This made me think that a better approach would be to have it just look for a simple signed cookie, and if that cookie isn’t there, insert an interstitial page that sets it via form POST (with a Javascript wrapper to automatically submit the form).

But then I realized, Publ already provides this sort of humanity test: the login page!

UPDATE: This approach is a bit obsolete; here is a better approach that uses HTTP 429 responses (which also serve the purpose of signalling to crawlers that they are unwelcome). I also no longer recommend the g.is_bot approach to removing page elements, as Publ now has user.is_bot as a built-in function that works better with caching.

Random verb selection on a MUCK

April 29, 2025 11:49 PM (10 months ago)

On SpinDizzy MUCK there are a bunch of “hug” verbs which are a bit whimsical and a bit nonsensical, and for reasons that are too silly to get into, I have been locked in an eternal battle with Austin in which I am constantly creating more.

A while back I ran into an issue with a few miscellaneous world scripts breaking around me, and it turned out to be that one of the global scripts, for reasons I’m still unclear on, attempts to parse every verb attached to a character object, and for other reasons I am also unclear on, it ends up attempting to push every name for the verb onto the stack.

macOS Dequarantine

November 8, 2024 3:47 PM (a year ago)

Tired of dealing with the annoying processes necessary to run an unsigned application on macOS?

Here’s a simple thing to make your life a lot easier: dequarantine.zip

Download this file, open it up, double-click the dequarantine.workflow file, and then install it as a Quick Action. Now if you want to let an unsigned application run, right-click (or ctrl-click) it, select “Quick Actions,” then “dequarantine.” And then, done.

Have fun.

Falsehoods programmers believe about email

March 25, 2022 4:21 PM (3 years ago)

In the spirit of falsehoods programmers believe about names and time, here’s some falsehoods about email which are all too common.

Radix sort revisited

March 14, 2021 1:40 PM (4 years ago)

Around a year and a half ago I wrote an article on the perils of relying on big-O notation, and in it I focused on a comparison between comparison-based sorting (via std::sort) and radix sort, based on the common bucketing approach.

Recently I came across a video on radix sort which presents an alternate counting-based implementation at the end, and claims that the tradeoff point between radix and comparison sort comes much sooner. My intuition said that even counting-based radix sort would still be slower than a comparison sort for any meaningful input size, but it’s always good to test one’s intuitions.

So, hey, it turns out I was wrong about something. (But my greater point still stands.)

The danger of big-O notation

October 3, 2019 2:42 PM (6 years ago)

A common pitfall I see programmers run into is putting way too much stock into Big O notation and using it as a rough analog for overall performance. It’s important to understand what the Big O represents, and what it doesn’t, before deciding to optimize an algorithm based purely on the runtime complexity.

How not to shuffle a list

December 10, 2018 8:30 PM (7 years ago)

A frequent thing that people want to do in making games or interactive applications is to shuffle a list. One common and intuitive approach that people take is to simply sort the list, but use a random number generator as the comparison operation. (For example, this is what’s recommended in Fuzzball’s MPI documentation, and it is a common answer that comes up on programming forums as well.)

This way is very, very wrong.

Making a hash of data

October 23, 2018 10:07 AM (7 years ago)

When I was replacing peewee with PonyORM, I was evaluating a few options, including moving away from an ORM entirely and simply storing the metadata in indexed tables in memory. This would have also helped to solve a couple of minor annoying design issues (such as improper encapsulation of the actual content state into the application instance), but I ended up not doing this.

A big reason why is that there don’t actually seem to be any useful in-memory indexed table libraries for Python. Or many other languages.

Pushl

September 30, 2018 12:00 AM (7 years ago)

Pushl: A tool for generating WebMention, Pingback, and WebSub notifications from arbitrary websites regardless of their underlying publishing system.