A simple anti-AI measure for Flask

After figuring out a basic anti-bot measure for Publ, I decided to try building a simple experiment for Flask in general.

Here is an extremely simple implementation that has worked amazingly well, having implemented it on The Flickr Random Image Generatr.

First, here’s a new template, saved as gatekeep.html:

templates/gatekeep.html
<!DOCTYPE html>
<html><head><title>Sentience test</title>
<script>
window.addEventListener("load", () => {
    document.forms['proxy'].submit();
});
</script>
</head>
<body>
<h1>Sentience check</h1>

<form method="POST" id='proxy'>
    <input type="hidden" name="sid" value="{{sid}}">
    <input type="submit" value="I'm not a bot">
</form>
</body>
</html>

Next, in any given endpoint that you want to protect from bot traffic, you can do something like this:

import uuid

# The app requires a session key; leave this off if you already have one
app.secret_key = str(uuid.uuid4())

@app.route('/whatever/<id:str>'):
def whatever_endpoint(id):
    # This endpoint is expensive! Make sure there's a sesssion ID
    if not flask.session.get('sid'):
        return flask.render_template('gatekeep.html', sid=str(uuid.uuid4())), 401

    # ... continue with your expensive operations ...

@app.route('/whatever/<id:str>', methods=['POST'])
def whatever_gatekeep(id):
    # We've passed the sentience check
    if not flask.request.form.get('sid'):
        raise http_error.BadRequest("missing SID")

    flask.session['sid'] = flask.request.form['sid']
    return flask.redirect(flask.url_for('whatever_endpoint', id=id))

All this does is make it so that anything that causes expensive operations to happen requires that the user agent had clicked on the “give me a session ID” thing and stored the resulting cookie in its cookie jar. Javascript-enabled clients will just automatically click the button.

It could theoretically be made a bit more secure by having the session ID be the HMAC of something identifiable to the browser (e.g. the IP address or the like), and could also probably be set up as a decorator to the endpoint function.

But there’s really no reason to overthink this stuff. The crawlers are a lot stupider than people give them credit for.