fluffy rambles: Federated access control with Atom and WebSub

Federated access control with Atom and WebSub

November 29, 2018 1:14 PM (7 years ago)

I’ve been ranting about ActivityPub vs. RSS/Atom a lot lately, and I think I’ve proven to myself (and maybe a few others) that for fully-public content feeds, Atom (combined with WebSub and WebMention) is superior to ActivityPub; it’s simpler to implement, works with many more hosting environments and configurations, it generally scales better (and handles scaling failures better), and it’s modular and allows for much eaiser migrations between hosting setups and so on.

But one thing ActivityPub supports which Atom does not is the notion of private content. The way it does support this is a bit hamfisted (in that ActivityPub publishers choose to only push content to endpoints which have a trusted user, and endpoints only forward that content over to the trusted users, albeit in a not-very-trustable way). It doesn’t inherently support the ability to backfill older content (or make it otherwise browseable) to someone who is granted friends-only access after-the-fact, though, and it has many scaling and security implications in how this works (since it requires push to be reliable and requires the recipient’s storage of said push notifications to also be reliable).

I’ve put a lot of thought into how to add friends-only stuff to Atom on and off over the years; my previous blog (which used Movable Type for publishing and phpBB for comments) actually had an ad-hoc implementation which worked sort of okay; people could authenticate with my site’s forum, and people in a trusted friend group would see private content. On the public feed, if their reader were logged into the forum (via cookie sharing etc.) it would see the private content in the feed, otherwise it would see placeholders saying “THis is a friends-only entry, please visit the site to read it.” It worked okay but it was never great.

Anyway, I think I have finally come up with an auth approach that works with Atom and offers a… well, least-bad solution all around, which scales better and more reliably than ActivityPub while working with WebSub and existing/legacy feed readers.

Some approaches that don’t work

There’s a few approaches which have been tried in the past, but none of them really work out that well. There’s probably others but these are all that I’ve heard of.

Per-follower feeds

The idea here is that every follower gets their own unique feed URL, e.g. http://example.com/feed/2390480243/2394028028032. This feed only contains the entries that the user is authorized to read.

Pluses:

Is trivial to support in existing readers
CDN-friendly
Works with static publishing (with the caveat that item links need to be unguessable)

Minuses:

Cannot share feeds between multiple users on the same feed reader
If the feed URL leaks (e.g. via a reblog/boost, OPML export, etc.) the private entries are visible to everyone
WebSub support doesn’t scale very well – need to send one WebSub notification per trusted follower
The WebSub hub sees all the private content too
Discovery of the feed itself is difficult, since the reader needs to know their specific URL and subscribe via that, rather than using standard feed discovery mechanisms
Items won’t inherently be obviously-private without additional metadata, which existing readers won’t know about, meaning users might erroneously boost them
If a feed URL leaks, the reader and publisher need to synchronize the private URL migration

Authenticated feeds

Here, the feed itself shows only public content for someone who isn’t logged in, and shows the private content to people who are logged in. A number of auth approaches can be used, including IndieAuth or OAuth with a comment provider or OpenID (lol) or whatever.

Pluses:

Feed URL leakage doesn’t leak items

Minuses:

Not CDN-friendly
Doesn’t work with WebSub
Cannot share feeds between multiple users on the same feed reader (aside from multiple unauthorized users)
Need to add auth mechanisms (cookie jars/OAuth keys/etc) to feed readers (although some have provisions for this already, e.g. Feed on Feeds and Tiny Tiny RSS)
Huge burden on readers to apply the auth credentials to their subscriptions
Items won’t be inherently obviously-private; again, feed readers need to be given additional metadata that the readers need to know about in order to not leak private content
Doesn’t work with static publishing

Hybrid authenticated feeds

This is the approach I mentioned above the cut; it’s an authenticated feed, except it shows placeholders for private content for non-authenticated readers. Again, authentication can be done via IndieAuth or whatever. (It’s also similar to how private entries work on WordPress, where individual entries can be password-protected and trusted people can be given that password via side channels.)

Pluses:

Works with WebSub, sort of
CDN-friendly, sort of
Works for readers which don’t support auth token storage
Readers which do support auth token storage can get the full content

Minuses:

WebSub readers will by necessity only see the public placeholders
Leaks the presence and publication metadata (ID, publish time, permalink, maybe title depending on implementation) of private content to public readers
Still doesn’t do anything about the privacy of items that shouldn’t be shared/boosted
Gets really spammy for non-friends of people who post a lot of private stuff
Doesn’t work with static publishing

Favored approach: In-plain-sight encryption

My favored approach to providing private content on feeds is to have private items be encrypted.

Every reader has a public and private key-pair; the publisher knows their public key.

Every protected entry has a randomly-generated nonce key which is used to encrypt payload data with a symmetric cipher (e.g. Twofish or AES or whatever), and the private content (<title>, <content>, enclosure links, etc.) are stored in this encrypted payload. (<id> probably needs to remain public for various reasons, and things like <published>/<updated>/<link rel="alternate"> probably should as well.) The public payload can also include something like:

<title>Private content</title>
<content type="text/html">This is a private entry. Check the original site to
    see if you have access, or use a feed reader which supports the [insert
    clever name here] protocol.
</content>

This nonce key is then added to the item, encrypted using every trusted reader’s public key. (So, if there are 10 followers who are allowed to see the entry, there are 10 copies of the nonce key, each one encrypted by the public key.) Of course the CMS can manage this in any number of ways (e.g. having one or more protected groups of friends who can see things, with specific per-user inclusions and exclusions).

When a reader gets an encrypted entry, it tries to decrypt each of the encrypted nonce keys with its private key, and then when it gets a valid nonce key it uses that to decrypt the payload.

The plus sides to this approach:

Supported fully by WebSub with no modifications (as long as the hub passes through elements it doesn’t know about, anyway)
Supports the sharing of feeds between multiple users of the same feed reader, also with no modifications
Protected entries are obviously protected; feed readers which support receiving the content will also know not to do a content boost (and doing a boost of the encrypted feed content or its URL doesn’t increase the number of people who can decrypt/read it)
Multiple users on a single feed reader still share the bandwidth and storage for the content, even if they can’t all see the same parts of it
Existing feed readers will work for public content, and just see placeholders for private content they can’t access
Incredibly fine-grained access control of recipients
Supports full-content backfilling via RFC 5005

Minus sides:

Need to add support for this stuff to feed readers (and ones which don’t add support will see a bunch of placeholder entries – but per the above, this is sort of an advantage because legacy readers can still get at the content without violating privacy)
More bandwidth used per item; someone with hundreds of trusted followers posting lots of short Twitter-like posts will generate a lot of traffic
Key-exchange is difficult at its core (but see below for a key-exchange protocol which might work!)
Like hybrid auth feeds, leaks the presence and basic metadata of private content to unauthorized readers (but not the content itself)
Also leaks the general overall size of the private items too

The key-exchange protocol

Every user has a profile page of some sort. This will be a document (probably HTML, possibly XML) which supports IndieAuth, i.e. it has various rel="me" links to third-party authentication providers (GitHub, Twitter, Mastodon, etc.) which IndieAuth endpoints know how to trust. This can be provided by the feed reader, or it can be someone’s own website, or whatever – it just needs to be something the feed reader can know about. For the sake of this explanation we’ll use http://example.com/user/alice.

The profile page also has a link with rel="publickey" which links to their public key. This can be hosted by whatever hosts the profile, or it can be a third-party link, or whatever – it doesn’t matter, just as long as it links to a public key in a supported format.

This establishes the ability for someone to provide their identity and a public key in an identifiable way.

Next, the publisher’s social feed declares an authentication endpoint. For example, it can look something like:

<link rel="feedauth" href="https://blog.example.com/remote-auth-endpoint" />

When a user subscribes to the blog, their reader detects this link and forwards the user to that URL, with GET parameters like:

mode: auth
profile: The user’s profile URL
callback: The reader’s subscription callback (which should be an unguessable URL that uniquely maps to the user)

The endpoint then authenticates the user via IndieAuth (or whatever), and on successful authentication, collects the public key from the profile and makes a call to the callback URL with the following POST parameters:

mode: validate
challenge: A randomly-generated string that has been encrypted using the public key
refresh_time: When to next refresh the lease, in seconds from now (see below)

The callback then responds with the challenge string decrypted via the private key. If this challenge is correctly-decrypted, the publisher marks the user and key as valid.

At this point the blog author can be notified about the new user (with a link to their profile page) and then the author can decide which access groups to put the user in at their leisure.

This auth step also can happen at any time with respect to the subscription step; if someone bulk-imports a bunch of subscriptions from another feed reader, or if the key exchange link appears at a later time (e.g. it was later added to a website’s CMS or whatever), the reader should get a notification that they can authenticate and then get forwarded along to the friend exchange URL at that time.

Better yet, the auth step could just be ignored until the reader receives an encrypted entry for the first time – that way, if there’s no private content, people have no need to auth themselves at all! (This would be a function of the reader, not the publisher, of course.)

Key refreshing

A key refresh ensures that all readers are active and their keys are current; every key has a lease time on it, and if the key hasn’t been refreshed by the end of the lease, it gets suspended by When a key exchange happens, the reader should also get a lease time, i.e. how long it should be before it next refreshes. If the refresh hasn’t happened by this time, then the publisher suspends it and stops encrypting the nonce key with it (saving bandwidth for everyone and computation time on the publishing side). This also gives publishers an indication of followers who have stopped following.

A refresh can be initiated by the feed reader with no input from the user. This is the same as the auth challenge-response, except with a mode of refresh instead:

mode: refresh
profile: The user’s profile URL
callback: The reader’s subscription callback (which should be an unguessable URL that uniquely maps to the user)

If the profile URL is known, the endpoint responds with an appropriate 2XX response code, and then gathers the public key from the profile. If the public key hasn’t changed, it sends another validate request back to the callback URL as above:

mode: validate
challenge: A randomly-generated string that has been encrypted using the public key
refresh_time: When to next refresh the lease, in seconds from now

The callback then responds with the challenge string decrypted via the private key. If this challenge is correctly-decrypted, the publisher marks the user and key as valid.

Error handling

If auth fails, the user will already know it by merit of failing to log in to the authentication endpoint.

If refresh fails (due to e.g. not being a known profile URL, or the key having already been suspended), the endpoint should return an appropriate 4XX response code. The feed reader should then indicate to the user they need to re-do the auth process.

If validate fails (due to e.g. the public key changing or the challenge failing), the key should immediately be suspended and the callback URL should be called again with a mode of validate_failed, and the feed reader should indicate to the user they need to re-do the auth process.

In this situation, it’s possible that a validate failure might be missed by the client (due to the client being offline or otherwise having transitory failures). In this case the client will get a 4XX error on the next refresh and have the user re-auth at the next lease expiration. (Meaning the user will miss any private updates until that next interval.) For this reason, the lease refresh time should probably be e.g. on the order of a few days, and the publisher’s actual lease time should be longer, e.g. 5x the expected refresh time.

Migrations

If a user needs to migrate to a new profile provider, there could be a mechanism for them to port their key-pair over to the new place and, ideally, the old provider has a forwarding link that tells the publisher to change the profile they look at. Otherwise, the user can simply re-subscribe using the new profile.

If a user needs to change feed readers (but can use the same profile provider), there is nothing special that needs to happen, as long as they can use the same profile and key-pair.

If a blog migrates and loses the authorized follower data, the next client refresh cycle ends up instigating a re-authorization.

Viewing on the publisher’s site

If someone wants to look at the entries on the original site, the site can provide an IndieAuth login mechanism, and if they log in using a known profile, they should be able to see all entries that are visible to that profile’s private key. (No need to actually use the encryption key in this case though.)

This has an advantage over the ActivityPub approach in that people can look at past content on the original site, rather than being limited only to the entries which were propagated to them via subscription (and retained by their reader). This also supports a world where a reader retains the metadata for old/archived entries but refers to the original site to view the entry itself (which saves space on the subscribers' side).

Static publishing

This approach can work with static publishing! The auth endpoint can be hosted at a third party, and a static site generator can pull the current access lists from that as an API, which would be pretty simple to express with a simple JSON document, e.g.:

{
    "followers": {
        "http://beesbuzz.biz/": "PUBLIC KEY GOES HERE",
        "http://example.com/": "PUBLIC KEY GOES HERE",
        "https://queer.party/@fluffy": "PUBLIC KEY GOES HERE"
    },
    "lists": {
        "Friends": ["http://beesbuzz.biz/", "http://example.com/"],
        "Lovers": ["http://example.com/", "https://queer.party/@fluffy"]
    }
}

So in this way, Atom still provides the benefits of static publishing while also providing access control!

Note that this does have a compromise around security; private entires will need to have their <link rel="alternate"> to not be in the public part of the entry (or the public link needs to go to a page that discusses how to get access), and its private permalink (which would then be declared in the encrypted enclosure) would have to go to an unguessable ugly URL instead. And then you have to worry about that unguessable URL being shared publicly. (But hey, “private” images on Mastodon, Patreon, Discord, and Facebook aren’t private either! They all rely on unguessable URLs too.)

Things to watch out for

On multi-user feed readers, while the feed itself can be shared between users, the decryption should be done on a per-user basis, ideally as close to display time as possible. Only content which is decryptable by the user should be presented to the user.

The public <link rel="alternate"> should be obscured in some way so that unauthorized users don’t see any private information; for example, it should not include a title slug (and should not forward to a URL with a title slug unless the viewer is authorized to see it).

Also this wouldn’t scale very well for feeds of short, Twitter-like content. It might be worth having items be able to refer to another item’s nonce, so for example:

<entry>
    <title>private content</title>
    <id>urn:XYZZY:1</id>
    <published>1970-01-01 00:00:01</published>
    <ns:nonce>
        [BIG PILE OF ENCRYPTED TEXT HERE - contains the nonce encrypted by every
        trusted follower!]
    </ns:nonce>
    <ns:encrypted-entry>
        [encrypted post data goes here]
    </ns:encrypted-entry>
</entry>

<entry>
    <title>private content</title>
    <id>urn:XYZZY:2</id>
    <published>1970-01-01 00:00:15</published>
    <ns:nonce ref="urn:XYZZY:1" />
    <ns:encrypted-entry>
        [encrypted post data goes here]
    </ns:encrypted-entry>
</entry>

So if a long thread of small updates comes out, the publishing side could simply share the nonce among all the items in the thread. (But care should be taken to not share a nonce with another item that does not exist in the same feed view.)

Also if anyone’s access is revoked from an older entry, its nonce should probably be discarded. (Whoever did have access previously might have saved the plaintext though, and there’s nothing you can do about that.)

Oh, and getting this to interoperate with WebMention might be tricky. WebMention does have some provisions for private/protected content but I haven’t looked into how it works.

Next steps

After gathering feedback on this from the community, I’d be very interested in properly formalizing this as an RFC or a W3C proposal. What does everyone think?

After that I feel like the next step would be to start finally writing Subl with this stuff from the ground up, and maybe add at least the auth part (if not the encryption part) to Publ (although first I’d be implementing simple hybrid authed feeds like I had on Movable Type + phpBB).