Background

Some details on the architectural background. We have the Fediverse with the following properties

Messages are exchanged with signed POST requests based on ActivityPub
One has a linked data flavor, so messages and objects contain links, where more information can be looked up
While the Fediverse has many good actors, there is plenty of content that I don't want to see, so I need a way to filter for it
While following people is a good way to obtain content, other options of content discovery would be interesting

This background document is on how to take Fediverse as input, i.e. receive POST requests, and turn them into AMQP messages, that can then be further processed.

Step 1

The basic layout is:

graph LR
    A((Fediverse)) ==>|POST| B[server]

A lot of stuff is hidden here, as the requests being signed, requiring some actor lookups to validate.

Step 2: Normalization, Validation

Normalization: ActivityPub messages are based on json-ld, and thus can be pretty arbitrary, it is thus convenient to normalize the messages with an application defined @context.

Validation: The message was signed if the author does not match actor, refetch. Check ids are from same domain, etc...

graph LR
    A((Fediverse)) ==>|POST| B[server]
    B -->|amqp| C[validation]
    C -->|valid| D[next step]
    C -->|invalid| E[refetch]
    E --> D

Note, one can implement refetching in a separate queue.

Branch off here

One probably wants to include something like

graph LR
    A((Fediverse)) ==>|POST| B[server]
    B -->|amqp| C[validation]
    C -->|valid| D[next step]
    C -->|invalid| D
    C --> |amqp| F[Key updates]

where if the Activity is an Update to a profile, one checks if the public keys changed. This type of branch off can also be used to collect statistics, like done with https://jsonld.bovine.social/.

Step 3: Processing / Annotation (possible)

This is another step, where additional information is added, e.g.

Replace images with proxied copy
Generate preview cards if non existent
Filter for SPAM, hate, CSAM, etc ...
Filter for bad content, e.g. a reply one cannot look up the post being replied to
Add the author to the dataset
Add Content Warnings / Remove Content Warnings
Filter by language / Translate

Step 4: Fanout

Once processing is done, we are ready to fan out

graph LR
    A((Fediverse)) ==>|POST| B[server]
    B -->|amqp| C[validation]
    C -->|valid| D[Processing]
    D --> E[Enqueue for devices]
    D --> F[Store in local archives]
    D --> G[Add to replies]

One should note that these steps can be conditional on the processing results. An article will not be visible on your watch, but visible on your phone.

Implementation

This is all partially implemented in bovine. Adding a message to a queue via amqp is done at

bovine_web.bovine_web

The broker is defined in

bovine_propan.broker

The broker corresponds to how stuff is processed.

Migrating bovine to do the fan out in Step 4 is a work in progress.