Operations

There are 3 core data operations on a stream:

Retention

Age-based retention can be configured on a stream, and S2 will automatically delete records that are older than the configured threshold.

Explicit trimming is supported with the trim command.

Data in JSON

The following pieces of record data are stored as bytes:

  • Header name
  • Header value
  • Body

A custom header S2-Format is used to indicate the desired encoding of these bytes when records are represented in JSON.

raw

S2-Format: raw or omit header.

Use when your record data is valid Unicode.

Zero overhead, human-readable.

Cannot handle binary data safely.

base64

S2-Format: base64

Use when you are working with arbitrary bytes.

Always safe.

33% overhead over the wire.

You can write using one format and read with another. When reading raw, S2 is interpreting the stored bytes as UTF-8. This will be a potentially-lossy conversion if it was not also written as raw, or as base64-encoded valid UTF-8.

Protobuf messages

Data plane endpoints to append and read records also support protobuf bodies. This helps avoid the base64 encoding tax compared to binary data in JSON messages.

To send and receive protobuf:

  • Set the Content-Type header to application/proto and send a protobuf-encoded payload.
  • Set the Accept header to application/proto to receive a protobuf-encoded response. The response will include the Content-Type: application/proto header if the server returns a protobuf.

Type definitions are available in git and Buf.

Sending Accept: application/proto request header only guarantees a protobuf response in case of a success (HTTP 200). Other status codes are always accompanied by JSON bodies.

Sessions

S2S (S2-Session) is a minimal protocol to encapsulate streaming append and read session semantics for the data plane. Until it is rolled out, sessions are supported via v1alpha gRPC and most SDKs.

Command records

Command records are an advanced feature to signal certain operations interpreted by the service. S2 SDKs make it easy to create supported command records.

Concretely, a command record is a record with:

  1. sole header that has an empty name — empty header names are not allowed in any other context
  2. operation encoded in this header value
  3. payload for the command in the body of the record.

Command records take up a sequence number on the stream, and will be returned to reads. It is easy to test and filter out commands if needed, with the logic len(headers) == 1 && headers[0].name == b"".

Operations that are currently supported:

  • fence with up to 36 UTF-8 bytes as payload to set a fencing token for the stream. An empty payload clears the token. Fencing is strongly consistent, and subsequent appends that specify a fencing token will be rejected if it does not match.

  • trim with exactly 8 big-endian bytes as payload representing the desired earliest sequence number for the stream — we will call it the trim point. The effective trim point from the command is going to be max(existing_trim_point, min(provided_trim_point, my_seq_num + 1)). Trimming is eventually consistent, and trimmed records may be visible for a brief period.

The S2 CLI also supports fence and trim commands.