Operations
There are 3 core data operations on a stream:Append records
POST /streams/{stream}/recordsRead records
GET /streams/{stream}/records?seq_num=42&count=100Check the tail
GET /streams/{stream}/records/tailRetention
Age-based retention can be configured on a stream, and S2 will automatically delete records that are older than the configured threshold. Explicit trimming is supported with thetrim command.
Data in JSON
The following pieces of record data are stored as bytes:- Header name
- Header value
- Body
S2-Format is used to indicate the desired encoding of these bytes when records are represented in JSON.
raw
S2-Format: raw or omit header.Use when your record data is valid Unicode. Zero overhead, human-readable. Cannot handle binary data safely.base64
S2-Format: base64Use when you are working with arbitrary bytes. Always safe. 33% overhead over the wire.You can write using one format and read with another. When reading
raw, S2 is interpreting the stored bytes as UTF-8. This will be a potentially-lossy conversion if it was not also written as raw, or as base64-encoded valid UTF-8.Protobuf messages
Data plane endpoints toappend and read records also support protobuf bodies. This helps avoid the base64 encoding tax compared to binary data in JSON messages.
To send and receive protobuf:
- Set the
Content-Typeheader toapplication/protobufand send a protobuf-encoded payload. - Set the
Acceptheader toapplication/protobufto receive a protobuf-encoded response. The response will include theContent-Type: application/protobufheader if the server returns a protobuf.
Sending
Accept: application/protobuf request header only guarantees a protobuf response in case of a success (HTTP 200). Other status codes are always accompanied by JSON bodies.Sessions
S2S (S2-Session) is a minimal binary protocol to encapsulate streaming append and read session semantics over , at the same endpoints as “unary” calls.
It is meant for S2 SDKs, but documented for the adventurous.
S2S spec
S2S spec
Setup
Content-Type: s2s/protosignals that a session is being requested.Accept-Encodingsignals which compression algorithms are supported (service supportszstdandgzip).Content-Encodingis not sent as message-level compression is used.200 OKresponse establishes a session.
Message framing
All integers use big-endian byte order. Messages smaller than 1KiB should not be compressed.Length prefix (3 bytes): Total message length (flag + body)Flag byte (1 byte):[T][CC][RRRRR]T(bit 7): Terminal flag (1= stream ends after this message)CC(bits 6-5): Compression (00=none,01=zstd,10=gzip)RRRRR(bits 4-0): Reserved
- Regular message is a Protobuf
- Terminal message contains a 2-byte status code, followed by JSON error information (corresponding to unary response behavior)
Data flow
Append sessions are a bi-directional stream ofAppendInput messages from Client → Server, and AppendAck messages from Server → Client.Read sessions are a uni-directional stream of ReadBatch messages from Server → Client. When waiting for new records, an empty batch is sent as a heartbeat at least every 15 seconds.Read sessions are also supported over SSE which has the benefit of being usable in the browser context.
Command records
Command records are an advanced feature to signal certain operations interpreted by the service. S2 SDKs make it easy to create supported command records. Concretely, a command record is a record with:- sole header that has an empty name — empty header names are not allowed in any other context
- operation encoded in this header value
- payload for the command in the body of the record.
Command records take up a sequence number on the stream, and will be returned to reads. It is easy to test and filter out commands if needed, with the logic
len(headers) == 1 && headers[0].name == b"".fencewith up to 36 UTF-8 bytes as payload to set a fencing token for the stream. An empty payload clears the token. Fencing is strongly consistent, and subsequent appends that specify a fencing token will be rejected if it does not match.
trimwith exactly 8 big-endian bytes as payload representing the desired earliest sequence number for the stream — we will call it the trim point. The effective trim point from the command is going to bemax(existing_trim_point, min(provided_trim_point, my_seq_num + 1)). Trimming is eventually consistent, and trimmed records may be visible for a brief period.
The S2 CLI also supports
fence and trim commands.
