Data Plane
Records API for S2
Operations
There are 3 core data operations on a stream:
Append records
POST /streams/{stream}/records
Read records
GET /streams/{stream}/records?seq_num=42&count=100
Check the tail
GET /streams/{stream}/records/tail
Retention
Age-based retention can be configured on a stream, and S2 will automatically delete records that are older than the configured threshold.
Explicit trimming is supported with the trim
command.
Data in JSON
The following pieces of record data are stored as bytes:
- Header name
- Header value
- Body
A custom header S2-Format
is used to indicate the desired encoding of these bytes when records are represented in JSON.
raw
S2-Format: raw
or omit header.
Use when your record data is valid Unicode.
Zero overhead, human-readable.
Cannot handle binary data safely.
base64
S2-Format: base64
Use when you are working with arbitrary bytes.
Always safe.
33% overhead over the wire.
You can write using one format and read with another. When reading raw
, S2 is interpreting the stored bytes as UTF-8. This will be a potentially-lossy conversion if it was not also written as raw
, or as base64
-encoded valid UTF-8.
Protobuf messages
Data plane endpoints to append
and read
records also support protobuf bodies. This helps avoid the base64 encoding tax compared to binary data in JSON messages.
To send and receive protobuf:
- Set the
Content-Type
header toapplication/proto
and send a protobuf-encoded payload. - Set the
Accept
header toapplication/proto
to receive a protobuf-encoded response. The response will include theContent-Type: application/proto
header if the server returns a protobuf.
Type definitions are available in git and Buf.
Sending Accept: application/proto
request header only guarantees a protobuf response in case of a success (HTTP 200). Other status codes are always accompanied by JSON bodies.
Sessions
S2S (S2-Session
) is a minimal protocol to encapsulate streaming append
and read
session semantics for the data plane. Until it is rolled out, sessions are supported via v1alpha gRPC and most SDKs.
Command records
Command records are an advanced feature to signal certain operations interpreted by the service. S2 SDKs make it easy to create supported command records.
Concretely, a command record is a record with:
- sole header that has an empty name — empty header names are not allowed in any other context
- operation encoded in this header value
- payload for the command in the body of the record.
Command records take up a sequence number on the stream, and will be returned to reads. It is easy to test and filter out commands if needed, with the logic len(headers) == 1 && headers[0].name == b""
.
Operations that are currently supported:
-
fence
with up to 36 UTF-8 bytes as payload to set a fencing token for the stream. An empty payload clears the token. Fencing is strongly consistent, and subsequent appends that specify a fencing token will be rejected if it does not match. -
trim
with exactly 8 big-endian bytes as payload representing the desired earliest sequence number for the stream — we will call it the trim point. The effective trim point from the command is going to bemax(existing_trim_point, min(provided_trim_point, my_seq_num + 1))
. Trimming is eventually consistent, and trimmed records may be visible for a brief period.
The S2 CLI also supports fence
and trim
commands.