The Stream
Operations
The gRPC API and SDKs offer both unary operations as well as streaming sessions. The upcoming REST API will support most operations aside from streaming appends.
The first usage of an authentication token on a connection may experience a latency overhead for verification, typically tens of milliseconds. Subsequent requests will not incur such an overhead.
Latency guidance noted below assumes a client in the same cloud region as the basin. Currently, S2 defaults basins to aws:us-east-1
— see endpoints.
CheckTail
Determine the tail of the stream, i.e. the next sequence number that will be assigned to a record.
This is a fast operation, expected to complete in single-digit milliseconds.
Read
and ReadSession
Read records from the stream, starting at any sequence number that has not been trimmed.
Optionally, limit by count or total bytes.
With a session, you are able to read in a streaming fashion. If a limit is not specified and the end of the stream is reached, the session goes into real-time tailing mode and will return records as they are appended to the stream.
Reading records written within the last 20 seconds is expected to take single-digit milliseconds. Otherwise, the time-to-first-byte can take up to 200 milliseconds.
Append
and AppendSession
Append a batch of records to a stream. Appends execute atomically — either all the records in a batch will become durable, or none. S2 returns the range of sequence numbers assigned.
With a session, you can pipeline batches with an ordering guarantee, and receive acknowledgements back in a corresponding order. If any batch fails, subsequent batches will not become durable.
Acknowledgment latency depends on the storage class for the stream. A Standard stream will acknowledge writes within 500 milliseconds, and an Express stream within 50 milliseconds.
Concurrency control
Appends support two mechanisms for concurrency control.
Match sequence
Specifying expected current state allows optimistic concurrency control.
You can provide the sequence number that you expect S2 to assign to the first record in a batch as the match_seq_num
.
If it does not match, the gRPC API will return an ABORTED
status code, signalling that there has been a concurrent write.
Fencing token
Fencing is a form of pessimistic concurrency control.
It is a cooperative mechanism, so an append that does not specify a fencing token will still be allowed.
When an append does include a fencing_token
, the gRPC API will return an ABORTED
status code if it does not match the current token set for the stream.
Setting the current fencing token itself requires appending to the stream, with the fence
command.
Retention
Age-based retention can be configured on a stream, and S2 will automatically delete records that are older than the configured threshold.
Explicit trimming is supported with the trim
command.
Key-based compaction inspired by Kafka’s semantics is on our roadmap. Instead of configuring an age threshold, you will be able to specify the name of a header whose value represents the key for compaction.
Command records
Command records are an advanced feature to append certain operations interpreted by the service. S2 SDKs make it easy to create supported command records.
Concretely –– a command record is a record with a sole header that has an empty name, an operation encoded in this header value, and a payload for the command in the body of the record. Empty header names are not allowed in any other context.
Command records take up a sequence number on the stream, and will be returned to reads. It is easy to test and filter out commands if needed, with the logic len(headers) == 1 && headers[0].name == b""
.
Operations that are currently supported:
-
fence
with an upto 16 byte payload to set a fencing token for the stream. An empty payload clears the token. Fencing is strongly consistent, and subsequent appends that specify a fencing token will be rejected if it does not match. -
trim
with exactly 8 big-endian bytes as payload representing the desired earliest sequence number for the stream –– we will call it the trim point. The effective trim point from the command is going to bemax(existing_trim_point, min(provided_trim_point, my_seq_num + 1))
. Trimming is eventually consistent, and trimmed records may be visible for a brief period.
The S2 CLI also supports fence
and trim
commands.