A Flink connector for S2 is available. Documentation and code can be found on Github.

The connector currently supports use of streams both as sources, and sinks, and is accessible via Flink’s DataStream, Table, and SQL APIs.

The Table/SQL connector allows S2 sources and sinks to be optionally configured with upsert-style semantics, similar to the Upsert Kafka Flink connector.

Example

The connector repo contains an example job demonstrating a real-time feature pipeline for an e-commerce usecase, which:

  • Listens for user actions being logged to a collection of S2 streams
  • Extracts complex events from patterns of user actions within a rolling window
  • Writes summary events to a S2 source
  • Computes a streaming dynamic table containing a derived feature: the top query strings, per product, leading to a purchase

An example per-product feature derived from user actions:

[
  112140,
  {
    "top_queries": [
      "mid-century modern",
      "contemporary living",
      "fashion",
      "home and living",
      "kitchen"
    ]
  }
]

The associated README on Github has details on getting the job up and running.