Click to copy

• Reviewed for ksqlDB 0.29

How to Union Streams in ksqlDB

ksqlDB doesn't directly support the SQL UNION operator to combine streams. Instead, you will need to create a new stream using the first query of the union, then insert the values into the same stream by querying any additional topics that need to be merged in.

Why Union Streams?

There are a few common cases where you may want to union together two or more streams in ksqlDB:

  • Merge streams with the same schema from different clusters or applications into a single unified stream for downstream processing.
  • Combine streams with related but different data for aggregation. For example, user activity events from a website and mobile app.
  • Append new data to an existing stream.

How to Emulate UNION

Since ksqlDB doesn't directly support UNION, you need to reproduce its behaviour by:

    Creating a new stream from the first data source you want to union
    Inserting additional streams into the same output stream

Step 1: Create Initial Stream

You need to start by creating a new stream from the first source query:

CREATE STREAM union_stream AS
  SELECT *
  FROM stream_1
  EMIT CHANGES;

This registers stream_1 as a ksqlDB stream if it isn't already, runs the query, and outputs into union_stream.

Step 2: Insert Additional Streams

Now you can insert additional streams into union_stream to union them:

INSERT INTO union_stream
  SELECT *
  FROM stream_2
  EMIT CHANGES;
INSERT INTO union_stream
  SELECT *
  FROM stream_3
  EMIT CHANGES;

The above inserts the latest data from stream_2 and stream_3 into the same union_stream output.

By inserting multiple streams into a common output stream, you can achieve the logical equivalent of a SQL UNION in ksqlDB.

Example Use Case

As an example, imagine you have separate user activity event streams from your website and mobile app. You want to union these together into a combined user_activity stream for reporting.

You could do:

CREATE STREAM user_activity AS
  SELECT *
  FROM website_activity
  EMIT CHANGES;
INSERT INTO user_activity
  SELECT *
  FROM mobile_activity
  EMIT CHANGES;

Now user_activity will contain events from both sources.

Key Considerations

When unioning streams in ksqlDB using INSERT INTO statements, keep the following in mind:

  • The schemas should match or be compatible. You may need to cast or alias fields.
  • Deduplication logic may be required if events are duplicated across streams.
Discover what readers are saying
topictale
Get easy to digest how-tos on ksqlDB
Sign up
Please read our Privacy Policy to understand how we protect and manage your data.
You may also like