Click to copy
How to Union Streams in ksqlDB
ksqlDB doesn't directly support the SQL UNION operator to combine streams. Instead, you will need to create a new stream using the first query of the union, then insert the values into the same stream by querying any additional topics that need to be merged in.
There are a few common cases where you may want to union together two or more streams in ksqlDB:
- Merge streams with the same schema from different clusters or applications into a single unified stream for downstream processing.
- Combine streams with related but different data for aggregation. For example, user activity events from a website and mobile app.
- Append new data to an existing stream.
Since ksqlDB doesn't directly support UNION, you need to reproduce its behaviour by:
- Creating a new stream from the first data source you want to union
- Inserting additional streams into the same output stream
You need to start by creating a new stream from the first source query:
CREATE STREAM union_stream AS
SELECT *
FROM stream_1
EMIT CHANGES;
This registers stream_1
as a ksqlDB stream if it isn't already, runs the query, and outputs into union_stream
.
Now you can insert additional streams into union_stream
to union them:
INSERT INTO union_stream
SELECT *
FROM stream_2
EMIT CHANGES;
INSERT INTO union_stream
SELECT *
FROM stream_3
EMIT CHANGES;
The above inserts the latest data from stream_2
and stream_3
into the same union_stream
output.
By inserting multiple streams into a common output stream, you can achieve the logical equivalent of a SQL UNION in ksqlDB.
As an example, imagine you have separate user activity event streams from your website and mobile app. You want to union these together into a combined user_activity
stream for reporting.
You could do:
CREATE STREAM user_activity AS
SELECT *
FROM website_activity
EMIT CHANGES;
INSERT INTO user_activity
SELECT *
FROM mobile_activity
EMIT CHANGES;
Now user_activity
will contain events from both sources.
When unioning streams in ksqlDB using INSERT INTO statements, keep the following in mind:
- The schemas should match or be compatible. You may need to cast or alias fields.
- Deduplication logic may be required if events are duplicated across streams.