KSQL is the streaming SQL engine for Apache Kafka®. It provides an easy-to-use yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language.
KSQL is built on Kafka Streams, a robust stream processing framework that is part of Apache Kafka.
KSQL is not ANSI SQL compliant, for now there are no defined standards on streaming SQL languages
KSQL servers, clients, queries, and applications run outside of Kafka brokers, in separate JVM instances, or in separate clusters entirely.
KSQL Engine The KSQL engine executes KSQL statements and queries. You define your application logic by writing KSQL statements, and the engine builds and runs the application on available KSQL servers. Each KSQL server instance runs a KSQL engine. Under the hood, the engine parses your KSQL statements and builds corresponding Kafka Streams topologies.
REST Interface The REST server interface enables communicating with the KSQL engine from the CLI, Confluent Control Center, or from any other REST client.
Data Definition Language (DDL) Statements Imperative verbs that define metadata on the KSQL server by adding, changing, or deleting streams and tables. Data Definition Language statements modify metadata only and don't operate on data. You can use these statements with declarative DML statements.
The DDL statements include:
Data Manipulation Language (DML) Statements Declarative verbs that read and modify data in KSQL streams and tables. Data Manipulation Language statements modify data only and don't change metadata. The KSQL engine compiles DML statements into Kafka Streams applications, which run on a Kafka cluster like any other Kafka Streams application.
The DML statements include:
Interactive – data exploration and pipeline development. KSQL shares statements with servers in the cluster over the command topic. The command topic stores every KSQL statement, along with some metadata that ensures the statements are built compatibly across KSQL restarts and upgrades.
Headless – long-running production environments. The REST interface isn't available, so you assign workloads to KSQL servers by using a SQL file. The SQL file contains the KSQL statements and queries that define your application. KSQL stores metadata in an internal topic called the config topic.
KSQL enables distributing the processing load for your KSQL applications across all KSQL Server instances, and you can add more KSQL Server instances without restarting your applications.
Join KSQL engines to the same service pool by using the ksql.service.id property.
|You write:||KSQL statements||JVM applications|
|Graphical UI||Yes, in Confluent Control Center||No|
|Data formats||Avro, JSON, CSV||Any data format, including Avro, JSON, CSV, Protobuf, XML|
|REST API included||Yes||No, but you can implement your own|
|Runtime included||Yes, the KSQL server||Applications run as standard JVM processes|
Usually, KSQL isn't a good fit for BI reports, ad-hoc querying, or queries with random access patterns, because it's a continuous query system on data streams.
In KSQL, a record is an immutable representation of an event in time. Each record carries a timestamp. Timestamps are used by time-dependent operations, like aggregations and joins.
Don't mix streams or tables that have different time semantics.
Topic -> message.timestamp.type
By default, when KSQL imports a topic to create a stream, it uses the record's timestamp, but you can add the WITH(TIMESTAMP='some-field') clause to use a different field from the record's value as the timestamp
KSQL supports using windows in JOIN queries by using the WITHIN clause
SHOW STREAMS and EXPLAIN