Kafka Connect has its own in-memory objects that include data types and schemas, it allows for pluggable converters to allow storing these records in any format.
Kafka allows encrypting data on the wire, as it is piped from sources to Kafka and from Kafka to sinks. It also supports authentication (via SASL) and authorization.
Kafka also provides an audit log to track access—unauthorized and authorized. With some extra coding, it is also possible to track where the events in each topic came from and who modified them, so you can provide the entire lineage for each record.
The more agile way is to preserve as much of the raw data as possible and allow downstream apps to make their own decisions regarding data processing and aggregation
Kafka Connect is a part of Apache Kafka and provides a scalable and reliable way to move data between Kafka and other datastores. There is no need to install it separately.
Provides APIs and a runtime to develop and run connector plugins
Kafka Connect Sink is used to export data from Kafka to external databases.
Kafka Connect Source is used to import from external databases into Kafka.
You cannot have more sink tasks (= consumers) than the number of partitions
Kafka Connect runs as a cluster of worker processes
Kafka Connect also has a standalone mode
Look at the Connect worker log after deleting a connector, you should see all other connectors restarting their tasks. They are restarting in order to rebalance the remaining tasks between the workers and ensure equivalent workloads after a connector was removed.
bootstrap.servers: A list of Kafka brokers that Connect will work with
group.id: All workers with the same group ID are part of the same Connect cluster
key.converter: Set the converter for the key. Default is JSON format using the JSONConverter key.converter.schema.enable=true/false -- JSON messages can include a schema or be schema-less key.converter.schema.registry.url -- Avro
value.converter: Set the converter for the value. Default is JSON format using the JSONConverter value.converter.schema.enable=true/false -- JSON messages can include a schema or be schema-less value.converter.schema.registry.url -- Avro
rest.host.name and rest.port Connectors are typically configured and monitored through the REST API of Kafka Connect.
Connector plugins implement the connector API, which includes two parts:
Determining how many tasks will run for the connector - choosing the lower of max.tasks
Deciding how to split the data-copying work between the tasks
Getting configurations for the tasks from the workers and passing it along
When the source connector returns a list of records, which includes the source partition and offset for each record, the worker sends the records to Kafka brokers. If the brokers successfully acknowledge the records, the worker then stores the offsets of the records it sent to Kafka.
Sink connectors have an opposite but similar workflow: they read Kafka records, which already have a topic, partition, and offset identifiers. Then they call the connector put() method that should store those records in the destination system. If the connector reports success, they commit the offsets they’ve given to the connector back to Kafka, using the usual consumer commit methods.