Kafka多分区下如何避免数据重复?

单分区情况下可以通过enable.idempotence开启幂等

那多分区的情况下如何避免呢?

使用事务
https://kafka.apache.org/23/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("transactional.id", "my-transactional-id");
Producer<String, String> producer = new KafkaProducer<>(props, new StringSerializer(), new StringSerializer());

producer.initTransactions();

try {
producer.beginTransaction();
for (int i = 0; i < 100; i++)
producer.send(new ProducerRecord<>("my-topic", Integer.toString(i), Integer.toString(i)));
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
}
producer.close();

提高消费端的处理性能避免触发Balance,比如可以用异步的方式来处理消息,缩短单个消息消费的市场。或者还可以调整消息处理的超时时间。还可以减少一次性从Broker上拉取数据的条数。
具体可以参考这个文章: Kafka如何保证消息不丢失、不重复? https://blog.csdn.net/weixin_43972154/article/details/124453309