I have a worker in the primary region (US-East) that computes data on traffic at our edge locations. I want to push the data from an edge region to our primary kafka region.
An example is Poland, Australia, US-West. I want to push all these stats to US-East. I don't want to encurr additional latency during the writes from the edge regions to the primary.
Another option is to create another kafka cluster and worker that acts as a relay. That would require us to maintain individual clusters in each region and would add a lot more complexity to our deployments.
I've seen Mirror Maker, but I don't really want to Mirror anything, I guess I'm looking more for a relay system. If this isn't the designed way to do this, how can I aggregate all of our application metrics to the primary region to be computed and sorted?
Thank you for your time.
As far as I know, here are your options:
Option 1. is definitely the most standard solution to this problem, albeit a bit heavy handed. I suspect there will be more tooling coming out Confluent / Kafka folks to support option 3. in the future.
Write the messages to a local logfile on disk. Write a small daemon which reads the logfile and pushes the events to the main kafka daemon.
To increase througput and limit the effect of latency you could also rotate the logfile every minute. Then rsync the logfile with a cronjob to your main kafka region minutely. Let the import daemon run there.