处理大量查询并避免重复

My project involves concurrency and database management. Meaning that I have to be editing a database simultaneously between all threads. To be more specific I am reading a line from the database then inserting a line to mark that I grabbed that line. This could work with transactions but due to the fact I will be running this program on multiple machines, I will be having different database connections on each one. Is their a better way for me to accomplish my above task?

Applying Optimistic Concurrency using transactions and a version field/column (could be a time-stamp or a time-stamp plus an actual version number that just increases or other mechanism for version number) is a must here.

But since you are doing this on different machines, it's possible that a substantial amount of repetitive failed transactions occur.

To prevent this, you could use a queuing mechanism. A dispatcher program reads the non-processed records from database and dispatch them to workers - using a queue or a job dispatcher. Then each worker will take the id from that queue and process it in a transaction.

This way:

  1. if a transaction fails, dispatcher would queue it again
  2. if a worker goes down, other workers would continue (noticing the going down is a matter of monitoring)
  3. workers can easily scale-out and new workers can be added at any time (as long as your database is not your bottleneck)

A request/reply schema would do best in this case to prevent queue congestion. I've used NATS successfully (and happily) for similar cases. Of-course you could use another tool of your choice but remember that you have to take care of request/reply part. Just throwing things at queues does not solve all problems and the amount of queued work should be controlled!