I have a scenario where I have to import data (millions of records) from multiple sources and save it in a database. A user should get results in under 2-3 seconds when they try to search for any information related to that data.
For this, I designed an architecture where I used golang to import data from multiple sources and pushed data in AWS SQS. I've created a lambda function which triggers when AWS SQS has some data. This lambda function then pushes data in AWS Elastic Search. I've created a Rest API using which I give results to the user.
I use CRON to do this importing work every morning. Now my problem is if a new batch of data comes I want to delete the existing data and replace all of them with the new data. I'm stuck at how I can achieve this deleting and adding new data part.
I thought of creating a temporary index and then replacing it with the original index. But the problem is I do not know when importing has ended and can make this index switch.
The concept you're after is an index alias. The basic workflow would be:
my-index-2019-09-16
(for example).Point the alias to the new index (it's an atomic switch between the indices):
POST /_aliases
{
"actions" : [
{ "remove" : { "index" : "my-index-2019-09-15", "alias" : "my-index" } },
{ "add" : { "index" : "my-index-2019-09-16", "alias" : "my-index" } }
]
}
Delete the old index.
You will double the disk space during the import process, but otherwise this should work without any issues and you only delete data once it has a proper replacement.