Currently, I am retrieving individual RSS feeds and storing the data that I need from them in a JSON format like this for every source (like 100):
{
"status": "ok",
"source": "source-string",
"sortBy": "top",
"unixTimeStampLastUpdated": 1513555729,
"articles": [{
"author": null,
"title": "Article Title",
"description": "Short Description",
"urlToImage": null,
"publishedAt": 1513536447,
"id": "2017-12_5a370775559fa"
},
...and so on
I store a monthly JSON file for each source (about 100 sources) in that format.
From that, I generate pages based on the sources monthly JSON file. For each of the articles listed it has a unique ID that needs to point to something on my server; to do this, I have an ENORMOUS monthly array of just the article IDs and a few of their attributes, like this:
{
"2017-12_5a3701fb89c99": {
"title": "Sample Article Title",
"url": "https:\/\/www.example.com\/",
"feed": "the-source",
"origin": "2017-12"
},
"2017-12_5a3701fba9c9a": {
"title": "Sample Article Title",
"url": "https:\/\/www.example.com\/",
"feed": "the-source-2",
"origin": "2017-12"
},
My Question:
What is the best way to retrieve articles, index them, display them, and act on the callbacks of them (ID); lighting fast and organized?
I am not sure if a SQL Database will solve my problems, as I have not had to set one up yet and I think this could be simpler...
Is there a way that I could do this with each article listed in only 1 JSON file instead of it being reference in a few places? Or would it lack speed?
Any input would be greatly appreciated!
Sounds like your data isn't terribly relational and you want:
Welcome to NoSQL land.
There are plenty of simple services that each accomplish one task or the other, [eg: Lucene or Solr for search] and plenty of consolidated services that accomplish both. If you're running this app in a public cloud somewhere [eg: AWS DynamoDB, GCP Datastore] then chances are they already have a service that does what you want, otherwise you're probably going to want to look into something like Couchbase, Cassandra, or Elasticsearch.
I've tried to be as broad as possible, so as not to ignite a holy war, but your question itself really rides the line for "Too Broad" and "Primarily Opinion-based" to begin with.
Lastly, if all this is too daunting you can always cobble together loose approximations of NoSQL systems inside of an RDBMS. In fact, Postgres has some fairly nice tools for interacting with schemaless data.