I'm trying to get a big number of data (about 3M rows) and I have only two options to do that.
- Call an API then recover the 3M JSON objects.
- Import a CSV file containing the 3M rows.
I didn't test any of these solutions yet to tell which one is best in terms of speed.
If you want to retrieve simple data as lists or rows with some columns the option #2 is the good one, you can read below a set of advantages and disadvantages:
Pros
- Less bandwidth needed because json needs more syntax characters to keep the format while csv is as simple as use a character separator
- Process data is faster because only needs to split by the separator character while JSON needs to interpret the syntax
- Big data technology as Hadoop have an integrated parse for CSV format while needs an specific function for parse Json (for example using Hive language).
Cons
- Unstructured data and more difficult to be read by humans
- You have to take care with separator character cannot appear in data fields.
If the data will contains complex data as tupla, arrays and structures JSON is better because:
- Keeps a clear and structured format
- Doesn't repeat data to reference it because one label could contain multiple data.