I have a php/js site where the information is encoded and put into the database. The encryption key for the information is randomly generated, then given back to the users after they send a post through a form. The encryption key is not stored in my database at all. A seperate, randomly generated, ID is formed and stored in the database, used to lookup the item itself before deciphering it.
My question is, is it possible at all to look through the logs and find information that would reveal the key? I am trying to make it impossible to read any of the SQL data without either being the person who has the code (who can do whatever he wants with it), or by a brute force attack (unavoidable if someone gets my SQL database)?
Just to re-iterate my steps:
Is there any possibility if someone had access to my server that can read the information? I want it to be information for me to read the messages myself. The information has to be decodable, it can't be a one way encoding.
Lets have a little think
IF someboby breaks that - you do have a problem with security
Spend the effort on fixing that. Disaster recover is a waste of effort in this case. Just get the base cases correct.
Let's start with some basic definitions:
Code Protecting data by translating it to another language, usually a private language. English translated to Spanish is encoded but its not very secure since many people understand Spanish.
Cipher Protecting data by scrambling it up using a key. A letter substitution cipher first documented by Julius Caesar is an example of this. Modern techniques involve mathematical manipulation of binary data using prime numbers. The best techniques use asymmetric keys; the key that is used to encipher the data cannot decipher it, a different key is needed. This allows the public key to be published and is the basis of SSL browser communication.
Encryption Protecting data by encoding and/or enciphering it.
All of these terms are often used interchangeably but they are different and the differences are sometimes important. What you are trying to do is to protect the data by a cipher.
If the data is "in clear" then if it is intercepted it is lost. If it is enciphered, then both the data and the key need to be intercepted. If it is enciphered and encoded, then the data, the key and the code need to be intercepted.
Where is your data vulnerable?
Do not overlook physical security, quite often the easiest way to steal data is to walk up to the server and copy the hard drive. Many companies (and sadly defence/security forces) spend millions on on-line data security and then put their data in a room with no lock. They also have access protocols that a 10 year old child could circumvent.
You now have lovely encrypted data - how are you going to stop your program from serving it up in the clear to anyone who asks for it?
This brings us to identification, validation and authorisation. More definitions:
Identification A claim made by a person that they are so-and-so. This is usually handled in a computer program by a user name. In physical security applications it is by a person presenting themselves and saying "I am so-and-so"; this can explicitly be by a verbal statement or by presenting an identity document like a passport or implicitly by a guard you know recognising you.
Validation This is the proof that a person is who they say they are. In a computer this is the role of the password; more accurately, this proves that they know the person they say they are's password which is the big, massive, huge and insurmountable problem in the whole thing. In physical security it is by comparing physical metrics (appearance, height etc) as documented in a trusted document (like a passport) against the claim; you need to have protocols in place to ensure that you can trust the document. Incidentally, this is the main cause of problems with face recognition technology to identify bad guys – it uses a validation technique to try and identify someone. “This guy looks like Bad Guy #1”; guess what? So do a lot of people in a population of 7 billion.
Authorisation Once a person has been identified and validated they are then given authorisation to do certain things and go to certain places. They may be given a temporary identification document for this; think of a visitor id badge or a cookie. Depending on where they go they may be required to reidentify and revalidate themselves; think of a bank’s website; you identify and validate yourself to see your bank accounts and you do it again to make transfers or payments.
By and large, this is the weakest part of any computer security system; it is hard for me to steal you data, it is far easier for me to steal your identity and have the data given to me.
In your case, this is probably not your concern, providing that you do the normal thing of allowing the user to set, change and retrieve their password in the normal commercial manner, you have probably done all you can.
Remember, data security is a trade off between security on the one hand and trust and usability on the other. Make things too hard (like high complexity passwords for low value data) and you compromise the whole system (because people are people and they write them down).
Like everything in computers – users are a problem!
Why are you protecting this data, and what are you willing to spend to do so?
This is a classic risk management question. In effect, you need to consider the adverse consequences of losing this data, the risk of this happening with your present level of safeguards and if the reduction in risk that additional safeguards will cost is worth it.
Losing the data can mean any or all of:
This type of thinking is what leads to the classification of data in defence and government into Top Secret, Secret, Restricted and Unrestricted (Australian classifications). The human element intervenes again here; due to the nature of bureaucracy there is no incentive to give a document a low classification and plenty of disincentive; so documents are routinely over-classified. This means that because many documents with a Restricted classification need to be distributed to people who don’t have the appropriate clearance simply to make the damn thing work, this is what happens.
You can think of this as a hierarchy as well; my personal way of thinking about it is:
Irrespective of the level, you don’t want any of this data lost or changed but if it is, you need to know that this has happened. For the Nazi’s, having their Enigma cipher broken was bad; not knowing it had happened was catastrophic.
In the comments below, I have been asked to describe best practice. This is impossible without knowing the risk of the data (and risk tolerance of the organisation). Spending too much on data security is as bad as spending too little.
I like your system of the URL containing the decryption key, so that not even you, without having data available only on the user's computer, will be able to access.
I still see a few gotchas in this.
URLs are often saved in web server logs. If you're logging to disk, and they get the disk, then they get the keys.
If the attacker has access to your database, he may have enough access to your system to secretly install software that logs the URLs. He could even do something as prosaic as turn logging back on.
The person visiting your site will have the URL bookmarked at least (otherwise it is useless to him) and it will likely appear in his browser history. Normally, bookmarks and history are not considered secure data. Thus, an attacker to a user's computer (either by sitting down directly or if the computer is compromised by malware) can access the data as well. If the payload is desirable enough, someone could create a virus or malware that specifically mines for your static authentication token, and could achieve a reasonable hit rate. The URLs could be available to browser plugins, even, or other applications acting under a seemingly reasonable guise of "import your bookmarks now".
So it seems to me that the best security is then for the client to not just have the bookmark (which, while it is information, it is not kept in anyone's head so can be considered "something he has"), but also for him to have to present "something he knows", too. So encrypt with his password, too, and don't save the password. When he presents the URL, ask for a password, and then decrypt with both (serially or in combination) and the data is secure.
Finally, I know that Google's two-factor authentication can be used by third parties (for example, I use it with Dropbox). This creates another "something you have" by requiring the person accessing the resource to have his cell phone, or nothing. Yes, there is recourse if you lose your cell phone, but it usually involves another phone number, or a special Google-supplied one-time long password that has been printed out and stashed in one's wallet.
First and most importantly, you need a really good, watertight legal disclaimer.
Second, don’t store the user’s data at all.
Instead when the user submits the data (using SSL), generate a hash of the SessionID and your system’s datetime. Store this hash in your table along with the datetime and get the record ID. Encrypt the user’s data with this hash and generate a URL with the record ID and the data within it and send this back to the user (again using SSL). Security of this URL is now the user’s problem and you no longer have any record of what they sent (make sure it is not logged).
Routinely, delete stale (4h,24h?) records from the database.
When a retrieval request comes in (using SSL) lookup the hash, if it’s not there tell the user the URL is stale. If it is, decrypt the data they sent and send it back (using SSL) and delete the record from your database.