This question already has an answer here:
I am developing a small library automation software and I need to determine a word is in English or Turkish. An example scenario is like this:
A friend of mine suggested me "connect to Google Translate and use it" which seems reasonable but an algorithm without connecting an external service or database will be more appropriate for me. (I also search the Turkish/English specific characters like ç,ş,İ/w,x to decide) Therefore I am searching an algorithm to do this job maybe based on letter frequencies or something like it. Anything available in literature? Thanks, in advance. (I use php, mysql if it's important)
</div>
As per comment.
please check: Detect language from string in PHP
or:
http://wiki.apache.org/solr/LanguageDetection
Solr can give you language with probability (for example this sentence is 90% English or 10% Turkish)
If the sample you're testing is that small (a single word or phrase) then simple heuristics like letter frequency aren't going to be very useful, as the English phrase "Jazz Quizzes" would probably fit the profile of many languages more readily than English.
You might be able to use frequency of bigraphs and trigraphs (2- and 3-letter combinations), as English and Turkish are sufficiently unrelated as to have combinations which only occur in one.
More likely, however, you are going to have to use a database of actual words from the two languages. In that case, you are probably best off using a third party API or database, rather than going to all the effort building your own corpuses, implementing the statistical algorithms, etc.