确定现有字符串的所有子字符串的最快方法

Let's say I have the string "Hey". I would like to determine all combinations of characters that exist in this string as fast as possible. The resulting algorithm should generate this:

H, e, y, He, ey, Hey

The algorithm should not produce the string "Hy" because it does not exist in the string as a substring.

There are O(n^2) of those substrings, of lengths [1,n], so any algorithm to generate all of them will be O(n^2) * O(n) = O(n^3):

(*) See Edit2 at the end - depending on the implementation of the string - the complexity can vary from O(n^2) to O(n^3)

Pseudo code:

result <- {} #result is a set if dupes should be terminated, otherwise - it is a multiset.
for i from 0 to s.length:
   for j from i+1 to s.length:
      result.add(s.substring(i,j))
return result

Note however, that you can do some "cheating", by creating an iterator and generate the substrings on the fly, it should look something like this [pseudo code]:

class MyIterator:
  String s
  int i,j
  MyIterator(String s):
     this.s = s
     i = 0
     j = 0
  next():
     j = j + 1
     if (j >= s.length):
     i = i + 1
     j = i + 1
     if (i >= s.length): 
         throw exception
     return s.substring(i,j)

Note that creating the iterator is O(1), and each iteration is O(n) - but to actually produce all the elements, you need O(n^2) steps, so complexity remains O(n^3) overall, but you decrease the latency of your application.

EDIT:
I editted complexity, claiming it is O(n^2) is wrong, the complexity is O(n^3) since you need to generate strings of variable lengths, some of them are long. At least half of the generated substrings will be of length n/2 - thus the total complexity is Theta(n^3)

EDIT2:
In some cases it can actually be O(n^2) - depending on the string implementation. In java for example - it uses a single char[], and only "plays" with the offset and length - so in java - creation is actually O(n^2), since creating a substring is O(1)
In C however - it is O(n^3), since every substring needs to be copied to a different char[].

Check for the implementation of n-grams in php.

In your example string: Hey

H, E, Y are unigrams

HE, EY are bigrams

HEY is a trigram