根据字符串相似性获取最长的公共子字符串

I have a table with a column that includes names like:

  1. Home Improvement Guide
  2. Home Improvement Advice
  3. Home Improvement Costs
  4. Home Gardening Tips

I would like the result to be:

  1. Home Improvement
  2. Home Gardening Tips

Based on a search for the word 'Home'.

This can be accomplished in MySQL or PHP or a combination of the two. I have been pulling my hair out trying to figure this out, any help in the right directly would be greatly appreciated. Thanks.

Edit / Problem kinda solved:

I think this problem can be solved much easier by changing the logic a little. For anyone else with this problem, here is my solution.

  1. Get the sql results
  2. Find the first occurrence of the searched word, one string at a time, and get the next word in the string to the right of it.
  3. The results would include the searched word concatenated with the distinct adjoining word.

Not as good of a solution, but it works for my project. Thanks for the help everyone.

This is too long for a comment.

I don't think that Levenshtein distance does what you want. Consider:

Home Improvement
Home Improvement Advice on Kitchen Remodeling
Home Gardening

The first and third are closer by the Levenshtein measure than the first and third. And yet, I'm guessing that you want the first and second to be paired.

I have an idea of the algorithm you want. Something like this:

  • Compare every returned string to every other string
  • Measure the length of the initial overlap
  • Find the maximum over all the strings strings, pair those
  • Repeat the process with the second largest overlap and so on

Painful, but not impossible to implement in SQL. Maybe very painful.

What this suggests to me is that you are looking for a hierarchy among the products. My suggestion is to just include a category column and return the category. You may need to manually insert the categories into your data.