是否可以自动确定doctype?

I have an app that receives some HTML and sometimes it doesn't contain a doctype it just starts off with the <html> tag.

I can return an error to the user and ask them to define a doctype but I would rather go the extra mile (if possible) of somehow determining an appropriate doc type by looking at the HTML code.

Is this possible? With JS? PHP?

The simplest option would probably be to validate the document multiple times, prepending a different Doctype each time.

You could then assume that which ever Doctype resulted in the fewest errors was the one to use.

The W3C Markup Validation Service has an API and you can download and install a copy locally for better performance (and to avoid hammering a free service provided by a third party).

You could check some basics of the code.

  1. Are empty tags ending with /> or > (img for example)
  2. Are attributes in use which are just in transitional doctypes available (target for example)
  3. Are tags containing which are just defined in HTML5?

If you have this three answers you can differ between

  1. HTML and XHTML
  2. Transitional and Strict
  3. (X-)HTML4 or HTML5

I guess this is easier than using the api.