too long

I need to save the "plain" version of HTML content coming from a textarea with a WYSIWYG editor. Right now I'm using the following function right before saving into the database:

public function preUpdate(PreUpdateEventArgs $event)
{
    if (($resource = $event->getEntity()) instanceof Resource) {
        $resource->setPlainContent($this->computePlainContent($resource));
    }
}

protected function computePlainContent(Resource $resource)
{
    return preg_replace(
        '/\s+/',
        ' ',
        html_entity_decode(
            strip_tags($resource->getContent()),
            ENT_QUOTES | ENT_HTML401
        )
    );
}

Plain text will be used for searching among pages.

Questions:

  • is this good/safe**, assuming the editor will always produce valid HTML?
  • would you remove punctuation mark, and how?
  • should I use ENT_HTML401 or ENT_XHTML with CKEditor (default configuration, don't know the output quality)?

** for safe I mean safe to produce a good output. Users (o this system) are trusted.