WordPress如何从PDF中替换诸如Ligatures和Quotes之类的狡猾字符

I want to work directly with some HTML containing text copied from a PDF, bypassing whatever processing WordPress applies to content in its back-end editor. This is for ease of development, as using the WP editor for a very long page is impractical.

When I use the WP editor, the text displays perfectly, but using pure HTML (via a template, replacing the_content()), the symbols with weird encoding from the PDF such as joined "fi" and several other characters, display incorrectly, as diamond shapes and question marks.

The encoding is UTF-8 on both my editor (NetBeans) and the WordPress page.

Could someone please explain how WordPress "knows" how to replace these characters, and how I can do the same in my source code? Using PHP is an option, I guess.