I want to work directly with some HTML containing text copied from a PDF, bypassing whatever processing WordPress applies to content in its back-end editor. This is for ease of development, as using the WP editor for a very long page is impractical.
When I use the WP editor, the text displays perfectly, but using pure HTML (via a template, replacing the_content()
), the symbols with weird encoding from the PDF such as joined "fi" and several other characters, display incorrectly, as diamond shapes and question marks.
The encoding is UTF-8 on both my editor (NetBeans) and the WordPress page.
Could someone please explain how WordPress "knows" how to replace these characters, and how I can do the same in my source code? Using PHP is an option, I guess.