pdfmark：生成的PDF书签标题中的某些重音字符无法正确显示

I'm inserting bookmarks to existing PDF and have some problem with accented "c". There is the example (charset used in the example is UTF-8):

$name = "Ruční nářadí";

$name = chr(254).chr(255).iconv('UTF-8', 'UTF-16BE', str_replace(array('(',')','/'),array('\\(','\\)','\\/'),$name));

$fh = fopen('pdfmark.txt', 'w');
fputs($fh, "[/Title ({$name}) /Page 1 /OUT pdfmark
");
fclose($fh);

$command = "gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf final.pdf pdfmark.txt; mv out.pdf final.pdf";
exec($command);

The problem is that accented č appears in bookmark of final PDF as Ċ (uppercase letter with different accent). I tried other accented characters used in my language (Czech) and except of this one everything is ok.

Thanks for any clues to solve this issue.

EDIT (2013-02-01):

Version of GhostScript used is 9.06 (2012-08-08). I'm using Adobe Reader 11.0.1 to view the resulting PDF file.

I'm still thinking about it...Does it have to be encoding specified in PDF in some way? Because the source PDF is out of my control and I know quite nothing about it. If it is the case, is there any way to use GS or pdfmark to do so? I thought that if the encoding of the bookmarks is Unicode so it realy doesn't matter, but maybe I'm wrong.

EDIT (2013-02-05):

There seems to be bug in GS's pdfwrite or Acrobat, more info in GS's bug tracking. I will write solution info here, after it will be resolved.

According to bug tracking post it works for me to encode the string in different way (also it could help to download newer version 9.08 PRERELEASE):

$name = "Ruční nářadí";

$name = 'FEFF'.strtoupper(bin2hex(iconv('UTF-8', 'UCS-2BE', str_replace(array('(',')','/'),array('\\(','\\)','\\/'),$name))));

$fh = fopen('pdfmark.txt', 'w');
fputs($fh, "[/Title <{$name}> /Page 1 /OUT pdfmark
");
fclose($fh);

$command = "gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf final.pdf pdfmark.txt; mv out.pdf final.pdf";
exec($command);

Note the encoding to hex format and also different parenthesis in title definition.

The following code snippet illustrates what you need to do.

In postscript, special characters can be accessed using a \000 notation, where the 000 is a character location. The 3 digit location is in OCTAL, where \350 equals decimal location 232 and hex location E8.

The characters your looking for are Ccaron and ccaron. To be able to access these characters you need to define them in a font encoding table. The CEEncoding table is Adobe's Central European character set. Postscript probably already has the CEEncoding defined somewhere, but this example defines its own. As with this example you can define any encoding you like. The postscript language reference manual is available on the web gives details about the characters available.

This example outputs testing 1234 using standard /Helvetica, then defines a new font /Helvetica-CE based on the standard /Helvetica, but uses the CEEncoding encoding. (Ru\350ní) show uses the character \350 which the CEEncoding defines as ccaron. Just for fun, I also redefined character \001 to be Ccaron and the \002 to be the euro symbol and \003 as the trademark symbol to illustrate that any character can be defined as any character and output it as (testing 4567\001\002\003) show. Not all fonts have all symbols defined. Fonts without the symbol will substitute a space character.

And it's just that easy ;)

/Helvetica findfont 46 scalefont setfont
100 75 moveto
(testing 1234) show
/CEEncoding [
/.notdef /Ccaron /Euro /trademark /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /space /exclam /quotedbl
/numbersign /dollar /percent /ampersand /quoteright
/parenleft /parenright /asterisk /plus /comma
/minus /period /slash /zero /one
/two /three /four /five /six
/seven /eight /nine /colon /semicolon
/less /equal /greater /question /at
/A /B /C /D /E
/F /G /H /I /J
/K /L /M /N /O
/P /Q /R /S /T
/U /V /W /X /Y
/Z /bracketleft /backslash /bracketright /asciicircum
/underscore /quoteleft /a /b /c
/d /e /f /g /h
/i /j /k /l /m
/n /o /p /q /r
/s /t /u /v /w
/x /y /z /braceleft /bar
/braceright /tilde /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/Sacute /.notdef /.notdef /Zacute /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /sacute /.notdef /.notdef /zacute
/space /.notdef /breve /Lslash /currency
/Aogonek /.notdef /dieresis /.notdef /Scaron
/Scedilla /Tcaron /Zacute /hyphen /Zcaron
/Zdotaccent /degree /aogonek /ogonek /lslash
/acute /lcaron /.notdef /caron /cedilla
/aogonek /scedilla /tcaron /zacute /hungarumlaut
/zcaron /zdotaccent /Racute /Aacute /Acircumflex
/Abreve /Adieresis /Lacute /Cacute /Ccedilla
/Ccaron /Eacute /Eogonek /Edieresis /Ecaron
/Iacute /Icircumflex /Dcaron /Eth /Nacute
/Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis
/multiply /Rcaron /Uring /Uacute /Uhungarumlaut
/Udieresis /Yacute /Tcedilla /germandbls /racute
/aacute /acircumflex /abreve /adieresis /lacute
/cacute /ccedilla /ccaron /eacute /eogonek
/edieresis /ecaron /iacute /icircumflex /dcaron
/eth /nacute /ncaron /oacute /ocircumflex
/ohungarumlaut /odieresis /divide /rcaron /uring
/uacute /uhungarumlaut /udieresis /yacute /tcedilla
/dotaccent
] def

/Helvetica findfont
dup length dict begin
{ 1 index /FID ne
{def}
{pop pop}
ifelse
} forall
/Encoding CEEncoding def
currentdict
end
/Helvetica-CE exch definefont pop
/Helvetica-CE findfont 36 scalefont setfont
100 100 moveto
(\310\350) show
100 150 moveto 
(Ru\350ní) show
100 200 moveto
(testing 4567\001\002\003) show
 showpage

I would start by simplifying the string to a single offending character. Then look at the string in pdfmark.txt and see if it is correctly UTF-16BE encoded.

Assuming this is correct, then try running Ghostscript from the command line and see if that works. If it doesn't you'll be in a position to open a bug report which you can do at http://bugs.ghostscript.com please supply the source file(s) and command line if you do this.

You don't say what version of Ghostscript you are using, and you don't say what you are using to view the PDF file produced. Both would be useful....