Apostrophes和Colons in Pretty Links

I am looking to solve a problem dealing with "pretty links" or "permalinks" that list the title of books, such as:

http://www.example.com/title/The-Catcher-in-the-Rye/

When I deal with regular book titles, such that have simple words or spaces, there is no problem, since I can simply replace the space with a dash - and deal with finding the book title in the database by doing a reverse str_replace.

However, the problem arises when I have titles of books that have either apostrophes ' or colons : in them, or both as in this example:

Why Can't I Be You: A Novel

In my sql database, all single quotes are escaped, so that the entry in the database is as follows:

+-----+-------------------------------+
| BID | book_title                    |
+-----+-------------------------------+
|   1 | Why Can\'t I Be You: A Novel  |
+-----+-------------------------------+

When I list all book titles, I again unescape the string, so it lists simply as: Why Can't I Be You: A Novel

My <a> links show the unescaped title, with the pretty links created by replacing spaces with dashes and omitting the apostrophes and colons as follows:

<a href="http://www.example.com/title/why-cant-i-be-you-a-novel" title="Why Can't I Be You: A Novel">Why Can't I Be You: A Novel</a>

So, getting to my problem. I want to be able to list all the titles of the books formatted (unescaped) and have the "permalinks" / "pretty links" with hyphens work and return the proper title to the GET method.

In my .htaccess entry, I have the following RewriteRule:

RewriteRule ^title/(.*[^/])/?$ viewbook.php?booktitle=$1 [NC,L]

What this does is take the "pretty" link portion that follows title/ and sends it via GET to viewbook.php. So for example for the book The Catcher in the Rye, the following is sent via GET: The-Catcher-in-the-Rye

No problem there, since it is simple in php to resolve this issue:

$booktitle = $_GET['booktitle'];
$goodBookTitle = str_replace('-', ' ', $booktitle);

// or we can do it all at once

$booktitle = str_replace('-', ' ', $_GET['booktitle']);

// Send $booktitle to SQL query and find the book

This works fine when no apostrophes are found, however, this method does not help if the title has either apostrophes or colons since it will not be found in the database. I also do not want to use the WHERE book_title LIKE '%$booktitle%' since viewbook.php has to be the exact match.

I am looking for an elegant or simple solution which will enable me to solve this via a RewriteRule and not have to add extra tables to the database for say slug or permalink, and I don't want to have apostrophes in the url such as %27 for single quote. This is a large database on which data entry is done in a spreadsheet, exported to CSV and uploaded into the SQL database. There is no front end for individual entries to allow for such things as slug or equivalent.

I hope my explanation is clear.

First of all, the idea of storing escaped strings in database looks strange. MySQL is able to store strings of arbitrary characters and even can safely store binary sequences.

Now about mapping from real titles to pretty URLs and back. The idea to convert title to URL-friendly string and then back is not the common way to solve your problem, because it is very hard to make such conversion reversible. Usual way to solve this problem is to have separate column in the database that contains book title modified to be URL-friendly. Also values in this columns should be made unique. The table could look like this:

+-----+-----------------------------+----------------------------+
| BID | book_title                  | book_title_url             |
+-----+-----------------------------+----------------------------+
|   1 | Why Can't I Be You: A Novel | why-can-t-i-be-you-a-novel |
+-----+-----------------------------+----------------------------+

You should index your table by this columns and use it instead of book_title in SQL query inside your viewbook.php script like this:

SELECT * FROM books WHERE book_title_url='$booktitle'

Where $booktitle contains book title received via $_GET['booktitle'] and properly escaped to prevent SQL injections.

So your pretty URLs will look like http://www.example.com/title/why-can-t-i-be-you-a-novel and they will be rewritten by Apache to something like http://www.example.com/viewbook.php?booktitle=why-can-t-i-be-you-a-novel.

Again, this is common way how pretty URLs are usually implemented. hope it will work for you too.

For existing records you can populate book_title_url column by something like this:

UPDATE books SET book_title_url=REPLACE(REPLACE(REPLACE(book_title, " ", "-"), ":", "-"), "'", "-");

Don't forget having index by url or else it will work slow and please do escape variables coming from request unless you want SQL injection :)

All those problems could be overcome if you have an option to just embed integer id in the pretty urls, like that : http://www.whaaa.at/title/1/whatever-fancy-%34name%34-you-like and then lookup by that id

Anyway, book title is no primary key, `cause there can be several books with same title.

According to STD 66, both colons and apostrophes are valid in path segments:

segment    = *pchar

pchar      = unreserved / pct-encoded / sub-delims / ":" / "@"

sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
           / "*" / "+" / "," / ";" / "="

Therefore, in this case, you can just use them in your URIs unencoded:

<a href="http://www.example.com/title/why-can't-i-be-you:-a-novel"
   title="Why Can't I Be You: A Novel">Why Can't I Be You: A Novel</a>

This is how Wikipedia does it: e.g. http://en.wikipedia.org/wiki/Breakin'_2:_Electric_Boogaloo (alas StackOverflow is encoding those characters in creating the hyperlink).

Characters which genuinely can't be used must either be encoded in some fashion (the standardised approach is to use percent encoding, but you could do something application-specific if that is unacceptable to you for some reason), or else omitted (e.g. by looking up against a secondary column, such as described in @MikhailVladimirov's answer).