[^\"'> \t)]
1.这段正则表达式是什么意思?
2.\"是对双引号做转义,python正则中还有哪些特殊符号需要\做转义(不考虑\number,\w等),官方文档里写着“permitting you to match characters like "*", "?", and so forth”。我就是想知道so forth里都包括哪些,如果能给出引用地址就更好。
The special characters are:
'.'
(Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
'^'
(Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.
'$'
Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.
'*'
Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.
'?'
Reference:
[url]http://docs.python.org/library/re.html[/url]
以 " 或 ' 或 > 或 制表位\t 或 ) 开头
Escape Sequence Meaning Notes
\newline Ignored
\ Backslash ()
\' Single quote (')
\" Double quote (")
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\N{name} Character named name in the Unicode database (Unicode only)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\uxxxx Character with 16-bit hex value xxxx (Unicode only) (1)
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only) (2)
\v ASCII Vertical Tab (VT)
\ooo Character with octal value ooo (3,5)
\xhh Character with hex value hh (4,5)
Reference:
[url]http://www.network-theory.co.uk/docs/pylang/Stringliterals.html[/url]
'\A' Matches the start of the string.
'\b' Matches the empty string that forms the boundary at the beginning or end of a word.
'\B' Matches the empty string that is not the beginning or end of a word
'\d' Matches any decimal digit.
'\D' Matches any non-decimal digit.
'\s' Matches any whitespace character.
'\S' Matches any non-whitespace charaacter.
'\w' Matches any alphanumeric character and the underscore.
'\W' Matches any non-alphanumeric character.
'\Z' Matches the end of the string.
Reference:
[url]http://python.about.com/od/regularexpressions/g/regex_spec_char.htm[/url]
后面一个太多了,没有贴完。
()是匹配一个完整的group.
所以,也应该反义.
非" 或 ' 或 > 或 制表位\t 或 )
^符号,本来都具有双义。
建议,对每一个regular expression都操作一下。
在[]中就是表示否定