SO I have a script that pulls information from an event webpage. URL is this: http://everguide.com.au/melbourne/event/2012-jul-14/colour/
This php script is calling a python script (its part of a for loop):
${"tmp" . $i} = utf8_encode (exec("python myscrape.py ${"eu" . $i}"));
It passes a URL. The python script is this:
# -*- coding: utf-8 -*-
import sys
URL = sys.argv[1]
#$URL = 'http://everguide.com.au/melbourne/event/2012-jul-14/colour/'
import urllib2
req = urllib2.Request(URL)
response = urllib2.urlopen(req)
html = response.read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html.decode('utf-8'))
soup.prettify()
import re
for node in soup.findAll(itemprop="name"):
n = ''.join(node.findAll(text=True))
for node in soup.findAll(itemprop="url"):
v = ''.join(node.findAll(text=True))
for node in soup.findAll("div", { "class" : "time" }):
d = ''.join(node.findAll(text=True))
for node in soup.findAll("a", { "id" : "ctl00_holderBody_ctl00_lnkCat" }):
c = ''.join(node.findAll(text=True))
vu = v
vu.encode('utf-8', 'xmlcharrefreplace')
re.escape(vu)
print n,"|", d,"|", vu,"|", c
Which works really well, but only returns up to the or pipe before VU - it cant go past that!
The UTF-8 encoding is set on all files, HTML and php.
When there is a special character in the V variable, it breaks and stops. If there are no special characters, it works perfectly.
Expected output is:
Colour | 14 July @ 7:30PM | 1000 £ Bend | Clubs & Parties
This ouutput can be seen when running the script on the server (with same python command) but over PHP - i cant get the Venue string back in!
Please help
Rick
vu.encode
returns encoded string ... as you're not assigning the encoded result, this is just getting thrown away. Have you tried
vu = vu.encode('utf-8', 'xmlcharrefreplace')
You'll also need to skip the escape as it will mess up encoded unicode.