run.py - Issue 29324625: Issue 2951 - Fix handling of URLs without a scheme

Keyboard Shortcuts

	File
u :	up to issue
m :	publish + mail comments
M :	edit review message
j / k :	jump to file after / before current file
J / K :	jump to next file with a comment after / before current file
	Side-by-side diff
i :	toggle intra-line diffs
e :	expand all comments
c :	collapse all comments
s :	toggle showing all comments
n / p :	next / previous diff chunk or comment
N / P :	next / previous comment
<Up> / <Down> :	next / previous line
<Enter> :	respond to / edit current comment
d :	mark current comment as done

	Issue
u :	up to list of issues
m :	publish + mail comments
j / k :	jump to patch after / before current patch
o / <Enter> :	open current patch in side-by-side view
i :	open current patch in unified diff view

	Issue List
j / k :	jump to issue after / before current issue
o / <Enter> :	open current issue
# :	close issue

	Comment/message editing
<Ctrl> + s or <Ctrl> + Enter :	save comment
<Esc> :	cancel edit

Unified Diff: run.py

Issue 29324625: Issue 2951 - Fix handling of URLs without a scheme (Closed)

Patch Set: Created Aug. 25, 2015, 8:25 p.m.

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

Index: run.py

===================================================================

--- a/run.py

+++ b/run.py

@@ -41,17 +41,18 @@ class CrawlerApp:

request_body_size = int(environ.get('CONTENT_LENGTH', 0))

except (ValueError):

start_response('400 Bad Request', [])

return ''

data = json.loads(environ['wsgi.input'].read(request_body_size))

self.urls.remove(data['url'])

- parsedurl = urlparse.urlparse(data['url'])

+ fullurl = data['url'] if ':' in data['url'] else 'http://' + data['url']

Sebastian Noack 2015/08/26 13:11:52 Please don't repeat yourself: fullurl = data['u

Wladimir Palant 2015/10/09 10:27:34 Done.

+ parsedurl = urlparse.urlparse(fullurl)

urlhash = hashlib.new('md5', data['url']).hexdigest()

timestamp = datetime.datetime.fromtimestamp(data['startTime'] / 1000.0).strftime('%Y-%m-%dT%H%M%S.%f')

basename = "%s-%s-%s" % (parsedurl.hostname, timestamp, urlhash)

datapath = os.path.join(self.parameters.outdir, basename + ".json")

screenshotpath = os.path.join(self.parameters.outdir, basename + ".jpg")

sourcepath = os.path.join(self.parameters.outdir, basename + ".xml")

try:

« no previous file with comments | « no previous file | no next file » | no next file with comments »