Left: | ||
Right: |
LEFT | RIGHT |
---|---|
1 abpcrawler | 1 abpcrawler |
2 ========== | 2 ========== |
3 | 3 |
4 Firefox extension that loads a range of websites and records which | 4 This tool loads a range of websites in Firefox and records which requests are |
Sebastian Noack
2015/04/27 14:55:50
Apparently its not only a Firefox extension but al
Wladimir Palant
2015/05/07 00:04:59
Done.
| |
5 elements are filtered by [Adblock Plus](http://adblockplus.org). | 5 blocked by the [Adblock Plus extension](http://adblockplus.org). |
6 | 6 |
7 Requirements | 7 Requirements |
8 ------------ | 8 ------------ |
9 | 9 |
10 * [Python 2.x](https://www.python.org) | 10 * [Python 2.7](https://www.python.org) |
Sebastian Noack
2015/04/27 14:55:50
We actually require Python 2.7 specifically, as we
Wladimir Palant
2015/05/07 00:04:59
Done.
| |
11 * [The Jinja2 module](http://jinja.pocoo.org/docs) | 11 * [The Jinja2 module](http://jinja.pocoo.org/docs) |
12 * [mozrunner module](https://pypi.python.org/pypi/mozrunner) | 12 * [mozrunner module](https://pypi.python.org/pypi/mozrunner) |
Sebastian Noack
2015/04/27 14:55:50
I think you should add Firefox to that list as wel
Wladimir Palant
2015/05/07 00:04:59
Done.
| |
13 * [Firefox](https://www.mozilla.org/en-US/firefox/) | |
13 | 14 |
14 Running | 15 Running |
15 ------- | 16 ------- |
16 | 17 |
17 Execute the following: | 18 Execute the following: |
18 | 19 |
19 ./run.py -b /usr/bin/firefox urls.txt outputdir | 20 ./run.py -b /usr/bin/firefox urls.txt outputdir |
20 | 21 |
21 This will run the specified Firefox binary to crawl the URLs from `urls.txt` | 22 This will run the specified Firefox binary to crawl the URLs from `urls.txt` |
22 (one URL per line). The resulting data and screenshots will be written to the | 23 (one URL per line). The resulting data and screenshots will be written to the |
23 `outputdir` directory. Firefox will close automatically once all URLs have been | 24 `outputdir` directory. Firefox will close automatically once all URLs have been |
24 processed. | 25 processed. |
25 | 26 |
26 Optionally, you can provide the path to the Adblock Plus repository - Adblock | 27 The complete list of command line flags: |
saroyanm
2015/05/04 18:13:43
Maybe make sense to also add some small notes abou
Wladimir Palant
2015/05/07 00:04:59
Done.
| |
27 Plus will no longer be downloaded then. | 28 |
29 -h, --help show help message and exit | |
30 -b BINARY, --binary BINARY | |
31 path to the Firefox binary | |
32 -a ABPDIR, --abpdir ABPDIR | |
33 path to the Adblock Plus repository | |
34 -f url [url ...], --filters url [url ...] | |
35 filter lists to install in Adblock Plus. The arguments | |
36 can also have the format path=url, the data will be | |
37 read from the specified path then. | |
38 -t TIMEOUT, --timeout TIMEOUT | |
39 Load timeout (seconds) | |
40 -x MAXTABS, --maxtabs MAXTABS | |
41 Maximal number of tabs to open in parallel | |
28 | 42 |
29 License | 43 License |
saroyanm
2015/05/04 18:13:43
Is there a purpose why we use MPL instead of GPL ?
Wladimir Palant
2015/05/07 00:04:59
This extension was written before we switched to t
saroyanm
2015/05/07 13:19:00
Got it, thanks.
| |
30 ------- | 44 ------- |
31 | 45 |
32 This Source Code is subject to the terms of the Mozilla Public License | 46 This Source Code is subject to the terms of the Mozilla Public License |
33 version 2.0 (the "License"). You can obtain a copy of the License at | 47 version 2.0 (the "License"). You can obtain a copy of the License at |
34 http://mozilla.org/MPL/2.0/. | 48 http://mozilla.org/MPL/2.0/. |
LEFT | RIGHT |