README.md - Issue 29465720: Issue 4970 - Document the library API of python-abp

Side by Side Diff: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)

Patch Set: Created June 14, 2017, 5:45 p.m.

Left:
Right:

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

View unified diff | Download patch

OLD	NEW
1 # python-abp	1 # python-abp

2	2

3 This repository contains the script that is used for building Adblock Plus	3 This repository contains a library for working with Adblock Plus filter lists

4 filter lists from the form in which they are authored into the format suitable	4 and the script that is used for building Adblock Plus filter lists from the

5 for consumption by the adblocking software.	5 form in which they are authored into the format suitable for consumption by the

	6 adblocking software.

6	7

7 ## Installation	8 ## Installation

8	9

9 Prerequisites:	10 Prerequisites:

10	11

11 * Linux, Mac OS X or Windows (any modern Unix should work too),	12 * Linux, Mac OS X or Windows (any modern Unix should work too),

12 * Python (2.7 or 3.5),	13 * Python (2.7 or 3.5, 3.6),

13 * pip.	14 * pip.

14	15

15 To install:	16 To install:

16	17

17 $ pip install -U python-abp	18 $ pip install -U python-abp

18	19

19 ## Rendering of filter lists	20 ## Rendering of filter lists

20	21

21 The filter lists are originally authored in relatively smaller parts focused	22 The filter lists are originally authored in relatively smaller parts focused

22 on a particular type of filters, related to a specific topic or relevant	23 on a particular type of filters, related to a specific topic or relevant

23 for particular geographical area.	24 for particular geographical area.

24 We call these parts _filter list fragments_ (or just _fragments_)	25 We call these parts _filter list fragments_ (or just _fragments_)

25 to distinguish them from full filter lists that are	26 to distinguish them from full filter lists that are

26 consumed by the adblocking software such as Adblock Plus.	27 consumed by the adblocking software such as Adblock Plus.

27	28

28 Rendering is a process that combines filter list fragments into a filter list.	29 Rendering is a process that combines filter list fragments into a filter list.

29 It starts with one fragment that can include other ones and so forth.	30 It starts with one fragment that can include other ones and so forth.

30 The produced filter list is marked with a version, a timestamp and	31 The produced filter list is marked with a version, a timestamp and

31 a [checksum](https://adblockplus.org/filters#special-comments).	32 a [checksum][1].

32	33

33 Python-abp contains a script that can do this called `flrender`:	34 Python-abp contains a script that can do this called `flrender`:

34	35

35 $ flrender fragment.txt filterlist.txt	36 $ flrender fragment.txt filterlist.txt

36	37

37 This will take the top level fragment in `fragment.txt`, render it and save into	38 This will take the top level fragment in `fragment.txt`, render it and save into

38 `filterlist.txt`.	39 `filterlist.txt`.

39	40

40 Fragments might reference other fragments that should be included into them.	41 Fragments might reference other fragments that should be included into them.

41 The references come in two forms: http(s) includes and local includes:	42 The references come in two forms: http(s) includes and local includes:

42	43

43 %include http://www.server.org/dir/list.txt%	44 %include http://www.server.org/dir/list.txt%

44 %include easylist:easylist/easylist_general_block.txt	45 %include easylist:easylist/easylist_general_block.txt%

45	46

46 The first instruction contains a URL that will be fetched and inserted at the	47 The first instruction contains a URL that will be fetched and inserted at the

47 point of reference.	48 point of reference.

48 The second one contains a path inside easylist repository.	49 The second one contains a path inside easylist repository.

49 `flrender` needs to be able to find a copy of the repository on the local	50 `flrender` needs to be able to find a copy of the repository on the local

50 filesystem. We use `-i` option to point it to to the right directory:	51 filesystem. We use `-i` option to point it to to the right directory:

51	52

52 $ flrender -i easylist=/home/abc/easylist input.txt output.txt	53 $ flrender -i easylist=/home/abc/easylist input.txt output.txt

53	54

54 Now the second reference above will be resolved to	55 Now the second reference above will be resolved to

(...skipping 13 matching lines...) Expand all Loading...
68 If you don't know all the source names that are needed to render some list,	69 If you don't know all the source names that are needed to render some list,

69 just run `flrender` and it will report what it's missing:	70 just run `flrender` and it will report what it's missing:

70	71

71 $ flrender easylist.txt output/easylist.txt	72 $ flrender easylist.txt output/easylist.txt

72 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener	73 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener

73 al_block.txt' from 'easylist.txt'	74 al_block.txt' from 'easylist.txt'

74	75

75 You can clone the necessary repositories to a local directory and add `-i`	76 You can clone the necessary repositories to a local directory and add `-i`

76 options accordingly.	77 options accordingly.

77	78

	79 ## Library API

	80

	81 Python-abp can also be used as a library for parsing filter lists. For example

	82 to read a filter list (we use Python 3 syntax here but the API is the same):

	83

	84 from abp.filter import parse_filterlist

	85

	86 with open('filterlist.txt') as filterlist:

	87 for line in parse_filterlist(filterlist):

	88 print(line)

	89

	90 If `filterlist.txt` contains a filter list, the output will look similar to

	91 the following:

	92

	93 Header(version='Adblock Plus 2.0')

	94 Metadata(key='Title', value='Example List')

	95 EmptyLine()

	96 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value':

	97 'div#ad1'}, action='hide', options={'domains-include': ['abc.com',

	98 'cdf.com'], 'domains-none': True})

	99 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value':

	100 'abc.com/ad'}, action='block', options={'types-none': True,

	101 'types-include': ['image']})

	102 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value':

	103 'abc\\.com'}, action='allow', options={})

	104 ...

	105

	106 In general `parse_filterlist` takes an iterable of strings (such as a list or

	107 an open file) and returns an iterable of parsed filter list lines. Each line

	108 will have its `.type` attribute set to a string indicating its type. It will

	109 also have a `.to_string()` method that converts it to a unicode string in the

	110 filter list format (most of the time it's the same as the string from which the

	111 filter was parsed). Further attributes depend on the type of the line.

	112

	113 Note: `parse_filterlist` returns an iterator, not a list, and only consumes

	114 the input lines when its output is iterated over. This allows much more memory

	115 efficient handling of large filter lists, however there are two things to watch

	116 out for:

	117

	118 - When you're parsing filters from a file, you need to complete the iteration

	119 before you close the file.

	120 - Once you iterate over the output of `parse_filterlist` once, it will be

	121 consumed and you won't be iterate over it again.

	122

	123 If you find that any of these issues is bothering you, you probably want to

	124 convert the output of `parse_filterlist` to a list:

	125

	126 lines_list = list(parse_filterlist(filterlist))

	127

	128 This will load the whole file into memory but unless you're dealing with a

	129 gigantic filter list that should not be a problem.

	130

	131 ### Line types

	132

	133 As mentioned before, lines of different types have different attributes:

	134

	135 \| type \| attributes \|

	136 \|------------\|------------------------------------------------------------------ ------\|

	137 \| header \| `version` - plugin version string \|

	138 \| emptyline \| no options \|

	139 \| comment \| `text` - text of the comment \|

	140 \| metadata \| `key` - name of the metadata field, `value` - value of the field \|

	141 \| include \| `target` - url/path of the file to include \|

	142 \| invalid \| `text` - full text of the line, error - error message \|

	143 \| filter \| `text` - text of the filter, `selector` - what to look for, `acti on` - what to do with selected items, `options` - filter options \|

	144

	145 #### Filter atributes

	146

	147 Selector is a dictionary with two keys:

	148

	149 \| key \| meaning \|

	150 \|--------------\|----------------------------------------------------\|

	151 \| type \| 'css', 'abp-simple', 'url-pattern', 'url-regexp' \|

	152 \| value \| the selector itself, the meaning is type-dependent \|

	153

	154 Options is a dictionary with a variable set of keys. Only options that are

	155 actually present in the filter will be stored there. The list of possible option s

	156 and their meanings can be found in [documentation on authoring the filter

	157 rules][2].

	158

	159 There are four classes of options that are handled differently:

	160

	161 - Type options (that make the rule apply or not apply to certain types of

	162 requests and resources):

	163 - `types-include`: List of additional types to which the rule applies.

	164 - `types-exclude`: List of types to which the rule doesn't apply.

	165 - `types-none`: If this is `True`, the filter only applies to the types

	166 in `types-include`. Otherwise all types except for `document`, `popup`,

	167 `elemhide`, `generichide` and `genericblock` are implicitly included.

	168 - Domain options (that make the rule apply or not apply to specific domains):

	169 - `domains-include`: List of domains to which the rule applies (it will also

	170 apply to any subdomains unless they are excluded).

	171 - `domains-exclude`: Excluded domains (their subdomains are also excluded

	172 unless specifically included).

	173 - `domains-none`: If this is `True`, all domains that are not mentioned by

	174 `domains-include` and `domains-exclude` are excluded. Otherwise they are

	175 included.

	176 - `sitekeys`: List of sitekeys that can be used to activate the rule.

	177 - Flags: `third-party`, `collapse`, `match-case`, etc. See [documentation][2]

	178 for more information on their meaning.

	179

	180 ### Other functions

	181

	182 `abp.filters` module also exports two lower-level functions for parsing

	183 individual lines of filter list or individual filters. Not very surprisingly

	184 they are called `parse_line` and `parse_filter` respectively. Both will return

	185 a parsed line object just like the items in the iterator returned by

	186 `parse_filterlist`. The difference between them is that `parse_line` tries to

	187 do line type detection and `parse_filter` will always try to interpret things

	188 as a filter. Both functions will throw a `ParseError` exception instead of

	189 returning a line with `type="invalid"`.

	190

78 ## Testing	191 ## Testing

79	192

80 Unit tests for `python-abp` are located in the `/tests` directory.	193 Unit tests for `python-abp` are located in the `/tests` directory.

81 [Pytest](http://pytest.org/) is used for quickly running the tests	194 [Pytest][3] is used for quickly running the tests

82 during development.	195 during development.

83 [Tox](https://tox.readthedocs.org/) is used for testing in different	196 [Tox][4] is used for testing in different

84 environments (Python 2.7, Python 3.5 and PyPy) and code quality	197 environments (Python 2.7, 3.5, 3.6 and PyPy) and code quality

85 reporting.	198 reporting.

86	199

87 In order to execute the tests, first create and activate development	200 In order to execute the tests, first create and activate development

88 virtualenv:	201 virtualenv:

89	202

90 $ python setup.py devenv	203 $ python setup.py devenv

91 $ . devenv/bin/activate	204 $ . devenv/bin/activate

92	205

93 With the development virtualenv activated use pytest for a quick test run:	206 With the development virtualenv activated use pytest for a quick test run:

94	207

95 (devenv) $ py.test tests	208 (devenv) $ pytest tests

96	209

97 and tox for a comprehensive report:	210 and tox for a comprehensive report:

98	211

99 (devenv) $ tox	212 (devenv) $ tox

	213

	214

	215 [1]: https://adblockplus.org/filters#special-comments

	216 [2]: https://adblockplus.org/filters#options

	217 [3]: http://pytest.org/

	218 [4]: https://tox.readthedocs.org/

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »