README.md - Issue 29465720: Issue 4970 - Document the library API of python-abp

Delta Between Two Patch Sets: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)

Left Patch Set: Created June 14, 2017, 5:45 p.m.

Right Patch Set: Rebase to match the new master and retouche the docstrings. Created Oct. 24, 2017, 4:06 p.m.

Left:
Right:

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

Left: Side by side diff | Download
Right: Side by side diff | Download

LEFT	RIGHT
1 # python-abp	1 # python-abp

2	2

3 This repository contains a library for working with Adblock Plus filter lists	3 This repository contains a library for working with Adblock Plus filter lists

4 and the script that is used for building Adblock Plus filter lists from the	4 and the script that is used for building Adblock Plus filter lists from the

5 form in which they are authored into the format suitable for consumption by the	5 form in which they are authored into the format suitable for consumption by the

6 adblocking software.	6 adblocking software.

7	7

8 ## Installation	8 ## Installation

9	9

10 Prerequisites:	10 Prerequisites:

11	11

12 * Linux, Mac OS X or Windows (any modern Unix should work too),	12 * Linux, Mac OS X or Windows (any modern Unix should work too),

13 * Python (2.7 or 3.5, 3.6),	13 * Python (2.7 or 3.5+),

14 * pip.	14 * pip.

15	15

16 To install:	16 To install:

17	17

18 $ pip install -U python-abp	18 $ pip install -U python-abp

19	19

20 ## Rendering of filter lists	20 ## Rendering of filter lists

21	21

22 The filter lists are originally authored in relatively smaller parts focused	22 The filter lists are originally authored in relatively smaller parts focused

23 on a particular type of filters, related to a specific topic or relevant	23 on a particular type of filters, related to a specific topic or relevant

(...skipping 23 matching lines...) Expand all Loading...
47 The first instruction contains a URL that will be fetched and inserted at the	47 The first instruction contains a URL that will be fetched and inserted at the

48 point of reference.	48 point of reference.

49 The second one contains a path inside easylist repository.	49 The second one contains a path inside easylist repository.

50 `flrender` needs to be able to find a copy of the repository on the local	50 `flrender` needs to be able to find a copy of the repository on the local

51 filesystem. We use `-i` option to point it to to the right directory:	51 filesystem. We use `-i` option to point it to to the right directory:

52	52

53 $ flrender -i easylist=/home/abc/easylist input.txt output.txt	53 $ flrender -i easylist=/home/abc/easylist input.txt output.txt

54	54

55 Now the second reference above will be resolved to	55 Now the second reference above will be resolved to

56 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will	56 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will

57 be read from this file.	57 be loaded from this file.

58	58

59 Directories that contain filter list fragments that are used during rendering	59 Directories that contain filter list fragments that are used during rendering

60 are called sources.	60 are called sources.

61 They are normally working copies of the repositories that contain filter list	61 They are normally working copies of the repositories that contain filter list

62 fragments.	62 fragments.

63 Each source is identified by a name: that's the part that comes before ":"	63 Each source is identified by a name: that's the part that comes before ":"

64 in the include instruction and it should be the same as what comes before "="	64 in the include instruction and it should be the same as what comes before "="

65 in the `-i` option.	65 in the `-i` option.

66	66

67 Commonly used sources have generally accepted names. For example the main	67 Commonly used sources have generally accepted names. For example the main

68 EasyList repository is referred to as `easylist`.	68 EasyList repository is referred to as `easylist`.

69 If you don't know all the source names that are needed to render some list,	69 If you don't know all the source names that are needed to render some list,

70 just run `flrender` and it will report what it's missing:	70 just run `flrender` and it will report what it's missing:

71	71

72 $ flrender easylist.txt output/easylist.txt	72 $ flrender easylist.txt output/easylist.txt

73 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener	73 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener

74 al_block.txt' from 'easylist.txt'	74 al_block.txt' from 'easylist.txt'

75	75

76 You can clone the necessary repositories to a local directory and add `-i`	76 You can clone the necessary repositories to a local directory and add `-i`

77 options accordingly.	77 options accordingly.

78	78

79 ## Library API	79 ## Library API

80	80

81 Python-abp can also be used as a library for parsing filter lists. For example	81 Python-abp can also be used as a library for parsing filter lists. For example

82 to read a filter list (we use Python 3 syntax here but the API is the same):	82 to read a filter list (we use Python 3 syntax here but the API is the same):

83	83

84 from abp.filter import parse_filterlist	84 from abp.filters import parse_filterlist

85	85

86 with open('filterlist.txt') as filterlist:	86 with open('filterlist.txt') as filterlist:

87 for line in parse_filterlist(filterlist):	87 for line in parse_filterlist(filterlist):

88 print(line)	88 print(line)

89	89

90 If `filterlist.txt` contains a filter list, the output will look similar to	90 If `filterlist.txt` contains a filter list:

91 the following:	91

	92 [Adblock Plus 2.0]

	93 ! Title: Example list

	94

	95 abc.com,cdf.com##div#ad1

	96 abc.com/ad$image

	97 @@/abc\.com/

	98 ...

	99

	100 the output will look something like:

92	101

93 Header(version='Adblock Plus 2.0')	102 Header(version='Adblock Plus 2.0')

94 Metadata(key='Title', value='Example List')	103 Metadata(key='Title', value='Example list')

95 EmptyLine()	104 EmptyLine()

96 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value':	105 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])])

97 'div#ad1'}, action='hide', options={'domains-include': ['abc.com',	106 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)])

98 'cdf.com'], 'domains-none': True})	107 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[])

99 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value':

100 'abc.com/ad'}, action='block', options={'types-none': True,

101 'types-include': ['image']})

102 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value':

103 'abc\\.com'}, action='allow', options={})

104 ...	108 ...

105	109

106 In general `parse_filterlist` takes an iterable of strings (such as a list or	110 `abp.filters` module also exports a lower-level function for parsing individual

107 an open file) and returns an iterable of parsed filter list lines. Each line	111 lines of a filter list: `parse_line`. It returns a parsed line object just like

108 will have its `.type` attribute set to a string indicating its type. It will	112 the items in the iterator returned by `parse_filterlist`.

109 also have a `.to_string()` method that converts it to a unicode string in the

110 filter list format (most of the time it's the same as the string from which the

111 filter was parsed). Further attributes depend on the type of the line.

112	113

113 Note: `parse_filterlist` returns an iterator, not a list, and only consumes	114 For further information on the library API use `help()` on `abp.filters` and

114 the input lines when its output is iterated over. This allows much more memory	115 its contents in interactive Python session, read the docstrings or look at the

115 efficient handling of large filter lists, however there are two things to watch	116 tests for some usage examples.

116 out for:

117

118 - When you're parsing filters from a file, you need to complete the iteration

119 before you close the file.

120 - Once you iterate over the output of `parse_filterlist` once, it will be

121 consumed and you won't be iterate over it again.

122

123 If you find that any of these issues is bothering you, you probably want to

124 convert the output of `parse_filterlist` to a list:

125

126 lines_list = list(parse_filterlist(filterlist))

127

128 This will load the whole file into memory but unless you're dealing with a

129 gigantic filter list that should not be a problem.

130

131 ### Line types

132

133 As mentioned before, lines of different types have different attributes:

134

135 \| type \| attributes \|

136 \|------------\|------------------------------------------------------------------ ------\|

137 \| header \| `version` - plugin version string \|

138 \| emptyline \| no options \|

139 \| comment \| `text` - text of the comment \|

140 \| metadata \| `key` - name of the metadata field, `value` - value of the field \|

141 \| include \| `target` - url/path of the file to include \|

142 \| invalid \| `text` - full text of the line, error - error message \|

143 \| filter \| `text` - text of the filter, `selector` - what to look for, `acti on` - what to do with selected items, `options` - filter options \|

144

145 #### Filter atributes

146

147 Selector is a dictionary with two keys:

148

149 \| key \| meaning \|

150 \|--------------\|----------------------------------------------------\|

151 \| type \| 'css', 'abp-simple', 'url-pattern', 'url-regexp' \|

152 \| value \| the selector itself, the meaning is type-dependent \|

153

154 Options is a dictionary with a variable set of keys. Only options that are

155 actually present in the filter will be stored there. The list of possible option s

156 and their meanings can be found in [documentation on authoring the filter

157 rules][2].

158

159 There are four classes of options that are handled differently:

160

161 - Type options (that make the rule apply or not apply to certain types of

162 requests and resources):

163 - `types-include`: List of additional types to which the rule applies.

164 - `types-exclude`: List of types to which the rule doesn't apply.

165 - `types-none`: If this is `True`, the filter only applies to the types

166 in `types-include`. Otherwise all types except for `document`, `popup`,

167 `elemhide`, `generichide` and `genericblock` are implicitly included.

168 - Domain options (that make the rule apply or not apply to specific domains):

169 - `domains-include`: List of domains to which the rule applies (it will also

170 apply to any subdomains unless they are excluded).

171 - `domains-exclude`: Excluded domains (their subdomains are also excluded

172 unless specifically included).

173 - `domains-none`: If this is `True`, all domains that are not mentioned by

174 `domains-include` and `domains-exclude` are excluded. Otherwise they are

175 included.

176 - `sitekeys`: List of sitekeys that can be used to activate the rule.

177 - Flags: `third-party`, `collapse`, `match-case`, etc. See [documentation][2]

178 for more information on their meaning.

179

180 ### Other functions

181

182 `abp.filters` module also exports two lower-level functions for parsing

183 individual lines of filter list or individual filters. Not very surprisingly

184 they are called `parse_line` and `parse_filter` respectively. Both will return

185 a parsed line object just like the items in the iterator returned by

186 `parse_filterlist`. The difference between them is that `parse_line` tries to

187 do line type detection and `parse_filter` will always try to interpret things

188 as a filter. Both functions will throw a `ParseError` exception instead of

189 returning a line with `type="invalid"`.

190	117

191 ## Testing	118 ## Testing

192	119

193 Unit tests for `python-abp` are located in the `/tests` directory.	120 Unit tests for `python-abp` are located in the `/tests` directory.

194 [Pytest][3] is used for quickly running the tests	121 [Pytest][3] is used for quickly running the tests

195 during development.	122 during development.

196 [Tox][4] is used for testing in different	123 [Tox][4] is used for testing in different

197 environments (Python 2.7, 3.5, 3.6 and PyPy) and code quality	124 environments (Python 2.7, Python 3.5+ and PyPy) and code quality

198 reporting.	125 reporting.

199	126

200 In order to execute the tests, first create and activate development	127 In order to execute the tests, first create and activate development

201 virtualenv:	128 virtualenv:

202	129

203 $ python setup.py devenv	130 $ python setup.py devenv

204 $ . devenv/bin/activate	131 $ . devenv/bin/activate

205	132

206 With the development virtualenv activated use pytest for a quick test run:	133 With the development virtualenv activated use pytest for a quick test run:

207	134

208 (devenv) $ pytest tests	135 (devenv) $ pytest tests

209	136

210 and tox for a comprehensive report:	137 and tox for a comprehensive report:

211	138

212 (devenv) $ tox	139 (devenv) $ tox

213	140

	141 ## Development

	142

	143 When adding new functionality, add tests for it (preferably first). Code

	144 coverage (as measured by `tox -e qa`) should not decrease and the tests

	145 should pass in all Tox environments.

	146

	147 All public functions, classes and methods should have docstrings compliant with

	148 [NumPy/SciPy documentation guide][5]. One exception is the constructors of

	149 classes that the user is not expected to instantiate (such as exceptions).

214	150

215 [1]: https://adblockplus.org/filters#special-comments	151 [1]: https://adblockplus.org/filters#special-comments

216 [2]: https://adblockplus.org/filters#options	152 [2]: https://adblockplus.org/filters#options

217 [3]: http://pytest.org/	153 [3]: http://pytest.org/

218 [4]: https://tox.readthedocs.org/	154 [4]: https://tox.readthedocs.org/

	155 [5]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt

LEFT	RIGHT