README.md - Issue 29465720: Issue 4970 - Document the library API of python-abp

Unified Diff: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)

Patch Set: Created June 14, 2017, 5:45 p.m.

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

View side-by-side diff with in-line comments

Index: README.md

===================================================================

--- a/README.md

+++ b/README.md

@@ -1,20 +1,21 @@

# python-abp

-This repository contains the script that is used for building Adblock Plus

-filter lists from the form in which they are authored into the format suitable

-for consumption by the adblocking software.

+This repository contains a library for working with Adblock Plus filter lists

+and the script that is used for building Adblock Plus filter lists from the

+form in which they are authored into the format suitable for consumption by the

+adblocking software.

## Installation

Prerequisites:

* Linux, Mac OS X or Windows (any modern Unix should work too),

-* Python (2.7 or 3.5),

+* Python (2.7 or 3.5, 3.6),

* pip.

To install:

$ pip install -U python-abp

## Rendering of filter lists

@@ -23,30 +24,30 @@

for particular geographical area.

We call these parts _filter list fragments_ (or just _fragments_)

to distinguish them from full filter lists that are

consumed by the adblocking software such as Adblock Plus.

Rendering is a process that combines filter list fragments into a filter list.

It starts with one fragment that can include other ones and so forth.

The produced filter list is marked with a version, a timestamp and

-a [checksum](https://adblockplus.org/filters#special-comments).

+a [checksum][1].

Python-abp contains a script that can do this called `flrender`:

$ flrender fragment.txt filterlist.txt

This will take the top level fragment in `fragment.txt`, render it and save into

`filterlist.txt`.

Fragments might reference other fragments that should be included into them.

The references come in two forms: http(s) includes and local includes:

%include http://www.server.org/dir/list.txt%

- %include easylist:easylist/easylist_general_block.txt

+ %include easylist:easylist/easylist_general_block.txt%

The first instruction contains a URL that will be fetched and inserted at the

point of reference.

The second one contains a path inside easylist repository.

`flrender` needs to be able to find a copy of the repository on the local

filesystem. We use `-i` option to point it to to the right directory:

$ flrender -i easylist=/home/abc/easylist input.txt output.txt

@@ -70,30 +71,148 @@

$ flrender easylist.txt output/easylist.txt

Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener

al_block.txt' from 'easylist.txt'

You can clone the necessary repositories to a local directory and add `-i`

options accordingly.

+## Library API

+Python-abp can also be used as a library for parsing filter lists. For example

+to read a filter list (we use Python 3 syntax here but the API is the same):

+ from abp.filter import parse_filterlist

+ with open('filterlist.txt') as filterlist:

+ for line in parse_filterlist(filterlist):

+ print(line)

+If `filterlist.txt` contains a filter list, the output will look similar to

+the following:

+ Header(version='Adblock Plus 2.0')

+ Metadata(key='Title', value='Example List')

+ EmptyLine()

+ Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value':

+ 'div#ad1'}, action='hide', options={'domains-include': ['abc.com',

+ 'cdf.com'], 'domains-none': True})

+ Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value':

+ 'abc.com/ad'}, action='block', options={'types-none': True,

+ 'types-include': ['image']})

+ Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value':

+ 'abc\\.com'}, action='allow', options={})

+ ...

+In general `parse_filterlist` takes an iterable of strings (such as a list or

+an open file) and returns an iterable of parsed filter list lines. Each line

+will have its `.type` attribute set to a string indicating its type. It will

+also have a `.to_string()` method that converts it to a unicode string in the

+filter list format (most of the time it's the same as the string from which the

+filter was parsed). Further attributes depend on the type of the line.

+**Note:** `parse_filterlist` returns an iterator, not a list, and only consumes

+the input lines when its output is iterated over. This allows much more memory

+efficient handling of large filter lists, however there are two things to watch

+out for:

+- When you're parsing filters from a file, you need to complete the iteration

+ before you close the file.

+- Once you iterate over the output of `parse_filterlist` once, it will be

+ consumed and you won't be iterate over it again.

+If you find that any of these issues is bothering you, you probably want to

+convert the output of `parse_filterlist` to a list:

+ lines_list = list(parse_filterlist(filterlist))

+This will load the whole file into memory but unless you're dealing with a

+gigantic filter list that should not be a problem.

+### Line types

+As mentioned before, lines of different types have different attributes:

+| type | attributes |

+|------------|------------------------------------------------------------------------|

+| header | `version` - plugin version string |

+| emptyline | no options |

+| comment | `text` - text of the comment |

+| metadata | `key` - name of the metadata field, `value` - value of the field |

+| include | `target` - url/path of the file to include |

+| invalid | `text` - full text of the line, error - error message |

+| filter | `text` - text of the filter, `selector` - what to look for, `action` - what to do with selected items, `options` - filter options |

+#### Filter atributes

+Selector is a dictionary with two keys:

+| key | meaning |

+|--------------|----------------------------------------------------|

+| type | 'css', 'abp-simple', 'url-pattern', 'url-regexp' |

+| value | the selector itself, the meaning is type-dependent |

+Options is a dictionary with a variable set of keys. Only options that are

+actually present in the filter will be stored there. The list of possible options

+and their meanings can be found in [documentation on authoring the filter

+rules][2].

+There are four classes of options that are handled differently:

+- Type options (that make the rule apply or not apply to certain types of

+ requests and resources):

+ - `types-include`: List of additional types to which the rule applies.

+ - `types-exclude`: List of types to which the rule doesn't apply.

+ - `types-none`: If this is `True`, the filter only applies to the types

+ in `types-include`. Otherwise all types except for `document`, `popup`,

+ `elemhide`, `generichide` and `genericblock` are implicitly included.

+- Domain options (that make the rule apply or not apply to specific domains):

+ - `domains-include`: List of domains to which the rule applies (it will also

+ apply to any subdomains unless they are excluded).

+ - `domains-exclude`: Excluded domains (their subdomains are also excluded

+ unless specifically included).

+ - `domains-none`: If this is `True`, all domains that are not mentioned by

+ `domains-include` and `domains-exclude` are excluded. Otherwise they are

+ included.

+- `sitekeys`: List of sitekeys that can be used to activate the rule.

+- Flags: `third-party`, `collapse`, `match-case`, etc. See [documentation][2]

+ for more information on their meaning.

+### Other functions

+`abp.filters` module also exports two lower-level functions for parsing

+individual lines of filter list or individual filters. Not very surprisingly

+they are called `parse_line` and `parse_filter` respectively. Both will return

+a parsed line object just like the items in the iterator returned by

+`parse_filterlist`. The difference between them is that `parse_line` tries to

+do line type detection and `parse_filter` will always try to interpret things

+as a filter. Both functions will throw a `ParseError` exception instead of

+returning a line with `type="invalid"`.

## Testing

Unit tests for `python-abp` are located in the `/tests` directory.

-[Pytest](http://pytest.org/) is used for quickly running the tests

+[Pytest][3] is used for quickly running the tests

during development.

-[Tox](https://tox.readthedocs.org/) is used for testing in different

-environments (Python 2.7, Python 3.5 and PyPy) and code quality

+[Tox][4] is used for testing in different

+environments (Python 2.7, 3.5, 3.6 and PyPy) and code quality

reporting.

In order to execute the tests, first create and activate development

virtualenv:

$ python setup.py devenv

$ . devenv/bin/activate

With the development virtualenv activated use pytest for a quick test run:

- (devenv) $ py.test tests

+ (devenv) $ pytest tests

and tox for a comprehensive report:

(devenv) $ tox

+ [1]: https://adblockplus.org/filters#special-comments

+ [2]: https://adblockplus.org/filters#options

+ [3]: http://pytest.org/

+ [4]: https://tox.readthedocs.org/

« no previous file with comments | « no previous file | no next file » | no next file with comments »