Class HTMLParser
object --+
|
_BaseParser --+
|
_FeedParser --+
|
HTMLParser
- Known Subclasses:
-
HTMLParser(self, recover=True, no_network=True, remove_blank_text=False, compact=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
The HTML parser.
This parser allows reading HTML into a normal XML tree. By
default, it can read broken (non well-formed) HTML, depending on
the capabilities of libxml2. Use the 'recover' option to switch
this off.
Available boolean keyword arguments:
- recover - try hard to parse through broken HTML (default: True)
- no_network - prevent network access for related files (default: True)
- remove_blank_text - discard empty text nodes
- remove_comments - discard comments
- remove_pis - discard processing instructions
- compact - safe memory for short text content (default: True)
Other keyword arguments:
- encoding - override the document encoding
- target - a parser target object that will receive the parse events
- schema - an XMLSchema to validate against
Note that you should avoid sharing parsers between threads for performance
reasons.
|
|
__init__(self,
recover=True,
no_network=True,
remove_blank_text=False,
compact=True,
remove_comments=False,
remove_pis=False,
target=None,
encoding=None,
schema=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature |
|
|
|
a new object with type S, a subtype of T
|
|
|
Inherited from _FeedParser:
close,
feed
Inherited from _BaseParser:
copy,
makeelement,
setElementClassLookup,
set_element_class_lookup
Inherited from object:
__delattr__,
__getattribute__,
__hash__,
__reduce__,
__reduce_ex__,
__repr__,
__setattr__,
__str__
|
|
Inherited from _FeedParser:
feed_error_log
Inherited from _BaseParser:
error_log,
resolvers,
version
Inherited from object:
__class__
|
__init__(self,
recover=True,
no_network=True,
remove_blank_text=False,
compact=True,
remove_comments=False,
remove_pis=False,
target=None,
encoding=None,
schema=None)
(Constructor)
|
|
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
- Overrides:
object.__init__
|
- Returns: a new object with type S, a subtype of T
- Overrides:
object.__new__
|