Class HTMLPurifier_Lexer_DOMLex

Description

Parser that uses PHP 5's DOM extension (part of the core).

In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.

Located in /lib/core/Parsers/htmlpurifier/HTMLPurifier.standalone.php (line 13564)

HTMLPurifier_Lexer
   |
   --HTMLPurifier_Lexer_DOMLex
Direct descendents
Class Description
HTMLPurifier_Lexer_PH5P Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.
Method Summary
HTMLPurifier_Lexer_DOMLex __construct ()
void callbackArmorCommentEntities ( $matches)
void callbackUndoCommentSubst ( $matches)
void muteErrorHandler ( $errno,  $errstr)
Tokens tokenizeDOM ($node $node, $tokens &$tokens, [$collect $collect = false])
void tokenizeHTML ( $html,  $config,  $context)
Associative transformAttrToAssoc ($attribute_list $node_map)
void wrapHTML ( $html,  $config,  $context)
Variables
Methods
Constructor __construct (line 13569)
  • access: public
HTMLPurifier_Lexer_DOMLex __construct ()

Redefinition of:
HTMLPurifier_Lexer::__construct()
callbackArmorCommentEntities (line 13723)

Callback function that entity-izes ampersands in comments so that

callbackUndoCommentSubst doesn't clobber them

  • access: public
void callbackArmorCommentEntities ( $matches)
  • $matches
callbackUndoCommentSubst (line 13715)

Callback function for undoing escaping of stray angled brackets

in comments

  • access: public
void callbackUndoCommentSubst ( $matches)
  • $matches
muteErrorHandler (line 13709)

An error handler that mutes all errors

  • access: public
void muteErrorHandler ( $errno,  $errstr)
  • $errno
  • $errstr
tokenizeDOM (line 13621)

Recursive function that tokenizes a node, putting it into an accumulator.

  • return: of node appended to previously passed tokens.
  • access: protected
Tokens tokenizeDOM ($node $node, $tokens &$tokens, [$collect $collect = false])
  • $node $node: DOMNode to be tokenized.
  • $tokens &$tokens: Array-list of already tokenized tokens.
  • $collect $collect: Says whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.
tokenizeHTML (line 13575)
  • access: public
void tokenizeHTML ( $html,  $config,  $context)
  • $html
  • $config
  • $context

Redefinition of:
HTMLPurifier_Lexer::tokenizeHTML()
Lexes an HTML string into tokens.

Redefined in descendants as:
transformAttrToAssoc (line 13694)

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

  • return: array of attributes.
  • access: protected
Associative transformAttrToAssoc ($attribute_list $node_map)
  • $attribute_list $node_map: DOMNamedNodeMap of DOMAttr objects.
wrapHTML (line 13730)

Wraps an HTML fragment in the necessary HTML

  • access: protected
void wrapHTML ( $html,  $config,  $context)
  • $html
  • $config
  • $context

Inherited Methods

Inherited From HTMLPurifier_Lexer

HTMLPurifier_Lexer::__construct()
HTMLPurifier_Lexer::CDATACallback()
HTMLPurifier_Lexer::create()
HTMLPurifier_Lexer::escapeCDATA()
HTMLPurifier_Lexer::escapeCommentedCDATA()
HTMLPurifier_Lexer::extractBody()
HTMLPurifier_Lexer::normalize()
HTMLPurifier_Lexer::parseData()
HTMLPurifier_Lexer::removeIEConditional()
HTMLPurifier_Lexer::tokenizeHTML()

Documentation generated on Sun, 06 Mar 2011 00:24:10 -0500 by phpDocumentor 1.4.3