org.randomcoder.content
Class XHTMLReader

java.lang.Object
  extended by org.xml.sax.helpers.XMLFilterImpl
      extended by org.randomcoder.content.XHTMLReader
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler, XMLFilter, XMLReader

public class XHTMLReader
extends XMLFilterImpl

XMLFilter implementation which filters out dangerous and/or unwanted markup from the input XML.

This class implements a large subset of XHTML (minus dangerous, deprecated, or otherwise undesirable content). Tag and attribute names are canonicalized, non-semantic markup is converted to semantic, and disallowed elements, their children, and attributes are removed.

 Copyright (c) 2006, Craig Condit. All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 
   * Redistributions of source code must retain the above copyright notice,
     this list of conditions and the following disclaimer.
   * Redistributions in binary form must reproduce the above copyright notice,
     this list of conditions and the following disclaimer in the documentation
     and/or other materials provided with the distribution.
     
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.
 


Constructor Summary
XHTMLReader(XMLReader parent, Set<String> allowedClasses, URL baseUrl)
          Creates a new XHTMLReader.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Parses character data.
 void endDocument()
          Marks the end of the current document.
 void endElement(String uri, String localName, String qName)
          Marks the end of the current element.
 void endPrefixMapping(String prefix)
          Ends the current prefix mapping.
 void ignorableWhitespace(char[] ch, int start, int length)
          Processes ignorable whitespace.
 void notationDecl(String name, String publicId, String systemId)
          Processes a notation declaration.
 void processingInstruction(String target, String data)
          Handles processing instructions.
 InputSource resolveEntity(String publicId, String systemId)
          Resolves an entity.
 void skippedEntity(String name)
          Skips an entity.
 void startDocument()
          Marks the beginning of the current document.
 void startElement(String uri, String localName, String qName, Attributes atts)
          Marks the start of an element.
 void startPrefixMapping(String prefix, String uri)
          Starts mapping a prefix.
 void unparsedEntityDecl(String name, String publicId, String systemId, String notationName)
          Handles unparsed entity declaractions.
 
Methods inherited from class org.xml.sax.helpers.XMLFilterImpl
error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, getProperty, parse, parse, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, setProperty, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XHTMLReader

public XHTMLReader(XMLReader parent,
                   Set<String> allowedClasses,
                   URL baseUrl)
Creates a new XHTMLReader.

Parameters:
parent - parent reader to wrap
allowedClasses - set of allowed css classes
baseUrl - base URL for links
Method Detail

startDocument

public void startDocument()
                   throws SAXException
Marks the beginning of the current document.

Specified by:
startDocument in interface ContentHandler
Overrides:
startDocument in class XMLFilterImpl
Throws:
SAXException - if an error occurs

endDocument

public void endDocument()
                 throws SAXException
Marks the end of the current document.

Specified by:
endDocument in interface ContentHandler
Overrides:
endDocument in class XMLFilterImpl
Throws:
SAXException - if an error occurs

startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes atts)
                  throws SAXException
Marks the start of an element.

Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class XMLFilterImpl
Parameters:
uri - URI of element
localName - local part
qName - qualified name
atts - list of attributes
Throws:
SAXException - if an error occurs

endElement

public void endElement(String uri,
                       String localName,
                       String qName)
                throws SAXException
Marks the end of the current element.

Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class XMLFilterImpl
Parameters:
uri - URI of the current element
localName - local part
qName - fully qualified name
Throws:
SAXException - if an error occurs

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
Parses character data.

Specified by:
characters in interface ContentHandler
Overrides:
characters in class XMLFilterImpl
Parameters:
ch - character buffer
start - starting offset in buffer
length - number of characters to process
Throws:
SAXException - if an error occurs

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
                         throws SAXException
Processes ignorable whitespace.

Specified by:
ignorableWhitespace in interface ContentHandler
Overrides:
ignorableWhitespace in class XMLFilterImpl
Parameters:
ch - character buffer to read
start - starting offset in buffer
length - number of characters to read
Throws:
SAXException - if an error occurs

processingInstruction

public void processingInstruction(String target,
                                  String data)
                           throws SAXException
Handles processing instructions.

Specified by:
processingInstruction in interface ContentHandler
Overrides:
processingInstruction in class XMLFilterImpl
Parameters:
target - target of processing instruction
data - associated data
Throws:
SAXException - if an error occurs

unparsedEntityDecl

public void unparsedEntityDecl(String name,
                               String publicId,
                               String systemId,
                               String notationName)
                        throws SAXException
Handles unparsed entity declaractions.

Specified by:
unparsedEntityDecl in interface DTDHandler
Overrides:
unparsedEntityDecl in class XMLFilterImpl
Parameters:
name - name of entity
publicId - public identifier
systemId - system identifier
notationName - notation name
Throws:
SAXException - if an error occurs

resolveEntity

public InputSource resolveEntity(String publicId,
                                 String systemId)
                          throws SAXException,
                                 IOException
Resolves an entity.

Specified by:
resolveEntity in interface EntityResolver
Overrides:
resolveEntity in class XMLFilterImpl
Parameters:
publicId - public identifier
systemId - system identifier
Returns:
InputSource pointing to the requested entity
Throws:
SAXException - if an error occurs
IOException

skippedEntity

public void skippedEntity(String name)
                   throws SAXException
Skips an entity.

Specified by:
skippedEntity in interface ContentHandler
Overrides:
skippedEntity in class XMLFilterImpl
Parameters:
name - name of entity to skip
Throws:
SAXException - if an error occurs

notationDecl

public void notationDecl(String name,
                         String publicId,
                         String systemId)
                  throws SAXException
Processes a notation declaration.

Specified by:
notationDecl in interface DTDHandler
Overrides:
notationDecl in class XMLFilterImpl
Parameters:
name - notation name
publicId - public identifier
systemId - system identifier
Throws:
SAXException - if an error occurs

startPrefixMapping

public void startPrefixMapping(String prefix,
                               String uri)
                        throws SAXException
Starts mapping a prefix.

Specified by:
startPrefixMapping in interface ContentHandler
Overrides:
startPrefixMapping in class XMLFilterImpl
Parameters:
prefix - name of prefix to map
uri - URI of prefix to map
Throws:
SAXException - if an error occurs

endPrefixMapping

public void endPrefixMapping(String prefix)
                      throws SAXException
Ends the current prefix mapping.

Specified by:
endPrefixMapping in interface ContentHandler
Overrides:
endPrefixMapping in class XMLFilterImpl
Parameters:
prefix - prefix to be mapped
Throws:
SAXException - if an error occurs


Copyright © 2006-2010 Craig Condit. All Rights Reserved.