Home
Categories
Dictionary
Download
Project Details
Changes Log
FAQ
License

XML parser



The XMLSAXParser class simplifies the creation of XML SAX parser. It works with the ResolverSAXHandler handler.

It allows to either:

The parser can be configured such as exceptions or XML warnings encountered during the validation will not be shown: A custom EntityResolver Can be used for resolving entities during the parsing[1]
As for example the EntityListResolver in the same package
.

ResolverSAXHandler

Main Article: ResolverSAXHandler

The XMLSAXParser class works with the ResolverSAXHandler handler.

Setting the locale

By default, the Locale of the parser is the default locale of the platform. However, it is possible to change the locale of the parser with the two following methods: Note that if you want to be sure to use the English Locale regardless of the platform Locale, you should use theLocale.forLanguageTag("") rather than Locale.forLanguageTag("en") to get the proper Locale because there is no message*_en localized message file for the underlying JDK Xerces parser.

Allowing nestable URL connections

By default the parser does not parse XML files inside zip files (for examle an XML file which would be an entry in a zip file). However it is possible to: Not that if you use the APACHE version of the library, the parser will never parse XML files inside zip files whichever setting you chose.

Prevent the internal creation of readers

By default this class and the XMLIncluder class will try to create readers through theFiles.newBufferedReader(path, charset) method to be sure that the correct encoding will be use regardless of the platform encoding.

In some cases, for XML files which are entries in jar files (for example when using the JPackage Java system), this may cause problems when trying to access the entry using the jar provider, and accessing the entry may throw an exception.

In that case you can use the XMLSAXParser.setDefaultAllowInternalReaders(boolean) or the XMLSAXParser.setAllowInternalReaders(boolean) method to use a simpler InputStream from the URL.

XML encoding parsing behavior

The new parsing behavior uses the encoding set by XMLSAXParser.setEncoding(String) to get a Reader with the correct Charset, not using the default platform Charset anymore.

Reverting to pre 1.2.7 parsing behavior

It is possible to revert to the pre-1.2.7 parsing behavior by using the XMLSAXParser.revertOldReaderBehavior-boolean) static method.

Note that before 1.2.7, the behavior to get a Reader to parse was just:
  Reader reader = new InputStreamReader(url.openStream());
As you can see, the default platform Charset was used (no Charset was specified to decode the file), which could lead to decoding problems in the XML files with special characters if the default platform Charset was not UTF-8.

Parser validation

The parser can be configured for validation:

Additional parser configuration

The parser can be also configured: Note that some XML parsers will not honor the http://apache.org/xml/features/xinclude property. In that case, the included XML documents will not be taken into account by the parser even if the XMLSAXParser.setXIncludeAware(boolean) is set to true. It is however still possible to take into account included XML documents by setting the XMLSAXParser.setConcatenateIncludes(boolean) method to true.

Bulk configuration of the parser

It is possible to configure several parsing options with one method call (and for several parsers) by calling the XMLSAXParser.setParserConfiguration(XMLParserConfiguration) method.

For example:
  XMLParserConfiguration config = new XMLParserConfiguration();
  config.isValidating = false;
  XMLSAXParser parser = new XMLSAXParser("My Parser");
  parser.setParserConfiguration(config);

Add additional parser factory features

It is possible to add any additional parser factory features by calling the XMLSAXParser.setFeature(String, boolean) method.

For example:
  XMLSAXParser parser = new XMLSAXParser("My Parser");
  parser.setFeature("http://javax.xml.XMLConstants/feature/secure-processing", false);

Add additional parser properties

It is possible to add any additional parser properties by calling the XMLSAXParser.setProperty(String, Object) method.

For example:
  XMLSAXParser parser = new XMLSAXParser("My Parser");
  parser.setProperty("namespace-prefixes", true);

Changing the underlying parser

By default this class will use the built-in Xerces parser used in the JRE. However it is possinle to use any JAXP parser factory by using the XMLSAXParser.setSAXParserFactoryImplementation(String) static method. For example:
   XMLSAXParser.setSAXParserFactoryImplementation("com.ctc.wstx.sax.WstxSAXParserFactory");
   XMLSAXParser parser = new XMLSAXParser("My Parser");

Parser inputs

The XMLSAXParser class allows to use several types of inputs for the parsing:

Examples

Create a validating parser using a DTD

  XMLSAXParser parser = new XMLSAXParser("My Parser");
  parser.setValidating(true);
  parser.setHandlerDTD(dtd);
  parser.setHandler(handler);
  parse.parse(file);

Create a validating parser using a Schema

  XMLSAXParser parser = new XMLSAXParser("My Parser");
  parser.setValidating(true);
  parser.setSchema(schema);
  parser.setHandler(handler);
  parse.parse(file);

Create a validating parser and get the parsing exceptions

  XMLSAXParser parser = new XMLSAXParser("My Parser");
  parser.setValidating(true);
  parser.setSchema(schema);
  parser.setHandler(handler);
  parser.showExceptions(false);
  parse.parse(file);

  if (handler.hasParserExceptions()) {
    List>ResolverSAXHandler.ExceptionResult> results = handler.getExceptionResults();
  }

Notes

  1. ^ As for example the EntityListResolver in the same package

See also


Categories: packages | xml

Copyright 2006-2024 Herve Girod. All Rights Reserved. Documentation and source under the LGPL v2 and Apache 2.0 licences