Question: How and in what ways is XML data parsed?

Last time we looked at four ways to parse JSON, so this time we’ll look at four ways to parse XML.

Four ways to parse

  • The DOM parsing
  • SAX parsing
  • JDOM parsing
  • DOM4J to parse

A case in field

The DOM parsing

DOM (Document Object Model), in the application, the XML parser based on DOM transforms an XML Document into a collection of Object models (usually called DOM tree), the application is through the operation of this Object Model, To implement operations on XML document data. XML itself appears as a tree, so when DOM manipulates, it will be transformed as a tree of chapters. In the entire DOM tree, the largest place is the Document, which represents a Document in which only one root node exists.

Note: When using DOM manipulation, each text area is also a node, called a text node.

Core operation interface

There are four core operation interfaces in DOM parsing:

Document: This interface represents the entire XML Document, represents the root of the DOM tree, provides access to and manipulation of the data in the Document, through the Document node can access all elements of the XML file content.

Node: This interface plays an important role in the DOM tree. A large portion of the core interface for DOM manipulation is inherited from Node. For example: Document, Element and other interfaces, in the DOM tree, each Node interface represents a Node in the DOM tree.

NodeList: This interface represents a collection of nodes. It is generally used to represent a sequential set of nodes, such as children of a node. Changes to the document will directly affect the NodeList collection.

NamedNodeMap: this interface represents the one-to-one mapping between a group of nodes and their unique names. This interface is mainly used for the representation of attribute nodes.

DOM parsing process

If an application needs to perform DOM parsing and reading operations, it also needs to follow the following steps:

(1) establish DocumentBuilderFactory: DocumentBuilderFactory factory = DocumentBuilderFactory. NewInstance (); (2) establish DocumentBuilder: DocumentBuilder builder = factory. The newDocumentBuilder (); Document: Document doc = Builder.parse (" file path to parse "); NodeList nl = doc.getelementsbytagName (" read node "); ⑤ Read XML informationCopy the code

SAX parsing

SAX (Simple API for XML) parsing is done step by step in the order of XML files. SAX has no official standards body, it does not belong to any standards organization or group, nor does it belong to any company or individual, but rather provides a computer technology for anyone to use.

SAX (Simple API for XML, a Simple interface for manipulating XML), is different from DOM operation, SAX uses a sequential mode to access, is a fast way to read XML data. When the SAX parser is used for operation, a series of things will be triggered. When the document is scanned to the beginning and end of the document, the element is scanned to the beginning and end of the relevant processing methods will be called, and these operation methods will make corresponding operations until the end of the whole document scan.

If you want to implement this SAX parsing, you must first build a SAX parser.

/ / 1, create a parser factory SAXParserFactory factory = SAXParserFactory. NewInstance (); Parser = factory.newsaxParser (); // SAX parser, inherits DefaultHandler String Path = new File("resource/demo01.xml").getabsolutePath (); Parse (path, new MySaxHandler()); // Parser. parse(path, new MySaxHandler());Copy the code

JDOM parsing

In W3C itself to provide XML operation standards, DOM and SAX, but from the point of view of development, DOM and SAX itself is unique, DOM can be modified, but not suitable for reading large files, and SAX can read large files, but itself can not be modified. The so-called JDOM = DOM modifiable + SAX read large files, JDOM itself is a free open source component, downloaded directly from www.jdom.org.

Common classes for JDOM to manipulate XML:

Document: represents the entire XML Document, which is a tree structure

Eelment: Represents an XML element and provides methods to manipulate its children, such as text, attributes, and namespaces

Attribute: Indicates the attributes contained in the element

Text: indicates XML Text information

XMLOutputter: XML output stream, underlying through JDK stream

Format: Provides encoding, styling, and layout Settings for the output of XML files

We found JDOM’s output operation much more convenient and intuitive than traditional DOM, including easy output. What is observed at this point is JDOM’s support for DOM parsing, but also that JDOM itself supports SAX features; So, you can use SAX for parsing.

SAXBuilder builder = new SAXBuilder(); File file = new File("resource/demo01.xml"); Document doc = Builder.build (new File(file.getabsolutePath ())); Element root = doc.getrootelement (); System.out.println(root.getName()); List<Element> List = root.getChildren(); // Get all children of the root node. System.out.println(list.size()); for(int x = 0; x<list.size(); x++){ Element e = list.get(x); // Retrieve the name of the element and its text String name = LLDB etName(); System.out.println(name + "=" + e.getText()); System.out.println("=================="); }Copy the code

DOM4J to parse

Dom4j is a simple open source library for processing XML, XPath, and XSLT based on the Java platform, using Java’s collections framework and fully integrating DOM, SAX, and JAXP. Download path:

www.dom4j.org/dom4j-1.6.1…

Sourceforge.net/projects/do…

DOM4J, like JDOM, is a free XML open source component, but it is widely used in current development frameworks, such as Hibernate, Spring, etc. DOM4J is used in this function, so as an introduction, you can have an understanding of this component. There is no one better than the other. DOM4J is used more often in frameworks than JDOM. You can see that DOM4J offers many new features, such as output formats that work well.

File file = new File("resource/outputdom4j.xml"); SAXReader reader = new SAXReader(); Document doc = reader.read(file); Element root = doc.getrootelement (); Iterator<Element> iter = root.elementiterator (); Iterator<Element> iter = root.elementiterator (); while(iter.hasNext()){ Element name = iter.next(); System.out.println("value = " + name.getText()); }Copy the code

Extend the creation of ~XML

The DOM to create

If you want to generate XML files, you should use the newDocument() method when creating the document

If you want to output the DOM document, it can be cumbersome. Write multiple copies at once

Public static void createXml() throws Exception{// Obtains the DocumentBuilderFactory of the parser factory factory=DocumentBuilderFactory.newInstance(); / / get the parser DocumentBuilder builder = factory. The newDocumentBuilder (); // Create Document doc=builder.newDocument(); Element root=doc.createElement("people"); Element person=doc.createElement("person"); Element name=doc.createElement("name"); Element age=doc.createElement("age"); name.appendChild(doc.createTextNode("lebyte")); age.appendChild(doc.createTextNode("10")); doc.appendChild(root); root.appendChild(person); person.appendChild(name); person.appendChild(age); / / write out / / get the transformer factory TransformerFactory TSF = TransformerFactory. NewInstance (); Transformer ts=tsf.newTransformer(); SetOutputProperty (outputkeys. ENCODING, "UTF-8"); // setOutputProperty(outputkeys. ENCODING, "utF-8 "); DOMSource =new DOMSource(doc); // Create a new input Source with a DOM node to act as the holder of the transform Source tree. File File =new File(" SRC /output.xml"); StreamResult result=new StreamResult(file); ts.transform(source, result); }Copy the code

SAX create

/ / create a SAXtransformerfactory object SAXtransformerfactory STF = (SAXtransformerfactory) SAXtransformerfactory. NewInstance (); Try {/ / by SAXTransformerFactory object to create a TransfomerHandler TransformerHandler handler = STF. NewTransformerHandler (); // Create a transformer object with transformer tf = handler.gettransformer (); // Set the Transfomer object's property tf.setOutputProperty(outputkeys. ENCODING, "UTF-8"); tf.setOutputProperty(OutputKeys.INDENT, "yes"); // create a Result object and associate it with handler File File = new File(" SRC /output.xml"); if(! file.exists()){ file.createNewFile(); } Result result = new StreamResult(new FileOutputStream(file)); handler.setResult(result); // Open Document handler.startDocument(); AttributesImpl attr = new AttributesImpl(); // Create the root bookstore handler.startElement("", "", "", "bookstore", attr); attr.clear(); attr.addAttribute("", "", "id", "", "1"); handler.startElement("", "", "book", attr); attr.clear(); handler.startElement("", "", "name", attr); Characters (" Cervical spondylosis rehabilitation guide ".tochararray (), 0, "Cervical spondylosis rehabilitation Guide ".length()); handler.endElement("","","name"); // Close handler. EndElement ("", "", ""); handler.endElement("", "", "bookstore"); handler.endDocument(); } catch (SAXException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (TransformerConfigurationException e) { // TODO Auto-generated catch block e.printStackTrace(); }Copy the code

JDOM to create

Element person = new Element("person"); Element name = new Element("name"); Element age = new Element("age"); // Create Attribute id = new Attribute("id","1"); // Set the text name.settext ("lebyte"); age.setText("10"); Document doc = new Document(person); person.addContent(name); name.setAttribute(id); person.addContent(age); XMLOutputter out = new XMLOutputter(); File file = new File("resource/outputjdom.xml"); out.output(doc, new FileOutputStream(file.getAbsoluteFile()));Copy the code

DOM4J create

/ / use the DocumentHelper to create Document object Document Document. = DocumentHelper createDocument (); Person = document.addelement ("person"); Element name = person.addElement("name"); Element age = person.addElement("age"); // Set the text name.settext ("lebyte"); age.setText("10"); / / create formatting output, OutputFormat of = OutputFormat. CreatePrettyPrint (); of.setEncoding("utf-8"); File File = new File("resource/outputdom4j.xml"); XMLWriter writer = new XMLWriter(new FileOutputStream(new File(file.getAbsolutePath())),of); / / write writer. Write (document); writer.flush(); writer.close();Copy the code