Introduction to XML parsing XML with Dom4j

This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

A. Introduction of XML

First, let’s take a quick look at what XML is, and skip title 1 if you’re already familiar with it. For those in a hurry, look directly at title 2: Dom4j parsing XML

1. What is XML

XML stands for EXtensible Markup Language
XML is a markup language, much like HTML
XML is designed to transmit data, not display it
XML tags are not predefined. We need to define the label by ourselves.
XML is designed to be self-descriptive
XML is a W3C recommendation

2. The main role of XML

It is used to store data, and the data is self-descriptive
It can also be used as a configuration file for a project or module
It can also be used as a format for transmitting data over the network (currently JSON is the main format).

3. The XML and HTML

XML is not a replacement for HTML.
XML and HTML are designed for different purposes:

XML is designed to transport and store data, with the focus on the content of the data. HTML is designed to display data, and the focus is on what the data looks like.

HTML is designed to display information, while XML is designed to transmit information.
XML tags are case-sensitive, whereas HTML tags are case-insensitive, and the browser automatically converts the tag name to lowercase when parsing HTML tags.

4. XML attributes

XML tag attributes are very similar to HTML tag attributes in that attributes provide additional information about an element. You can write attributes on a tag: you can write multiple attributes on a tag. The value of each attribute must be enclosed in quotes.

5.XML syntax rules

All XML elements must have a close tag (that is, close)
XML tags are case-sensitive
XML must be properly nested
XML documents must have a root element

A root element is a top-level element, and an element without a parent tag is called a top-level element. The root element is the only top-level element that has no parent tag.

XML attribute values must be quoted
Special characters in XML

symbol	Representation methods in XML	meaning
<	<	Less than
>	>	Is greater than
&	&	And no.
‘	‘	Single quotes
“	“	quotes

Text area (CDATA area)

The CDATA syntax tells the XML parser that the text content in my CDATA is just plain text, and no XML syntax is required to parse the CDATA format :
example:


      
<! Encoding encoding -->
<students>
    <student id="001">
        <name>Mr.Yu</name>
        <age>21</age>
        <gender><! [CDATA [< m >]] ></gender>
    </student>

    <student id="002">
        <name>Xiao Ming</name>
        <age>20</age>
        <gender><! [CDATA [< m >]] ></gender>
    </student>
</students>

Copy the code

Dom4j parses XML

1. Tree structure and XML file parsing technology

1.1 a tree structure

Both HTML and XML files are markup documents that can be parsed using DOM technology developed by the W3C organization.

The XML file corresponding to the tree structure above:

<bookstore>
<book category="COOKING">
  <title lang="en">Everyday Italian</title> 
  <author>Giada De Laurentiis</author> 
  <year>2005</year> 
  <price>30.00</price> 
</book>
<book category="CHILDREN">
  <title lang="en">Harry Potter</title> 
  <author>J K. Rowling</author> 
  <year>2005</year> 
  <price>29.99</price> 
</book>
<book category="WEB">
  <title lang="en">Learning XML</title> 
  <author>Erik T. Ray</author> 
  <year>2003</year> 
  <price>39.95</price> 
</book>
</bookstore>
Copy the code

The Document object represents the entire document (either HTML or XML).

1.2 XML file parsing technology

The early JDK provided us with two XML parsing technologies, DOM and SAX (obsolete, but we need to know about both)
Dom parsing technology is developed by the W3C, and all programming languages use their own language features to implement this parsing technology. Java also implements DOM technology for parsing tags.
Sun updates DOM parsing technology in JDK5 :SAX(Simple API for XML)
- SAX parsing, which is not quite the same as the parsing formulated by the W3C. It uses an event-like mechanism to tell the user what is currently being parsed through callbacks. It reads the XML file line by line for parsing. You don’t create a lot of DOM objects.
- So it’s in terms of memory usage when parsing XML. And performance. Are better than Dom parsing.
Third party analysis:
- Jdom is encapsulated on top of DOM.
- Dom4j encapsulates JDOM in turn.
- Pull is mainly used in Android phone development and is very similar to SAX in that it is an event mechanism to parse XML files.

2. Dom4j to parse the XML

Through the above explanation of XML file parsing technology, we know Dom4j is a third-party parsing technology. We need to use a good class library provided by a third party to parse XML files.
Since Dom4j is not sun’s technology, but a third-party company’s technology, we need to use Dom4j to download the DOM4j JAR package from dom4j official website. I upload the files on CSDN resources, you can also directly download, download address: download.csdn.net/download/Mr…

After unpacking the downloaded files, let’s make a brief introduction to the file directory:

Docs is the document directory, which is the learning document provided by the third party class library.
The lib directory that houses the other third-party libraries dom4j needs to rely on.
The SRC directory is the source directory for DOM4J

We now need to use dom4J-1.6.1.jar and import the jar package into the project. After importing the JAR package, we need the following steps to parse the XML using Dom4j:

To create a Document object, we need to create a SAXReader object
By creating a SAXReader object. To read the XML file and get the Document object
Through the Document object. Get the root element object of the XML
Through the root element object. Gets all of the book tag objects, element.elements, which gets the set of specified children of the current Element
Iterate over each student tag object. We then get each element within the Student tag object.

The specific code is as follows:

public class TestMain {
    public static void main(String[] args) {
        try {
            parseXml();
        } catch(DocumentException e) { e.printStackTrace(); }}public static void parseXml(a) throws DocumentException {
        To create a Document object, we need to create a SAXReader object
        SAXReader reader = new SAXReader();
        // By creating a SAXReader object. To read the XML file and get the Document object
        Document document= reader.read("05_xml/xml/students.xml");
        // Through the Document object. Get the root element object of the XML
        Element root = document.getRootElement();
        // Through the root element object. Gets all of the book tag objects, element.elements, which gets the set of specified children of the current Element
        List<Element> students = root.elements("student");
        // Iterate over each student tag object. We then get each element within the Student tag object.
        for (Element student : students) {
            // Get the id attribute of student
            String id = student.attributeValue("id");
            // Get the name element under student
            Element nameElement = student.element("name");
            // Get the age object under student
            Element ageElement = student.element("age");
            // Get the gender element object under student
            Element genderElement = student.element("gender");
            // Get the text between the start tag and the end tag using the getText() method
            System.out.println("Student Number:"+id);
            System.out.println("Name:"+nameElement.getText());
            System.out.println("Age."+ageElement.getText());
            System.out.println("Gender."+genderElement.getText());
            System.out.println("* * * * * * * * * * * * * * * * * * * * * * * * * * * * *"); }}}Copy the code

Parsed XML file:


      
<! Encoding encoding -->
<students>
    <student id="001">
        <name>Mr.Yu</name>
        <age>21</age>
        <gender><! [CDATA [< m >]] ></gender>
    </student>

    <student id="002">
        <name>Xiao Ming</name>
        <age>20</age>
        <gender><! [CDATA [< m >]] ></gender>
    </student>
</students>


Copy the code

Parse the output: