Abstract:We often need to parse data written in different languages. Python provides many libraries for parsing or splitting data written in other languages. In this Python XML Parser tutorial, you’ll learn how to parse XML using Python.

This article to share from huawei cloud community “from scratch to learn python | how to parse and modify XML in python?” , the original author is Yuchuan.

We often need to parse data written in different languages. Python provides many libraries for parsing or splitting data written in other languages. In this Python XML Parser tutorial, you’ll learn how to parse XML using Python.

Here are all the topics covered in this tutorial:

What is XML?

Python XML Parsing Modules

xml.etree.ElementTree Module

  • Using parse() function
  • Using fromstring() function
  • Finding Elements of Interest
  • Modifying XML files
  • Adding to XML
  • Deleting from XML

xml.dom.minidom Module

  • Using parse() function
  • Using fromString() function
  • Finding Elements of Interest

Let’s get started. 🙂

What is XML?

XML stands for Extensible Markup Language. It is similar in appearance to HTML, but XML is used for data representation, while HTML is used to define the data being used. XML is specifically designed to send and receive data back and forth between the client and the server. Consider the following example:

Example:

<? XML version ="1.0" encoding =" utf-8 "? > <metadata> <food> <item name ="breakfast" > <price> $2.5 </price> <description> two occasionally's with chutney < occasionally /description> <calorie > 553 </calorie > </food> <food> <item name ="breakfast" > Paper Dosa </item> <price> $2.7 </price> < </price> </food> <food> <item name ="breakfast" > Upma </item> <price> $3.65 </price> <description> Rava upma with bajji </description> <calories> 600 </calories> </food> <food> <item name ="breakfast" > Bis Bale Bath </item> <price> $4.50 </price> <description> Bis Bale Bath with sev </description> < > 400 </calorie > </food> <food> <item name ="breakfast" > Kesari Bath </item> <price> $1.95 </price> <description> saffron-sweet lava </description> </calories> 950 </calories> </ food > </ metadata >

The above example shows the contents of a file I named “sample.xml,” and I’ll use the same content for all the upcoming examples in this Python XML parser tutorial.

Python XML parsing module

Python allows you to parse these XML documents using two modules, the xml.etree.elementTree module and the Minidom (minimal DOM implementation). Parsing means reading information from a file and breaking it up into multiple parts by identifying parts of that particular XML file. Let’s take a closer look at how to use these modules to parse XML data.

XML. Etree. ElementTree module:

This module helps us format XML data in a tree structure, which is the most natural representation of hierarchical data. Element types allow hierarchical data structures to be stored in memory and have the following attributes:

ElementTree is a class that wraps the structure of elements and allows conversion to and from XML. Now let’s try parsing the above XML file using the Python module.

There are two ways to parse a file using the “ElementTree” module. The first is to use the parse() function, and the second is the fromString () function. The parse() function parses the XML document provided as a file, while fromString parses the XML provided as a string, that is, within triple quotes.

Using the parse() function:

As mentioned earlier, this function parses it using XML in file format. Consider the following example:

Example:

import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()

As you can see, the first thing you need to do is import the XML.etree.ElementTree module. The parse() method then parses the “sample.xml” file. The getRoot () method returns the root element of “sample.xml”.

When you execute the above code, you don’t see the output returned, but you don’t get an error indicating that the code executed successfully. To check the root element, you can simply use the print statement, as follows:

Example:

import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()
print(myroot)

Output: < element ‘metadata’ at 0x033589F0>

The above output indicates that the root element in our XML document is “metadata.”

Using the fromString () function:

You can also use the fromString () function to parse your string data. To do this, pass the XML as a string to the triple quotes, as follows:

import xml.etree.ElementTree as ET data='''<? The XML version = "1.0" encoding = "utf-8"? > <metadata> <food> <item name="breakfast"> <price>$2.5</price> <description> Two occasionally's with chutney </description> <calories>553</calories> </food> </metadata> ''' myroot = ET.fromstring(data) #print(myroot) print(myroot.tag)

The above code will return the same output as the previous one. Note that the XML document used as a string is only part of “sample.xml,” which I use to improve visibility. You can also use a full XML document.

You can also retrieve the root tag using the Tag object, as shown below:

Example:

Print (myRoot.tag) : metadata

You can also slice the label string output by specifying the part of the string that you want to see in the output.

Example:

Print (myRoot.tag [0:4]) print(myRoot.tag [0:4]

As mentioned earlier, tags can also have dictionary attributes. To check if the root tag has any properties, you can use the “attrib” object, as shown below:

Example:

Print (myRoot.attrib) : {}

As you can see, the output is an empty dictionary because our root tag has no attributes.

Find elements of interest:

Roots are also composed of sub-tags. To retrieve the children of the root tag, you can use the following command:

Example:

Print (myRoot [0].tag) print(myRoot [0].tag

Now, if you want to retrieve all the first child tags of the root, you can iterate through it using a for loop, as shown below:

Example:

for x in myroot[0]:
     print(x.tag, x.attrib)

Output:

Item {‘name’: ‘breakfast’} price {} description {} calorie {}

All items returned are child attributes and labels of the food.

To use ElementTree to separate text from XML, you can use the text attribute. For example, if I want to retrieve all the information about the first food item, I should use the following code:

Example:

for x in myroot[0]:
        print(x.text)

Output:

Lazy $2.5 for two laid-back 553 with chutney

As you can see, the text information for the first item has been returned as output. Now, if you want to display all the items with a particular price, you can use the get() method. This method accesses the attributes of an element.

Example:

for x in myroot.findall('food'):
    item =x.find('item').text
    price = x.find('price').text
    print(item, price)

Output:

Occasionally $2.5 Paper Dosa $2.7 Upma $3.65 Bisi Bele Bath $4.50 Kesari Bath $1.95

The above output shows all required items and the price for each item. With ElementTree, you can also modify XML files.

Modify the XML file:

You can manipulate elements in an XML file. To do this, you can use the set() function. Let’s first look at how to add something to XML.

Add to XML:

The following example shows how to add content to a project description.

Example:

for description in myroot.iter('description'):
     new_desc = str(description.text)+'wil be served'
     description.text = str(new_desc)
     description.set('updated', 'yes')
 
mytree.write('new.xml')

The write() function helps create a new XML file and write the updated output to the same file. However, you can also modify the original file using the same functionality. After executing the above code, you should be able to see that a new file has been created with updated results.

The picture above shows the modified description of our food. To add a new child tag, you can use the SubElement() method. For example, if you want to add a new professional label in the first entry, you can do the following:

Example:

ET.SubElement(myroot[0], 'speciality')
for x in myroot.iter('speciality'):
     new_desc = 'South Indian Special'
     x.text = str(new_desc)
 
mytree.write('output5.xml')

Output:

As you can see, a new label has been added under the first food label. You can add labels anywhere by specifying subscripts within [] brackets. Now let’s take a look at how to use this module to delete items.

Remove from XML:

To remove attributes or child elements using ElementTree, you can use the pop() method. This method removes required attributes or elements that the user does not need.

Example:

myroot[0][0].attrib.pop('name', None)
 
# create a new XML file with the results
mytree.write('output5.xml')

Output:

The figure above shows that the name attribute has been removed from the item tag. To remove the entire tag, you can use the same pop() method, as shown below:

Example:

myroot[0].remove(myroot[0][0])
mytree.write('output6.xml')

Output:

The output shows that the first child element of the food label has been removed. To remove all tags, use the clear() function, as shown below:

Example:

myroot[0].clear()
mytree.write('output7.xml')

Output:

When the above code is executed, the first child of the food label is completely removed, including all child labels. So far, we have been using the xml.etree.elementTree module from the Python XML parser tutorial. Now let’s look at how to parse XML using MiniDOM.

XML, dom minidom module:

This module is mostly used by people who are proficient in the DOM (Document Object Module). DOM applications typically begin by parsing XML into the DOM. In xml.dom.minidom, this can be done in the following way:

Using the parse() function:

The first is to use the parse() function by providing the XML file to parse as an argument. Such as:

Example:

from xml.dom import minidom
p1 = minidom.parse("sample.xml");

After doing this, you will be able to split the XML file and get the data you need. You can also use this function to parse open files.

Example:

dat=open('sample.xml')
p2=minidom.parse(dat)

In this case, the variable that stores the open file is provided as a parameter to the parsing function.

Use the parseString() method:

This method is used when you want to provide XML to be parsed as a string.

Example:


p3 = minidom.parseString('<myxml>Using<empty/> parseString</myxml>')

You can use either of the above methods to parse XML. Now let’s try using this module to get the data.

Find elements of interest:

After my file has been parsed, if I try to print it, the output returned will show a message that the variable storing the parsed data is a DOM object.

Example:

dat=minidom.parse('sample.xml')
print(dat)

Output:

The < xml.dom.minidom.document object is at 0x03B5A308>

Use getElementByTagName to access elements:

Example:

tagname= dat.getElementsByTagName('item')[0]
print(tagname)

If I try to get the first element using the getElementByTagName method, I’ll see the following output:

Output:

<DOM element: project > at 0xc6bd00

Note that only one output is returned because I used the [0] subscript for convenience, which will be removed in further examples.

To access the value of the property, I must use the value property as follows:

Example:

dat = minidom.parse('sample.xml')
tagname= dat.getElementsByTagName('item')
print(tagname[0].attributes['name'].value)

Output: Breakfast

To retrieve the data that exists in these tags, you can use the data attribute, as shown below:

Example:

        
print(tagname[1].firstChild.data)

Output: paper Dosa

You can also use the value attribute to split and retrieve the value of the attribute.

Example:

print(items[1].attributes['name'].value)

Output: Breakfast

To print out all the items available in our menu, you can iterate over them and return all the items.

Example:

for x in items:
    print(x.firstChild.data)

Output:

Stand by and watch Dosa Upma take a Kesari bath

To count the number of items on the menu, you can use the len() function, as shown below:

Example:

Print (len(items)) Specifies that our menu contains five items.

This brings us to the end of this Python XML parser tutorial. I hope you have understood everything clearly.

Click on the attention, the first time to understand Huawei cloud fresh technology ~