How To pretty Print XML Files in Python?

Data comes in different formats and every company, even a small start-up, has to deal with it. From Excel to TSV to CSV to text and XML, each format has its own way of storing and presenting data. In this tutorial, we’ll focus on XML files and How to pretty print them in Python. Whether you’re working with inventory data or user information, having a clean and readable XML file can make all the difference. So let’s dive in and learn how to make our XML files look their best!


6 ways to pretty print XML files in python

XML files provide a simple and organized way to store data that can be easily read and understood by computers. In Python, there are different modules that can be used to pretty print XML files, which makes them easier to read and work with. These modules include:

  1. DOM: This library can be imported to pretty print XML files in Python.
  2. ElementTree: This module can be imported using the “etree” function.
  3. BeautifulSoup: This is another library in Python that can be used to pretty print XML files.
  4. lxm: This can be used with the “lxml.etree.parse()” function to pretty print XML files.
  5. Parsers: To use BeautifulSoup and the backend lxml (parser) libraries to pretty print XML files, they need to be installed.
  6. indent(): This function can be used to pretty print XML files.
  7. toprettyxml(): This function can be used to pretty print XML strings.

By using these different modules and functions, it’s possible to make XML files more organized and easier to read, which can be helpful for a variety of programming tasks.

Working on Anaconda navigator -> Jupyter notebook 

1) pretty printing xml files using DOM library

To pretty print XML files using DOM, first, you need to import the xml.dom.minidom library by executing the command “import xml.dom.minidom”. Then, you can read the XML file and pass it as an argument to the parseString method of the library. This method converts the XML string to an object, which can then be pretty-printed using the built-in toprettyxml method. You can store the result in a variable and print it using the print command.

# DOM Library

import xml.dom.minidom

Next, we need to open the XML file and convert it into an object. To do this, we use the “parseString” method from the xml.dom.minidom library. This method converts the XML string to an object.

To open the file and convert it into an object, we write the following code in our Python environment:

import xml.dom.minidom

with open(r"D:\DATA_SCIENCE\python\employee.xml") as xmldata:

    xml = xml.dom.minidom.parseString(xmldata.read())  

Once we have the XML object, we can pretty print it using the “toprettyxml” method. This method will indent the XML object data and return it as a pretty XML string.

To pretty print the XML object, we call the “toprettyxml” method and save it to a unique variable name. The code for this looks like this:

   xml_data_file = xml.toprettyxml()

print(xml_data_file)

Finally, to print the XML data file in Python, we can use the “print” function. To do this, we write the following code in the Python shell:

print(xml_data_file)

This will print the output of the XML file in a readable format. This can be helpful for debugging your program and its output.

2) Using ElementTree to print xml files

To pretty print XML files in Python, we can use the ElementTree XML library, which is a subset of the full ElementTree XML release.

To load the XML content into the ElementTree, we need to import the module and create a Tree Builder object to hold our XML document. We can do this by using the following command:

import xml.etree.ElementTree as ET

Once we have created the Tree Builder object, we can get the root element for the document tree using the “getroot()” method. This returns the root element for the tree.

XML files consist of elements, and tags define these elements in an XML document. The most significant element, which contains information about all other elements, is known as the “root.”

To pretty print the XML data or content in Python, we can use the “tostring” method from the ElementTree library. This method will parse the XML file from the device location and return an ElementTree object. To pretty print the contents of the XML file, we can call “ET.tostring()” and pass in the root element obtained in the previous step. We can also specify the encoding type to be “unicode.”

After we have obtained the pretty printed XML data, we can simply print it using the “print()” command.

To summarize, the steps to pretty print XML files in Python are:

Step 1: Import the ElementTree module and load the XML content into the ElementTree using a Tree Builder object.

Step 2: Get the root element for the document tree using the “getroot()” method.

Step 3: Use “ET.tostring()” to pretty print the XML data, with the root element obtained in Step 2 as an argument.

Step 4: Print the pretty printed XML data using the “print()” command.

import xml.etree.ElementTree as ET

k = ET.parse(r"D:\DATA_SCIENCE\python\employee.xml")

l = k.getroot()

m =  ET.tostring(l, encoding="unicode")

print(m)

This will print the output of the XML file in a readable format. This can be helpful for debugging your program and its output.

How to pretty Print XML Files in Python

3) uSING beautifulsoup TOOL

BeautifulSoup is compatible with Python  3.9.

BeautifulSoup tool is used in Python 3.9 to parse XML and HTML documents.

Step 01:

You need to install BeautifulSoup in your directory before you can use it.

To install it, go to the command prompt and type:

pip install lxml bs4

Step 02:

Then, in your Python environment, you need to import the BeautifulSoup module.

You can do this using the following command:

from bs4 import BeautifulSoup

Step 03:

Importing the open() function is a good idea. Use this syntax:

bs = BeautifulSoup(open(r"D:\DATA_SCIENCE\python\employee.xml"), 'xml')

Step 04:

You can use the prettify() module to make the contents of an XML file look more organized and easier to read.

To do this, use the following code:

pretty_xml = bs.prettify() 

print(pretty_xml)

4) using lxm or using lxml.etree.parse() to Pretty Printing XML files

To prettily print XML files, you can use the lxml.etree.tostring() method. First, parse or read the XML file and return an ElementTree object by using the lxml.etree.parse() method with a path to the XML file as its argument. The ElementTree object is named “tree” in our case.

Afterwards, you can use the method etree.tostring(tree, pretty_print=True, method=’xml’) to pretty print the XML file data by passing the ElementTree object (tree).

Lastly, print the XML file by using the print command.

Here’s an example code:

from lxml import etree

# parse the XML file and get the root element
tree = etree.parse("boat0.xml")

# prettify the XML data and assign to a variable
pretty_print_xml = etree.tostring(tree, pretty_print=True, method='xml')

# print the prettified XML data
print(pretty_print_xml)
b'<annotation>\n    <folder>images</folder>\n    <filename>boat0.png</filename>\n    <size>\n        <width>466</width>\n        <height>418</height>\n        <depth>3</depth>\n    </size>\n    <segmented>0</segmented>\n    <object>\n        <name>boat</name>\n        <pose>Unspecified</pose>\n        <truncated>0</truncated>\n        <occluded>0</occluded>\n        <difficult>0</difficult>\n        <bndbox>\n            <xmin>282</xmin>\n            <ymin>205</ymin>\n            <xmax>333</xmax>\n            <ymax>273</ymax>\n        </bndbox>\n    </object>\n</annotation>\n'

5) Install BeautifulSoup and the backend lxml (parser) libraries

To prettily print an XML file, you need to install two libraries: BeautifulSoup and lxml. The BeautifulSoup module has a parameter called “markup,” which takes in the markup of the file you want to parse or read. This allows BeautifulSoup to return the data in a more beautiful way.

The “markup” parameter requires a specific parser (“lxml”, “html.parser”, “lxml-xml” or “html5lib”) or (“html”, “xml”, “html5”). It is recommended that you specify a parser so that Beautiful Soup produces accurate results.

To install these libraries, run the command “pip install lxml bs4” in your command prompt or Python working environment. After installation, import the BeautifulSoup module from bs4 and use the open() function to read an XML file.

Then, iterate over the loaded XML file using a for loop and pass each element to the BeautifulSoup module with the “lxml-xml” markup parameter to pretty print the XML file using the prettify() function. Finally, print the results.

Here is the code to execute these steps:

from bs4 import BeautifulSoup
# read a file with open() read ‘r’ function
with open(r"D:\DATA_SCIENCE\python\boat0.xml", 'r') as xml:
   for i in xml:
      # iterate a loaded xml file over a BeautifulSoup ‘lxml-xml' to pretty print the xml file using prettify() function. 
      print(BeautifulSoup(i, 'lxml-xml').prettify())

This code will read the “boat0.xml” file, iterate through each element, and use the “lxml-xml” parser to prettily print the XML file using the prettify() function.

<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?> <annotation/> <?xml version="1.0" encoding="utf-8"?> <folder>  images </folder> <?xml version="1.0" encoding="utf-8"?> <filename>  boat0.png </filename> <?xml version="1.0" encoding="utf-8"?> <size/> <?xml version="1.0" encoding="utf-8"?> <width>  466 </width> <?xml version="1.0" encoding="utf-8"?> <height>  418 </height> <?xml version="1.0" encoding="utf-8"?> <depth>  3 </depth> <?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?> <segmented>  0 </segmented> <?xml version="1.0" encoding="utf-8"?> <object/> <?xml version="1.0" encoding="utf-8"?> <name>  boat </name> <?xml version="1.0" encoding="utf-8"?> <pose>  Unspecified </pose> <?xml version="1.0" encoding="utf-8"?> <truncated>  0 </truncated> <?xml version="1.0" encoding="utf-8"?> <occluded>  0 </occluded> <?xml version="1.0" encoding="utf-8"?> <difficult>  0 </difficult> <?xml version="1.0" encoding="utf-8"?> <bndbox/> <?xml version="1.0" encoding="utf-8"?> <xmin>  282 </xmin> <?xml version="1.0" encoding="utf-8"?> <ymin>  205 </ymin> <?xml version="1.0" encoding="utf-8"?> <xmax>  333 </xmax> <?xml version="1.0" encoding="utf-8"?> <ymax>  273 </ymax> <?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?> 

6) indent() function to Pretty Print the XML files

To print XML files in a pretty format, you can use the indent() function from the xml.etree.ElementTree module in Python. To do this, follow these steps:

  1. Import the xml.etree.ElementTree module as ET.
  2. Create an object tree/element by passing an XML format string to the ET.XML() method.
  3. Call the ET.indent() function and pass the tree argument to it.
  4. Print the XML file using the ET.tostring() function. This function returns the object into the string format.

Here’s an example code:

import xml.etree.ElementTree as ET

tree = ET.XML("<annotation><folder>images</folder><filename>boat0.png</filename><size.<width>466</width><height>418</height><depth>3</depth></size></annotation>")
ET.indent(tree)

print(ET.tostring(tree, encoding='unicode', method= None))
<annotation>
  <folder>images</folder>
  <filename>boat0.png</filename>
  <size>
    <width>466</width>
    <height>418</height>
    <depth>3</depth>
  </size>
</annotation>

7) Pretty print XML string using toprettyxml()

To pretty print XML strings in Python, we can use the toprettyxml() function. This function returns a better version of the file.

To achieve this, we first import the dom module. Then, we define an XML-formatted string and store it in a variable. Using the dot operator, we call the parseString() function from the dom module, passing the XML string variable as an argument.

Once the string is read, we can print the XML string in a pretty version. We can use the print() function for this purpose. Therefore, we can easily pretty print XML files in Python by following these steps.

import xml.dom.minidom
xml_string = "<annotation><folder>images</folder><filename>boat0.png</filename><size><width>466</width><height>418</height><depth>3</depth></size></annotation>"
XML_File = xml.dom.minidom.parseString(xml_string)
pretty_print_xml = XML_File.toprettyxml()
print(pretty_print_xml)
<?xml version="1.0" ?>
<annotation>
	<folder>images</folder>
	<filename>boat0.png</filename>
	<size>
		<width>466</width>
		<height>418</height>
		<depth>3</depth>
	</size>
</annotation>

Conclusion

There are various ways to pretty print XML files in Python, and it’s not a complicated task. The first approach involves using the “dom” module, which is compatible with Python 3 and newer versions. This method is the simplest and easiest way to print XML content in a visually appealing way. However, the other two methods require importing the “etree” and “ElementTree” modules.

If you encounter errors such as “lxml not defined,” you should first install lxml in the command prompt by typing “pip install lxml,” or you can also type “help()” in the Python shell and execute the command for additional assistance.

If you’re interested in learning more about Python programming, you can visit Python Programming Tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *