HOW TO PRETTY PRINT XML FILES IN PYTHON?

This tutorial is about How to pretty print XML Files in Python? In a world full of data, every piece of data has been stored and compiled in different formats. Every company, even a small start-up, has data. Every inventory store generates data daily. Even Meta (formerly Facebook) generates data from your current area daily, even in an hour. So, every piece of data is stored in a specific file and format, either in Excel, TSV (text-separated values), CSV (comma-separated values), text form, or XML. Each has a different way of reading and print. Here on this page, we will learn how to print XML files in Python. So let’s get on board!

XML files decode the data in the file in an easy, computable, and understandable way. XML is the core XML support for the Python environment. However, to pretty print XML files in Python, There are the following modules to accomplish this task:

  • dom in Python – import DOM library to print XML files pretty
  • etree in Python – import ElementTree to pretty print XML files
  • BeautifulSoup in Python to pretty print XML files in Python
  • Pretty Printing XML using lxm or using lxml.etree.parse() to Pretty Printing XML files
  • parsers in Python – Install BeautifulSoup and the backend lxml (parser) libraries To Pretty Print XML files
  • indent() function to Pretty Print the XML files
  • Pretty print XML string using toprettyxml()

Working on Anaconda navigator -> Jupyter notebook 

You can download the XML script from any source. 

Import DOM library to print XML files pretty

All you have to do is import the following library 

Step 01:

import xml.dom.minidom

“dom,” a library, converts the object to a string from the XML file. It is a built-in library in your Python shell. You can accomplish this by executing the command line: Shift+Enter to execute the code line:

Step 02:

Then, run the following code in your Python environment:

import xml.dom.minidom
with open(r"D:\DATA_SCIENCE\python\employee.xml") as xmldata:
    xml = xml.dom.minidom.parseString(xmldata.read())  
    
    
    xml_data_file = xml.toprettyxml()
print(xml_data_file)

The process follows:

  • First write a command to open the file as xml,
  • Then call parseString function from xml.dom.minidom library, pass the read command in its argument as xmldata.read(). 
  • parseString is a method that converts XML string to object
with open(r"D:\DATA_SCIENCE\python\employee.xml") as xmldata:
    xml = xml.dom.minidom.parseString(xmldata.read())  

Python has a built-in function called “toprettyxml()” that will pretty print an XML file. This can be useful for debugging a program and its output.

  • Call xml.toprettyxml() function and save it to a unique variable name.
  • toprettyxml method will indent the XML object data and return as a Pretty XML string.
    xml = xml.dom.minidom.parseString(xmldata.read())  

    xml_data_file = xml.toprettyxml()

Print the xml files in Python

Now time to print the xml string data file

Write the following command in the python shell,

print(xml_data_file)

Print command will print the output 

ElementTree in Python to pretty print XML files

The ElementTree XML library in the Python Environment is a subset of the full ElementTree XML release. 

Step 01:

To read the XML element into the ElementTree, you need to load the XML content into the ElementTree. 

 So to do so, import the module to create a Tree Builder object that will hold your XML document. 

Use the following command to do so: 

import xml.etree.ElementTree as ET

Step 02:

Now that you’re ready to start constructing your tree, move on to getting the root element for this tree.

Get the root node for this document tree. Use getroot(), which returns the root element for this tree.

l = k.getroot()

XML files have an element, and tags define these elements in an XML document (the starting tag (<) and ending tag (>)). And the most significant element, which possesses information about all other elements, is known as the “root.” 

Step 03:

Use ET.tostring() to pretty print the XML data or content in Python. Moreover, to parse the XML file from the device location and return an ElementTree object 

Call ET.tostring(l, encoding=”unicode”) to pretty print the contents of the XML file, with “l” as the result of the previous step.

l = k.getroot()
m =  ET.tostring(l, encoding="unicode")

You can pretty easily print an XML file in Python using the “tostring” module. This module is part of the standard library and is available on most systems.

The last step is to write out the entire tree using the “unique” variable that you created earlier in the program: This will result in a nice, understandable output showing all of your elements in order of appearance.

Step 04:

Then print the pretty XML file in Python using a simple print() command. 

import xml.etree.ElementTree as ET
k = ET.parse(r"D:\DATA_SCIENCE\python\employee.xml")
l = k.getroot()
m =  ET.tostring(l, encoding="unicode")
print(m)
How to pretty Print XML Files in Python

BeautifulSoup in Python to pretty print XML files in Python

BeautifulSoup is compatible with Python  3.9.

Step 01:

You must first install BeautifulSoup in your directory. 

Run cmd prompt and write 

pip install lxml bs4

Step 02:

Then write in your python shell or python environment

The following command

from bs4 import BeautifulSoup

Step 03:

Then, importing the open() function would be best. Use this syntax:

bs = BeautifulSoup(open(r"D:\DATA_SCIENCE\python\employee.xml"), 'xml')

Step 04:

To pretty print the contents of an XML file, you can use the prettify() module.

Then, you can use the function like so:

pretty_xml = bs.prettify()
print(pretty_xml)

Pretty Printing XML using lxm or using lxml.etree.parse() to Pretty Printing XML files

Using lxml.etree.tostring() method, you can prettily print XML files. However, the syntax is follows as to parse XML files is; lxml.etree.parse(provide a path to the XML file) to parse or read the XML file and return an ElementTree object, in our case, it is tree

Afterwords, Invoke pretty_print_xml = etree.tostring(tree, pretty_print = True, method=’xml’), and to pretty print the data of the XML source file by passing the ElementTree object  (tree). 

Lastly, pretty, print the XML file through the print command. 

from lxml import etree
tree = etree.parse("boat0.xml")
pretty_print_xml = etree.tostring(tree, pretty_print = True, method='xml')
print(pretty_print_xml)
b'<annotation>\n    <folder>images</folder>\n    <filename>boat0.png</filename>\n    <size>\n        <width>466</width>\n        <height>418</height>\n        <depth>3</depth>\n    </size>\n    <segmented>0</segmented>\n    <object>\n        <name>boat</name>\n        <pose>Unspecified</pose>\n        <truncated>0</truncated>\n        <occluded>0</occluded>\n        <difficult>0</difficult>\n        <bndbox>\n            <xmin>282</xmin>\n            <ymin>205</ymin>\n            <xmax>333</xmax>\n            <ymax>273</ymax>\n        </bndbox>\n    </object>\n</annotation>\n'

Install BeautifulSoup and the backend lxml (parser) libraries To Pretty Print XML files

To pretty print the XML file, BeautifulSoup module from path. BeautifulSoup module has  a parameter param markup. A markup is a string or markup of a specific file like string represents the markup for the file you want to parse or read. So that BeautifulSoup returns the output in a more beautiful way of that data/file. 

Moreover, param markup consists of a specific parser (“lxml”, “html.parser”,  “lxml-xml” or “html5lib”) or (“html”, “xml”,  “html5”). It is advised that you identify a specific parser so that Beautiful Soup produces accurate results. 

The execution follows:

  • pip install lxml bs4 in your path, run this command on your system cmd prompt or on your Python working environment
  • When the installation is done, Import BeautifulSoup from bs4 
  • read a file with open(),‘r’ function
  • Using for..in the loop, iterate a loaded xml file over a BeautifulSoup ‘lxml-xml’ to pretty print the xml file using prettify() function.
  • Pass the tree or element and markup ‘lxml-xml’ with prettify() function to print XML files in Python pretty.
from bs4 import BeautifulSoup
#read a file with open() read ‘r’ function
with open(r"D:\DATA_SCIENCE\python\boat0.xml", 'r') as xml:
   for i in xml:
#iterate a loaded xml file over a BeautifulSoup ‘lxml-xml' to pretty print the xml file using prettify() function. 
       print(BeautifulSoup(i, 'lxml-xml').prettify())
<?xml version="1.0" encoding="utf-8"?>

<?xml version="1.0" encoding="utf-8"?>
<annotation/>

<?xml version="1.0" encoding="utf-8"?>
<folder>
 images
</folder>
<?xml version="1.0" encoding="utf-8"?>
<filename>
 boat0.png
</filename>
<?xml version="1.0" encoding="utf-8"?>
<size/>

<?xml version="1.0" encoding="utf-8"?>
<width>
 466
</width>
<?xml version="1.0" encoding="utf-8"?>
<height>
 418
</height>
<?xml version="1.0" encoding="utf-8"?>
<depth>
 3
</depth>
<?xml version="1.0" encoding="utf-8"?>

<?xml version="1.0" encoding="utf-8"?>
<segmented>
 0
</segmented>
<?xml version="1.0" encoding="utf-8"?>
<object/>

<?xml version="1.0" encoding="utf-8"?>
<name>
 boat
</name>
<?xml version="1.0" encoding="utf-8"?>
<pose>
 Unspecified
</pose>
<?xml version="1.0" encoding="utf-8"?>
<truncated>
 0
</truncated>
<?xml version="1.0" encoding="utf-8"?>
<occluded>
 0
</occluded>
<?xml version="1.0" encoding="utf-8"?>
<difficult>
 0
</difficult>
<?xml version="1.0" encoding="utf-8"?>
<bndbox/>

<?xml version="1.0" encoding="utf-8"?>
<xmin>
 282
</xmin>
<?xml version="1.0" encoding="utf-8"?>
<ymin>
 205
</ymin>
<?xml version="1.0" encoding="utf-8"?>
<xmax>
 333
</xmax>
<?xml version="1.0" encoding="utf-8"?>
<ymax>
 273
</ymax>
<?xml version="1.0" encoding="utf-8"?>

<?xml version="1.0" encoding="utf-8"?>

<?xml version="1.0" encoding="utf-8"?>

indent() function to Pretty Print the XML files

Using xml.etree.ElementTree with indent() function using (.) operator make it possible to print XML files in Python pretty. 

Here is how to pretty print xml files in Python using indent():

  • Import xml.etree.ElementTree as ET
  • Create an object tree/element and pass an XML format string within ET.XML().
  • Invoke ET.indent() and pass the tree argument to it. 
  • Print the XML file using the ET.tostring() function. tostring()returns the object into the string format.
import xml.etree.ElementTree as ET
tree = ET.XML("<annotation><folder>images</folder><filename>boat0.png</filename><size><width>466</width><height>418</height><depth>3</depth></size></annotation>")
ET.indent(tree)
print(ET.tostring(tree, encoding='unicode', method= None))
<annotation>
  <folder>images</folder>
  <filename>boat0.png</filename>
  <size>
    <width>466</width>
    <height>418</height>
    <depth>3</depth>
  </size>
</annotation>

Pretty print XML string using toprettyxml()

To pretty print the XML string using toprettyxml() function. toprettyxml() function allows us to return a better file version.

The execution proceeds as follows:

  • import dom module
  • Define an XML formatted string in a variable.
  • Now using the dom module with the parseString() function using the dot(.) operator. Pass XML string variable in its scope as an argument.
  • After reading the string, it’s time to print the XML string in a pretty version. 
  • Use the print () function to print the XML file. Hence, in this way, you can pretty print the XML files in Python. 
import xml.dom.minidom
xml_string = "<annotation><folder>images</folder><filename>boat0.png</filename><size><width>466</width><height>418</height><depth>3</depth></size></annotation>"
XML_File = xml.dom.minidom.parseString(xml_string)
pretty_print_xml = XML_File.toprettyxml()
print(pretty_print_xml)
<?xml version="1.0" ?>
<annotation>
	<folder>images</folder>
	<filename>boat0.png</filename>
	<size>
		<width>466</width>
		<height>418</height>
		<depth>3</depth>
	</size>
</annotation>

Conclusion

Pretty printing XML files in Python is not a difficult task. Here are different methods to print the XML content in a Python environment. The first method uses a “dom” module, which executes almost on Python 3 and more excellent versions. It’s the simplest and easiest way to print pretty XML content in Python. However, other two methods, you need to import etree and ElemenTree modules 

If you get errors like “lxml not defined,” install lxml first in cmd prompt by typing pip install lxml, or write help() in the Python shell and execute the command for more help.

If you want to learn more about Python Programming, Visit Python Programming Tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *