How to read xml python

How to read xml python

Reading and Writing XML Files in Python

Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write XML files in python.

Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a xml file we are referring to parsing the XML document.

In this article, we would take a look at two libraries that could be used for the purpose of xml parsing. They are:

Using BeautifulSoup alongside with lxml parser

For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal.

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser (used for parsing XML/HTML documents). lxml could be installed by running the following command in the command processor of your Operating system:

Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it.

Reading Data From an XML File

There are two steps required to parse a xml file:-

Example:

XML File used:

How to read xml python. Смотреть фото How to read xml python. Смотреть картинку How to read xml python. Картинка про How to read xml python. Фото How to read xml python

Python3

OUTPUT:

How to read xml python. Смотреть фото How to read xml python. Смотреть картинку How to read xml python. Картинка про How to read xml python. Фото How to read xml python

Writing an XML File

Writing a xml file is a primitive process, reason for that being the fact that xml files aren’t encoded in a special way. Modifying sections of a xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned xml document.

Example:

Python3

Output:

How to read xml python. Смотреть фото How to read xml python. Смотреть картинку How to read xml python. Картинка про How to read xml python. Фото How to read xml python

Using Elementree

Elementree module provides us with a plethora of tools for manipulating XML files. The best part about it being its inclusion in the standard Python’s built-in library. Therefore, one does not have to install any external modules for the purpose. Due to the xmlformat being an inherently hierarchical data format, it is a lot easier to represent it by a tree. The module provides ElementTree provides methods to represent whole XML document as a single tree.

In the later examples, we would take a look at discrete methods to read and write data to and from XML files.

Reading XML Files

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree.parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot(). Then displayed (printed) the root tag of our xml file (non-explicit way). Then displayed the attributes of the sub-tag of our parent tag using root[0].attrib. root[0] for the first tag of parent root and attrib for getting it’s attributes. Then we displayed the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.

Example:

Python3

Output:

How to read xml python. Смотреть фото How to read xml python. Смотреть картинку How to read xml python. Картинка про How to read xml python. Фото How to read xml python

Writing XML Files

Now, we would take a look at some methods which could be used to write data on an xml document. In this example we would create a xml file from scratch.

How to Read XML File with Python and Pandas

In this quick tutorial, we’ll cover how to read or convert XML file to Pandas DataFrame or Python data structure.

The short solutions is:

With the single line above we can convert any XML file to Pandas or Python structure.

Below we will cover multiple examples in greater detail by using two ways:

Setup

Suppose we have simple XML file with the following structure:

which we would like to read as Pandas DataFrame like shown below:

loclastmodchangefreq
0https://example.com/item-12022-06-02T00:00:00Zweekly
1https://example.com/item-22022-06-02T11:34:37Zweekly
2https://example.com/item-32022-06-03T19:24:47Zweekly

or getting the links as Python list:

Step 1: Read local XML File with read_xml()

The official documentation of method read_xml() is placed on this link:

To read the local XML file in Python we can give the absolute path of the file:

The result will be:

loclastmodchangefreq
0https://example.com/item-12022-06-02T00:00:00Zweekly
1https://example.com/item-22022-06-02T11:34:37Zweekly
2https://example.com/item-32022-06-03T19:24:47Zweekly

The method has several useful parameters:

How to read xml python. Смотреть фото How to read xml python. Смотреть картинку How to read xml python. Картинка про How to read xml python. Фото How to read xml python

Step 2: Read remote XML File with read_xml()

The first parameter of read_xml() is: path_or_buffer described as:

String, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be any valid XML string or a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file.

So we can read remote files the same way:

Step 3: Read XML File to Python list or dict

Now suppose you need to convert XML file to Python list or dictionary.

We can first convert the file to DataFrame and then get the values from this DataFrame by:

Example 1:

Example 2:

Example 3:

Step 4: Read multiple remote XML Files in Python

Finally let’s see how to read multiple identical XML files with Python and Pandas.

Suppose that files are identical with the following format:

We can use the following code to read all files in a given range and concatenate them into a single DataFrame:

The result is a list of DataFrames which can be concatenated into a single one by:

Now we have information from all XML files into df_all.

It can be installed by:

To read XML file we can do:

The result is a dict:

Accessing elements can be done by:

Conclusion

In this article we covered several ways to read XML files with Python and Pandas. Now we know how to read local or remote XML files, using two Python libraries.

Reading and Writing XML Files in Python

How to read xml python. Смотреть фото How to read xml python. Смотреть картинку How to read xml python. Картинка про How to read xml python. Фото How to read xml python

XML, or Extensible Markup Language, is a markup-language that is commonly used to structure, store, and transfer data between systems. While not as common as it used to be, it is still used in services like RSS and SOAP, as well as for structuring files like Microsoft Office documents.

With Python being a popular language for the web and data analysis, it’s likely you’ll need to read or write XML data at some point, in which case you’re in luck.

Throughout this article we’ll primarily take a look at the ElementTree module for reading, writing, and modifying XML data. We’ll also compare it with the older minidom module in the first few sections so you can get a good comparison of the two.

The XML Modules

The ElementTree module provides a more «Pythonic» interface to handling XMl and is a good option for those not familiar with the DOM. It is also likely a better candidate to be used by more novice programmers due to its simple interface, which you’ll see throughout this article.

In this article, the ElementTree module will be used in all examples, whereas minidom will also be demonstrated, but only for counting and reading XML documents.

XML File Example

In the examples below, we will be using the following XML file, which we will save as «items.xml»:

As you can see, it’s a fairly simple XML example, only containing a few nested objects and one attribute. However, it should be enough to demonstrate all of the XML operations in this article.

Reading XML Documents

Using minidom

Here the file name can be a string containing the file path or a file-type object. The function returns a document, which can be handled as an XML type. Thus, we can use the function getElementByTagName() to find a specific tag.

Since each node can be treated as an object, we can access the attributes and text of an element using the properties of the object. In the example below, we have accessed the attributes and text of a specific node, and of all nodes together.

The result is as follows:

If we wanted to use an already-opened file, can just pass our file object to parse like so:

Also, if the XML data was already loaded as a string then we could have used the parseString() function instead.

Using ElementTree

ElementTree presents us with an very simple way to process XML files. As always, in order to use it we must first import the module. In our code we use the import command with the as keyword, which allows us to use a simplified name ( ET in this case) for the module in the code.

Following the import, we create a tree structure with the parse function, and we obtain its root element. Once we have access to the root node we can easily traverse around the tree, because a tree is a connected graph.

The code is as follows:

The result will be as follows:

As you can see, this is very similar to the minidom example. One of the main differences is that the attrib object is simply a dictionary object, which makes it a bit more compatible with other Python code. We also don’t need to use value to access the item’s attribute value like we did before.

You may have noticed how accessing objects and attributes with ElementTree is a bit more Pythonic, as we mentioned before. This is because the XML data is parsed as simple lists and dictionaries, unlike with minidom where the items are parsed as custom xml.dom.minidom.Attr and «DOM Text nodes».

Counting the Elements of an XML Document

Using minidom

Keep in mind that this will only count the number of children items under the note you execute len() on, which in this case is the root node. If you want to find all sub-elements in a much larger tree, you’d need to traverse all elements and count each of their children.

Using ElementTree

Similarly, the ElementTree module allows us to calculate the amount of nodes connected to a node.

The result is as follows:

Writing XML Documents

Using ElementTree

ElementTree is also great for writing data to XML files. The code below shows how to create an XML file with the same structure as the file we used in the previous examples.

SubElement(parent, tag, attrib=<>, **extra)

Here parent is the parent node to connect to, attrib is a dictionary containing the element attributes, and extra are additional keyword arguments. This function returns an element to us, which can be used to attach other sub-elements, as we do in the following lines by passing items to the SubElement constructor.
3. Although we can add our attributes with the SubElement function, we can also use the set() function, as we do in the following code. The element text is created with the text property of the Element object.
4. In the last 3 lines of the code below we create a string out of the XML tree, and we write that data to a file we open.

Executing this code will result in a new file, «items2.xml», which should be equivalent to the original «items.xml» file, at least in terms of the XML data structure. You’ll probably notice that it the resulting string is only one line and contains no indentation, however.

Finding XML Elements

Using ElementTree

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

In addition, there is another helper function that returns the text of the first node that matches the given criterion:

Here is some example code to show you exactly how these functions operate:

And here is the reuslt of running this code:

Modifying XML Elements

Using ElementTree

The ElementTree module presents several tools for modifying existing XML documents. The example below shows how to change the name of a node, change the name of an attribute and modify its value, and how to add an extra attribute to an element.

A node text can be changed by specifying the new value in the text field of the node object. The attribute’s name can be redefined by using the set(name, value) function. The set function doesn’t have to just work on an existing attribute, it can also be used to define a new attribute.

The code below shows how to perform these operations:

After running the code, the resulting XML file «newitems.xml» will have an XML tree with the following data:

As we can see when comparing with the original XML file, the names of the item elements have changed to «newitem», the text to «new text», and the attribute «name2» has been added to both nodes.

You may also notice that writing XML data in this way (calling tree.write with a file name) adds some more formatting to the XML tree so it contains newlines and indentation.

Creating XML Sub-Elements

Using ElementTree

The ElementTree module has more than one way to add a new element. The first way we’ll look at is by using the makeelement() function, which has the node name and a dictionary with its attributes as parameters.

The second way is through the SubElement() class, which takes in the parent element and a dictionary of attributes as inputs.

In our example below we show both methods. In the first case the node has no attributes, so we created an empty dictionary ( attrib = <> ). In the second case, we use a populated dictionary to create the attributes.

After running this code the resulting XML file will look like this:

As we can see when comparing with the original file, the «seconditems» element and its sub-element «seconditem» have been added. In addition, the «seconditem» node has «name2» as an attribute, and its text is «seconditemabc», as expected.

Deleting XML Elements

Using ElementTree

As you’d probably expect, the ElementTree module has the necessary functionality to delete node’s attributes and sub-elements.

Deleting an attribute

The result will be the following XML file:

As we can see in the XML code above, the first item has no attribute «name».

Deleting one sub-element

One specific sub-element can be deleted using the remove function. This function must specify the node that we want to remove.

The following example shows us how to use it:

The result will be the following XML file:

As we can see from the XML code above, there is now only one «item» node. The second one has been removed from the original tree.

Deleting all sub-elements

The ElementTree module presents us with the clear() function, which can be used to remove all sub-elements of a given element.

The example below shows us how to use clear() :

The result will be the following XML file:

As we can see in the XML code above, all sub-elements of the «items» element have been removed from the tree.

Wrapping Up

Python offers several options to handle XML files. In this article we have reviewed the ElementTree module, and used it to parse, create, modify and delete XML files. We have also used the minidom model to parse XML files. Personally, I’d recommend using the ElementTree module as it is much easier to work with and is the more modern module of the two.

How to read XML file in Python

In this article, we will learn various ways to read XML files in Python. We will use some built-in modules and libraries available in Python and some related custom examples as well. Let’s first have a quick look over the full form of XML, introduction to XML, and then read about various parsing modules to read XML documents in Python.

Introduction to XML

In this article, we would take a look at four different ways to read XML documents using different XML modules. They are:

1. MiniDOM(Minimal Document Object Model)

2. BeautifulSoup alongside the lxml parser

4. Simple API for XML (SAX)

XML File: We are using this XML file to read in our examples.

Read XML File Using MiniDOM

It is Python module, used to read XML file. It provides parse() function to read XML file. We must import Minidom first before using its function in the application. The syntax of this function is given below.

Syntax

This function returns a document of XML type.

Example Read XML File in Python

Since each node will be treated as an object, we are able to access the attributes and text of an element using the properties of the object. Look at the example below, we’ve accessed the attributes and text of a selected node.

model #2 attribute:
model2
All attributes:
model1
model2
model #2 data:
model2abc
model2abc
All model data:
model1abc
model2abc

Read XML File Using BeautifulSoup alongside the lxml parser

After successful installation, use these libraries in python code.

We are using this XML file to read with Python code.

Example Read XML File in Python

Let’s read the above file using beautifulsoup library in python script.

Read XML File Using Element Tree

The Element tree module provides us with multiple tools for manipulating XML files. No installation is required. Due to the XML format present in the hierarchical data format, it becomes easier to represent it by a tree. Element Tree represents the whole XML document as a single tree.

Example Read XML File in Python

Read XML File Using Simple API for XML (SAX)

In this method, first, register callbacks for events that occur, then the parser proceeds through the document. this can be useful when documents are large or memory limitations are present. It parses the file because it reads it from disk and also the entire file isn’t stored in memory. Reading XML using this method requires the creation of ContentHandler by subclassing xml.sax.ContentHandler.

Note: This method might not be compatible with Python 3 version. Please check your version before implementing this method.

XML file

Python Code Example

*****Model*****
Model number: ST001
Price: 35000
Quantity: 12
Company: Samsung
*****Model*****
Model number: RW345
Price: 46500
Quantity: 14
Company: Onida
*****Model*****
Model number: EX366
Price: 30000
Quantity: 8
Company: Lenovo
*****Model*****
Model number: FU699
Price: 45000
Quantity: 12
Company: Acer

Conclusion

Чтение и запись XML файлов в Python‭

XML,‭ ‬или Extensible Markup Language‭ (‬расширяемый язык разметки‭) – ‬это язык разметки,‭ ‬часто используемый,‭ ‬чтобы структурировать,‭ ‬хранить и передавать данные между системами.‭ ‬Хотя и не так часто,‭ ‬как ранее,‭ ‬но он ещё используется в таких сервисах,‭ ‬как RSS и SOAP,‭ ‬а также для структурирования файлов наподобие документов Microsoft Office.

Поскольку Python‭ – ‬популярный язык для сети и анализа данных,‭ ‬вероятно,‭ ‬вам потребуется читать или записывать данные XML,‭ ‬в таком случае вам повезло. Читайте также: “Топ 5 интересных языков программирования для новичков”.

На протяжении этой статьи мы в первую очередь взглянем на модуль ElementTree для чтения,‭ ‬записи и изменения файлов XML.‭ ‬Мы также сравним его с более старым модулем minidom в первых нескольких главах.

Модули XML

Модуль ElementTree предлагает более‭ «‬питоний‭» ‬интерфейс обращения с XML и является хорошим выбором для тех,‭ ‬кто не знаком с DOM.‭ ‬Также он кажется лучшим кандидатом для использования программистами-новичками благодаря простому интерфейсу,‭ ‬что вы увидите в этой статье.

Пример файла XML

В примерах ниже мы будем использовать следующий файл XML,‭ ‬который мы сохраним как‭ “‬items.xml‭”‬:

Как вы можете видеть,‭ ‬это весьма простой пример XML,‭ ‬содержащий лишь немного вложенных объектов и один атрибут.‭ ‬Хотя этого должно быть достаточно,‭ ‬чтобы показать все операции с XML в этой статье.

Чтение документов‭ ‬XML

Использование minidom

Здесь имя файла может быть строкой,‭ ‬содержащей путь к файлу или объект файлового типа.‭ ‬Функция возвращает документ,‭ ‬который можно обработать как тип XML.‭ ‬Итак,‭ ‬мы можем использовать функцию getElementByTagName‭() ‬,‭ ‬чтобы найти определённый тэг.

Поскольку каждый узел можно рассматривать как объект,‭ ‬мы можем получить доступ к атрибутам и тексту элемента через свойства объекта.‭ ‬В примере ниже мы добрались до атрибутов и текста отдельного узла и всех узлов вместе.

Результат выглядит так:

Также,‭ ‬если данные XML уже были загружены как строка,‭ ‬то мы могли бы использовать вместо этого функцию parseString‭() ‬.‭

Использование ElementTree

Вслед за импортом мы создаём структуру дерева при помощи функции parse и получаем его корневой элемент.‭ ‬Как только добрались до корневого узла,‭ ‬мы можем легко путешествовать по дереву,‭ ‬поскольку оно является связным графом.

С помощью ElementTree мы можем,‭ ‬подобно примеру выше,‭ ‬получить атрибуты узла и текст,‭ ‬используя объекты,‭ ‬связанные с каждым узлом.

Код выглядит так:‭

Результат будет выглядеть следующим образом:

Подсчёт элементов в документе XML

Использование minidom

Имейте в виду,‭ ‬что этот код только посчитает число элементов-потомков там,‭ ‬где мы запускаем len‭() ‬,‭ ‬в данном случае у корневого узла.‭ ‬Если вы хотите найти все подэлементы в гораздо большем дереве,‭ ‬вам придётся обойти все элементы и сосчитать каждого из их потомков.

Использование ElementTree

Похожим образом модуль ElementTree позволяет нам посчитать количество узлов, соединённых с некоторым узлом.

Результат выглядит так:

Запись документов XML

Использование ElementTree

ElementTree также хорош для записи данных в файлы XML. Код ниже показывает, как создать файл XML с той же самой структурой, как файл, что мы использовали в прошлых примерах.

Запустив этот код, получим новый файл‭ “‬items2.xml‭”‬,‭ который должен совпадать с исходным файлом “‬items.xml‭”‬,‭ по крайней мере в смысле структуры данных‬ XML.‭ Возможно вы заметите, что в результате получается одна строка без отступов‬.

Поиск элементов XML

Использование ElementTree

Более того, есть ещё одна вспомогательная функция, которая возвращает текст первого узла, удовлетворяющего заданному критерию:

Вот пример кода, чтобы показать вам, как работают эти функции:

Вот результат запуска этого кода:

Изменение элементов XML

Использование ElementTree

Модуль ElementTree предоставляет несколько инструментов, чтобы изменить существующие документы XML. Пример ниже показывает, как изменить имя узла, атрибута и модифицировать его значение, и как добавить лишний атрибут к элементу.

Код ниже показывает, как проделывать все эти операции:

После запуска кода итоговый файл XML ‭”‬newitems.xml‭” будет иметь дерево ‬XML со следующими данными:

Как мы можем увидеть, по сравнению с исходным файлом XML,‭ имена элементов изменились ан “‬newitem‭”‬,‭ ‬текст на‭ “‬new text‭”‬,‭ ‬и атрибут‭ “‬name2‭” ‬добавлен к обоим узлам.

Вы также можете заметить, что запись данных XML подобным образом (вызывая tree.write с именем файла) добавляет форматирование к дереву XML, так что оно содержит новые строки и отступы.

Создание подэлементов XML

Использование ElementTree

В примере ниже мы показываем оба метода. В первом случае узел не имеет атрибутов, так что мы создали пустой словарь ( attrib = <> ). Во втором случае мы используем заполненный словарь, чтобы создать атрибуты.

После запуска кода итоговый файл XML будет выглядеть так:

Как мы можем видеть, сравнив с исходным файлом, добавлены элемент ‭”‬seconditems‭” ‬и его подэлемент‭ “‬seconditem‭”‬.‭ ‬К тому же,‭ ‬узел‭ “‬seconditem‭” ‬имеет‭ “‬name2‭” ‬в виде атрибута,‭ ‬и его текст‭ “‬seconditemabc‭”‬,‭ ‬как и ожидалось.

Удаление элементов XML

Использование ElementTree

Как вы уже могли заметить, модуль ElementTree содержит необходимый функционал, чтобы удалить атрибуты и подэлементы узла.

Удаление атрибута

В итоге получится следующий файл XML:

Как мы можем видеть в коде XML выше,‭ ‬у первого элемента нет атрибута‭ “‬name‭”‬.

Удаление одного подэлемента

Следующий пример показывает её использование:

В итоге получим следующий файл XML:

Как мы можем видеть из кода XML выше,‭ ‬теперь только один узел‭ “‬item‭”‬.‭ Второй был удалён из исходного дерева‬.

Удаление всех подэлементов

Пример ниже показывает нам, как использовать функцию clear() :

В итоге будет следующий файл XML:

Как мы можем видеть в коде XML выше,‭ все подэлементы элемента “‬items‭” удалены из дерева‬.

Подведём итоги

Источники информации:

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *