Ci-dessous, les différences entre deux révisions de la page.
Prochaine révision | Révision précédente | ||
python:xml [2021/08/30 04:49] marclebrun créée |
python:xml [2023/09/16 14:08] (Version actuelle) marclebrun |
||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
====== XML ====== | ====== XML ====== | ||
+ | |||
+ | ===== Utiliser ElementTree ===== | ||
+ | |||
+ | * Doc : [[https://docs.python.org/3/library/xml.etree.elementtree.html]] | ||
+ | |||
+ | <code python> | ||
+ | import xml.etree.ElementTree as ET | ||
+ | tree = ET.parse("yourXMLfile.xml") | ||
+ | root = tree.getroot() | ||
+ | </code> | ||
+ | |||
+ | <code python> | ||
+ | for child in root: | ||
+ | print(child.tag, child.attrib) | ||
+ | </code> | ||
+ | |||
+ | <code python> | ||
+ | print(root[0][1].text) | ||
+ | </code> | ||
+ | |||
+ | <code python> | ||
+ | print(root.findall("myTag")) | ||
+ | print(root[0].find("myOtherTag")) | ||
+ | </code> | ||
+ | |||
+ | ===== Utiliser la librairie lxml ===== | ||
* Tutorial **lxml** : [[https://lxml.de/tutorial.html]] | * Tutorial **lxml** : [[https://lxml.de/tutorial.html]] | ||
* Infos: [[https://python.doctor/page-xml-python-xpath]] | * Infos: [[https://python.doctor/page-xml-python-xpath]] | ||
+ | |||
+ | <code bash> | ||
+ | pip install lxml | ||
+ | </code> | ||
+ | |||
+ | ===== Utiliser BeautifulSoup4 ===== | ||
+ | |||
+ | <code bash> | ||
+ | pip install beautifulsoup4 | ||
+ | </code> | ||
+ | |||
+ | <code python> | ||
+ | from bs4 import BeautifulSoup | ||
+ | import requests | ||
+ | |||
+ | xmlDict = {} | ||
+ | |||
+ | r = requests.get("http://www.site.co.uk/sitemap.xml") | ||
+ | xml = r.text | ||
+ | |||
+ | soup = BeautifulSoup(xml) | ||
+ | sitemapTags = soup.find_all("sitemap") | ||
+ | |||
+ | print "The number of sitemaps are {0}".format(len(sitemapTags)) | ||
+ | |||
+ | for sitemap in sitemapTags: | ||
+ | xmlDict[sitemap.findNext("loc").text] = sitemap.findNext("lastmod").text | ||
+ | |||
+ | print xmlDict | ||
+ | </code> | ||