This example will demonstrate how to parse a sitemap file with Groovy's XmlSlurper. A sitemap typically in xml or html format is a file hosted on websites that contain a list of accessible pages for crawlers or users to view. Below we will make a request to leveluplunch's xml version of the sitemap and print out the elements. The highest level node is a urlset containing one to many urls. The url node has the location of the page, last modified date and priority elements. For a full listing and description of elements be sure to check out sitemaps.org
Reading sitemap
@Test
public void parse_rss_xml_feed() {
def siteMapLocation = "http://www.leveluplunch.com/sitemap.xml".toURL().text
def urlset = new XmlSlurper().parseText(siteMapLocation)
urlset.url.each{
println it.loc
println it.lastmod
println it.priority
println "^^^^^^^^"
}
}
Output
http://www.leveluplunch.com/blog/2014/10/21/solving-for-enum-inheritance-extend-mixin/
2014-10-21T11:46:18-05:00
0.8
^^^^^^^^
http://www.leveluplunch.com/blog/2014/10/09/why-agile-could-fail-in-large-enterprise/
2014-10-09T11:46:18-05:00
0.8
^^^^^^^^
http://www.leveluplunch.com/blog/2014/09/29/amazon-cloudfront-s3-subfolders-default-index/
2014-09-29T11:46:18-05:00
0.8
^^^^^^^^