Archive for the 'xml' Category

17
Jan
12

easy xml parsing in c#

Just a simple usage example:


using System.Xml;
...
XmlDocument doc = new XmlDocument();
// loading a file
doc.Load("file.xml");

// searching for multiple nodes via XPath; this will select all  elements wherever they may be
XmlNodeList books = doc.SelectNodes("//books");

foreach(XmlNode node in books)
{
   // getting attributes; assuming <book Author="Whatever">
   XmlAttribute author = node.Attributes["Author"];
   string value = author.Value;
}

30
Sep
11

validate an xml using xmllint and a xsd


Here’s how:


xmllint.exe --schema some_schema.xsd some.xml

03
Sep
11

xml content not allowed in prolog


If you get this error:


build.xml:2: Content is not allowed in prolog

The first thing to check is that you don’t have anything before:


<?xml version="1.0" encoding="utf-8"?>

The second, check that you don’t have a strange encoding. For example, one of the reasons it failed for me, was that build.xml was encoded using UCS2 Little Endian.

03
Sep
11

nokogiri strict parsing


By default Nokogiri is very forgiving. But, it’s forgiveness can be a source of bugs, as I experienced first-hand today. Here’s how you process an XML in a strict way:


Nokogiri::XML( File.read( file ) ) do |config|
   config.strict
end

09
Aug
11

XPath with REXML


Here’s how you search by XPath in REXML:


XPath.each(node,"//*[@WhateverAttribute]") do |found_element|

end

11
Jan
10

SAXException: Invalid byte 2 of 2-byte UTF-8 sequence


If you ever receive this exception: SAXException: Invalid byte 2 of 2-byte UTF-8 sequence, then it means that the XML file you’re trying to read wasn’t encoded in UTF-8. If by any chance,you’re creating that XML, make sure you use the proper encoding when writing the file ( in this case UTF-8 ).

15
Oct
09

getting a basename from a path using ant


If you have a path written in a file, and you’d like to get a basename from it, like in the following example:

/this/is/my/path.txt

after you run this target, you will have only path.txt. Here’s the code that makes that possible:


<target name="rgx" description="basename equivalent">
	<replaceregexp file="a.txt" match="(.*/)|(.*\\)" replace="" byline="true"/>
</target>

12
Oct
09

remove duplicate xml nodes using nokogiri


A few functions I wrote to remove duplicate nodes. From my tests, it appears to be working:


require "nokogiri"
require "ruby-debug"

# the data we're testing
data = <<EOF
<bla>
	<father>
		<mini id="3"/>
		<mini id="5"/>
		<mini id="3"/>
	</father>
</bla>
EOF

# check and see if the attributes are the same
# probably could be done shorter
def same_attributes?(attr1,attr2)
	attr1.each do |k,v|
		if attr2.has_key?(k)
			a1v = v.value
			a2v = attr2[k].value
			if a1v != a2v
				return false
			end
		else
			return false
		end
	end
	# do it the other way so no key is left out
	attr2.each do |k,v|
		if !attr1.has_key?(k)
			return false
		end
	end
	return true
end

# recursively check if 2 nodes are the same
def same_nodes?(node1,node2,truth_array=[])
	if node1.nil? || node2.nil?
		return false
	end
	if node1.name != node2.name
		return false
	end
        if node1.text != node2.text
                return false
        end
	node1_attrs = node1.attributes
	node2_attrs = node2.attributes
	truth_array << same_attributes?(node1_attrs,node2_attrs)
	node1_kids = node1.children
	node2_kids = node2.children
	node1_kids.zip(node2_kids).each do |pair|
		truth_array << same_nodes?(pair[0],pair[1])
	end
	# if every value in the array is true, then the nodes are equal
	return truth_array.all?
end

# removes duplicate nodes recursively from a document
def remove_copies(node)
	node_names = node.children.select {|kid| kid.name != "text" }.collect {|k| k.name}
	node_names.uniq!
	node_names.each {|name| remove_duplicates(node,name)}
	node.children.each {|k| remove_copies(k)}
end

# remove named child duplicates from a node
def remove_duplicates(node,child_name)
	ex_childs = node.children.select {|kid| kid.name == child_name}
	node.children.each {|k| k.remove if k.name == child_name}
	added_nodes = []
	ex_childs.each do |ec|
		add_me = true
		added_nodes.each do |added_node|
			if same_nodes?(added_node,ec)
				add_me = false
			end
		end
		if add_me
			node.add_child(ec)
			added_nodes << ec
		end
	end
end

node = Nokogiri::XML(data)
remove_copies(node)
node.children.each {|c| puts c}

I’m curious if there are other solutions to this problem.




Blog Stats

  • 116,541 hits

Follow

Get every new post delivered to your Inbox.