Accessing and Manipulating XML Data
Terms you'll need to understand:
- DiffGram
- Document Object Model (DOM)
- Document Type Definition (DTD)
- Valid XML
- Well-formed XML
- XPath
Techniques you'll need to master:
- Retrieving information from XML files by using the Document Object Model, XmlReader class, XmlDocument class, and XmlNode class
- Synchronizing DataSet data with XML via the XmlDataDocument class
- Executing XML queries with XPath and the XPathNavigator class
- Validating XML against XML Schema Design (XSD) and Document Type Definition (DTD) files
- Generating XML from SQL Server databases
- Updating SQL Server databases with DiffGrams
You can't use the .NET Framework effectively unless you're familiar with XML. XML is pervasive in .NET, and it's especially important for the distributed applications covered on the 70-310 exam. The System.Xml namespace contains classes to parse, validate, and manipulate XML. You can read and write XML, use XPath to navigate through an XML document, or check to see whether a particular document is valid XML by using the objects in this namespace.
NOTE
In this chapter, I've assumed that you're already familiar with the basics of XML, such as elements and attributes. If you need a refresher course on XML basics, refer to Appendix B, "XML Standards and Syntax."
Accessing an XML File
In this section, you'll learn how to extract information from an XML file. I'll start by showing you how you can use the XmlReader object to move through an XML file, extracting information as you go. Then you'll see how other objects, including the XmlNode and XmlDocument objects, provide a more structured view of an XML file.
I'll work with a very simple XML file named Books.xml that represents three books a computer bookstore might stock. Here's the raw XML file:
<?xml version="1.0" encoding="UTF-8"?> <Books> <Book Pages="1109"> <Author>Gunderloy, Mike</Author> <Title>Exam 70-306 Training Guide</Title> <Publisher>Que</Publisher> </Book> <Book Pages="357"> <Author>Wildermuth, Shawn</Author> <Title>Pragmatic ADO.NET</Title> <Publisher>Addison-Wesley</Publisher> </Book> <Book Pages="484"> <Author>Burton, Kevin</Author> <Title>.NET Common Language Runtime Unleashed</Title> <Publisher>Sams</Publisher> </Book> </Books>
Understanding the DOM
The Document Object Model, or DOM, is an Internet standard for representing the information contained in an HTML or XML document as a tree of nodes. Like many other Internet standards, the DOM is an official standard of the World Wide Web Consortium, better known as the W3C. You can find it at http://www.w3.org/DOM.
In its simplest form, the DOM defines an XML document as consisting as a tree of nodes. The root element in the XML file becomes the root node of the tree, and other elements become child nodes. The DOM provides the standard for constructing this tree, including a classification for individual nodes and rules for which nodes can have children.
TIP
In the DOM, attributes are not represented as nodes within the tree. Rather, attributes are considered to be properties of their parent elements. You'll see later in the chapter that this is reflected in the classes provided by the .NET Framework for reading XML files.
Using an XmlReader Object
The XmlReader class is designed to provide forward-only, read-only access to an XML file. This class treats an XML file similar to the way a cursor treats a resultset from a database. At any given time, there is one current node within the XML file, represented by a pointer that you can move around within the file. The class implements a Read method that returns the next XML node to the calling application. The XmlReader class has many other members, as shown in Table 3.1.
Table 3.1 Important Members of the XmlReader Class
Member |
Type |
Description |
Depth |
Property |
The depth of the current node in the XML document |
EOF |
Property |
A Boolean property that is True when the current node pointer is at the end of the XML file |
GetAttribute |
Method |
Gets the value of an attribute |
HasAttributes |
Property |
True when the current node contains attributes |
HasValue |
Property |
True when the current node is a type that has a Value property |
IsEmptyElement |
Property |
True when the current node represents an empty XML element |
IsStartElement |
Method |
Determines whether the current node is a start tag |
Item |
Property |
An indexed collection of attributes for the current node (if any) |
MoveToElement |
Method |
Moves to the element containing the current attribute |
MoveToFirstAttribute |
Method |
Moves to the first attribute of the current element |
MoveToNextAttribute |
Method |
Moves to the next attribute |
Name |
Property |
The qualified name of the current node |
NodeType |
Property |
The type of the current node |
Read |
Method |
Reads the next node from the XML file |
Skip |
Method |
Skips the children of the current element |
Value |
Property |
The value of the current node |
The XmlReader class is a purely abstract class. That is, this class is marked with the MustInherit modifier; you cannot create an instance of XmlReader in your own application. Generally, you'll use the XmlTextReader class instead. The XmlTextReader class implements XmlReader for use with text streams. Here's how you might use this class to dump the nodes of an XML file to a ListBox control:
Private Sub btnReadXml_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnReadXML.Click Dim intI As Integer Dim intJ As Integer Dim strNode As String ' Create a new XmlTextReader on the file Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Walk through the entire XML file Do While xtr.Read If (xtr.NodeType = XmlNodeType.Element) Or _ (xtr.NodeType = XmlNodeType.Text) Then strNode = "" For intI = 1 To xtr.Depth strNode &= " " Next strNode = strNode & xtr.Name & " " strNode &= xtr.NodeType.ToString If xtr.HasValue Then strNode = strNode & ": " & xtr.Value End If lbNodes.Items.Add(strNode) ' Now add the attributes, if any If xtr.HasAttributes Then While xtr.MoveToNextAttribute strNode = "" For intI = 1 To xtr.Depth strNode &= " " Next strNode = strNode & xtr.Name & " " strNode &= xtr.NodeType.ToString If xtr.HasValue Then strNode = strNode & ": " & _ xtr.Value End If lbNodes.Items.Add(strNode) End While End If End If Loop ' Clean up xtr.Close() End Sub
Figure 3.1 shows the view of the sample Books.xml file produced by this code.
Figure 3.1 An XML file translated into schematic form by an XmlTextReader object.NOTE
This and other examples in this chapter assume that the XML file is located in the bin folder of your Visual Basic .NET project.
The DOM includes nodes for everything in the XML file, including the XML declaration and any whitespace (such as the line feeds and carriage returns that separate lines of the files). On the other hand, the node tree doesn't include XML attributes, though you can retrieve them from the parent elements. However, the DOM and the XmlTextReader are flexible enough that you can customize their work as you like. Note the use of the NodeType property and the MoveToNextAttribute method in this example to display just the elements, text nodes, and attributes from the file.
CAUTION
Alternatively, you can retrieve attributes by using the Item property of the XmlTextReader. If the current node represents an element in the XML file, the following code will retrieve the value of the first attribute of the element:
xtr.Items(0)
This code will retrieve the value of an attribute named Page:
xtr.Item("Page")
The XMLNode Class
The code you saw in the previous example deals with nodes as part of a stream of information returned by the XmlTextReader object. But the .NET Framework also includes another class, XmlNode, that can be used to represent an individual node from the DOM representation of an XML document. If you instantiate an XmlNode object to represent a particular portion of an XML document, you can alter the properties of the object and then write the changes back to the original file. The DOM provides two-way access to the underlying XML in this case.
NOTE
In addition to XmlNode, the System.Xml namespace also contains a set of classes that represent particular types of nodes: XmlAttribute, XmlComment, XmlElement, and so on. These classes all inherit from the XmlNode class.
The XmlNode class has a rich interface of properties and methods. You can retrieve or set information about the entity represented by an XmlNode object, or you can use its methods to navigate the DOM. Table 3.2 shows the important members of the XmlNode class.
Table 3.2 Important Members of the XmlNode Class
Member |
Type |
Description |
AppendChild |
Method |
Adds a new child node to the end of this node's list of children |
Attributes |
Property |
Returns the attributes of the node as an XmlAttributeCollection |
ChildNodes |
Property |
Returns all child nodes of this node |
CloneNode |
Method |
Creates a duplicate of this node |
FirstChild |
Property |
Returns the first child node of this node |
HasChildNodes |
Property |
True if this node has any children |
InnerText |
Property |
The value of the node and all its children |
InnerXml |
Property |
The markup representing only the children of this node |
InsertAfter |
Method |
Inserts a new node after this node |
InsertBefore |
Method |
Inserts a new node before this node |
LastChild |
Property |
Returns the last child node of this node |
Name |
Property |
The name of the node |
NextSibling |
Property |
Returns the next child of this node's parent node |
NodeType |
Property |
The type of this node |
OuterXml |
Property |
The markup representing this node and its children |
OwnerDocument |
Property |
The XmlDocument object that contains this node |
ParentNode |
Property |
Returns the parent of this node |
PrependChild |
Method |
Adds a new child node to the beginning of this node's list of children |
PreviousSibling |
Property |
Returns the previous child of this node's parent node |
RemoveAll |
Method |
Removes all children of this node |
RemoveChild |
Method |
Removes a specified child of this node |
ReplaceChild |
Method |
Replaces a child of this node with a new node |
SelectNodes |
Method |
Selects a group of nodes matching an XPath expression |
SelectSingleNode |
Method |
Selects the first node matching an XPath expression |
WriteContentTo |
Method |
Writes all children of this node to an XmlWriter object |
WriteTo |
Method |
Writes this node to an XmlWriter |
The XmlDocument Class
You can't directly create an XmlNode object that represents an entity from a particular XML document. Instead, you can retrieve XmlNode objects from an XmlDocument object. The XmlDocument object represents an entire XML document. By combining the XmlNode and XmlDocument objects, you can navigate through the DOM representation of an XML document. For example, you can recursively dump the contents of an XML file to a ListBox control with this code:
Private Sub btnReadXML_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnReadXML.Click Dim intI As Integer Dim intJ As Integer Dim strNode As String ' Create a new XmlTextReader on the file Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Load the XML file to an XmlDocument xtr.WhitespaceHandling = WhitespaceHandling.None Dim xd As XmlDocument = New XmlDocument() xd.Load(xtr) ' Get the document root Dim xnodRoot As XmlNode = xd.DocumentElement ' Walk the tree and display it Dim xnodWorking As XmlNode If xnodRoot.HasChildNodes Then xnodWorking = xnodRoot.FirstChild While Not IsNothing(xnodWorking) AddChildren(xnodWorking, 0) xnodWorking = xnodWorking.NextSibling End While End If ' Clean up xtr.Close() End Sub Private Sub AddChildren(ByVal xnod As XmlNode, _ ByVal Depth As Integer) ' Add this node to the listbox Dim strNode As String Dim intI As Integer Dim intJ As Integer Dim atts As XmlAttributeCollection ' Only process Text and Element nodes If (xnod.NodeType = XmlNodeType.Element) Or _ (xnod.NodeType = XmlNodeType.Text) Then strNode = "" For intI = 1 To Depth strNode &= " " Next strNode = strNode & xnod.Name & " " strNode &= xnod.NodeType.ToString strNode = strNode & ": " & xnod.Value lbNodes.Items.Add(strNode) ' Now add the attributes, if any atts = xnod.Attributes If Not atts Is Nothing Then For intJ = 0 To atts.Count - 1 strNode = "" For intI = 1 To Depth + 1 strNode &= " " Next strNode = strNode & _ atts(intJ).Name & " " strNode &= atts(intJ).NodeType.ToString strNode = strNode & ": " & _ atts(intJ).Value lbNodes.Items.Add(strNode) Next End If ' And recursively walk ' the children of this node Dim xnodworking As XmlNode If xnod.HasChildNodes Then xnodworking = xnod.FirstChild While Not IsNothing(xnodworking) AddChildren(xnodworking, Depth + 1) xnodworking = xnodworking.NextSibling End While End If End If End Sub
The XmlDocument class includes a number of other useful members. Table 3.3 lists the most important of these.
Table 3.3 Important Members of the XmlDocument Class
Member |
Type |
Description |
CreateAttribute |
Method |
Creates an attribute node |
CreateElement |
Method |
Creates an element node |
CreateNode |
Method |
Creates an XmlNode object |
DocumentElement |
Property |
Returns the root XmlNode for this document |
DocumentType |
Property |
-Returns the node containing the DTD declaration for this document, if it has one |
ImportNode |
Method |
Imports a node from another XML document |
Load |
Method |
Loads an XML document into the XmlDocument |
LoadXml |
Method |
Loads the XmlDocument from a string of XML data |
NodeChanged |
Event |
Fires after the value of a node has been changed |
NodeChanging |
Event |
Fires when the value of a node is about to be changed |
NodeInserted |
Event |
Fires when a new node has been inserted |
NodeInserting |
Event |
Fires when a new node is about to be inserted |
NodeRemoved |
Event |
Fires when a node has been removed |
NodeRemoving |
Event |
Fires when a node is about to be removed |
PreserveWhitespace |
Property |
-True if whitespace in the document should be preserved when loading or saving the XML |
Save |
Method |
Saves the XmlDocument as a file or stream |
WriteTo |
Method |
Saves the XmlDocument to an XmlWriter |