Accessing an XML File
Access and Manipulate XML Data: Access an XML file by using the Document Object Model (DOM) and an XmlReader.
The most basic thing you can do with an XML file is open it and read it to find out what the file contains. The .NET Framework offers both unstructured and structured ways to access the data within an XML file. That is, you can treat the XML file either as a simple stream of information or as a hierarchical structure composed of different entities, such as elements and attributes.
In this section of the chapter you'll learn how to extract information from an XML file. I'll start by showing you how you can use the XmlReader object to move through an XML file, extracting information as you go. Then you'll see how other objects, including the XmlNode and XmlDocument objects, provide a more structured view of an XML file.
I'll work with a very simple XML file named Books.xml that represents three books that a computer bookstore might stock. Here's the raw XML file:
<?xml version="1.0" encoding="UTF-8"?> <Books> <Book Pages="1088"> <Author>Delaney, Kalen</Author> <Title>Inside Microsoft SQL Server 2000</Title> <Publisher>Microsoft Press</Publisher> </Book> <Book Pages="997"> <Author>Burton, Kevin</Author> <Title>.NET Common Language Runtime</Title> <Publisher>Sams</Publisher> </Book> <Book Pages="392"> <Author>Cooper, James W.</Author> <Title>C# Design Patterns</Title> <Publisher>Addison Wesley</Publisher> </Book> </Books>
Understanding the DOM
The Document Object Model, or DOM, is an Internet standard for representing the information contained in an HTML or XML document as a tree of nodes. Like many other Internet standards, the DOM is an official standard of the World Wide Web Consortium, better known as the W3C.
Even though there is a DOM standard, not all vendors implement the DOM in exactly the same way. The major issue is that there are actually several different standards grouped together under the general name of DOM. Also, vendors pick and choose which parts of these standards to implement. The .NET Framework includes support for the DOM Level 1 Core and DOM Level 2 Core specifications, but it also extends the DOM by adding additional objects, methods, and properties to the specification.
NOTE
DOM Background You can find the official DOM specifications at http://www.w3.org/DOM. For details of Microsoft's implementation in the .NET Framework, see the "XML Document Object Model (DOM)" topic in the .NET Framework Developer's Guide.
Structurally, an XML document is a series of nested items, including elements and attributes. Any nested structure can be transformed to an equivalent tree structure if the outermost nested item is made the root of the tree, the next-in items the children of the root, and so on. The DOM provides the standard for constructing this tree, including a classification for individual nodes and rules for which nodes can have children. Figure 2.1 shows how the Books.xml file might be represented as a tree.
Figure 2.1 You can represent an XML file as a tree of nodes.
In its simplest form, the DOM defines an XML document as a tree of nodes. The root element in the XML file becomes the root node of the tree, and other elements become child nodes.
TIP
Attributes in the DOM In the DOM, attributes are not represented as nodes within the tree. Rather, attributes are considered to be properties of their parent elements. You'll see later in the chapter that this is reflected in the classes provided by the .NET Framework for reading XML files.
Using an XmlReader Object
The XmlReader class is designed to provide forward-only, read-only access to an XML file. This class treats an XML file similarly to the way that a cursor treats a resultset from a database. At any given time, there is one current node within the XML file, represented by a pointer that you can move around within the file. The class implements a Read() method that returns the next XML node to the calling application. There are also many other members in the XmlReader class; I've listed some of these in Table 2.1.
Table 2.1 Important Members of the XmlReader Class
Member |
Type |
Description |
Depth |
Property |
Specifies the depth of the current node in the XML document |
EOF |
Property |
Represents a Boolean property that is true when the current node pointer is at the end of the XML file |
GetAttribute() |
Method |
Gets the value of an attribute |
HasAttributes |
Property |
Returns true when the current node contains attributes |
HasValue |
Property |
Returns true when the current node can have a Value property |
IsEmptyElement |
Property |
Returns true when the current node represents an empty XML element |
IsStartElement() |
Method |
Determines whether the current node is a start tag |
MoveToElement() |
Method |
Moves to the element containing the current attribute |
MoveToFirstAttribute() |
Method |
Moves to the first attribute of the current element |
MoveToNextAttribute() |
Method |
Moves to the next attribute |
Name |
Property |
Specifies a qualified name of the current node |
NodeType |
Property |
Specifies the type of the current node |
Read() |
Method |
Reads the next node from the XML file |
Skip() |
Method |
Skips the children of the current element |
Value |
Property |
Specifies the value of the current node |
The XmlReader class is a purely abstract class. You cannot create an instance of XmlReader in your own application. Generally, you'll use the XmlTextReader class instead. The XmlTextReader class implements XmlReader for use with text streams. Step-by-Step 2.1 shows you how to use the XmlTextReader class.
STEP BY STEP 2.1 - Using the XmlTextReader Class
-
Create a new Visual C# .NET Windows application. Name the application 320C02.
-
Right-click on the project node in Solution Explorer and select Add, Add New Item.
-
Select the Local Project Items node in the Categories tree view. Select the XML File template. Name the new file Books.xml and click OK.
-
Modify the code for the Books.xml file as follows:
<?xml version="1.0" encoding="UTF-8"?> <Books> <Book Pages="1088"> <Author>Delaney, Kalen</Author> <Title>Inside Microsoft SQL Server 2000</Title> <Publisher>Microsoft Press</Publisher> </Book> <Book Pages="997"> <Author>Burton, Kevin</Author> <Title>.NET Common Language Runtime</Title> <Publisher>Sams</Publisher> </Book> <Book Pages="392"> <Author>Cooper, James W.</Author> <Title>C# Design Patterns</Title> <Publisher>Addison Wesley</Publisher> </Book> </Books>
-
Add a new form to the project. Name the new form StepByStep2_1.cs.
-
Add a Button control (btnReadXml) and a ListBox control (lbNodes) to the form.
-
Switch to the code view and add the following using directives:
using System.Xml; using System.Text;
-
Double-click the Button control and add the following code to handle the button's Click event:
private void btnReadXML_Click( object sender, System.EventArgs e) { StringBuilder sbNode = new StringBuilder(); // Create a new XmlTextReader on the file XmlTextReader xtr = new XmlTextReader(@"..\..\Books.xml"); // Walk through the entire XML file while(xtr.Read()) { sbNode.Length = 0; for(int intI=1; intI <= xtr.Depth ; intI++) { sbNode.Append(" "); } sbNode.Append(xtr.Name + " "); sbNode.Append(xtr.NodeType.ToString()); if (xtr.HasValue) { sbNode.Append(": " + xtr.Value); } lbNodes.Items.Add(sbNode.ToString()); } // Clean up xtr.Close(); }
-
Insert the Main() method to launch the form. Set the form as the startup object for the project.
-
Run the project. Click the button. You'll see a schematic representation of the XML file, as shown in Figure 2.2.
Figure 2.2 An XML file translated into schematic form by an XmlTextReader object.
As you can see in Step-by-Step 2.1, the output has everything in the XML file, including the XML declaration and any whitespace (such as the line feeds and carriage returns that separate lines of the files). On the other hand, the output doesn't include XML attributes. But the XmlTextReader is flexible enough that you can customize its behavior as you like.
Step-by-Step 2.2 shows an example where the code displays only elements, text, and attributes.
STEP BY STEP 2.2 - Using the XmlTextReader Class to Read Selected XML Entities
-
Add a new form to the project. Name the new form StepByStep2_2.cs.
-
Add a Button control (btnReadXml) and a ListBox control (lbNodes) to the form.
-
Switch to the code view and add the following using directives:
using System.Xml; using System.Text;
-
Double-click the button and add the following code to handle the button's Click event:
private void btnReadXML_Click( object sender, System.EventArgs e) { StringBuilder sbNode = new StringBuilder(); // Create a new XmlTextReader on the file XmlTextReader xtr = new XmlTextReader(@"..\..\Books.xml"); // Walk through the entire XML file while(xtr.Read()) { if((xtr.NodeType == XmlNodeType.Element) || (xtr.NodeType == XmlNodeType.Text) ) { sbNode.Length = 0; for(int intI=1; intI <= xtr.Depth ; intI++) { sbNode.Append(" "); } sbNode.Append(xtr.Name + " "); sbNode.Append(xtr.NodeType.ToString()); if (xtr.HasValue) { sbNode.Append(": " + xtr.Value); } lbNodes.Items.Add(sbNode.ToString()); // Now add the attributes, if any if (xtr.HasAttributes) { while(xtr.MoveToNextAttribute()) { sbNode.Length=0; for(int intI=1; intI <= xtr.Depth;intI++) { sbNode.Append(" "); } sbNode.Append(xtr.Name + " "); sbNode.Append( xtr.NodeType.ToString()); if (xtr.HasValue) { sbNode.Append(": " + xtr.Value); } lbNodes.Items.Add( sbNode.ToString()); } } } } // Clean up xtr.Close(); }
-
Insert the Main() method to launch the form. Set the form as the startup form for the project.
-
Run the project. Click the button. You'll see a schematic representation of the elements and attributes in the XML file, as shown in Figure 2.3.
Figure 2.3 Selected entities from an XML file translated into schematic form by an XmlTextReader object.
Note that XmlTextReader does not consider attributes to be nodes; however, XmlTextReader provides the MoveToNextAtttibute() method to treat them as nodes. Alternatively, you can retrieve attributes by using indexers on the XmlTextReader object. If the current node represents an element in the XML file, then this code retrieves the value of the first attribute of the element:
xtr[0]
This code retrieves the value of an attribute named Pages:
xtr["Pages"]
The XmlNode Class
The individual items in the tree representation of an XML file are called nodes. As you've seen in Step-by-Steps 2.1 and 2.2, many different entities within the XML file can be represented by nodes: elements, attributes, whitespace, end tags, and so on. The DOM distinguishes these different types of nodes by assigning a node type to each one. In the .NET Framework, the possible node types are listed by the XmlNodeType enumeration. Table 2.2 lists the members of this enumeration.
Table 2.2 Members of the XmlNodeType Enumeration
Member |
Represents |
Attribute |
An XML attribute |
CDATA |
An XML CDATA section |
Comment |
An XML comment |
Document |
The outermost element of the XML document (that is, the root of the tree representation of the XML) |
DocumentFragment |
The outermost element of an XML document's subsection |
DocumentType |
A Document Type Description (DTD) reference |
Element |
An XML element |
EndElement |
The closing tag of an XML element |
EndEntity |
The end of an included entity |
Entity |
An XML entity declaration |
EntityReference |
A reference to an entity |
None |
An XmlReader object that has not been initialized |
Notation |
An XML notation |
ProcessingInstruction |
An XML processing instruction |
SignificantWhitespace |
Whitespace that must be preserved to re-create the original XML document |
Text |
The text content of an attribute, element, or other node |
Whitespace |
Space between actual XML markup items |
XmlDeclaration |
The XML declaration |
The code you've seen so far in this chapter deals with nodes as part of a stream of information returned by the XmlTextReader object. But the .NET Framework also includes another class, XmlNode, which can be used to represent an individual node from the DOM representation of an XML document. If you instantiate an XmlNode object to represent a particular portion of an XML document, you can alter the properties of the object and then write the changes back to the original file. The DOM provides two-way access to the underlying XML in this case.
NOTE
Specialized Node Classes In addition to XmlNode, the System.Xml namespace also contains a set of classes that represent particular types of nodes: XmlAttribute, XmlComment, XmlElement, and so on. These classes all inherit from the XmlNode class.
The XmlNode class has a rich interface of properties and methods. You can retrieve or set information about the entity represented by an XmlNode object, or you can use its methods to navigate the DOM. Table 2.3 shows the important members of the XmlNode class.
Table 2.3 - Important Members of the XmlNode Class
Member |
Type |
Description |
AppendChild() |
Method |
Adds a new child node to the end of this node's list of children |
Attributes |
Property |
Returns the attributes of the node as an XmlAttributeCollection object |
ChildNodes |
Property |
Returns all child nodes of this node |
CloneNode() |
Method |
Creates a duplicate of this node |
FirstChild |
Property |
Returns the first child node of this node |
HasChildNodes |
Property |
Returns true if this node has any children |
InnerText |
Property |
Specifies the value of the node and all its children |
InnerXml |
Property |
Specifies the markup representing only the children of this node |
InsertAfter() |
Method |
Inserts a new node after this node |
InsertBefore() |
Method |
Inserts a new node before this node |
LastChild |
Property |
Returns the last child node of this node |
Name |
Property |
Specifies the node's name |
NextSibling |
Property |
Returns the next child of this node's parent node |
NodeType |
Property |
Specifies this node's type |
OuterXml |
Property |
Specifies the markup representing this node and its children |
OwnerDocument |
Property |
Specifies the XmlDocument object that contains this node |
ParentNode |
Property |
Returns this node's parent |
PrependChild() |
Method |
Adds a new child node to the beginning of this node's list of children |
PreviousSibling |
Property |
Returns the previous child of this node's parent node |
RemoveAll() |
Method |
Removes all children of this node |
RemoveChild() |
Method |
Removes a specified child of this node |
ReplaceChild() |
Method |
Replaces a child of this node with a new node |
SelectNodes() |
Method |
Selects a group of nodes matching an XPath expression |
SelectSingleNode() |
Method |
Selects the first node matching an XPath expression |
WriteContentTo() |
Method |
Writes all children of this node to an XmlWriter object |
WriteTo() |
Method |
Writes this node to an XmlWriter object |
The XmlDocument Class
There's no direct way to create an XmlNode object that represents an entity from a particular XML document. Instead, you can retrieve XmlNode objects from an XmlDocument object. The XmlDocument object represents an entire XML document. Step-by-Step 2.3 shows how you can use the XmlNode and XmlDocument objects to navigate through the DOM representation of an XML document.
STEP BY STEP 2.3 - Using the XmlDocument and XmlNode Classes
-
Add a new form to the project. Name the new form StepByStep2_3.cs.
-
Add a Button control (btnReadXml) and a ListBox control (lbNodes) to the form.
-
Switch to the code view and add the following using directives:
using System.Xml; using System.Text;
-
Double-click the button and add the following code to handle the button's Click event:
private void btnReadXML_Click( object sender, System.EventArgs e) { StringBuilder sbNode = new StringBuilder(); // Create a new XmlTextReader on the file XmlTextReader xtr = new XmlTextReader(@"..\..\Books.xml"); // Load the XML file to an XmlDocument xtr.WhitespaceHandling = WhitespaceHandling.None; XmlDocument xd = new XmlDocument(); xd.Load(xtr); // Get the document root XmlNode xnodRoot = xd.DocumentElement; // Walk the tree and display it XmlNode xnodWorking; if (xnodRoot.HasChildNodes) { xnodWorking = xnodRoot.FirstChild; while (xnodWorking != null) { AddChildren(xnodWorking, 0); xnodWorking = xnodWorking.NextSibling; } } // Clean up xtr.Close(); } private void AddChildren(XmlNode xnod, Int32 intDepth) { // Adds a node to the ListBox, // together with its children. // intDepth controls the depth of indenting StringBuilder sbNode = new StringBuilder(); // Process only Text and Element nodes if((xnod.NodeType == XmlNodeType.Element) || (xnod.NodeType == XmlNodeType.Text) ) { sbNode.Length = 0; for(int intI=1; intI <= intDepth ; intI++) { sbNode.Append(" "); } sbNode.Append(xnod.Name + " "); sbNode.Append(xnod.NodeType.ToString()); sbNode.Append(": " + xnod.Value); lbNodes.Items.Add(sbNode.ToString()); // Now add the attributes, if any XmlAttributeCollection atts = xnod.Attributes; if(atts != null) { for(int intI = 0; intI < atts.Count; intI++) { sbNode.Length = 0; for (int intJ = 1; intJ <= intDepth + 1; intJ++) { sbNode.Append(" "); } sbNode.Append(atts[intI].Name + " "); sbNode.Append( atts[intI].NodeType.ToString()); sbNode.Append(": " + atts[intI].Value); lbNodes.Items.Add(sbNode); } } // And recursively walk // the children of this node XmlNode xnodworking; if (xnod.HasChildNodes) { xnodworking = xnod.FirstChild; while (xnodworking != null) { AddChildren( xnodworking, intDepth + 1); xnodworking = xnodworking.NextSibling; } } } }
-
Insert the Main() method to launch the form. Set the form as the startup form for the project.
-
Run the project. Click the button. You'll see a schematic representation of the elements and attributes in the XML file.
Step-by-Step 2.3 uses recursion to visit all the nodes in the XML file. That is, it starts at the document's root node (returned by the DocumentElement property of the XmlDocument object) and visits each child of that node in turn. For each child, it displays the desired information, and then visits each child of that node in turn, and so on.
In addition to the properties used in Step-by-Step 2.3, the XmlDocument class includes a number of other useful members. Table 2.4 lists the most important of these.
Table 2.4 Important Members of the XmlDocument Class
Member |
Type |
Description |
CreateAttribute() |
Method |
Creates an attribute node |
CreateElement() |
Method |
Creates an element node |
CreateNode() |
Method |
Creates an XmlNode object |
DocumentElement |
Property |
Returns the root XmlNode object for this document |
DocumentType |
Property |
Returns the node containing the DTD declaration for this document, if it has one |
ImportNode() |
Method |
Imports a node from another XML document |
Load() |
Method |
Loads an XML document into the XmlDocument object |
LoadXml() |
Method |
Loads the XmlDocument object from a string of XML data |
NodeChanged |
Event |
Occurs after the value of a node has been changed |
NodeChanging |
Event |
Occurs when the value of a node is about to be changed |
NodeInserted |
Event |
Occurs when a new node has been inserted |
NodeInserting |
Event |
Occurs when a new node is about to be inserted |
NodeRemoved |
Event |
Occurs when a node has been removed |
NodeRemoving |
Event |
Occurs when a node is about to be removed |
PreserveWhitespace |
Property |
Returns true if whitespace in the document should be preserved when loading or saving the XML |
Save() |
Method |
Saves the XmlDocument object as a file or stream |
WriteTo() |
Method |
Saves the XmlDocument object to an XmlWriter object |
REVIEW BREAK
The Document Object Model (DOM) is a W3C standard for representing the information contained in an HTML or XML document as a tree of nodes.
The XmlReader class defines an interface for reading XML documents. The XmlTextReader class inherits from the XmlReader class to read XML documents from streams.
The XmlNode object can be used to represent a single node in the DOM.
The XmlDocument object represents an entire XML document.