Parsing XML with DOM

The movies file from last week can be used as an example for this week's exercises. (See the answers to last week's exercises.) The file should be saved as "movies.xml". In order to read the file, a new instance of DOMDocument is created. The different methods of DOMDocument can then be used to process the file. A complete list of the available methods can be found in the PHP manual. The example below shows some of the methods being used.

<?php
error_reporting(E_ALL);
$doc = new DOMDocument();
$doc->load("movies.xml");

echo "If the document is valid, it prints a 1 here:";
echo $doc->validate();
echo "<p>";

echo "The whole document:<br>";
echo $doc->saveXML();  
echo "<p>";

echo "Document type: ";  
echo $doc->doctype->name;
echo "<p>";

echo "The top element: ";
$topElem = $doc->documentElement;
echo $topElem->tagName;
echo "<p>";

foreach ($topElem->childNodes AS $item) {
  if ($item->nodeType == 1) {
     echo $item->tagName. " : ";
     foreach ($item->childNodes AS $bottomItem) {
        if ($bottomItem->nodeType == 1) {
          echo $bottomItem->tagName ." = ". $bottomItem->nodeValue ."; ";
        }
     }
     echo "<br/>";
  }
}
?> 

Notes:
1) Space characters in an XML document can lead to the creation of empty text elements. The if statements (nodeType ==1) above make sure that the nodes are element nodes.

2) It is a good idea to turn error reporting on as a debugging help. If you want to allow for undefined variables use error_reporting(E_ALL ^ E_NOTICE); instead. Once you are finished with a script, the error_reporting line can be commented out.

Exercises

Note: in order to see the XML output properly, you may need to use "view source" in your browser for the exercises.

1) Change the program so that instead of using "childNodes" it uses "nextSibling" for looping through the nodes.

2) Quite often it is not necessary to loop through the whole document in order to find some information. "DOMDocument::getElementsByTagName" returns a list of elements with that tag name. Print all the titles of the movies by looping through the nodes which have "title" as their tag name and printing their "textContent".

3) Similar to the previous exercise, find the movie from the year "1998" and print the textContent of its parentNode.

4) Read the contents of the XML file into a data structure that contains the movies and their titles, actor and years. (For simplicity, store only the first actor of each movie.) Hints: create an array of movies: "$movies = array();". Create a temporary array for each child node. When processing the bottomItems, add the tagName/nodeValue pairs to the temporary array: "$temp[$bottomItem->tagName] = $bottomItem->nodeValue;" Use "print_r($movies);" to verify the contents of the array.

5) Optional exercise: DOM can also be used to create XML documents. Create a movie element with author, title and year and add it the movies XML file.

Parsing XML with SAX

SAX-based parsing is event-driven. In PHP it is available via the xml parser. (See the PHP manual for details.) The following example prints tag names, attributes "ID", and content data.
<?php
error_reporting(E_ALL ^ E_NOTICE);
$file = "movies.xml";
    
function startElement($parser, $name, $attrs) { 
    global $depth;
    for ($i = 0; $i < $depth; $i++) {
        echo "&nbsp;";
    }
    echo "$name: {$attrs["ID"]}";
    $depth++;
}
 
function endElement($parser, $name) { 
    global $depth;
    $depth--;
    echo "<p>";
} 

function dataHandler($parser,$data) {
    echo "$data";
 
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "dataHandler");
$fp = fopen($file, "r");
    
while ($data = fread($fp, 4096)) {
     xml_parse($xml_parser, $data, feof($fp));
}
xml_parser_free($xml_parser);
?>
The example below parses an XML file into a data structure. But this isn't the same sort of data structure as one would use with DOM.
<?php
error_reporting(E_ALL);
$filecontent = implode('', file('movies.xml'));
$p = xml_parser_create();
xml_parse_into_struct($p, $filecontent, $vals, $index);
xml_parser_free($p);
echo "<pre>";
echo "Index array\n";
print_r($index);
echo "\nVals array\n";
print_r($vals);
echo "</pre>";
?>

Exercises

6) Try the first example. Change the code so that tag names are printed in italics and the ID attributes are printed in bold face.

7) Print only the titles of the movies. Hint: use a global variable that is shared between the startElement and the handler function. In the startElement this function is set to "true" if it is a title tag, "false" otherwise.

8) Try the second example above. Using the $vals array, print only the movie titles.

9) Try the SimpleXML below. It converts from XML to an object and back.

<?php
if (file_exists('movies.xml')) {
$xml = simplexml_load_file('movies.xml');
print_r($xml);
print $xml->asXML();
}
?>