Showing posts with label xmlreader. Show all posts
Showing posts with label xmlreader. Show all posts

Monday, June 27, 2016

Parsing XMLs with DOM Parser

DOM parsers are the simpler of the two parsers, the other being SAX parser. Its is programmetically less complicated but is also less efficient compared to sax. The DOM parser loads the whole document into the main memory and then parses the whole document all at once as opposed to parsing on encountering in SAX parser. The obvious drawback to loading the full file in memoory is that the efficiency of parsing reduces with the increase in size of the document. Not to mention, documents that don't fit in the memory cannot be parsed.

To understand DOM parser, we take an example xml file and parse it using DOM. Lets consider the following xml -

testXML.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<first>
    <second atName="one">
        <number id="one">1</number>
        <number id="two">2</number>
    </second>
    
    <second atName="two"> 
        <number id="one">1</number>
        <number id="two">2</number>
    </second>
</first>

Our goal is to parse this whole document and output the same using DOM parser. Before beginning with the example lets look into some helper classes and basic methods -

DocumentBuilder - It defines the API to generate the DOM document tree from an XML. Its usually created by using the DocumentBuilderFactory.newInstance().

Node - It is an interface which represents a node in the DOM tree.

Attr - This is the interface which represents the attributes of a node.

NamedNodeMap - This represents the list of attributes that a node holds.

In our example, the main() method first generates the DOM tree and the processNode() method traverses this tree printing the nodes as it encounters them.

DomParser.java


public class DomParser {

    public static void main(String[] args) {
        try {
            File file = new File("src/testXML.xml");
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse(file);
            doc.getDocumentElement().normalize();

            String tab = "";

            System.out.println("Staring Parsing...");

            //Process root Node
            Node root = doc.getDocumentElement();
            System.out.println(root.getNodeName());
            processNode(root, "\t" + tab);

            System.out.println("Parsing Complete...");

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void processNode(Node node, String tab) {
        try {
            NodeList children = node.getChildNodes();

            for (int i = 0; i < children.getLength(); i++) {
                Node ele = children.item(i);

                //Printing the node name or the text value in case of a Text node
                if (ele.getNodeName().equals("#text")) {
                    System.out.print(" " + ele.getNodeValue());
                } else {
                    System.out.print(tab + ele.getNodeName());
                }

                //Printing attributes of the current node.
                if (ele.hasAttributes()) {
                    NamedNodeMap attrs = ele.getAttributes();
                    for (int j = 0; j < attrs.getLength(); j++) {
                        Attr attribute = (Attr) attrs.item(j);
                        System.out.print(" " + attribute.getName() + "=" + attribute.getValue());
                    }
                }

                //Process children 
                processNode(ele, "\t" + tab);
            }

        } catch (DOMException e) {
            e.printStackTrace();
        }

    }

}

The text nodes appear with a "#text" in them. This nodes accordingly dealt with. The method processNode()is recursively called as it traverses through the whole tree. The output for the above program is as follows -

Output


Staring Parsing...
first
 
     second atName=one 
          number id=one 1 
          number id=two 2 
     
    
     second atName=two  
          number id=one 1 
          number id=two 2 
     
Parsing Complete...


The DOM parser is not a very efficient parser, but for small documents, it can be very useful.

Sunday, September 7, 2014

Parsing XMLs with SAX Parser

In my previous post (Refer: XML Reading and Writing using Servlets in Java), I used DOM Parser to parse the XML document. But, there is a very big disadvantage of the DOM Parser. It tries to load the whole XML file into the memory. With small XMLs it does not make much of a difference, but with big files, it can affect performance greatly. SAX on the other hand, parses an XML document in a stream fashion. So, it never loads the whole file into the memory and consequently its performance is much better.

Lets consider the following xml file -


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<first>
    <second atName="one">
        <number id="one">1</number>
        <number id="two">2</number>
    </second>
    
    <second atName="two"> 
        <number id="one">1</number>
        <number id="two">2</number>
    </second>
</first>
Now, the SAX parser mainly uses events to handle the data.  Which means the parser fires certain events as it parses through the documents (Source: saxproject.org). The most common events are as follows -

  • Start Document - When the document begin is encountered
  • Start Element - When the open tag for an element is encountered
  • Charecters - This event reads the charecters between two tags
  • End Element - When the end tag for an element is encountered
  • End Document - When the document end is encountered
There are a few more events but these are the most commonly used ones. To use the functionality of these events, we extend the DefaultHandler class. 

public class SaxParser extends DefaultHandler 

This class helps us override the event methods. In my method GetXMLFile(), I have initiated the parsing.


private void GetXMLFile() throws SAXException, ParserConfigurationException, IOException {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        sp.parse("src\\testXML.xml", this);
}

The first step would be to get an instance of the SAXParser from the SAXParserFactory. Then we simply feed the xml file to the parser which parses the file. It launches the following events which I handle in the following way.

The Start Document event -


@Override
public void startDocument() throws SAXException {
        System.out.println("Starting Parsing...");
}

The Start Element event -


@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        StringBuilder sb = new StringBuilder();
        
        if (qName.equals("first")) {
            sb.append(qName).append("\n");
        } else if (qName.equals("second")) {
            sb.append("\t").append(qName).append(" ").append(attributes.getValue("atName")).append("\n");
        } else if (qName.equals("number")) {
            sb.append("\t\t").append(qName).append(" ").append(attributes.getValue("id")).append(" ");
        }
        
        System.out.print(sb.toString());
}

The charecters event -


@Override
public void characters(char ch[], int start, int length) throws SAXException {
    System.out.print(new String(ch, start, length).trim());
}

And the end element event -


@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
        if (qName.equals("first")) {
            System.out.println(qName);
        } else if (qName.equals("number")){
            System.out.println("");
        }
}

The end document event -


@Override
public void endDocument() throws SAXException {
    System.out.println("Parsing Complete...");
}

The events fire according the the content of the xml and finally generates the following output -


Starting Parsing...
first
 second one
  number one 1
  number two 2
 second two
  number one 1
  number two 2
first
Parsing Complete...

Sax parsers are actually very powerful parsers and can be used very efficiently to go through a xml file.

I have attached the complete code just for reference.


package domsaxparser;

import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SaxParser extends DefaultHandler {

    public static void main(String[] args) {
        try {
            SaxParser xmlEditor = new SaxParser();
            xmlEditor.GetXMLFile();
        } catch (SAXException se) {
            se.printStackTrace();
        } catch (ParserConfigurationException pce) {
            pce.printStackTrace();
        } catch (IOException ie) {
            ie.printStackTrace();
        }
    }

    private void GetXMLFile() throws SAXException, ParserConfigurationException, IOException {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        sp.parse("src\\testXML.xml", this);
    }

    @Override
    public void startDocument() throws SAXException {
        System.out.println("Starting Parsing...");
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        StringBuilder sb = new StringBuilder();

        if (qName.equals("first")) {
            sb.append(qName).append("\n");
        } else if (qName.equals("second")) {
            sb.append("\t").append(qName).append(" ").append(attributes.getValue("atName")).append("\n");
        } else if (qName.equals("number")) {
            sb.append("\t\t").append(qName).append(" ").append(attributes.getValue("id")).append(" ");
        }

        System.out.print(sb.toString());
    }

    @Override
    public void characters(char ch[], int start, int length) throws SAXException {
        System.out.print(new String(ch, start, length).trim());
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (qName.equals("first")) {
            System.out.println(qName);
        } else if (qName.equals("number")) {
            System.out.println("");
        }
    }

    @Override
    public void endDocument() throws SAXException {
        System.out.println("Parsing Complete...");
    }

}

Have fun!!