Sunday, September 7, 2014

Parsing XMLs with SAX Parser

In my previous post (Refer: XML Reading and Writing using Servlets in Java), I used DOM Parser to parse the XML document. But, there is a very big disadvantage of the DOM Parser. It tries to load the whole XML file into the memory. With small XMLs it does not make much of a difference, but with big files, it can affect performance greatly. SAX on the other hand, parses an XML document in a stream fashion. So, it never loads the whole file into the memory and consequently its performance is much better.

Lets consider the following xml file -


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<first>
    <second atName="one">
        <number id="one">1</number>
        <number id="two">2</number>
    </second>
    
    <second atName="two"> 
        <number id="one">1</number>
        <number id="two">2</number>
    </second>
</first>
Now, the SAX parser mainly uses events to handle the data.  Which means the parser fires certain events as it parses through the documents (Source: saxproject.org). The most common events are as follows -

  • Start Document - When the document begin is encountered
  • Start Element - When the open tag for an element is encountered
  • Charecters - This event reads the charecters between two tags
  • End Element - When the end tag for an element is encountered
  • End Document - When the document end is encountered
There are a few more events but these are the most commonly used ones. To use the functionality of these events, we extend the DefaultHandler class. 

public class SaxParser extends DefaultHandler 

This class helps us override the event methods. In my method GetXMLFile(), I have initiated the parsing.


private void GetXMLFile() throws SAXException, ParserConfigurationException, IOException {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        sp.parse("src\\testXML.xml", this);
}

The first step would be to get an instance of the SAXParser from the SAXParserFactory. Then we simply feed the xml file to the parser which parses the file. It launches the following events which I handle in the following way.

The Start Document event -


@Override
public void startDocument() throws SAXException {
        System.out.println("Starting Parsing...");
}

The Start Element event -


@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        StringBuilder sb = new StringBuilder();
        
        if (qName.equals("first")) {
            sb.append(qName).append("\n");
        } else if (qName.equals("second")) {
            sb.append("\t").append(qName).append(" ").append(attributes.getValue("atName")).append("\n");
        } else if (qName.equals("number")) {
            sb.append("\t\t").append(qName).append(" ").append(attributes.getValue("id")).append(" ");
        }
        
        System.out.print(sb.toString());
}

The charecters event -


@Override
public void characters(char ch[], int start, int length) throws SAXException {
    System.out.print(new String(ch, start, length).trim());
}

And the end element event -


@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
        if (qName.equals("first")) {
            System.out.println(qName);
        } else if (qName.equals("number")){
            System.out.println("");
        }
}

The end document event -


@Override
public void endDocument() throws SAXException {
    System.out.println("Parsing Complete...");
}

The events fire according the the content of the xml and finally generates the following output -


Starting Parsing...
first
 second one
  number one 1
  number two 2
 second two
  number one 1
  number two 2
first
Parsing Complete...

Sax parsers are actually very powerful parsers and can be used very efficiently to go through a xml file.

I have attached the complete code just for reference.


package domsaxparser;

import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SaxParser extends DefaultHandler {

    public static void main(String[] args) {
        try {
            SaxParser xmlEditor = new SaxParser();
            xmlEditor.GetXMLFile();
        } catch (SAXException se) {
            se.printStackTrace();
        } catch (ParserConfigurationException pce) {
            pce.printStackTrace();
        } catch (IOException ie) {
            ie.printStackTrace();
        }
    }

    private void GetXMLFile() throws SAXException, ParserConfigurationException, IOException {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        sp.parse("src\\testXML.xml", this);
    }

    @Override
    public void startDocument() throws SAXException {
        System.out.println("Starting Parsing...");
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        StringBuilder sb = new StringBuilder();

        if (qName.equals("first")) {
            sb.append(qName).append("\n");
        } else if (qName.equals("second")) {
            sb.append("\t").append(qName).append(" ").append(attributes.getValue("atName")).append("\n");
        } else if (qName.equals("number")) {
            sb.append("\t\t").append(qName).append(" ").append(attributes.getValue("id")).append(" ");
        }

        System.out.print(sb.toString());
    }

    @Override
    public void characters(char ch[], int start, int length) throws SAXException {
        System.out.print(new String(ch, start, length).trim());
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (qName.equals("first")) {
            System.out.println(qName);
        } else if (qName.equals("number")) {
            System.out.println("");
        }
    }

    @Override
    public void endDocument() throws SAXException {
        System.out.println("Parsing Complete...");
    }

}

Have fun!!

No comments:

Post a Comment