[SoapRMI] [ANN] MXP1 new fast and small parsing engine for XMLPULL

Aleksander Slominski aslom_at_cs.indiana.edu
Tue, 16 Apr 2002 18:24:59 -0500


hi,

i have completely rewritten XPP3 paring engine for XMLPULL that is
now called MXP1 and is available from:

  http://www.extreme.indiana.edu/xgws/xsoap/xpp/mxp1/

the size of complete MXP1 parser (without factory but can be used directly) is less than 20 KB.

to estimate how MXP1 is performing i have used SAX benchmark
from http://piccolo.sourceforge.net/bench.html that i have
modified to run tests for XMLPULL and added also following features:
* modified tests to parse from memory and not from file to eliminate IO interference
* each test actual visits every element and its content is added to StringBuffer
 (to allow checking real time to visit every node!)
* added ability to check overhead of creation of parser instances instead of reusing

you can get modified tests from http://www.extreme.indiana.edu/~aslom/xpp_sax2bench/

actual test results are at: http://www.extreme.indiana.edu/~aslom/xpp_sax2bench/results.html

in all but two tests MXP1 is the fastest parser by about 5-20% than second fastest Piccolo
but MXP1 is slower than Piccolo for 'Mostly text' and 'Random XML' as MXP1 will report
text always combined as one event. that means that in application there is really no need ot use
StringBuffer to collect element content. i have kept string buffer in all tests for symmetry but
removing it will speed up test to the same level as Piccolo,
for example when USE_SB flag is false in XmlPullTest:


     C:\Forge\homepage\xpp_sax2bench>java -cp classes;parsers\xpp3_mxp1_beta1.jar XmlPullTest
     data\rand_100.xml  2000 ns_on
     using factory class org.xmlpull.mxp1.MXParserFactory
     namespaces: true
     reuse parser instances: true
     using parser class org.xmlpull.mxp1.MXParserCachingStrings
     Warming up the parser....
     count=1220
     Parsing data\rand_100.xml 2000 times by XmlPullTest
     Elapsed time: 7801ms
     Average parse time: 3.9005ms
     <benchmark elapsed="7801" iterations="2000"/>

and changing the flag to true will slow down parser by about 10%:


     C:\Forge\homepage\xpp_sax2bench>java -cp classes;parsers\xpp3_mxp1_beta1.jar XmlPullTest
     data\rand_100.xml  2000 ns_on
     using factory class org.xmlpull.mxp1.MXParserFactory
     namespaces: true
     reuse parser instances: true
     using parser class org.xmlpull.mxp1.MXParserCachingStrings
     Warming up the parser....
     count=1093
     Parsing data\rand_100.xml 2000 times by XmlPullTest
     Elapsed time: 8753ms
     Average parse time: 4.3765ms
     <benchmark elapsed="8753" iterations="2000"/>

and here is what Picoolo 0.8 is reporting:


     C:\Forge\homepage\xpp_sax2bench>java -cp classes;parsers/Piccolo-0.8.jar
     -Dorg.xml.sax.driver=com.bluecast.xml.Piccolo SAX2Test data\rand_100.xml 2000 ns_on
     using parser class com.bluecast.xml.Piccolo
     namespaces: true
     reuse parser instances: true
     Warming up the parser....
     count=1174
     Parsing data\rand_100.xml 2000 times by SAX2Test
     Elapsed time: 7511ms
     Average parse time: 3.7555ms
     <benchmark elapsed="7511" iterations="2000"/>

comments about MXP1 and test are welcome.

thanks,

alek