[SoapRMI] Re: xml parsers performance
Aleksander Slominski
aslom_at_cs.indiana.edu
Thu, 07 Mar 2002 23:39:04 -0500
Kapil Talwar wrote:
> I was wondering if you had any overall statistics on how much of CPU
> time goes into processing the overhead for a typical distributed
> application that commuicates via XML/SOAP.
we do not have hard data except for that of our SC00 paper available at
http://www.extreme.indiana.edu/xgws/index.html#papers
however for small XML messages in RPC the network latencies will dominate
and there should be no big difference between SOAP and other binary protocols
(if SOAP impl _is_ optimized).
> I've heard a lot of people
> mentioning that XML parsing is itself CPU intensive but, typically, how
> much of cpu time is spent in XML parsing in the overall scheme of things?
in SC00 paper we discussed that really important to achieve maximum performance
is to overlap cpu intensive xml parsing with network operations and that could
easily speed up considerably typical applications. see:
http://www.extreme.indiana.edu/xgws/papers/sc00_paper/node12.html
http://www.extreme.indiana.edu/xgws/papers/sc00_paper/node13.html
http://www.extreme.indiana.edu/xgws/papers/sc00_paper/figures/gant.gif
if you do not do overlapping then for typical RPC operation half of time in client
may be spent waiting for data to be received, processed and sent back from server
when the critical overlap between serialziation/deserialization is not exploited
(optimally a network pipe between client and server is kept full on one side
sending/serializing data and on the other end receiving/deserializing).
we have incorporated some of those ideas in XSOAP - for example thanks to
pull parser we can process data as soon as it arrives and now we look on
how to use HTTP chunking to start sending the serialized XML as soon as possible.
i am not underestimating importance of XML parsing but there are those factors that
needs to be considered - the weakest link in the chain (the slowest part in the
system)
is the most important to fix and not always it will be XML parsing ...
also in some situations other factors than performance may be more important
for example in XSOAP the memory footprint is pretty low as never any DOM
like representation for whole XML input message is created but instead
XML Pull Parser is used to deserialize incoming XML on the fly. if the size
of memory is a consideration like on busy server with multiple _concurrent_
connections ability to fit in memory limits may be more important for performance
than real fast SOAP processor that is causing system to swap ...
there are some special considerations to look on when performance optimized system
is written, for example: as a side effect of streaming if the error in the incoming
XML
is detected the already deserialized objects may have been created that will be
discarded
so there should be no side effects (ot there should be undo defined ...) and this
is
an inherent risk when doing streaming.
we do plan to do updated performance and interoperability tests for XSOAP java
and c++ soon and we will post results here (and apply necessary fixes to XSOAP).
thanks,
alek