A Comparison of Alternative Encoding Mechanisms for Web Services
Min Cai, Shahram Ghandeharizadeh, Rolfe Schmidt, Saihong Song
Computer Science Department
University of Southern California
Los Angeles, California 90089-0781
{mincai,shahram,rrs,saihong}@usc.edu
Abstract
A web service is a
modular application that is published, advertised, discovered, and invoked
across a network, i.e., an Intranet or the Internet. It is based on a “software-as-services” model and may participate
as a component of other web services and applications. Binary and XML are two popular
encoding/decoding mechanisms for network messages. Binary encoding is used when performance is critical and XML encoding
is employed when interoperability with other web services and applications is
essential. With each, one may employ
compression to reduce message size prior to its transmission across the
network. These decisions have a
significant impact on response time and throughput. This paper reports on our experiences with a decision support
benchmark, TPC-H, using these alternatives on different hardware
platforms. We focus on queries and make
the following observations. First, when
the message size is large compared to the physical memory (RAM), XML encoding
diminishes response time because it inflates message size, causing the
operating system to write either a part or the entire message to disk. Compression reduces the message size and
enhances the throughput of a shared network.
With XML, we present numbers from XMill, a compression technique that
employs XML semantics. For queries that
produce a lot of XML data (more than a megabyte), XMill compressed XML messages
are smaller than the Zip compressed Binary messages. While this improves throughput of a networked environment with a
fixed bandwidth, XMill compressed messages are at times twice slower than Zip
compressed Binary messages. The
processor speed has a significant impact on the observed response times.
1. Introduction
Many organizations envision web services as an enabling component of Internet-scale computing. A web service is either a computation or an information service with a published interface. Its essence is a remote procedure call (RPC) that consumes and processes some input data in order to produce output data. It is a concept that renders web applications extensible: By identifying each component of a web application as a web service, an organization can combine these web services with others to rapidly develop new web applications. The new web application may consist of web services that span the boundaries of several (if not many) organizations.
Serialization is an integral component of both web services and network-centric database applications. It is the process of converting objects and their state information into a form appropriate for transmission across the network (and storage in a flat file). With a network connection, the receiving process consumes arriving messages, re-assembles the transmitted objects, and processes them. This process is termed de-serialization. Two common encoding mechanisms are XML and Binary. XML produces human readable text and is employed when interoperability with other web services and applications is essential. Binary produces streams that are compact and fast to parse, but not human-readable.
The focus of this study is to quantify the performance tradeoff associated with these two alternatives for a decision support benchmark. We do not report on the query execution times and have eliminated it from the presented response times. In addition, we analyse the role of compression in reducing the number of transmitted bytes with each encoding mechanism. With XML, we analyse two compression schemes: Zip/GZip library and XMill [Liefke & Suciu]. Both employ techniques based on Lempel-Ziv algorithm [Ziv & Lempel]. Our results demonstrate that without compression, the XML encoder results in message sizes that are at times five times larger than their Binary representation. With Zip, compressed XML messages are at most twice the size of their compressed Binary representation. With XMill, compressed XML messages are at times smaller than their Zip compressed Binary representation. This trend holds true for large messages (more than one megabyte). Otherwise, Zip compressed Binary messages are smaller.
We also investigated two alternative protocols, namely, TCP/IP [Postel] and HTTP [Fielding et. al.]. Our experiments reveal that these two protocols provide comparable performance. This is because the Hypertext Transfer Protocol (HTTP) is an application-level protocol that employs a reliable transport protocol. In our experimental configuration, it employs TCP/IP connections. HTTP does transmit more bytes based on its request-response paradigm [Fielding et. al.]. However, this is negligible when compared with the message size. We do not investigate the role of HTTP intermediary, i.e., proxy, gateway, and tunnel. To simplify discussion, we eliminate HTTP from further consideration.
Based on the obtained results, we develop analytical models that compute the throughput of the system when the network is a shared resource. An analysis of these models reveals compression can significantly enhance system throughput when the network bandwidth is limitted. The rest of this paper is organized as follows. Section 2 provides an overview of our experimental environment and the obtained results. Section 3 describes analytical models that compute system throughput using network as a bottleneck. We offer brief conclusions and future research directions in Section 4.

Figure
1: Bytes from Server to Client
2. Performance Evaluation
We quantified the performance of the alternative transmission protocols using TPC-H benchmark because it is a standard that provides documented queries and data sets. This enables others to re-create our experiments. TPC-H includes both retrieval and refresh queries. The refresh commands generate large requests and small responses. The retrieval queries offer a mix of commands that generate either (a) large requests and small responses, and (b) large requests and large responses. This motivated us to focus on retrieval queries and ignore refresh queries from further consideration. We report on 21 out of 22 queries because we could not implement query 15 in a timely manner.
Our hardware platform consists of two PCs. One is a server and the other is a client. (We analyse results with PCs configured with three different processor speeds: 450 MHz, 1 GHz, and 2 GHz.) The client and server were connected using a LINKSYS Ethernet (10/100 megabit per second) switch. Each machine is configured with a 20-gigabyte internal disk and 512 megabytes of memory (unless specified otherwise). The server is configured with Microsoft Windows 2000 Server, SQL Server 2000, and Visual Studio .NET Beta 2 release. The client is configured with Microsoft Windows 2000 Professional and Visual Studio .NET Beta 2 release. The server implements a web service that accepts one TPC-H retrieval query, processes the query using ADO.NET, and returns the obtained results back to the client. The client employs a TPC-H provided component that generates SQL retrieval query strings, invokes the server’s web service, and receives the obtained results.
The communication between the client and server uses the .NET remoting framework. For transmission, we use the TCP and HTTP channels provided with the .NET framework. For message formatting we use the SOAP [Box et. al.] and Binary formatters provided with the .NET framework. We extended this framework with two new formatters: a) compression using Zip/GZip library written entirely in C#, and b) XMill compression scheme [Liefke & Suciu]. XMill employs zlib, the library function for gzip. This framework configures channels and encoders without requiring modification of the application code. When performing our experiments, our system reconfigures at runtime to repeat a query workload while communicating over different channels with different encoders. All of our experiments were conducted in a single user mode with no background load on the underlying hardware. The client was configured to invoke the web service in an asynchronous manner.
We used a 1 Gigabyte TPC-H database for evaluation purposes. The presented results ignore the query execution time. They pertain to only the encoding, transmission, and decoding times for processing a query.

Figure
2: Comparison of XML and Binary message
sizes with and without compression.
2.1 Obtained results
Figure 1 shows the number of bytes transmitted from the server to the client on behalf of each 21 TPC-H retrieval queries (except query 15) using alternative protocols. This figure shows that different queries produce a different amount of data. With the binary formatter, the message sizes vary from a few hundred bytes to a few megabytes. With the XML formatter, the message varies from a few hundred bytes to tens of megabytes. The server produces the largest volume of data for Query 10, approximately 25 megabytes of data with XML.
A compression scheme such as Zip can be applied to both the Binary and XML formatters. XMill is a compression technique designed to capitalize on the XML semantics. These compression techniques reduce the message size. Figure 2 provides a comparison of these alternatives using Zip compressed Binary (Zip-Binary) messages as a yardstick. The y-axis of this figure shows the ratio in size between two techniques. For example, with Query 10, Binary messages are more than 2.7 times larger than their Zip-Binary counterparts. A size ratio less than 1.0 implies a smaller message size relative to Zip-Binary. XMill compressed XML (XMill-XML) produce smaller messages than Zip-Binary, i.e., 0.84 times the size with Query 2. In our experiments, in the worst-case scenario, Zip compressed XML (Zip-XML) messages are twice the sizes of Zip-Binary messages, see Figure 2. In the best case, they are approximately the same size. This is because a loss-less compression technique can effectively compress the repeated XML tags. To illustrate, Figure 3 shows the compression factor for each compression scheme and format. It show Zip-Binary, Zip-XML, and XMill-XML, i.e., XMill cannot be applied to Binary because Binary provides no semantic about its data. Note that compression factor is higher with XML. With Binary, compression factor ranges from 1.4 to 5.5. With XML, the Zip compression factor ranges from 2.1 to 19.6. With XMill, the compression factor is as high as 26 with query 16.

Figure 3: Impact of compression on each encoding scheme.
Figure 2 shows that
XMill-XML messages are at times smaller than Zip-Binary messages, e.g., see
query 2. This is because XMill groups data items with related meaning into containers and
compresses each independently [Liefke & Suciu]. This column-wise compression is generally better than row-wise
compression [Iyer and Wilhite]. At the same time, Figures 2 and 3 show that
XMill is not always superior to Zip compression technique for XML messages,
e.g., queries 1, 4, 5, 6, 7, 8, 12, 13, 14, 17, 18, 19 and 22. Generally speaking, when compared with Zip,
XMill is more effective with large messages.
The aforementioned queries transmit fewer than 9000 bytes. With the remaining queries that produce XML
messages that are tens of thousands of bytes in size, XMill outperforms
Zip.
Table 1 shows the
response time of TPC-H Query 10 with the alternative formatters with three different processor configurations: a) 450 MHz, b) 1 GHz, and c) 2 GHz. In each case, we used two identical PCs with
the same processor speed, one as the client and the other as the server. Each machine was configured with 512
megabytes of memory. The network
configuration was identical in each case; see the beginning of Section 2. In
these experiments, Binary provides the fastest response times. Zip-Binary is slower because it includes the
overhead of compressing the message at the transmitter and uncompressing it at
the receiver. XMill-XML provides the
worst response time. With the 2 GHz
configuration is almost twice slower than Zip-Binary. Even though XML is more data intensive than XMill-XML and
Zip-XML, it offers a better response time because a) it does not incur the
overhead of compression, and b) network is not a bottleneck resource.
|
|
450 MHz |
1 GHz |
2 GHz |
|||
|
Binary |
99,352.86 |
|
68,758.38 |
|
35,228.65 |
|
|
Zip-Binary |
121,845.54 |
|
71,094.68 |
|
37,889.58 |
|
|
XML |
165,754.20 |
|
93,419.93 |
|
64,950.00 |
|
|
Zip-XML |
210,949.00 |
|
96,041.87 |
|
65,598.96 |
|
|
XMill-XML |
220,843.56 |
|
120,955.14 |
|
70,009.38 |
|
Table 1:
Response time (millisecond) of alternative formatters for TPC-H Query
10.
With the 450 MHz configuration, we experimented with two different
memory configurations: 128 MB and 512 MB.
When memory is scarce (128 MB), the performance of XML formatter
diminishes for those queries that transmit a large volume of data, e.g., query
10. This is because XML expands the
size of a message, forcing the operating system to write a part of the message
to disk, increasing response time. In
these cases, XMill-XML and Zip-XML result in a better response time when
compared with uncompressed XML. We speculate
the following as an explanation for this observation. The Windows 2000 operating system copies messages from the user
space to the kernel space for network transmission. Once a compression technique is applies to the message at the
user space, the likelihood of the operating system exhausting physical memory
is minimized, improving observed response time.

Figure 4: Speedup of each technique as a function of processor (clock) speed
Figure 4 shows the speedup observed for each formatter as a function of the processor speed. In each case, the speedup is sub-linear. This is because the memory’s clock speed is the same for all configurations. This is important because query 10 requires the system to read and write a large amount of data for encoding, compression, transmission, decompression and decoding.
3. Analytical Models of Transmission to Estimate Throughput
The results of Section 2 demonstrate the impact of alternative formatters on the number of transmitted bytes per TPC-H query. This can quickly dominate and determine the throughput of the environment in realistic Internet deployments when network bandwidth is limitted. Here we present an analytical model to compute upper bounds on the throughput of a system as a function of fixed bandwidth between a server and multiple clients. Using these models and the experimental results of Section 2, we quantify the throughput of each formatter with different network characteristics. We start with a simple model of throughput. Next, we use a Markov process to model transmission failures. Finally, we apply these models with different system parameters to compare the alternative encoders.
3.1 A Simple Model
In the following discussion, assume a fixed application and communication channel. Moreover, lets assume the average request message size is Q megabits (Mb) and the average response message size is A Mb. The server component resides on a machine that is linked to the network with a bandwidth of B Mb/sec. We use T to denote the system throughput. Its upper bound is defined as:
T £ B ¸ (Q + A)
No matter what improvements are made to the server or the transmission protocol, this theoretical upper bound cannot be surpassed.
3.2 Transmission Failures
Our techniques employ TCP transmission protocol. This reliable protocol may retransmit packets in order to compensate for packet loss, resulting in increased traffic on the underlying shared network. We use a Markov process to model TCP packet transmission. This model computes the expected number of bits to transmit a message based on failure rates, packet sizes, and available bandwidth. Using this expectation, we derive upper bounds on throughput.
We start by explaining our model for environments that transmit a single packet of size P. Next, we extend this model to a message that consists of N packets. There are three basic states for transmitting a single packet:
The following state transition diagram shows these different states assuming: a) every packet has a failure probability l that is independent of its size, and b) the ACK message has size e bits.

Where each transition is labeled with the probability of the transition and the number of bits transmitted during the transmission. Given this diagram we can compute P*, the expected number of bits transmitted over the network before successfully completing the transmission. If no failures occur (with probability of (1- l)2) then a total of P + e bits are transmitted. When there is a failure, the system restarts with state t. This is described as:
P* = (1- l)2 (P + e) + l(P + P*) + l(1- l)( P + e + P*)
Which can be reduced to:
P* =
P ¸ (1- l)2 + e ¸ (1-l)
The extension of this to a message that consists of N packets is trivial. We represent this as N “transmitted” states t0, t1, t2, …, tN, and “received” states r0, r1, r2, …, rN. After successfully receiving ACK for the ith packet, the system moves to state t(i+1): transmitted the (i + 1)th packet. The new diagram is shown in Figure 4.

Figure 4: The state transition diagram for the message transmission Markov chain
Note that once we reach state ti, we will never visit state tj for any j < i, motivating the following two propositions:
Proposition 1: The expected number of bits sent over the network to transmit a message of size M with failure rate l and packet size P is:
M* = M[1 ¸(1- l)2 + e ¸(P-Pl)]
With this analysis, we employ our simple model to estimate the expected throughput as a function of the failure rate, packet size, and bandwidth:
T(l, P, B) = (1- l) B ¸ [(Q + A) [1¸(1- l) + e ¸P]]
In fact we can say more about this estimation, see Proposition 2.
Proposition 2: For a workload of W queries, T(l, P, B) is a good estimate for the expected throughput when average total message size is M, where M = Q + A. In particular:
T(l, P, B) £ Expected Throughput £ T(l, P, B) + O(1 ¸( WM2))
Proof: First define m to be the random variable representing the total number of bytes transmitted to finish the workload. Using Proposition 1, E[m] = WM* . Now note that we always have m > 0, and the function F(m) = 1/m is convex in this range. This allows us to use Jensen’s inequality [Folland] to state:
Expected Throughput = E[B ¸ m]
= B E[F(m)]
³ B F(E[m])
= T(l, P, B)
Proving that T is a lower bound. To prove the upper bound, we use the Taylor expansion of F around the point WM*. Denoting the minimum number of bytes that must be transmitted to complete the workload by mmin we can find a function a(l, P) so that the following holds when x > mmin
F(x) £ 1 ¸ WM*
- (x - WM*) ¸ (WM*)2
+ a(l, P) (x - WM*)2 ¸ (WM*)3
Now write
Expected Throughput = B E[F(m)] = B SF(i)Pr[m = i]
£ B ¸ (WM*) - S (i – WM*) ¸ (WM*2)Pr[m = i] + a(l, P) S (I - WM*)2 ¸ (WM*)3 Pr[m = i]
£ T(l, P, B) + a(l, P) BW Var[m] ¸ (WM*)3
= T(l, P, B) + a(l, P) BW2M Var[p] ¸ (WM*)3
= T(l, P, B) + O(1 ¸( WM2))
Where p is a random variable denoting the number of bytes needed to transmit a single packet. This completes the proof of proposition 2.
3.3 Implications of the Model
This formula has several intuitive implications, and can be used to quantify our basic thesis that when bandwidth is limited, an encoding mechanism that transmits fewer bytes results in a higher throughput. Inspection of the formula for T(l, P, B) leads to the following basic observation: The throughput bound increases as a function of bandwidth, larger packet size, and reduced packet loss rate.

Figure 5: Throughput estimates as a function of server bandwidth for a packet loss rate of 0.01%.
We now use these throughput estimates to study the data reported in Section 2. Figures 5-7 plot throughput estimates on the TPC-H query workload using XML, Binary, ZIP-Binary, Zip-XML, XMill-XML. For these figures, we used the average message sizes (request plus response) for the 21 TPC-H queries. These are as follows: for the XML formatter is 1.65 MB, for the Binary formatter is 0.54 MB, for the Zip-XML formatter is 0.22 MB, for the Zip-Binary is 0.21 MB, and for the XMill-XML is 0.17 MB. These plots serve two purposes: First, they show how the throughput bound varies over realistic choices of parameter values. Second, they quantify the benefit of XMill-XML format in a variety of realistic settings.
Figure 5 shows the theoretical upper bounds on throughput for a workload of random TPC-H queries as a function of bandwidth to the server. The bandwidth varies from 56Kbps to 1Gbps, and both x-axes and y-axes use a log scale. Because the average message size is the same for TCP and HTTP channels, only the TCP data are displayed. This figure assumes each packet is eight kilobytes, P = 8K, and each acknowledgement packet is one kilobyte, e = 1K. In Figure 5, XMill-XML encoder provides substantial performance improvement: its throughput is 270 times higher than that of the binary channel (and 800 times higher than the XML encoder).

Figure 6: Throughput estimates as a function of packet loss rate for 10 Mbps link.
Figure 6 shows the throughput as a function of the packet loss rate when the server’s network bandwidth is limited to 10 Mbps. Once again, XMill-XML provides a higher throughput because it transmits fewer bytes, i.e., packets. This figure also shows that while performance improves with increased reliability, the improvement is marginal with loss rates less than 10%. For example, as we reduce the packet loss rate