Your Daily Source for Apache News and Information |
Breaking News | Preferences | Contribute | Triggers | Link Us | Search | About |
\n'); } if ( plugin ) { document.write(' '); } else if (!(navigator.appName && navigator.appName.indexOf("Netscape")>=0 && navigator.appVersion.indexOf("2.")>=0)) { document.write(''); } //--> |
|
|
internet.com Internet News Internet Investing Internet Technology Windows Internet Tech. Linux/Open Source Web Developer ECommerce/Marketing ISP Resources ASP Resources Wireless Internet Downloads Internet Resources Internet Lists International EarthWeb Career Resources Search internet.com Advertising Info Corporate Info |
A longer version of this appeared on WebReference. by The volume on the Web is forecasted to more than triple over the next three years and the category expecting the fastest growth is data. Data and content will remain the largest percentage of Web traffic and the majority of this information is dynamic so it does not lend itself to conventional caching technologies. Issues range from Business to Consumer response and order confirmation times, to the time required to deliver business information to a road warrior using a wireless device, to the download time for rich media such as music or video. Not surprisingly, the number one complaint among Web users is lack of speed. That's where compression can help, by using mod_gzip. The Solution: CompressionThe idea is to compress data being sent out from your Web server, and have the browser decompress this data on the fly, thus reducing the amount of data sent and increasing the page display speed. There are two ways to compress data coming from a Web server, dynamically, and pre-compressed. Dynamic Content Acceleration compresses the data transmission data on the fly (useful for e-commerce apps, database-driven sites, etc.). Pre-compressed text based data is generated beforehand and stored on the server (.html.gz files etc). The goal is to send less data. To do this the data must be analyzed and compressed in real time and be decompressed with no user interaction at the other end. Since smaller amounts of data (less packets) are being sent, they consume less bandwidth and arrive significantly faster. The network acceleration solutions need to be focused on the formats utilized for data and content including HTML, XML, SQL, Java, WML and all other text based languages. Both types of compression utilize HTTP compression and compress HTML files fully three times smaller. To get an idea of the improvement in speed involved, here's a live demonstration: Real time Web server content acceleration test:
Why Compress HTML?HTML is used in most Web pages, and forms the framework where the rest of the page appears (images, objects, etc). Unlike images (GIF, JPEG, PNG) which are already compressed, HTML is just ASCII text, which is highly compressible. Compressing HTML can have a major impact on the performance of HTTP especially as PPP lines are being filled up with data and the only way to obtain higher performance is to reduce the number of bytes transmitted. A compressed HTML page appears to pop onto the screen, especially over slower modems. The Last Mile ProblemThe Web is as strong as its weakest link. This has and always will be the last mile to the consumer's desktop. Even with the rapid growth of residential broadband solutions the growth of narrowband users and data far exceeds its limited reach. According to Jakob Nielsen he expects the standard data transmission speed to remain at 56K until at least 2003 so there is a distinct need to do something to reduce download times. Caching data has its benefits, but only content reduction can make a significant difference in response time. It's always going to be faster to download a smaller file than a larger one. Is Compression Built into the Browser?Yes. Most newer browsers since 1998/1999 have been equipped to support the HTTP 1.1 standard known as "content-encoding." Essentially the browser indicates to the server that it can accept "content encoding" and if the server is capable it will then compress the data and transmit it. The browser decompresses it and then renders the page. Only HTTP 1.1 compliant clients request compressed files. Clients that are not HTTP 1.1 compliant request and receive the files un-compressed, thereby not benefiting from the improved download times that HTTP 1.1 compliant clients offer. Internet Explorer versions 4 and above, Netscape 4.5 and above, Windows Explorer, and My Computer are all HTTP 1.1 compliant clients by default. To test your browser, click on this link (works if you are outside a proxy server): http://12.17.228.52:7000/And you'll get a chart like this: To verify that Internet Explorer is configured to use the HTTP 1.1 protocol:
IE4/5 Setting HTTP 1.1 What is IETF Content-Encoding (or HTTP Compression)?In a nutshell... it is simply a publicly defined way to compress HTTP content being transferred from Web Servers down to Browsers using nothing more than public domain compression algorithms that are freely available. "Content-Encoding" and "Transfer-Encoding" are both clearly defined in the public IETF Internet RFC's that govern the development and improvement of the HTTP protocol which is the "language" of the World Wide Web. "Content-Encoding" applies to methods of encoding and/or compression that have been already applied to documents before they are requested. This is also known as "pre-compressing pages." The concept never really caught on because of the complex file maintenance burden it represents and there are few Internet sites that use pre-compressed pages of any description. "Transfer-Encoding" applies to methods of encoding and/or compression used DURING the actual transmission of the data itself. In modern practice, however, the two are now one and the same. Since most HTTP content from major online sites is now dynamically generated, the line has blurred between what is happening before a document is requested and while it is being transmitted. Essentially, a dynamically generated HTML page doesn't even exist until someone asks for it. The original concept of all pages being "static" and already present on the disk has quickly become an 'older' concept and the originally well defined separation between "Content-Encoding" and "Transfer-Encoding" has simply turned into a rather pale shade of gray. Unfortunately, the ability for any modern Web or Proxy Server to supply "Transfer-Encoding" in the form of compression is even less available than the spotty support for "Content-Encoding." Suffice it to say that regardless of the two different publicly defined "Encoding" specifications, if the goal is to compress the requested content (static or dynamic) it really doesn't matter which of the two publicly defined "Encoding" methods is used... the result is still the same. The user receives far fewer bytes than normal and everything happens much faster on the client side. The publicly defined exchange goes like this....
Most popular Web Servers are still unable to do this final step.
The original designers of the HTTP protocol really did not foresee the current reality with so many people using the protocol that every single byte would count. The heavy use of pre-compressed graphics formats such as .GIF and the relative difficulty to further reduce the graphics content makes it even more important that all other exchange formats be optimized as much as possible. The same designers also did not foresee that most HTTP content from major online vendors would be generated dynamically and so there really is no real chance for there to ever be a "static" compressed version of the requested document(s). Public IETF Content-Encoding is still not a "complete" specification for the reduction of Internet content but it does work and the performance benefits achieved by using it are both obvious and dramatic. What is GZIP?It's a lossless compressed data format. The deflation algorithm used by GZIP (also zip and zlib) is an open-source, patent-free variation of LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in the input data. The second occurrence of a string is replaced by a pointer to the previous string, in the form of a pair (distance, length), distances are limited to 32K bytes, and lengths are limited to 258 bytes. When a string does not occur anywhere in the previous 32K bytes, it is emitted as a sequence of literal bytes. (In this description, "string" must be taken as an arbitrary sequence of bytes, and is not restricted to printable characters.) Technical OverviewHTML/XML/JavaScript/text compression: Does it make sense? The short answer is "only if it can get there quicker." In 99% of all cases it makes sense to compress the data. However there are several problems that need to be solved to enable seamless transmission from the server to the consumer.
Let's create a simple scenario. An HTML file which contains a large music listing in the form of a table. http://12.17.228.53:8080/music.htm This file is 679,188 bytes in length.Let's track this download over a 28K modem and then compare the results before and after compression. The theoretical throughput over a 28K modem is 3,600 bytes per second. Reality is more like 2,400 bytes per second but for the sake of this article we will work at the theoretical maximum. If there was no modem compression then the file would download in 188.66 seconds. On the average with modem compression running we can expect a download time of about 90 seconds which indicates about a 2:1 compression factor. The total number of packets transmitted from modem to modem effectively "halved" the file size. But note that the server still had to keep open the TCP/IP sub system to "send" all the bytes to the modem for transmission. What happens if we can compress the data prior to transmission from the server. The file is 679,188 bytes in length. If we can compress it using standard techniques (which are not optimized for HTML) then we can expect to see the file be compressed down to 48,951 bytes. This is a 92.79% compression factor. We are now transmitting only 48,951 bytes (plus some header information which should also be compressed but that's another story). Modem compression no longer plays a factor because the data is already compressed. Where are the performance improvements?
Compression clearly makes sense as long as it's seamless and doesn't kill server performance. What else remains to be done?A lot! Better algorithms need to be invented that compress the data stream more efficiently than gzip. Remember gzip was designed before HTML came along. Any technique which adds a new compression algorithm will require a thin client to decode and possibly tunneling techniques to enable it "firewall friendly." To sum up we need:
Further Reading
Related Stories: |
|
|
About Triggers | Media Kit | Security | Triggers | Login |
All times are recorded in UTC. Linux is a trademark of Linus Torvalds. Powered by Linux 2.4, Apache 1.3, and PHP 4 Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy. |