Apache Today - Filtering I/O in Apache 2.0

Your Daily Source for Apache News and Information

Breaking News

Preferences

Contribute

Triggers

Link Us

Search

About

The Jakarta Project
Apache-Related Projects
Apache-Perl Integration Project
PHP Server Side Scripting
The Apache Software Foundation
The Apache FAQ
The Java Apache Project
ApacheCon
Apache XML Project
Apache Module Registry
Apache Project


	internet.com Internet News Internet Investing Internet Technology Windows Internet Tech. Linux/Open Source Web Developer ECommerce/Marketing ISP Resources ASP Resources Wireless Internet Downloads Internet Resources Internet Lists International EarthWeb Career Resources Search internet.com Advertising Info Corporate Info

Filtering I/O in Apache 2.0
Sep 20, 2000, 16 :26 UTC (0 Talkback[s]) (6603 reads) (Other stories by Ryan Bloom)

By Ryan Bloom

One of the holy grails of the Apache developers has always been filtered or layered I/O, the ability for one module to modify the data that was generated by an earlier module. This ability was originally slated for inclusion in Apache 2.0, but when work began in earnest on 2.0, this feature was pushed aside, and marked for inclusion in 2.1 or 3.0. Two months ago however, the Apache developers had a small meeting, and designed filtered I/O for Apache 2.0. The work has been started, and there have been some filters written. Over the next few months, I will explain how this feature works, and how your modules can take advantage of it.

The general premise of the filtered I/O design in Apache 2.0 is that all data served by a web server can be broken into chunks. Each chunk of data comes from the same place either a file, a CGI program, or it is generated by a module. We also knew that all of the data could always be represented as a string of characters, although that string may not be human-readable. Armed with that knowledge, we sat down to design the filtering system. One of our overriding goals, was that the filtering logic needed to be performance aware. It didn't matter if filters chose to ignore performance issues, but it did matter if the design hindered filters from knowing about performance issues. This meant that we needed to know more about that data than just what the data was, we also needed to know where the data comes from, and what it's lifetime is.

This meta-data is important when actually writing the response to the network. For example, if we have a very simple request that is just a page from disk, then we want to use sendfile (sendfile is provided by APR, and is available on all platforms, it the platform doesn't have a native sendfile, then APR loops reading the file and writing to the network.) If we take this example a step further, and make the whole response an SSI page, where one element is a file from disk, and the rest is generated, such as date strings, then we want to use a single sendfile call if possible. APR's sendfile provides an opportunity to include both header and trailer information with the file, which are sent using writev. In this example, we can send the HTTP headers, the full file, and the date string with one APR call (The number of system calls will differ depending on platform). Keeping the meta-data accessable is obviously a good idea.

In order to keep the meta-data available, the Apache developers needed to find some way to pass everything from one filter to the next. The data structure that was designed to do this is being called a bucket_brigade. Each bucket brigade is composed of multiple buckets. The buckets contain the data that we are sending to the client. And the type of bucket used makes up the meta-data.

Currently, we have a small number of bucket types, but the bucket API was designed to be extendable. The current bucket types are:

AP_BUCKET_HEAP

This bucket type is designed to store data allocated off the heap. This data will be available as long as the bucket is available. If the data needs to be modified and there is space in the bucket, it is acceptable to modify the data in place when using this bucket.

AP_BUCKET_TRANSIENT

This bucket stores data allocated off the stack. This means that when a filter function returns, it is garaunteed that the data will not still be valid the next time this function is called. If the data has not yet been written to the network, then it must be converted to a heap bucket so that it is still available the next time the current filter is called.

AP_BUCKET_FILE

This bucket type references a file on the disk. When reading from this bucket, a new bucket is created in front of the file bucket. The data that was read from the file is stored in the new bucket. This is done so that we only read from the file once. When writing to the network, we determine how much of the data can be sent using sendfile.

AP_BUCKET_MMAP

This bucket references MMAPed files. The data is treated much like heap buckets, except the data can not be modified in the bucket. If the data needs to be modified, a heap bucket must be created and the data must be copied into that bucket.

AP_BUCKET_IMMORTAL

This bucket type is a generic bucket. Any data type is valid in this bucket, but the data must be managed some external entity. This is designed for data that a module will create and destroy. Perhaps tha best way to describe this is with an example. Mod_mmap_static keeps a cache of mmap'ed files available to increase the performance of Apache. The mmap entities would be immortal buckets. Mod_mmap_static is in charge of creating and destroying the mmaps, the immortal buckets just reference it.

AP_BUCKET_POOL

This bucket references data allocated out of a pool. Pool data is garaunteed to be available as long as the pool is available. When this bucket is created, a cleanup is registered so that when the pool is cleared, if the data is still required the bucket is converted into a heap bucket.

AP_BUCKET_PIPE

This bucket references a pipe. Pipes are interesting, because they destroy themselves as they are read. This means that if I have a pipe and I read data from it, I must save that data someplace or it will be lost. To accomplish this, when pipe buckets are read, a second bucket is created in front of the current bucket. The new bucket is used to store the data read from the pipe. This is very similar to file buckets, except that sendfile can't be used with pipes. Pipe buckets are most commonly used to return data from CGI scripts.

AP_BUCKET_EOS

This bucket does not contain any data. It signals filters that there is will be no more data generated for consumption. This tells filters that this is the final time they will be called, so they need to send any data that they have saved in previous calls

All buckets include pointers to their accessor functions. The details of what these functions do is specific to each bucket type, but we can explain them generally.

read
split
setaside
destroy

There are just a few more concepts that we need to cover in this overview of filtering. The first is registering a filter with the server. This is done with ap_register_filter.

void ap_register_filter(const char *name, ap_filter_func filter_func, ap_filter_type ftype);

The first argument is a name for the filter. This name is used throughout the server to reference that filter fuction. The second is the filter function we are registering. The final argument is the type of filter. There are currently two filter types, content and connection. Content filters are any filter that modifies the data being sent. Connection filters refer to filters that dictate how the data is sent over the network. For example, a SSI filter is a content filter while an SSL filter is a connection filter.

Once the filter has been registered, it must be added to the current request's filter stack before it will actually be called. This is done by calling ap_add_filter.

void ap_add_filter(const char *name, void *ctx, request_rec *r);

The first argument is the name of the filter to add. This should be the name that was registered for the desired filter. The second is a pointer to a structure that should be passed to the filter whenever it is called. This provides with a location to store any data that they may need to save between calls. The final argument is the request_rec to pass to the filter when it is called.

There are two more functions prototypes that must be discussed. The first is the prototype for filters themselves.

apr_status_t ap_filter_func(ap_filter_t *f, ap_bucket_brigade *b);

The first argument is a pointer to the current filter structure. This is where the ctx passed to ap_add_filter is stored, as well as a pointer to the next filter on the stack. The second is the current bucket brigade to filter.

The second function is ap_pass_brigade. This function passes bucket brigades from the current filter to the next filter. The prototype is:

apr_status_t ap_pass_brigade(ap_filter_t *filter, ap_bucket_brigade *bucket);

The first argument is a pointer to the next filter to call, and the second is the bucket brigade to pass to that filter.

That covers all of the basic concepts for writing filters. In next month's article we will actually write a filter that can be inserted into an Apache 2.0 server.

Related Stories:
Apache 2.0 Server Up and Running(Aug 19, 2000)
Apache 2.0alpha6 Released(Aug 19, 2000)
Apache 2.0alpha5 Released(Aug 05, 2000)
Building and Installing Apache 2.0(Jul 26, 2000)
ApacheWeek: Issue 206 7th July 2000(Jul 07, 2000)
Looking at Apache 2.0 Alpha 4 (Jun 30, 2000)
Apache 2.0 Alpha 4 Released(Jun 07, 2000)
An Introduction to Apache 2.0(May 28, 2000)

Current Newswire:

Apache Jakarta James Mailserver v2.0a2 Released

PostgreSQL v7.2 Final Release

Daemon News: Multiple webservers behind one IP address

Zend Technologies launches Zend Studio 2.0

NuSphere first to enable development of PHP web services

Covalent Technologies raises $18 million in venture capital

Apache 1.3.23 released

wdvl: Build Your Own Database Driven Website Using PHP and MySQL: Part 4

Business 2.0: Find High Tech in the Bargain Basement

Another mod_xslt added to the Apache Module Registry database

No talkbacks posted.

Home | Search Talkbacks | Customize View

Top of Page

Enter your comments below.

About Triggers

Media Kit

Security

Triggers

Login

All times are recorded in UTC.
Linux is a trademark of Linus Torvalds.
Powered by Linux 2.4, Apache 1.3, and PHP 4 Copyright 2002 INT Media Group, Incorporated All Rights Reserved.
Legal Notices, �Licensing, Reprints, & Permissions, �Privacy Policy.

Your Name:	Your Email Address:

Subject:	CC: [will also send this talkback to an E-Mail address]

Comments:
See our talkback-policy for or guidelines on talkback content.