Your Daily Source for Apache News and Information |
Breaking News | Preferences | Contribute | Triggers | Link Us | Search | About |
By Ken Coar
Webmasters are ever searching for ways to make their sites look cool and attractive. One way is to dress it up with images, logos, and other graphics--sometimes referred to as 'eye candy.' Of course, if you happen to be in the forefront of this in any way, you run the risk of having others cadge your art in order to dress up their sites. And they probably won't even ask permission nor pay you a royalty, either.
This article shows how you can use Apache configuration directives to limit access to your art so that it's more difficult to use elsewhere.
Simply put, there are two types of "infringement" involved here:
The first type not only causes your images to prettify someone else's site, but hurts you more directly because visitors to their site are hammering yours to get the images. Your log files get filled with access request entries, your bandwidth gets used -- and you're getting no benefit from it. This type of theft is almost completely preventable.
The second type of theft is more insidious. The 'borrower' doesn't cause your site to get pounded on for access to the images, since they've been copied to the borrower's site, but you probably weren't given any credit for the artwork--and you probably don't even know the theft happened. Because of the way the Web works, this type of theft can't really be prevented, but you can at least make it a little more difficult.
You can't completely prevent either of these, of course, but you can make them more difficult to do.
You're probably not going to want to protect every document on your site. Even if you do, for the sake of this article I'm assuming you only want to protect your artwork. So how do you indicate that the rules only apply to them? With directives such as the following in your server config files:
<FilesMatch ".(gif|jpg)"> [limiting directives will go here] </FilesMatch>
You can put a container such as this inside a <Directory>
container, or inside a <VirtualHost>
container, or outside any containers at all (in which case it applies to all such files on your server), or even inside .htaccess
files. Put it wherever it makes sense to protect what you want protected.
Referer
Header FieldDown on the wire, where the browsers, spiders, and servers live, every request for a Web page includes a component called the HTTP request header. This contains information about the request, such as the user's preferred languages, the types of documents the client is able to handle -- and not least, the name of the item being requested. This information is conveyed in a series of name/value pairs called header fields.
One of these header fields is of particular importance to what we want to do. It's called the Referer
field (yes, I know, it's misspelt--but that's how it's misspelt in the definition, too), and it indicates the URL of the client's last page if and only if the client is following a link. That is, if you're viewing page A, and click on a link to page B, the request for page B will include a Referer
field that says "I'm following a link on page A." If no link is being followed, such as if the user just typed B's URL into the Location field of his browser, there will be no Referer
field in the request header.
How does this help? Well, it gives us a way to tell whether an image is being requested because it was linked to by one of our pages -- or by someone else's.
SetEnvIf
to 'Tag' ImagesFor a simple case, suppose our Web site's main page is <http://my.apache.org/>
. In this case, we want to restrict any artwork requests that don't originate on our site (i.e., only allow them if the image was linked to by one of our pages). We can do this by using an environment variable (also called an envariable) as a flag, and setting it if the conditions are right. Something like the following ought to do it:
SetEnvIfNoCase Referer "^http://my.apache.org/" local_ref=1
When Apache processes a request, it will examine the Referer
field in the header, and set the environment variable local_ref
to "1" if the value starts with our site address--i.e., is one of our pages.
The string inside the quotation marks is a regular expression pattern that the value must match in order for the environment variable to be set. Describing how to use regular expressions (REs) is far beyond the scope of this article; for now, just be aware that the SetEnvIf*
directives use them.
The "NoCase
" portion of the directive name means, "do this whether the Referer
is 'http://my.apache.org/', or 'http://My.Apache.Org/', or 'http://MY.APACHE.ORG/' -- in other words, ignore the upper/lower caseness of the value.
The Order
, Allow
, and Deny
directives allow us to control access to documents based upon the setting (or unset-ness) of an envariable. The first thing to do is to indicate the order in which Apache will process Allow
and Deny
directives; you do with the Order
directive as follows:
Order Allow,Deny
This means that Apache will go through any list of Allow
directives it has that apply to the current request, and then repeat the process with any Deny
directives. With this ordering, the default condition is 'denied;' that is, no-one will be able to access anything unless there's an applicable Allow
directive.
All right, so let's add the directive that will let local references work:
Order Allow,Deny Allow from env=local_ref
This will let a request proceed if the local_ref
envariable is set (with any value whatsoever). Any and all other requests will be denied because they don't meet the Allow
conditions and the default is to deny access.
|
Putting all these pieces together, we end up with a stanza of directives that looks something like this:
SetEnvIfNoCase Referer "^http://my.apache.org/" local_ref=1 <FilesMatch ".(gif|jpg)"> Order Allow,Deny Allow from env=local_ref </FilesMatch>
These may all appear in your server-wide configuration files (e.g., httpd.conf
), or you can put the <FilesMatch>
container in one or more .htaccess
files. The effect is the same: Within the scope of these directives, images can only be fetched if they were linked to from one of your pages.
|
I mentioned earlier that you can't fully prevent image theft. That's because of two things, which apply pretty much to the two different types of poaching respectively:
Referer
value that happens to meet your criteria. In other words, by jiggering up the request so it looks like it's a reference from your site.Though it's essentially impossible to foil someone who's really desperate to snitch your artwork, the steps described in this article should make it too difficult for the casual poacher.
Another thing you can do, depending upon how protective you are of your art, is to watermark the images. Watermarking a digital image consists of encoding a special 'signature' into the graphic so that it can be detected later. Digital watermarking doesn't degrade the quality of the image, and can be done in such a way that even a cropped subsection of the image contains the mark, and it's detectable even if the image has been otherwise edited since the mark was inserted. It's even possible to detect a watermark in an image that was printed and then scanned in, having left the digital realm altogether! If you watermark your images, there's an excellent chance you'll be able to prove snitching if you ever find a suspicious image on another site somewhere.
If you're not sure whether anyone is really after your artwork, you can use the same detection mechanism and envariable to log suspicious requests. For instance, if you add the following directives to your httpd.conf
file, an entry will be made in the /usr/local/web/apache/logs/poachers_log
file any time someone accesses one of your images without a valid Referer
:
SetEnvIfNoCase Referer !"^http://my.apache.org/" not_local_ref=1 SetEnvIfNoCase Request_URI ".(gif|jpg)" is_image=1 RewriteEngine On RewriteCond ${ENV:not_local_ref} =1 RewriteCond ${ENV:is_image} =1 RewriteRule .* - [Last,Env=poach_attempt:1] CustomLog logs/poachers_log CLF env=poach_attempt
This should have the effect of logging all attempts to access your images using one of the potential 'snitching' techniques described in this article. The first two lines set flags for the conditions (that it's an image, and that it was't referred by a local document), the RewriteCond
lines check to see if the flags are set, the RewriteRule
line sets a third flag combining the two, and the last line causes the logging of the request in a special file if that last flag is set. The log entry is written in the pre-defined 'CLF' format ('Common Log Format'), but you could put together your own format just as easily.
The techniques described in this article are geared toward a single purpose, but illustrate some of the capabilities of the Apache server. Here are some pointers to resources for further investigation:
Then there are the specific pieces of the Apache documentation that are directly related to the directives and commands described in this article:
<FilesMatch>
documentation: <URL:http://www.apache.org/docs/mod/core.html#filesmatch>mod_setenvif
documentation: <URL:http://www.apache.org/docs/mod/mod_setenvif.html>mod_access
documentation: <URL:http://www.apache.org/docs/mod/mod_access.html>mod_rewrite
documentation: <URL:http://www.apache.org/docs/mod/mod_rewrite.html>CustomLog
directive: <URL:http://www.apache.org/docs/mod/mod_log_config.html>Custom artwork can result from someone's effort, and taking without permission something that another has created is generally accepted as theft. This article has described a basic way to put your works of art behind a velvet rope--if you're so inclined. It won't stop determined thieves, but it should hopefully stymy or dissuade the more casual ones.
If you have a particular Apache-related topic that you'd like covered in a future article in this column, please let me know; drop me an email at <>. I do read and answer my email, usually within a few hours (although a few days may pass if I'm travelling or my mail volume is 'way up). If I don't respond within what seems to be a reasonable amount of time, feel free to ping me again.
Ken Coar is a member of the Apache Group and a director and vice president of the Apache Software Foundation. He is also a core member of the Jikes open-source Java compiler project, a contributor to the PHP project, the author of Apache Server for Dummies, and a contributing author to Apache Server Unleashed. He can be reached via email at <>.
About Triggers | Media Kit | Security | Triggers | Login |
All times are recorded in UTC. Linux is a trademark of Linus Torvalds. Powered by Linux 2.4, Apache 1.3, and PHP 4 Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy. |