Your Daily Source for Apache News and Information  
Breaking News Preferences Contribute Triggers Link Us Search About
Apache Today [Your Apache News Source] To internet.com

Apache HTTPD Links
The Apache Software Foundation
Apache-Related Projects
ApacheCon
The Apache FAQ
The Jakarta Project
Apache Project
Apache Module Registry
Apache-Perl Integration Project
PHP Server Side Scripting
Apache XML Project
The Java Apache Project

  internet.com

Internet News
Internet Investing
Internet Technology
Windows Internet Tech.
Linux/Open Source
Web Developer
ECommerce/Marketing
ISP Resources
ASP Resources
Wireless Internet
Downloads
Internet Resources
Internet Lists
International
EarthWeb
Career Resources

Search internet.com
Advertising Info
Corporate Info
Apache Guide: Logging with Apache--Understanding Your access_log
Aug 21, 2000, 18 :01 UTC (16 Talkback[s]) (21053 reads) (Other stories by Rich Bowen)

By

Apache comes with built-in mechanisms for logging activity on your server. In this series of articles, I'll talk about the standard way that Apache writes log files, and some of the tricks for getting more useful information and statistics out of your server.

This week we'll talk about the information that appears in your transfer log, and what it all means.

The standard log files

If you have done a default installation of Apache, when you run your server, two log files will get written. These files are called access_log (access.log on Windows) and error_log (error.log on Windows). These files can be found (again, if you did a default installation) in /usr/local/apache/logs. On Windows, the logs will be in the logs subdirectory of wherever you installed Apache. Various of the package managers put the log files in various other places, and you'll have to poke around to find them, or check in the configuration file for the configured location.

access_log

access_log is, as the name suggests, the log of all accesses to your server. Typical entries in this file look like:

        216.35.116.91 - - [19/Aug/2000:14:47:37 -0400] "GET / HTTP/1.0" 200 654

This line contains 7 pieces of information. Actually, two of them are blank in this example, but there is space for 7 pieces of information.

The first piece of information is the address of the remote host. That is, who is looking at your web site. In the example above, the host visiting my web site is 216.35.116.91, which is, incidentally, the IP address of the machine called si3001.inktomi.com. (I figured that out by looking up the address in DNS, with the nslookup utility.) inktomi.com is a company that makes web searching software. (I looked at their web site.) Since this same IP address requested the file robots.txt just a few seconds earlier, I suspect that this is a web searching spider that was indexing my web site. I'll talk about spiders in another column. So, just based on that first piece of information, and a glance back in the log file, I've already found out quite a bit of information about my visitors.

By default, this address is just the IP address of the remote host. You can tell Apache to look up all the host names, and put those host names in the log instead of the IP address. This is probably not a good idea, since it greatly slows down the logging process, and so slows down your entire server. And there are various tools that will go through your log after the fact, and resolve all the IP addresses to host names, so there's no real advantage to doing this anyway.

But, if you want to, you can tell Apache to do these lookups with the directive:

        HostNameLookups on

Setting HostNameLookups to double, rather than on, will cause the logging process to do a reverse lookup on the name that it finds, to verify that it points back to the IP address that you started with. The value is set to off by default.

The second slot, alas, is blank, and almost always will be. That's what that ``-'' is: a place-holder for the second piece of information. That is the location where you're supposed to get the identity of the visitor. That's not just their login name, but their email address, or other unique identifier. This information is supposed to be returned by identd, or directly by the browser. And in the old days, back when Netscape 0.9 was the dominant browser, you would usually have email addresses in this spot. However, it did not take long for unsavory marketing types to think that it would be a good idea to collect those email addresses and send them unsolicited email (also known as spam). So, before very long, this feature was removed from just about every browser on the market. You will almost never find information in this field.

The third piece of information is also blank. The information that would appear there is the username with which the visitor authenticated. This will appear, of course, only when you have required authentication for a particular resource. So for the majority of entries in your log file, for most sites, this will be blank.

Next we have the time when the request was made. This information is enclosed in square brackets, and is in what is called ``common log format'', or ``standard english format.'' So the request in the above example was made at 14:47:37 on Saturday, August 19. The -0400 pn the end of the field means that the server is in the time zone 4 hours before UTC. This tells you two things. One, that I tend to leave my column until the last minute, and two, that I appear to have the wrong time-zone set on my server. I'll have to make a note to take care of that ...

The next piece of information is probably the most useful piece of information in the record. It tells what request was actually made of the server. This is typically in the format METHOD RESOURCE PROTOCOL.

In the example above, the METHOD is GET. The other most common methods will be POST and HEAD. There are a number of other valid methods, but those three are what you will see most of the time.

The RESOURCE is the actual document, or URL, that was requested from the server. In this example, the client requested ``/'', which is the root, or front page, of the server. In most configurations, this corresponds to the file index.html in the DocumentRoot directory, but could be something else, depending on your server configuration.

The PROTOCOL is usually going to be HTTP, followed by a version number. The version number will be either 1.0 or 1.1, with most of the records being 1.0 As you probably know from other articles, HTTP is the protocol that makes the web work. HTTP/1.0 was the earlier version of this protocol, and 1.1 was the more recent version. However, most web clients still speak version 1.0.

The sixth piece of information is a status code. This tells you whether the request was successful, or encountered some problem. Most of the time, this is 200, which means that the transfer was successful, and everything went well. Hopefully. I'm not going to give the whole list of the status codes, and what they mean. You need to look in the documentation for that. But, in general, a status code that starts with 2 was successful. Starting with a 3 means that the request was redirected somewhere else for some reason. Starting with a 4 means that the user did something wrong, and starting with a 5 means that the server did something wrong.

The seventh and final piece of information is the total number of bytes that were transferred to the client. This can tell you if a transfer was interrupted (if the number is different from the size of the file). Adding them up will tell you how much data your server transferred in a day, or week, or whatever.

Setting the location of your access_log

Where the access_log is located is actually a configuration option. If you look in your configuration file, httpd.conf, you should see a line that looks like the following:

        CustomLog /usr/local/apache/logs/access_log common

Note: If you're running an older version of Apache, this line might look a little different. It might be the TransferLog directive instead of the CustomLog directive. If that is the case, I really recommend that you upgrade if at all possible.

The CustomLog directive specifies where a particular log file should be stored, and what format that log should be in. Next week we'll talk about custom log formats. The log format described above is the common log format, which has been in use as the standard since the beginning of web servers. That's why it still contains the ident information field, even though almost no clients actually pass that information to the server.

The path specified there is the location of the log file. Note that this location should be secured against random users writing to it, since the log file is opened by the HTTP user (specified with the User directive), and so this is potentially a security problem.

Upcoming articles

In my next few articles, I'll be talking about the following subjects: Custom log format. Logging to a process, rather than to a file. The error log. Getting useful statistics out of your log files. And whatever else you fine readers suggest to me.

Thanks for reading. Please send me a note at if you have any suggestions or comments.

Want to discuss log files with other Apache Today readers? Then check out the PHP discussion at Apache Today Discussions.

  Current Newswire:
Netcraft Web Server Survey for November is available

FoxServ 2.0 Released

Ace's Hardware: Building a Better Webserver in the 21st Century

Web Techniques: Customer Number One

Apache-Frontpage RPM project updated

CNet: Open-source approach fades in tough times

NewsForge: VA spin-off releases first product, aims for profit

Apache 2.0.28 Released as Beta

Covalent Technologies announces industry support for Enterprise Ready Server and Apache 2.0

developer.com: On the Security of PHP, Part 1

 Talkback(s) Name  Date
I've been searching for an open-source Log Analyzer that will automatically ...   Log Analyzer   
  Aug 22, 2000, 13:15:30
I don't do massive virtual hosting (less then 100) but what I've done is ...   Re: Log Analyzer   
  Aug 22, 2000, 16:50:42
Log analysis is part 3 or 4 of this series. I'll cover log analysis tools, a ...   Log analysis   
  Aug 23, 2000, 00:54:39
I've recently moved from a single server to a clustered web farm. I need a l ...   Cluster logs   
  Sep 6, 2000, 04:00:05
You might want to try out spong. here is the url, it should do what you want it ...   Re: Cluster logs   
  Sep 23, 2000, 15:23:42
I want to know why ".....Permission denied: cannot read directory for multi: /ho ...   error code   
  Feb 3, 2001, 03:06:43
hi all,i need to upload access log into MySql from conf. apache , but i dont kno ...   access log   
  Apr 20, 2001, 05:30:42
Hi,My access_log is empty and only writes lines like this, on my error_log[info] ...   Error on access_log   
  May 14, 2001, 09:46:31
Hi,I was wondering, is it possible to collect/store country data in the access_l ...   Country data collecting   
  May 23, 2001, 15:17:08
I spent an hour looking over on apache.org for a full list of status codes. Coul ...   a complete list of all status codes?   
  May 30, 2001, 02:38:12
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html ...   Re: a complete list of all status codes?   
  Jun 19, 2001, 14:02:47
By far the absolute best log analyzer I have found is AccessProbe (www.AccessPro ...   Re: Log analysis   
  Jul 12, 2001, 19:02:16
: where i can get status code about access_log?for example,what meant of 200 or ...   where i can get status code about access_log?for example,what meant of 200 or 300?   
  Jul 31, 2001, 07:16:30
I wanted to block retrival of all the files except with the extensions .html , . ...   Configuring Apache   
  Aug 15, 2001, 23:21:01
I got adobe premiere with total training included, I used to be able to connect ...   adobe premiere   
  Oct 22, 2001, 20:55:54
LS,I want to give a user scripting access from a browser but without success.I w ...   htaccess   
  Nov 14, 2001, 16:05:17
Enter your comments below.
Your Name: Your Email Address:


Subject: CC: [will also send this talkback to an E-Mail address]
Comments:

See our talkback-policy for or guidelines on talkback content.

About Triggers Media Kit Security Triggers Login


All times are recorded in UTC.
Linux is a trademark of Linus Torvalds.
Powered by Linux 2.4, Apache 1.3, and PHP 4
Copyright INT Media Group, Incorporated All Rights Reserved.
Legal Notices,  Licensing, Reprints, & Permissions,  Privacy Policy.
http://www.internet.com/