Your Daily Source for Apache News and Information  
Breaking News Preferences Contribute Triggers Link Us Search About
Apache Today [Your Apache News Source] To internet.com

Apache HTTPD Links
The Jakarta Project
Apache XML Project
Apache-Perl Integration Project
Apache Project
Apache-Related Projects
ApacheCon
The Java Apache Project
The Apache Software Foundation
Apache Module Registry
The Apache FAQ
PHP Server Side Scripting
The Linux Channel at internet.com
Enterprise Linux Today
Linux Today
Linuxnewbie.org
BSD Central
Linux Planet
PHPBuilder
Linux Start
BSD Today
Linux Programming
Linux Apps
Just Linux
All Linux Devices
Apache Today
Linux Central
SITE DESCRIPTIONS
Apache Guide: Logging with Apache--Understanding Your access_log
Aug 21, 2000, 18 :01 UTC (16 Talkback[s]) (21502 reads) (Other stories by Rich Bowen)

By

Apache comes with built-in mechanisms for logging activity on your server. In this series of articles, I'll talk about the standard way that Apache writes log files, and some of the tricks for getting more useful information and statistics out of your server.

This week we'll talk about the information that appears in your transfer log, and what it all means.

The standard log files

If you have done a default installation of Apache, when you run your server, two log files will get written. These files are called access_log (access.log on Windows) and error_log (error.log on Windows). These files can be found (again, if you did a default installation) in /usr/local/apache/logs. On Windows, the logs will be in the logs subdirectory of wherever you installed Apache. Various of the package managers put the log files in various other places, and you'll have to poke around to find them, or check in the configuration file for the configured location.

access_log

access_log is, as the name suggests, the log of all accesses to your server. Typical entries in this file look like:

        216.35.116.91 - - [19/Aug/2000:14:47:37 -0400] "GET / HTTP/1.0" 200 654

This line contains 7 pieces of information. Actually, two of them are blank in this example, but there is space for 7 pieces of information.

The first piece of information is the address of the remote host. That is, who is looking at your web site. In the example above, the host visiting my web site is 216.35.116.91, which is, incidentally, the IP address of the machine called si3001.inktomi.com. (I figured that out by looking up the address in DNS, with the nslookup utility.) inktomi.com is a company that makes web searching software. (I looked at their web site.) Since this same IP address requested the file robots.txt just a few seconds earlier, I suspect that this is a web searching spider that was indexing my web site. I'll talk about spiders in another column. So, just based on that first piece of information, and a glance back in the log file, I've already found out quite a bit of information about my visitors.

By default, this address is just the IP address of the remote host. You can tell Apache to look up all the host names, and put those host names in the log instead of the IP address. This is probably not a good idea, since it greatly slows down the logging process, and so slows down your entire server. And there are various tools that will go through your log after the fact, and resolve all the IP addresses to host names, so there's no real advantage to doing this anyway.

But, if you want to, you can tell Apache to do these lookups with the directive:

        HostNameLookups on

Setting HostNameLookups to double, rather than on, will cause the logging process to do a reverse lookup on the name that it finds, to verify that it points back to the IP address that you started with. The value is set to off by default.

The second slot, alas, is blank, and almost always will be. That's what that ``-'' is: a place-holder for the second piece of information. That is the location where you're supposed to get the identity of the visitor. That's not just their login name, but their email address, or other unique identifier. This information is supposed to be returned by identd, or directly by the browser. And in the old days, back when Netscape 0.9 was the dominant browser, you would usually have email addresses in this spot. However, it did not take long for unsavory marketing types to think that it would be a good idea to collect those email addresses and send them unsolicited email (also known as spam). So, before very long, this feature was removed from just about every browser on the market. You will almost never find information in this field.

The third piece of information is also blank. The information that would appear there is the username with which the visitor authenticated. This will appear, of course, only when you have required authentication for a particular resource. So for the majority of entries in your log file, for most sites, this will be blank.

Next we have the time when the request was made. This information is enclosed in square brackets, and is in what is called ``common log format'', or ``standard english format.'' So the request in the above example was made at 14:47:37 on Saturday, August 19. The -0400 pn the end of the field means that the server is in the time zone 4 hours before UTC. This tells you two things. One, that I tend to leave my column until the last minute, and two, that I appear to have the wrong time-zone set on my server. I'll have to make a note to take care of that ...

The next piece of information is probably the most useful piece of information in the record. It tells what request was actually made of the server. This is typically in the format METHOD RESOURCE PROTOCOL.

In the example above, the METHOD is GET. The other most common methods will be POST and HEAD. There are a number of other valid methods, but those three are what you will see most of the time.

The RESOURCE is the actual document, or URL, that was requested from the server. In this example, the client requested ``/'', which is the root, or front page, of the server. In most configurations, this corresponds to the file index.html in the DocumentRoot directory, but could be something else, depending on your server configuration.

The PROTOCOL is usually going to be HTTP, followed by a version number. The version number will be either 1.0 or 1.1, with most of the records being 1.0 As you probably know from other articles, HTTP is the protocol that makes the web work. HTTP/1.0 was the earlier version of this protocol, and 1.1 was the more recent version. However, most web clients still speak version 1.0.

The sixth piece of information is a status code. This tells you whether the request was successful, or encountered some problem. Most of the time, this is 200, which means that the transfer was successful, and everything went well. Hopefully. I'm not going to give the whole list of the status codes, and what they mean. You need to look in the documentation for that. But, in general, a status code that starts with 2 was successful. Starting with a 3 means that the request was redirected somewhere else for some reason. Starting with a 4 means that the user did something wrong, and starting with a 5 means that the server did something wrong.

The seventh and final piece of information is the total number of bytes that were transferred to the client. This can tell you if a transfer was interrupted (if the number is different from the size of the file). Adding them up will tell you how much data your server transferred in a day, or week, or whatever.

Setting the location of your access_log

Where the access_log is located is actually a configuration option. If you look in your configuration file, httpd.conf, you should see a line that looks like the following:

        CustomLog /usr/local/apache/logs/access_log common

Note: If you're running an older version of Apache, this line might look a little different. It might be the TransferLog directive instead of the CustomLog directive. If that is the case, I really recommend that you upgrade if at all possible.

The CustomLog directive specifies where a particular log file should be stored, and what format that log should be in. Next week we'll talk about custom log formats. The log format described above is the common log format, which has been in use as the standard since the beginning of web servers. That's why it still contains the ident information field, even though almost no clients actually pass that information to the server.

The path specified there is the location of the log file. Note that this location should be secured against random users writing to it, since the log file is opened by the HTTP user (specified with the User directive), and so this is potentially a security problem.

Upcoming articles

In my next few articles, I'll be talking about the following subjects: Custom log format. Logging to a process, rather than to a file. The error log. Getting useful statistics out of your log files. And whatever else you fine readers suggest to me.

Thanks for reading. Please send me a note at if you have any suggestions or comments.

Want to discuss log files with other Apache Today readers? Then check out the PHP discussion at Apache Today Discussions.

  Current Newswire:
WDVL: Perl for Web Site Management: Part 3

Retro web application framework V1.1.0 release

Leveraging open standards such as Java, JSP, XML,J2EE, Expresso and Struts.

Netcraft Web Server Survey for November is available

FoxServ 2.0 Released

Ace's Hardware: Building a Better Webserver in the 21st Century

Web Techniques: Customer Number One

Apache-Frontpage RPM project updated

CNet: Open-source approach fades in tough times

NewsForge: VA spin-off releases first product, aims for profit

 Talkback(s) Name  Date
  Log Analyzer

I've been searching for an open-source Log Analyzer that will automatically "detect" new web-sites and place the respective web statistics into that site's structure.

For example, the moderately priced Urchin (http://www.urchin.com) detects web-sites that start at the root. For example:

http://www.MyBox.com/site1/blah.html
http://www.MyBox.com/site2/ugh.html

I want the Log Analyzer to ascertain from the Log file that there are actually two web sites to be processed: site1 and site2


Then Urchin will place a statistics directory named "urchin" into each site at:

http://www.MyBox.com/site1/urchin/
http://www.MyBox.com/site2/urchin/


As you can tell, I am enthused on everything about Urchin but its price. We cannot afford its licensing structure, even though it is fairly priced.

Do you know of an open-source free Log Analyzer that has comparable features of Urchin. I don't need fancy graphs.

Sincerely,

-- Niraj
  
  Aug 22, 2000, 13:15:30
   Re: Log Analyzer
I don't do massive virtual hosting (less then 100) but what I've done is simply create different logs for each site.

It isn't a bid deal because early on I made a newsite.sh shell script and all I have to do is answer a few questions like site domain and user/pass for the web access to the logs. The rest is done in about 4 seconds. Then each night I have a cron job run analog to update the logs for each site.

I don't think that really answered your question but I think analog has a similar feature.

Leknor,
http://Leknor.com   
  Aug 22, 2000, 16:50:42
  where i can get status code about access_log?for example,what meant of 200 or 300?
: where i can get status code about access_log?for example,what meant of 200 or 300?   
  Jul 31, 2001, 07:16:30
  Configuring Apache
I wanted to block retrival of all the files except with the extensions .html , .htm and .php .. to do that, I added some lines in my httpd.conf file and it looks like this :


Order allow,deny
allow from none
deny from all
AllowOverride none


Actually I wanted that within a page, every thing will be shown. Like, when an HTML page in the URL www.mypage.com/index.html has a graphic file called logo.gif, it is shown. but the url www.mypage.com/logo.gif < this doesnt show anything, rather than gives the forbidden message.
But what I am experiencing is that, no graphics is visible even within a page.
Moreover, the directory listings are not even shown when there is no index.html file (I allowed the directory listing)
take a look at the site where I am trying to configure my apache server :
http://yallara.cs.rmit.edu.au:58416/index.php   
  Aug 15, 2001, 23:21:01
  adobe premiere
I got adobe premiere with total training included, I used to be able to connect for training on line but now I can't. Could you tell me what to do.
Thanks Jan   
  Oct 22, 2001, 20:55:54
  htaccess
LS,

I want to give a user scripting access from a browser but without success.

I will explain here wat I want:

User smuradin must be able to execute perl scripts from his one account. So I setup the folllowing for him:

create the following dir. /home/smuradin/public_html/cgi-bin

in this directory I put a file .htaccess with contents Options +ExecCGI

When I try to execute a perl script I get a pop-up screen which want to save the perl script.

What am I doing wrong??

I also modified the following files
srm.conf: ScriptAlias /cgi-bin/ /home/smuradin/public_html/cgi-bin/

access.conf: AllowOverride Options


Can some one help me with this


bye   
  Nov 14, 2001, 16:05:17
  Log analysis
Log analysis is part 3 or 4 of this series. I'll cover log analysis tools, and recommend several. I just have not gotten that far yet.

Wusage is one of the better free ones out there. www.boutell.com. I have a list of others somewhere - perhaps in my book - but I don't have it handy.

http://www.apacheunleashed.com/   
  Aug 23, 2000, 00:54:39
   Re: Log analysis
By far the absolute best log analyzer I have found is AccessProbe (www.AccessProbe.com). It has a clean feature that has kept my log file under 100 mB and it contains 12 months worth of data. It usually is 100 mB in two or three weeks.

Chris   
  Jul 12, 2001, 19:02:16
  Cluster logs
I've recently moved from a single server to a clustered web farm. I need a log analyzer that can handle logs from a web farm. WebTrends Enterprise with their ClusterTrends addition claims to do this, but it's costly. Any other ideas? (Free is nice!)   
  Sep 6, 2000, 04:00:05
   Re: Cluster logs
You might want to try out spong. here is the url, it should do what you want it to do. It is fairly well documented and SHOULD work with clusters. here is the url http://spong.sourceforge.net/   
  Sep 23, 2000, 15:23:42
  error code
I want to know why ".....Permission denied: cannot read directory for multi: /home" always is in my error_log?   
  Feb 3, 2001, 03:06:43
  access log
hi all,

i need to upload access log into MySql from conf. apache , but i dont know how how it configuration in apache web server.
have idea?? please contac me...

Thx

sincerely

-arlin-   
  Apr 20, 2001, 05:30:42
  Error on access_log
Hi,

My access_log is empty and only writes lines like this, on my error_log

[info] created shared memory segment #239618
[Mon May 14 10:16:34 2001] [notice] Apache-AdvancedExtranetServer/1.3.14 (Linux-Mandrake/2mdk) PHP/4.0.3pl1 configured -- resuming normal operations

One day it stoped writing and I don't know why. Can anyone help me ?

Thanks,
Tiago Mota   
  May 14, 2001, 09:46:31
  Country data collecting
Hi,
I was wondering, is it possible to collect/store country data in the access_log file.
What I want is next:
Like the IP addresses stored in the access_log file I want to have the browserlanguage the user is using there as well. For example en/us.
Is this possible?

Regars,
Jordi   
  May 23, 2001, 15:17:08
  a complete list of all status codes?
I spent an hour looking over on apache.org for a full list of status codes. Couldn't find it. Anybody know how to find it?   
  May 30, 2001, 02:38:12
   Re: a complete list of all status codes?
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html   
  Jun 19, 2001, 14:02:47
Enter your comments below.
Your Name: Your Email Address:


Subject: CC: [will also send this talkback to an E-Mail address]
Comments:

See our talkback-policy for or guidelines on talkback content.

About Triggers Media Kit Security Triggers Login


All times are recorded in UTC.
Linux is a trademark of Linus Torvalds.
Powered by Linux 2.4, Apache 1.3, and PHP 4
Copyright INT Media Group, Incorporated All Rights Reserved.
Legal Notices,  Licensing, Reprints, & Permissions,  Privacy Policy.
http://www.internet.com/