Your Daily Source for Apache News and Information |
Breaking News | Preferences | Contribute | Triggers | Link Us | Search | About |
"... as I put more and more pictures online, I started to notice some pretty creepy CPU loads. Worse than that, my ISP neighbors were also starting to complain. After investigation, I determined that I was getting hit by not-so-nice "spiders": Web programs that, given a few starting points, recursively (and rapidly) fetch the contents of many pages. I believe most of these to be people on fast data connections (like my current cable modem that brings the equivalent of 2 T-1's into my house for $40 per month, yes!) innocently asking their Web browser to download a whole area."
"So, rather than pull my pictures offline, I decided to implement a throttler. I didn't care as much about transfer bandwidth as I did CPU, so I chose to track recent CPU activity for each visitor. Of course, HTTP has no concept of a "session," so I took a very easy shortcut: tracking by IP address. Yes, I know I've ranted in discussion forums a lot about how an IP address is not a user. But, for the purpose of throttling it seemed the most expedient choice."
"Once I put my throttler in place, no IP address is allowed to suck more than seven percent of my CPU over a period of 15 seconds. Once the CPU threshold is reached, any additional request is met with a 503 error (service unavailable), which, according to RFC2616 (the HTTP/1.1 specification), also allows me to give a "retry after" value of 15 seconds to advise the program that this was a temporary condition."
Complete Story
Related Stories:
Improving mod_perl Driven Site's Performance -- Part II: Benchmarking Applications(Dec 16, 2000)
Editor's Note: Automating Apache with Apache Toolbox(Nov 22, 2000)
About Triggers | Media Kit | Security | Triggers | Login |
All times are recorded in UTC. Linux is a trademark of Linus Torvalds. Powered by Linux 2.4, Apache 1.3, and PHP 4 Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy. |