Your Daily Source for Apache News and Information |
Breaking News | Preferences | Contribute | Triggers | Link Us | Search | About |
\n'); } if ( plugin ) { document.write(' '); } else if (!(navigator.appName && navigator.appName.indexOf("Netscape")>=0 && navigator.appVersion.indexOf("2.")>=0)){ document.write(''); } //-->
|
By Stas Bekman Performance Tuning by Tweaking Apache ConfigurationCorrect configuration of the All the above parameters should be specified on the basis of the resources you have. With a plain apache server, it's no big deal if you run many servers since the processes are about 1Mb and don't eat a lot of your RAM. Generally the numbers are even smaller with memory sharing. The situation is different with mod_perl. I have seen mod_perl processes of 20Mb and more. Now if you have Before you start this task you should be armed with the proper weapon. You need the crashme utility, which will load your server with the mod_perl scripts you possess. You need it to have the ability to emulate a multiuser environment and to emulate the behavior of multiple clients calling the mod_perl scripts on your server simultaneously. While there are commercial solutions, you can get away with free ones which do the same job. You can use the ApacheBench utility which comes with the Apache distribution, the crashme script which uses It is important to make sure that you run the load generator (the client which generates the test requests) on a system that is more powerful than the system being tested. After all we are trying to simulate Internet users, where many users are trying to reach your service at once. Since the number of concurrent users can be quite large, your testing machine must be very powerful and capable of generating a heavy load. Of course you should not run the clients and the server on the same machine. If you do, your test results would be invalid. Clients will eat CPU and memory that should be dedicated to the server, and vice versa. Configuration Tuning with ApacheBenchI'm going to use % ./ab -n 100 -c 10 http://www.example.com/perl/access/access.cgi The results are: Document Path: /perl/access/access.cgi Document Length: 16 bytes Concurrency Level: 10 Time taken for tests: 1.683 seconds Complete requests: 100 Failed requests: 0 Total transferred: 16100 bytes HTML transferred: 1600 bytes Requests per second: 59.42 Transfer rate: 9.57 kb/s received Connnection Times (ms) min avg max Connect: 0 29 101 Processing: 77 124 1259 Total: 77 153 1360 The only numbers we really care about are: Complete requests: 100 Failed requests: 0 Requests per second: 59.42 Let's raise the request load to 100 x 10 (10 users, each makes 100 requests): % ./ab -n 1000 -c 10 http://www.example.com/perl/access/access.cgi Concurrency Level: 10 Complete requests: 1000 Failed requests: 0 Requests per second: 139.76 As expected, nothing changes -- we have the same 10 concurrent users. Now let's raise the number of concurrent users to 50: % ./ab -n 1000 -c 50 http://www.example.com/perl/access/access.cgi Complete requests: 1000 Failed requests: 0 Requests per second: 133.01 We see that the server is capable of serving 50 concurrent users at 133 requests per second! Let's find the upper limit. Using The above tests were performed with the following configuration: MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 50 MaxRequestsPerChild 1500 Now let's kill each child after it serves a single request. We will use the following configuration: MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 100 MaxRequestsPerChild 1 Simulate 50 users each generating a total of 20 requests: % ./ab -n 1000 -c 50 http://www.example.com/perl/access/access.cgi The benchmark timed out with the above configuration.... I watched the output of Now let's reset MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 10 MaxRequestsPerChild 1500 I got 27.12 requests per second, which is better but still 4-5 times slower. (I got 133 with Summary: I have tested a few combinations of the server configuration variables (
The important parameters are Also it is important to understand that we didn't test the response times in the tests above, but the ability of the server to respond under a heavy load of requests. If the test script was heavier, the numbers would be different but the conclusions very similar. The benchmarks were run with: HW: RS6000, 1Gb RAM SW: AIX 4.1.5 . mod_perl 1.16, apache 1.3.3 Machine running only mysql, httpd docs and mod_perl servers. Machine was _completely_ unloaded during the benchmarking. After each server restart when I changed the server's configuration, I made sure that the scripts were preloaded by fetching a script at least once for every child. It is important to notice that none of the requests timed out, even if it was kept in the server's queue for more than a minute! That is the way ab works, which is OK for testing purposes but will be unacceptable in the real world - users will not wait for more than five to ten seconds for a request to complete, and the client (i.e. the browser) will time out in a few minutes. Now let's take a look at some real code whose execution time is more than a few milliseconds. We will do some real testing and collect the data into tables for easier viewing. I will use the following abbreviations: NR = Total Number of Request NC = Concurrency MC = MaxClients MRPC = MaxRequestsPerChild RPS = Requests per second Running a mod_perl script with lots of mysql queries (the script under test is mysqld limited) (http://www.example.com/perl/access/access.cgi?do_sub=query_form), with the configuration: MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 MaxRequestsPerChild 5000 gives us: NR NC RPS comment ------------------------------------------------ 10 10 3.33 # not a reliable figure 100 10 3.94 1000 10 4.62 1000 50 4.09 Conclusions: Here I wanted to show that when the application is slow (not due to perl loading, code compilation and execution, but limited by some external operation) it almost does not matter what load we place on the server. The RPS (Requests per second) is almost the same. Given that all the requests have been served, you have the ability to queue the clients, but be aware that anything that goes into the queue means a waiting client and a client (browser) that might time out! Now we will benchmark the same script without using the mysql (code limited by perl only): (http://www.example.com/perl/access/access.cgi), it's the same script but it just returns the HTML form, without making SQL queries. MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 MaxRequestsPerChild 5000 NR NC RPS comment ------------------------------------------------ 10 10 26.95 # not a reliable figure 100 10 30.88 1000 10 29.31 1000 50 28.01 1000 100 29.74 10000 200 24.92 100000 400 24.95 Conclusions: This time the script we executed was pure perl (not limited by I/O or mysql), so we see that the server serves the requests much faster. You can see the number of requests per second is almost the same for any load, but goes lower when the number of concurrent clients goes beyond Now we will use the server to its full capacity, by keeping all MinSpareServers 50 MaxSpareServers 50 StartServers 50 MaxClients 50 MaxRequestsPerChild 5000 NR NC RPS comment ------------------------------------------------ 100 10 32.05 1000 10 33.14 1000 50 33.17 1000 100 31.72 10000 200 31.60 Conclusion: In this scenario there is no overhead involving the parent server loading new children, all the servers are available, and the only bottleneck is contention for the CPU. Now we will change MinSpareServers 8 MaxSpareServers 10 StartServers 10 MaxClients 10 MaxRequestsPerChild 5000 NR NC RPS comment ------------------------------------------------ 10 10 23.87 # not a reliable figure 100 10 32.64 1000 10 32.82 1000 50 30.43 1000 100 25.68 1000 500 26.95 2000 500 32.53 Conclusions: Very little difference! Ten servers were able to serve almost with the same throughput as 50 servers. Why? My guess is because of CPU throttling. It seems that 10 servers were serving requests 5 times faster than when we worked with 50 servers. In that case, each child received its CPU time slice five times less frequently. So having a big value for Now we will start drastically to reduce MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 NR NC MRPC RPS comment ------------------------------------------------ 100 10 10 5.77 100 10 5 3.32 1000 50 20 8.92 1000 50 10 5.47 1000 50 5 2.83 1000 100 10 6.51 Conclusions: When we drastically reduce Here are the numbers of this run with mod_cgi, for comparison: MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 NR NC RPS comment ------------------------------------------------ 100 10 1.12 1000 50 1.14 1000 100 1.13 Conclusion: mod_cgi is much slower. :) In the first test, when NR/NC was 100/10, mod_cgi was capable of 1.12 requests per second. In the same circumstances, mod_perl was capable of 32 requests per second, nearly 30 times faster! In the first test each client waited about 100 seconds to be served. In the second and third tests they waited 1000 seconds! Choosing MaxClientsThe Total RAM Dedicated to the Webserver MaxClients = ------------------------------------ MAX child's process size So if I have 400Mb left for the webserver to run with, I can set You will be wondering what will happen to your server if there are more concurrent users than [Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting, consider raising the MaxClients setting There is no problem -- any connection attempts over the It is an error because clients are being put in the queue rather than getting served immediately, despite the fact that they do not get an error response. The error can be allowed to persist to balance available system resources and response time, but sooner or later you will need to get more RAM so you can start more child processes. The best approach is to try not to have this condition reached at all, and if you reach it often you should start to worry about it. It's important to understand how much real memory a child occupies. Your children can share memory between them when the OS supports that. You must take action to allow the sharing to happen. We have disscussed this in one of the previous article whose main topic was shared memory. If you do this, the chances are that your Total_RAM + Shared_RAM_per_Child * (MaxClients - 1) MaxClients = --------------------------------------------------- Max_Process_Size which is: Total_RAM - Shared_RAM_per_Child MaxClients = --------------------------------------- Max_Process_Size - Shared_RAM_per_Child Let's roll some calculations: Total_RAM = 500Mb Max_Process_Size = 10Mb Shared_RAM_per_Child = 4Mb 500 - 4 MaxClients = --------- = 82 10 - 4 With no sharing in place 500 MaxClients = --------- = 50 10 With sharing in place you can have 64% more servers without buying more RAM. If you improve sharing and keep the sharing level, let's say: Total_RAM = 500Mb Max_Process_Size = 10Mb Shared_RAM_per_Child = 8Mb 500 - 8 MaxClients = --------- = 246 10 - 8 392% more servers! Now you can feel the importance of having as much shared memory as possible. Choosing MaxRequestsPerChildThe Setting If left unbounded, then after a certain number of requests the children will use up all the available memory and leave the server to die from memory starvation. Note that sometimes standard system libraries leak memory too, especially on OSes with bad memory management (e.g. Solaris 2.5 on x86 arch). If this is your case you can set But beware -- if you set this number too low, you will lose some of the speed bonus you get from mod_perl. Consider using Another approach is to use the References
Related Stories: |
|
|
About Triggers | Media Kit | Security | Triggers | Login |
All times are recorded in UTC. Linux is a trademark of Linus Torvalds. Powered by Linux 2.4, Apache 1.3, and PHP 4 Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy. |