Your Daily Source for Apache News and Information |
Breaking News | Preferences | Contribute | Triggers | Link Us | Search | About |
|
In this article, we will continue the topic started in the previous article. This time we talk about tools that help us with code profiling and memory usage measuring. Code Profiling TechniquesThe profiling process helps you to determine which subroutines or just snippets of code take the longest time to execute and which subroutines are called most often. Probably you will want to optimize those. When do you need to profile your code? You do that when you suspect that some part of your code is called very often and may be there is a need to optimize it to significantly improve the overall performance. For example if you have ever used the We will run a benchmark, once with diagnostics enabled and once disabled, on a subroutine called test_code. The code inside the subroutine does an arithmetic and a numeric comparison of two strings. It assigns one string to another if the condition tests true but the condition always tests false. To demonstrate the use Benchmark; use diagnostics; use strict; my $count = 50000; disable diagnostics; my $t1 = timeit($count,\&test_code); enable diagnostics; my $t2 = timeit($count,\&test_code); print "Off: ",timestr($t1),"\n"; print "On : ",timestr($t2),"\n"; sub test_code{ my ($a,$b) = qw(foo bar); my $c; if ($a == $b) { $c = $a; } } For only a few lines of code we get: Off: 1 wallclock secs ( 0.81 usr + 0.00 sys = 0.81 CPU) On : 13 wallclock secs (12.54 usr + 0.01 sys = 12.55 CPU) With Now let's fix the comparison the way it should be, by replacing my ($a,$b) = qw(foo bar); my $c; if ($a eq $b) { $c = $a; } and run the same benchmark again: Off: 1 wallclock secs ( 0.57 usr + 0.00 sys = 0.57 CPU) On : 1 wallclock secs ( 0.56 usr + 0.00 sys = 0.56 CPU) Now there is no overhead at all. The After we have verified that using the diagnostics.pl -------------- use diagnostics; print "Content-type:text/html\n\n"; test_code(); sub test_code{ my ($a,$b) = qw(foo bar); my $c; if ($a == $b) { $c = $a; } } Run it with the profiler enabled, and then create the profiling stastics with the help of dprofpp: % perl -d:DProf diagnostics.pl % dprofpp Total Elapsed Time = 0.342236 Seconds User+System Time = 0.335420 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 92.1 0.309 0.358 1 0.3089 0.3578 main::BEGIN 14.9 0.050 0.039 3161 0.0000 0.0000 diagnostics::unescape 2.98 0.010 0.010 2 0.0050 0.0050 diagnostics::BEGIN 0.00 0.000 -0.000 2 0.0000 - Exporter::import 0.00 0.000 -0.000 2 0.0000 - Exporter::export 0.00 0.000 -0.000 1 0.0000 - Config::BEGIN 0.00 0.000 -0.000 1 0.0000 - Config::TIEHASH 0.00 0.000 -0.000 2 0.0000 - Config::FETCH 0.00 0.000 -0.000 1 0.0000 - diagnostics::import 0.00 0.000 -0.000 1 0.0000 - main::test_code 0.00 0.000 -0.000 2 0.0000 - diagnostics::warn_trap 0.00 0.000 -0.000 2 0.0000 - diagnostics::splainthis 0.00 0.000 -0.000 2 0.0000 - diagnostics::transmo 0.00 0.000 -0.000 2 0.0000 - diagnostics::shorten 0.00 0.000 -0.000 2 0.0000 - diagnostics::autodescribe It's not easy to see what is responsible for this enormous overhead, even if % dprofpp -T and the output is: main::BEGIN diagnostics::BEGIN Exporter::import Exporter::export diagnostics::BEGIN Config::BEGIN Config::TIEHASH Exporter::import Exporter::export Config::FETCH Config::FETCH diagnostics::unescape ..................... 3159 times [diagnostics::unescape] snipped ..................... diagnostics::unescape diagnostics::import diagnostics::warn_trap diagnostics::splainthis diagnostics::transmo diagnostics::shorten diagnostics::autodescribe main::test_code diagnostics::warn_trap diagnostics::splainthis diagnostics::transmo diagnostics::shorten diagnostics::autodescribe diagnostics::warn_trap diagnostics::splainthis diagnostics::transmo diagnostics::shorten diagnostics::autodescribe So we see that two executions of If we comment out the Total Elapsed Time = 0.079974 Seconds User+System Time = 0.059974 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 0.00 0.000 -0.000 1 0.0000 - main::test_code It is possible to profile code running under mod_perl with the % setenv PERL5OPT -d:DProf % httpd -X -d `pwd` & ... make some requests to the server here ... % kill `cat logs/httpd.pid` % unsetenv PERL5OPT % dprofpp The Another approach is to use PerlModule Apache::DProf Remember that any PerlHandler that was pulled in before % dprofpp (Lookup the Measuring the Memory of the ProcessA very important aspect of performance tuning is to make sure that your applications don't use much memory; since if they do, you cannot run many servers and therefore in most cases under a heavy load the overall performance degrades. In addition the code may not be clean and leak memory, which is even worse, since if the same process serves many requests and after each request more memory is used; after a while all RAM will be used and the machine will start swapping (use the swap partition), which is a very undesirable event since it may lead to a machine crash. The simplest way to figure out how big the processes are and see whether they grow is to watch the output of For example the output of top(1): 8:51am up 66 days, 1:44, 1 user, load average: 1.09, 2.27, 2.61 95 processes: 92 sleeping, 3 running, 0 zombie, 0 stopped CPU states: 54.0% user, 9.4% system, 1.7% nice, 34.7% idle Mem: 387664K av, 309692K used, 77972K free, 111092K shrd, 70944K buff Swap: 128484K av, 11176K used, 117308K free 170824K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 29225 nobody 0 0 9760 9760 7132 S 0 12.5 2.5 0:00 httpd_perl 29220 nobody 0 0 9540 9540 7136 S 0 9.0 2.4 0:00 httpd_perl 29215 nobody 1 0 9672 9672 6884 S 0 4.6 2.4 0:01 httpd_perl 29255 root 7 0 1036 1036 824 R 0 3.2 0.2 0:01 top 376 squid 0 0 15920 14M 556 S 0 1.1 3.8 209:12 squid 29227 mysql 5 5 1892 1892 956 S N 0 1.1 0.4 0:00 mysqld 29223 mysql 5 5 1892 1892 956 S N 0 0.9 0.4 0:00 mysqld 29234 mysql 5 5 1892 1892 956 S N 0 0.9 0.4 0:00 mysqld Which starts with overall information of the system and then displays the most active processes at the given moment. So for example if we look at the But of course we want to see all the apache/mod_perl processes, and that's where % ps -o pid,user,rss,vsize,%cpu,%mem,ucomm -C httpd_perl PID USER RSS VSZ %CPU %MEM COMMAND 29213 root 8584 10264 0.0 2.2 httpd_perl 29215 nobody 9740 11316 1.0 2.5 httpd_perl 29216 nobody 9668 11252 0.7 2.4 httpd_perl 29217 nobody 9824 11408 0.6 2.5 httpd_perl 29218 nobody 9712 11292 0.6 2.5 httpd_perl 29219 nobody 8860 10528 0.0 2.2 httpd_perl 29220 nobody 9616 11200 0.5 2.4 httpd_perl 29221 nobody 8860 10528 0.0 2.2 httpd_perl 29222 nobody 8860 10528 0.0 2.2 httpd_perl 29224 nobody 8860 10528 0.0 2.2 httpd_perl 29225 nobody 9760 11340 0.7 2.5 httpd_perl 29235 nobody 9524 11104 0.4 2.4 httpd_perl Now you can see the resident ( You probably agree that using Note:
use GTop; my $gtop = GTop->new; my $proc_mem = $gtop->proc_mem($$); for (qw(size vsize share rss)) { printf " %s => %d\n", $_, $proc_mem->$_(); } When executed we see the following output (in bytes): size => 1900544 vsize => 3108864 share => 1392640 rss => 1900544 So if we are interested in to print the process resident memory segment before and after some event we just do it: For example if we want to see how much extra memory was allocated after a variable creation we can write the following code: use GTop; my $gtop = GTop->new; my $before = $gtop->proc_mem($$)->rss; my $x = 'a' x 10000; my $after = $gtop->proc_mem($$)->rss; print "diff: ",$after-$before, " bytes\n"; and the output diff: 20480 bytes So we can see that Perl has allocated extra 20480 bytes to create The If you are running a true BSD system, you may use print "used memory = ".(BSD::Resource::getrusage)[2]."\n" For more information refer to the Measuring the Memory Usage of SubroutinesWith help of
Now you can start to optimize your code. Or test which of the several implementations is of the least size. For example, let's compare As you will see below the first OO script uses about 2k bytes while the second script (procedural interface) uses about 5k. Here are the code examples and the numbers:
After executing each script in single server mode (-X) the results are:
Note, that the above is correct if you didn't precompile all use CGI qw(-compile :all); in the server startup script. Having: use CGI qw(header); or use CGI qw(:all); is essentially the same. You will have all the symbols precompiled at startup imported even if you ask for only one symbol. It seems to me like a bug, but probably that's how BTW, you can check the number of opcodes in the code by a simple command line run. For example, comparing 'my %hash' vs. 'my %hash = ()'. % perl -MO=Terse -e 'my %hash' | wc -l -e syntax OK 4 % perl -MO=Terse -e 'my %hash = ()' | wc -l -e syntax OK 10 The first one has less opcodes. Note that you shouldn't use References
Related Stories: |
|
No talkbacks posted. |
About Triggers | Media Kit | Security | Triggers | Login |
All times are recorded in UTC. Linux is a trademark of Linus Torvalds. Powered by Linux 2.4, Apache 1.3, and PHP 4 Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy. |