Your Daily Source for Apache News and Information |
Breaking News | Preferences | Contribute | Triggers | Link Us | Search | About |
\n'); } if ( plugin ) { document.write(' '); } else if (!(navigator.appName && navigator.appName.indexOf("Netscape")>=0 && navigator.appVersion.indexOf("2.")>=0)){ document.write(''); } //-->
|
IntroductionIf your OS supports sharing of memory (and most sane systems do), you might save a lot of RAM by sharing it between child processes. This will allow you to run more processes and hopefully better satisfy the client, without investing extra money into buying more memory. This is only possible when you preload code at server startup. However, during a child process' life its memory pages tend to become unshared. There is no way we can make Perl allocate memory so that (dynamic) variables land on different memory pages from constants, so the copy-on-write effect will hit you almost at random. If you are pre-loading many modules you might be able to trade off the memory that stays shared against the time for an occasional fork by tuning The ideal is a point where your processes usually restart before too much memory becomes unshared. You should take some measurements to see if it makes a real difference, and to find the range of reasonable values. If you have success with this tuning the value of It is very important to understand that your goal is not to have Do not forget that if you preload most of your code at server startup, the newly forked child gets ready very very fast, because it inherits most of the preloaded code and the perl interpreter from the parent process. During the life of the child its memory pages (which aren't really its own to start with, it uses the parent's pages) gradually get `dirty' - variables which were originally inherited and shared are updated or modified -- and the copy-on-write happens. This reduces the number of shared memory pages, thus increasing the memory requirement. Killing the child and spawning a new one allows the new child to get back to the pristine shared memory of the parent process. The recommendation is that How Shared Is My Memory?You've probably noticed that the word shared is repeated many times in relation to mod_perl. Indeed, shared memory might save you a lot of money, since with sharing in place you can run many more servers than without it. How much shared memory do you have? You can see it by either using the memory utility that comes with your system or you can deploy the use GTop (); print "Shared memory of the current process: ", GTop->new->proc_mem($$)->share,"\n"; print "Total shared memory: ", GTop->new->mem->share,"\n"; When you watch the output of the Calculating Real Memory UsageI have shown how to measure the size of the process' shared memory, but we still want to know what the real memory usage is. Obviously this cannot be calculated simply by adding up the memory size of each process because that wouldn't account for the shared memory. On the other hand we cannot just subtract the shared memory size from the total size to get the real memory usage numbers, because in reality each process has a different history of processed requests, therefore the shared memory is not the same for all processes. So how do we measure the real memory size used by the server we run? It's probably too difficult to give the exact number, but I've found a way to get a fair approximation which was verified in the following way. I have calculated the real memory used, by the technique you will see in the moment, and then have stopped the Apache server and saw that the memory usage report indicated that the total used memory went down by almost the same number I've calculated. Note that some OSs do smart memory pages caching so you may not see the memory usage decrease as soon as it actually happens when you quit the application. This is a technique I've used:
Please note that this might be incorrect for your system, so you use this number on your own risk. I've used this technique to display real memory usage in the module use GTop (); my $gtop = GTop->new; my $total_real = 0; my $max_shared = 0; # @mod_perl_pids is initialized by Apache::Scoreboard, irrelevant here my @mod_perl_pids = some_code(); for my $pid (@mod_perl_pids) my $proc_mem = $gtop->proc_mem($pid); my $size = $proc_mem->size($pid); my $share = $proc_mem->share($pid); $total_real += $size - $share; $max_shared = $share if $max_shared < $share; } my $total_real += $max_shared; So as you see, we accumulate the difference between the shared and reported memory: $total_real += $size-$share; and at the end add the biggest shared process size: my $total_real += $max_shared; So now Are My Variables Shared?How do you find out if the code you write is shared between the processes or not? The code should be shared, except where it is on a memory page with variables that change. Some variables are read-only in usage and never change. For example, if you have some variables that use a lot of memory and you want them to be read-only. As you know the variable becomes unshared when the process modifies its value. So imagine that you have this 10Mb in-memory database that resides in a single variable, you perform various operations on it and want to make sure that the variable is still shared. For example if you do some matching regular expression (regex) processing on this variable and want to use the The MyShared.pm --------- package MyShared; use Apache::Peek; my $readonly = "Chris"; sub match { $readonly =~ /\w/g; } sub print_pos{ print "pos: ",pos($readonly),"\n";} sub dump { Dump($readonly); } 1; This module declares the package The module also defines three subroutines: Here is the script that prints the process ID (PID) and calls all three functions. The goal is to check whether share_test.pl ------------- use MyShared; print "Content-type: text/plain\r\n\r\n"; print "PID: $$\n"; MyShared::match(); MyShared::print_pos(); MyShared::dump(); Before you restart the server, in httpd.conf set: MaxClients 2 for easier tracking. You need at least two servers to compare the print outs of the test program. Having more than two can make the comparison process harder. Now open two browser windows and issue the request for this script several times in both windows, so you get different processes PIDs reported in the two windows and each process has processed a different number of requests to the share_test.pl script. In the first window you will see something like: PID: 27040 pos: 1 SV = PVMG(0x853db20) at 0x8250e8c REFCNT = 3 FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x8271af0 "Chris"\0 CUR = 5 LEN = 6 MAGIC = 0x853dd80 MG_VIRTUAL = &vtbl_mglob MG_TYPE = 'g' MG_LEN = 1 And in the second window: PID: 27041 pos: 2 SV = PVMG(0x853db20) at 0x8250e8c REFCNT = 3 FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x8271af0 "Chris"\0 CUR = 5 LEN = 6 MAGIC = 0x853dd80 MG_VIRTUAL = &vtbl_mglob MG_TYPE = 'g' MG_LEN = 2 We see that all the addresses of the supposedly big structure are the same ( So given that the Now if you need to compare more than variable, doing it by hand can be quite time consuming and error prune. Therefore it's better to correct the testing script to dump the Perl data-types into files (e.g /tmp/dump.$$, where So correcting the So this is the resulting code: MyShared2.pm --------- package MyShared2; use Devel::Peek; my $readonly = "Chris"; sub match { $readonly =~ /\w/g; } sub print_pos{ print "pos: ",pos($readonly),"\n";} sub dump{ my $dump_file = "/tmp/dump.$$"; print "Dumping the data into $dump_file\n"; open OLDERR, ">&STDERR"; open STDERR, ">".$dump_file or die "Can't open $dump_file: $!"; Dump($readonly); close STDERR ; open STDERR, ">&OLDERR"; } 1; When if I modify the code to use the modified module: share_test2.pl ------------- use MyShared2; print "Content-type: text/plain\r\n\r\n"; print "PID: $$\n"; MyShared2::match(); MyShared2::print_pos(); MyShared2::dump(); And run it as before (with MaxClients 2), two dump files will be created in the directory /tmp. In our test these were created as /tmp/dump.1224 and /tmp/dump.1225. When I run diff(1): % diff /tmp/dump.1224 /tmp/dump.1225 12c12 < MG_LEN = 1 --- > MG_LEN = 2 We see that the two padlists (of the variable In fact if we think about these results again, we get to a conclusion that there is no need for two processes to find out whether the variable gets modified (and therefore unshared). It's enough to check the datastructure before the script was executed and after that. You can modify the If you want to watch whether some lexically scoped (with Surely another way of ensuring that a scalar is readonly and therefore sharable is to either use the MyConstant.pm ------------- package MyConstant; use constant readonly => "Chris"; sub match { readonly =~ /\w/g; } sub print_pos{ print "pos: ",pos(readonly),"\n";} 1; % perl -c MyConstant.pm Can't modify constant item in match position at MyConstant.pm line 5, near "readonly)" MyConstant.pm had compilation errors. However this code is just right: MyConstant1.pm ------------- package MyConstant1; use constant readonly => "Chris"; sub match { readonly =~ /\w/g; } 1; Preloading Perl Modules at Server StartupYou can use the PerlModule CGI PerlModule DBI But an even better approach is to create a separate startup file (where you code in plain perl) and put there things like: use DBI (); use Carp (); Don't forget to prevent importing of the symbols exported by default by the module you are going to preload, by placing empty parentheses Then you PerlRequire /path/to/start-up.pl
use CGI (); CGI->compile(':all'); The arguments to Let's conduct a memory usage test to prove that preloading reduces memory requirements. In order to have an easy measurement I will use only one child process; therefore I will use this setting: MinSpareServers 1 MaxSpareServers 1 StartServers 1 MaxClients 1 MaxRequestsPerChild 100 I'm going to use the memuse.pl --------- use strict; use CGI (); use DB_File (); use LWP::UserAgent (); use Storable (); use DBI (); use GTop (); my $r = shift; $r->send_http_header('text/plain'); my $proc_mem = GTop->new->proc_mem($$); my $size = $proc_mem->size; my $share = $proc_mem->share; my $diff = $size - $share; printf "%10s %10s %10s\n", qw(Size Shared Difference); printf "%10d %10d %10d (bytes)\n",$size,$share,$diff; First I restart the server and execute this CGI script when none of the above modules preloaded. Here is the result: Size Shared Diff 4706304 2134016 2572288 (bytes) Now I take all the modules: use strict; use CGI (); use DB_File (); use LWP::UserAgent (); use Storable (); use DBI (); use GTop (); and copy them into the startup script, so they will get preloaded. The script remains unchanged. I restart the server and execute it again. I get the following. Size Shared Diff 4710400 3997696 712704 (bytes) Let's put the two results into one table: Preloading Size Shared Diff Yes 4710400 3997696 712704 (bytes) No 4706304 2134016 2572288 (bytes) -------------------------------------------- Difference 4096 1863680 -1859584 You can clearly see that when the modules weren't preloaded, the shared memory pages size were about 1864Kb smaller relative to the case where the modules were preloaded. Assuming that you have had 256M dedicated to the web server, if you didn't preload the modules, you could have: 268435456 = X * 2572288 + 2134016 X = (268435456 - 2134016) / 2572288 = 103 103 servers. Now let's calculate the same thing with modules preloaded: 268435456 = X * 712704 + 3997696 X = (268435456 - 3997696) / 712704 = 371 You can have almost 4 times more servers!!! Remember that I have mentioned before that memory pages gets dirty and the size of the shared memory gets smaller with time? So I have presented the ideal case where the shared memory stays intact. Therefore the real numbers will be a little bit different, but not far from the numbers in our example. Also it's obvious that in your case it's possible that the process size will be bigger and the shared memory will be smaller, since you will use different modules and a different code. So you won't get this fantastic ratio, but this example certainly helps to feel the difference. References
Related Stories: |
|
|
About Triggers | Media Kit | Security | Triggers | Login |
All times are recorded in UTC. Linux is a trademark of Linus Torvalds. Powered by Linux 2.4, Apache 1.3, and PHP 4 Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy. |