Your Daily Source for Apache News and Information  
Breaking News Preferences Contribute Triggers Link Us Search About
The Perl You Need to Know: Personalization Methods Part 2
(Oct 27th, 12:00:00 )

Originally appearing in the Web Developer's Virtual Library.

By

They say there are many ways to skin a cat, and although I've never tried any of them (despite provocation, until she bats her cute little eyes), the same can be said for architecting a personalization back-end. The "best" architecture for one system may be different than the ideal for another, where a system is the whole combination of machine, network connectivity, operating system, and web server software. To mix metaphors, we might think of the mixture of hardware and software as a recipe of sorts. And like a typical recipe, you might modify some of the ingredients we've used and achieve similar (or better) results. That said, the specific technologies used in this personalization series involve a recipe including:

You can certainly prepare similar dishes with the fancy china and truffles and caviar (in other words, expensive), but one of the nice things about this hearty recipe is that the only hard cost in dollars is the machine itself. Everything other than the server hardware is free software, which makes us warm, fuzzy and frugal, not to mention functional and fast. Unlike a typical recipe, though, this is not the part where we tell how to mix, stir, and blend it all together. Nope, we're concerned with programming here, so we'll have to assume that you've already rolled, pinched, and "BAM!"'d your way into a working system (the links on each ingredient above lead to sites with copious support information).

We're using both a database and cookies to implement the personalization system. Why both? This personalization needs both a short-term and long-term "memory". For example, when people are younger, they tend only to remember what happened 5 minutes ago, but not 5 years ago. Yet as people grow older, they tend to remember what happened 5 years ago, but not 5 minutes ago. Our system cannot have either flaw -- it needs to remember both 5 minutes (heck, 5 seconds) ago, as well as 5 days, months, or years ago.

The database will be our long-term memory. It remembers the visitor's account information permanently, no matter how much time has elapsed between accesses to the account. On the other hand, we run into the problem of HTTP's inherent statelesness. Granted, this is not a problem that makes the nightly news ("tonight with Peter Jennings, the inherent statelesness of HTTP and how it can harm your children"). The problem, in a nutshell, is that the web has no short-term memory whatsoever. As a result, a web application such as our personalization system can't rely on the web server to remember information specific to a user as they move from one web page to another within the site.

We could query the database -- our long-term memory -- every time the user navigates to another page within our site, but this would be inefficient and put a lot of strain on the database. Instead, we need a way to "preserve state" (sort of like marmalade), as they say, during the browsing session, without relying on the database. There are numerous ways to approach this matter, and one popular solution for Apache+mod_perl-based servers is the module Apache::Session. Using this module you can effectively create a global hash, tied to a long-term storage method (filesystem, database), where you can store and access user information from any Perl script running in the mod_perl environment. Apache::Session is well supported and worth investigating, but wasn't appropriate for the site which inspired our particular recipe.

Instead, we've decided to use client-side cookies for short-term memory. A client-side cookie is a small chunk of data that is stored on the visitor's machine. We can store virtually any arbitrary data in a cookie, and the cookie will be associated only with our particular web site. As the visitor moves through our site, the cookie is delivered back to our server, which allows us to read, and thus "remember", the chunk of data during the browsing session.

Cookies can expire when the user closes the browser, or they can persist on disk for longer periods. Our aim, in using cookies for short-term memory, is to use them only during the browser session. We'll be using "session cookies", then, which are simply cookies that are erased once the browser is shut down (or crashes, whichever comes first).

Long-term memory, courtesy of our personalization database, is really the backbone of this architecture. We need to design a relational database that is well suited to the type of information we want to store in each user's "account". We speak of the word "account" to represent a vague box of stuff that is assigned to a particular user. The "stuff" may reside in one, two, or more database tables, depending on the nature and needs of the system.

We'll divide the pieces of a user's account into semantically distinct tables. That is, segregating data required for login from data about the user, data representing user preferences, and so on. Let's imagine a small scale personalization system containing two database tables. Remember that we're using MySQL syntax in these examples, and column definition types may vary for other database systems.

user_info

CREATE TABLE user_info (
userid mediumint(8) unsigned NOT NULL auto_increment,
login blob DEFAULT '' NOT NULL,
pwd blob DEFAULT '' NOT NULL,
name blob DEFAULT '' NOT NULL,
created timestamp(14),
PRIMARY KEY (userid)
);

The user_info table will contain the vital account information for the user. Each user is assigned an integer-based unique ID, configured to simply auto increment each time a user record is inserted into the table. The maximum unsigned integer for this type is 16777215, so this table is limited to supporting some 16.7 million potential users.

The login name (login), password (pwd), and display name (name) fields are all binary objects, or blobs. Our system will recommend that users select their e- mail address as their login name, while their on-screen name for display will be stored in the name column. All three columns are blobs because we'll be encrypting this information in such a way that yields binary data. More on this in a moment.

Finally, a created column will contain a timestamp when the user record is created, in case we ever want to know when a user joined the system. The userid is indexed as a primary key, and will be a unique identifier with which we relate to records in other tables within the personalization database.

Encryption of sensitive table data is especially important, in case a hacker gains access to the database. With account information encrypted, a stolen database table may not do much good to the spy. That said, encryption is a sprawling topic, and there are many levels of encryption of varying degrees of sophistication. Like an automobile, almost any type of security can be compromised by a determined vandal. At the least, a reasonable level of encryption will deter the "joy rider" who will move on to easier targets.

MySQL in particular offers several encryption functions worth looking into. In this case, we're using MySQL's encode() and decode() functions, which will encrypt plaintext data based on a known password. For example, consider the user's password. One possible algorithm might be to encrypt their password using encode() and supplying an encryption password based on a calculation of the user's password plus a known value:

encrypt('mypassword','drowssapym#32-{sAP7!=}')

Above, we know that we can decrypt the user's password using a password based partially on the user's own input (the attempted password in reverse) combined with a known value (the string "#32-{sAP7!=}"). If using this approach, it would be a good idea to devise a different sequence for each column that you want to encrypt: in this case, the login, pwd, and name columns.

So, to wrap up user_info, let's consider a scenario. A visitor attempts to login to our site, submitting the login "[email protected]" and the password "farout". We've written Perl code which attempts to pull their user record from the database, which might look like:

#!/usr/bin/perl
#Attempt user login via user_info table in database

use CGI; my $cgiobj=new CGI; my $login=$cgiobj->param(login); my $password=$cgiobj->param(password); my $passwordR=reverse(split //,$password); my ($userid,$userlogin,$username)=&user_login($login,$password); if ($userid) { ...login successful... } else {
...login failed... } sub user_login { #assume this subroutine connects to the database
#and returns a database handle my $dbh=&connect_to_DB(); my $sqlquery= qq /select userid,login,name from user_info where decode(login,'clever_login_decryption_password')='$login' and decode(pwd,'${passwordR}#32-{sAP7!=}')='$password'/; my $sth=$dbh->prepare($sqlquery); $sth->execute
|| die "Failed to access DB in search of account ".$dbh->errstr; return $sth->fetchrow_array; }

Accepting as CGI input the parameters login and password, the above code constructs an SQL query that attempts to pull this user's record from user_info. The key is the query, which compares the results of a decode() function on the column values with the values submitted by the user. If the query succeeds, an array of column values is returned, otherwise nothing is returned. We test this by evaluating the presence of a value in $userid, and from there we know if the login was successful.

If the login was successful, we have also acquired some important account information for this user into $userid, $userlogin, and $username.

user_prefs

Account information in hand, we can proceed. Proceed with what? The goal in this example is to harvest the data that we want to store in our short-term memory, the cookie. We may not need all information from the database in our cookie, which is why, for instance, we only requested three fields in our earlier query. But we're not yet done collecting data from the database. Now that we've verified the user account, we want to grab some of the data from our second table user_prefs. Here we store some of the preferences that affect how the site appears or behaves for this user. The possibilities are nearly endless, but let's imagine a user_prefs table with some realistic preferences:

CREATE TABLE user_prefs (
userid mediumint(8) unsigned DEFAULT '0' NOT NULL,
matchtype enum('simple','advanced') DEFAULT 'simple' NOT NULL,
results_per_page tinyint(3) unsigned DEFAULT '10' NOT NULL,
match_color varchar(6) DEFAULT 'FFFFCC' NOT NULL,
PRIMARY KEY (userid)
);

Perhaps our web site is or contains a search of some sort. The preferences above would seem to fit into such a service. The first column, userid, keys these records to the account records from user_info. The column matchtype is an enumerated set -- meaning it can contain one of several possible string values -- in this case our two fictional types of search. The preferred number of results to display on one page is contained in results_per_page, naturally, and we also specify a hexadecimal color code for use in highlighting result matches. Again, these are hypothetical preferences, and you can easily imagine an extensive set (see Raging Search's customization system for an example).

Now that we've looked at the database behind the personalization, our long-term memory as it were, it's time to see how we interact with client-side cookies to implement a short-term memory.

Baking with Julia

Since we successfully "logged in" the user, by virtue of pulling their user_info record, we can proceed with creating session cookies. We know that we'll succeed in pulling their preferences record, because we've established that all user accounts have a user_prefs record, created by a script invoked when building a new user's account (something we'll see in a future installment).

A cookie, as we explained last month, is a chunk of arbitrary data. This data is tagged such that it can only be "seen" (delivered to, really) by the host or domain that issued the cookie. Cookies can be set to expire at a given date and time, or in the absence of such, will expire when the browser is closed. Cookies are not unlimited ("there's no free lunch", "it's not a buffet", take your pick), and browsers may limit the size and number of cookies that can be issued from a server. To be safe, you probably don't want to issue more than a handful of cookies, and keep them under 4K in length. Cookies aren't meant for storing large quantities of data - - if you have such a need, store the data in the backend database, and a record number or other identifier in the cookie.

We're going to issue two cookies: one called 'site-auth' which will act as an "authentication token". By this, we mean that this cookie and the data it contains is the key with which this user can access account-specific pages or services on our site. The presence of this cookie is what we mean when we say that a user is "logged in" to our web site. The absence, expiration, or deletion of this cookie will immediately render the user "logged out". It's important that the user not be able to modify the authentication token, such that they can masquerade as a different registered user. Just how important this is depends on the nature of your site, and level of encryption you wish to invoke.

In our case, we'll use a checksum to preserve the integrity of the 'site- auth' token. Continuing from the script we saw earlier, let's add a subroutine to bake the authentication token cookie.

use Digest::HMAC_MD5;
sub bake_auth_cookie {
 my ($userid,$username)=@_;
 my $cgiobj=new CGI;
 #userid, username, and an MD5 checksum
 my $hmac = Digest::HMAC_MD5->new("digest#1pass");
 $hmac->add($userid.$username);
 my $cookie=
  $cgiobj->cookie(
            -name=>'site-auth',
            -value=>$userid."\t".$username."\t".$hmac->b64digest,
            #-expires=>'+6M',# not used for session-only cookie
            -path=>'/'
           );

 return $cookie;
}

When called, we'll pass the $userid and $username parameters to &bake_auth_cookie. This subroutine uses a special Perl module called Digest::HMAC_MD5 -- you may need to install this module if Perl complains that it's not already installed on your system. We're using the MD5 module to calculate a keyed checksum.

Specifically, we invoke Digest::HMAC_MD5 to create a digest object keyed to a specific password -- in this fictional example, "digest#1pass", but you should select some other secure password. Once this digest object is created, we add to it a string -- in this case, the concatenation of the $userid and $username variables. The result is a checksum unique to this user's id and name, further keyed to the password we provided. Later, when we read back the cookie, we can determine whether the checksum matches the id and name values contained in that cookie. If not, someone may have tampered with one of those values in an attempt to masquerade as a different user. We can then deny further access to the page that they attempted to view.

We can now create a cookie object using the data collected: the user's id, name, and checksum. These values are strung together delimited by tab characters, in the parameter -value of the CGI module cookie() call. Note that we output the checksum as text characters by virtue of the b64digest() method. The -expires parameter has been commented out, resulting in a session-only cookie.

If you wanted the user's authentication token to hang around after the browser is closed, so that they can re-enter the web site next time without manually logging in, simply set an expiration value. We've included an example of "6 months hence" in the commented out code above. This is a technique typically used on sites which offer you the option "remember my login next time I visit", for example.

While we're talking about this cookie, writing another subroutine to expire this cookie -- that is, log the user out, is simple. Simply create a cookie object containing only the same name (site-auth) and an -expires value of 0. When the cookie is output (which we haven't done yet), it will expire immediately, thereby removing the authentication token from the user's machine.

Fresh Out of the Oven

In the interest of saving time and space (the earthly, rather than cosmic, kinds), we can imagine that we've written a similar routine to create the site-prefs cookie. The difference would be that, upon receiving the user's $userid, the subroutine would first query the user_prefs table in the database to harvest the desired preferences.

These preferences would then be stored in the cookie. You could string the preferences together using a delimiter that you'll separate out for later; or, you can store a hash in a cookie. For instance, let's fast forward and imagine that you culled the preferences from user_prefs into a hash named %user_prefs. The hash keys are the field names (matchtype, results_per_page, etc.) and the hash values are the values from the database associated with those fields. You can simply store the entire hash in the cookie:

$cgiobj->cookie(
          -name=>'site-prefs',
          -value=>\%user_prefs,
          -path=>'/'
         );

Later when you retrieve the above cookie, you'll get the hash back, intact, ready-to-eat.

Recall when we attempted to pull the user's account information from user_info. Once the query was performed, an if statement forked the code depending on whether the account record was found or not. Let's go back to that crucial condition.

...
if ($userid) { #login successful my $cookie1=&bake_auth_cookie($userid,$username);
my $cookie2=&bake_prefs_cookie($userid,$username); print $cgiobj->header( -cookie=>[$cookie1,$cookie2], -location= "http://".$cgiobj->server_name. "/login-name.html"; ); } else { #login failed
print $cgiobj->redirect( "http://".$cgiobj->server_name."/login-fail.html" } ...

When the user_info record is found, we bake the two cookies described earlier: the authentication token cookie, saved in $cookie1, and the preferences cookie -- whose subroutine we just imagined -- in $cookie2.

With the two cookies hot and gooey, they're ready to be served. In this case, after we serve the cookies we also want to redirect the user to a home page, login page, or wherever you want them to go once logged in. Because both cookies and page redirection are HTTP header data, we must send both the cookies and the redirection information in a single header() call, as seen in the code above. Once executed, the cookies will be saved by the browser, and the visitor will then be sent to the page specified in the - location parameter. Hopefully that page will be coded to read and act on the cookies, but that's a story for next month!

Finally, a failed login will reach the else clause of this if statement. In this case, we simply redirect the user to a page describing the failed login, probably with another login form so that they can re-attempt, and/or a link to your page where new accounts can be created -- also a topic for another month.


Printed from Apache Today (https://apachetoday.com).
https://apachetoday.com/news_story.php3?ltsn=2000-10-27-001-01-OS-SW-DT

About Triggers Media Kit Security Triggers Login


All times are recorded in UTC.
Linux is a trademark of Linus Torvalds.
Powered by Linux 2.4, Apache 1.3, and PHP 4
Copyright INT Media Group, Incorporated All Rights Reserved.
Legal Notices,  Licensing, Reprints, & Permissions,  Privacy Policy.
http://www.internet.com/