"100+ Auto-Installing Software Titles For Your Web Site"
======================================================
Module mod_rewrite Tutorial (Part 3):
Rewriting URLs
------------------------------------------------------
by
Dirk Brockhausen
------------------------------------------------------

In the two preceding parts of this tutorial we
explained the basics of Rules and Conditions.

We will now follow up with two examples to illustrate
their use for somewhat more complex applications.

The first example deals with dynamicall generated pages
while the second example will cover calling up ".txt"
files.

For our first example, let's assume that you want to
sell several items of merchandise on your web site.

Your clients are guided to various detailed product
descriptions via a script:

http://www.yoursite.com/cgi-bin/shop.cgi?product1
http://www.yoursite.com/cgi-bin/shop.cgi?product2
http://www.yoursite.com/cgi-bin/shop.cgi?product3

These URLs are included as links on your site.

If you want to submit these dynamic pages to the
search engines, you are confronted with the problem
that most of them will not accept URLs containing
the "?" character.

However, it would be perfectly possible to submit an
URL of the following format:

http://www.yoursite.com/cgi-bin/shop.cgi/product1

Here, the "?" character has been replaced by "/".

Yet more pleasing to the eye would be a URL of this
type:

http://www.yoursite.com/shop/product1

To the search engine, this appears to be just another
acceptable hyperlink, with "shop" presenting a directory
containing files "product1", "product2", etc.

If a visitor clicks this link on a search engine's
results page, this URL must be reconverted to make sure
that "shop.cgi?product1" will actually be called.

To this effect we will make use of mod_rewrite with the
following entries:

RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteRule ^(.*)shop/(.*)$  $1cgi-bin/shop.cgi?$2

The variables $1 and $2 constitute so-called
"backreferences". These are related to text groups.

Everything called in the clicked URL which is located
before "shop" plus everything following "shop/" is
defined by and stored in the two variables $1 and $2

Up to this point our given examples made use of rules
such as this one:

RewriteRule ^.htaccess*$ - [F]

However, we did not yet achieve a true rewrite in the
sense that one URL would be switched to another.

For the entry in our current example:

RewriteRule ^(.*)shop/(.*)$  $1cgi-bin/shop.cgi?$2

this general syntax applies:

RewriteRule currentURL rewrittenURL

As you can see, this command executes a real rewrite.

In addition to installing the ".htaccess" file,
all links in your normal HTML pages which follow the
format "cgi-bin/shop.cgi?product" must be changed to:
"shop/product" (without the quotes).

When a spider visits a normal HTML page of this kind
it will also follow or crawl the product links because
there is no question mark contained in the link anymore
to prevent it from doing so.

So employing this method you can convert dynamically
generated product descriptions into seemingly static
web pages and feed them to the search engines.

---------

In our second example we will discuss how to
redirect calls for ".txt" files to a program script.

Many webspace providers running Apache will feature
system log files only in common format. What this means
is that these logs will not store visitor Referrers and
UserAgents.

However, in relation to "robots.txt" calls it is
preferable to have access to this information in order
to learn more about visiting spiders than merely their
IPa.

To effect this, the entries in ".htaccess" should be as
follows:

RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteRule ^\robots.txt$ /text.cgi?%{REQUEST_URI}


Now, when "robots.txt" is called, the applied Rule
will redirect your visitor to the program script
"text.cgi".

Furthermore, a variable is conveyed to the script which
will be processed by the program.

"REQUEST_URI" defines the name of the file you expect
to be called. In out example this is "robots.txt".

The script will now read the contents of "robots.txt"
and will forward them to the web browser or the search
engine spider.

Finally, the visitor hit is archived in the log file.
To this effect, the script will pull the environmental
variables "$ENV{'HTTP_USER_AGENT'}" etc. This will
provide the required information.

Here is the source code for the cgi script mentioned
above:


#!/usr/bin/perl
# If required, adjust line above to point to Perl 5.
######################################################
# (c) Copyright 2000 by fantomaster.com              #
#     All rights reserved.                           #
######################################################

$stats_dir = "stats";
$log_file = "stats.log";

$remote_host   = "$ENV{'REMOTE_HOST'}";
$remote_addr   = "$ENV{'REMOTE_ADDR'}";
$user_agent    = "$ENV{'HTTP_USER_AGENT'}";
$referer       = "$ENV{'HTTP_REFERER'}";
$document_name = "$ENV{'QUERY_STRING'}";

open (FILE, "robots.txt");
   @TEXT = ;
close (FILE);

&get_date;

&log_hits
("$date $remote_host $remote_addr $user_agent $referer $document_name\n");

print "Content-type: text/plain\n\n";
print @TEXT;

exit;

sub get_date {
   ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localtime();
   $mon++;
   $sec  = sprintf ("%02d", $sec);
   $min  = sprintf ("%02d", $min);
   $hour = sprintf ("%02d", $hour);
   $mday = sprintf ("%02d", $mday);
   $mon  = sprintf ("%02d", $mon);
   $year = scalar localtime;
   $year =~ s/.*?(\d{4})/$1/;
   $date="$year-$mon-$mday, $hour:$min:$sec";
}

sub log_hits {
   open (HITS, ">>$stats_dir/$log_file");
      print HITS @_;
   close (HITS);
}



To install the script, upload it to your web site's
main or DocumentRoot directory by ftp and change file
permissions to 755.

Next, create the directory "stats".

A more detailed description on how to install a script
can he found in our online manuals, e.g. here:

< http://www.fantomaster.com/fantomasSuite/logFrog/lfhelp.txt >


If your server's configuration does not permit
execution of Perl or CGI scripts in the main directory
(DocumentRoot), you may wish to try the following
RewriteRule instead:

RewriteRule ^\robots.txt$ /cgi-bin/text.cgi?%{REQUEST_URI}

Note, however, that in this case you will have to
modify the paths accordingly in the program script!

-------

Finally, here's the solution to our quiz from the
previous issue of fantomNews:

=================================================
RewriteCond %{REMOTE_ADDR} ^216\.32\.64
RewriteRule ^.*$ - [F]

Quiz question:
--------------
If we don't write "^216\.32\.64\." for a regular
expression in the configuration above, but
"^216\.32\.64" instead, will we get the identical
effect, i.e. will this exclude the same IPs?
=================================================

The regular expression ^216\.32\.64
will apply e.g. to the following strings:

216.32.64
216.32.640
216.32.641
216.32.64a
216.32.64abc
216.32.64.12
216.32.642.12

Hence, "4" may be followed by any character string.

However, IP addresses can only have the maximal value
255.255.255.255 - which implies that e.g.
216.32.642.12 is not a valid IP.
The only valid IP in the list above is 216.32.64.12!

Although the two regular expressions "^216\.32\.64\."
and "^216\.32\.64" allow for different strings, due to
the technical limitation of IP addresses to 0-255 this
range of IPs will remain excluded.

(to be continued ...)

------------------------------------------------------

[Main text: 1001 words/7490 characters]
======================================================
This text may freely be republished or distributed
provided the following resource box is included intact
either at the beginning or the end of the article and
a complimentary copy or notice (link) is sent to the
author at the address specified below:

------------------------------------------------------
Dirk Brockhausen is the co-founder and principal of
fantomaster.com Ltd. (UK) and fantomaster.com GmbH
(Belgium), a company specializing in webmasters
software development, industrial-strength cloaking and
search engine positioning services. He holds a
doctorate in physics and has worked as an SAP
consultant and software developer since 1994. He is
also Technical Editor of fantomNews, a free newsletter
focusing on search engine optimization, available at:
< http://fantomaster.com/fantomnews-sub.html >
You can contact him at
mailto:[email protected]
(c) copyright 2000 by fantomaster.com
------------------------------------------------------



[ Back ] [ Main Menu ]



Download Fuse Node.js Compiler