"100+ Auto-Installing Software Titles For Your Web Site"

How do forms really work?

07-09-2005
by John Saya

Did you ever enter your name on a web page? How about a credit card number? Maybe you clicked a box that asked if you were Male or Female? If you ever submitted any type of information while on a web site, then you submitted a form.

When someone enters information into a form on a web page, it is usually sent to a script on the web site server to be processed in some way. The script receives the form data as a set of name-value pairs. The names are what you defined in the INPUT, SELECT, or TEXTAREA tags in your form, and the values are whatever the user typed in or selected. Users can also submit files with forms, but we won't cover that here.

Let's say you have a form that contains this information:

Name: <INPUT NAME=name VALUE="John">
Gender: <INPUT NAME=gender VALUE="Male">
Age: <INPUT NAME=age VALUE="25">
When the above form is submitted, the name-value pairs are sent back to the web server as one long string, which you need to parse. It's not very complicated, and there are plenty of existing routines to do it for you. The long string is in one of these two formats:

"name=John&gender=Male&age=25"
"name=John;gender=Male;age=25"

We just split the line between the ampersands or semicolons, and then on the equal signs. You should be aware that the original long string is URL-encoded, to allow for spaces, equal signs, ampersands, and so forth in the user's input. This means that certain characters are converted to other characters. For example, a space is converted to a plus sign (+). So when we parse the form information, we do two things to the name-value pairs:

1. Convert all "+" characters to spaces
2. Convert all hexadecimal characters to their original character. For example "%3d" would be "=".

But, where do you get the long string? Well, that depends on the HTTP method the form was submitted with. There are two ways a form can be submitted. Either GET or POST can be used. For GET submissions, the long string is in the environment variable QUERY_STRING. For POST submissions, it gets read from STDIN. The exact number of bytes to read is in the environment variable CONTENT_LENGTH.

To understand this concept a little more, you need to know more about how GET and POST work. They are two different methods defined in HTTP that do very different things, but both happen to be able to send form submissions to the server.

GET is how your web browser downloads most files, like web pages and pictures. It can also be used for most form submissions, if there's not too much data. The limit varies depending on your web browser. Keep in mind that your web browser can cache GET responses too. So if you submit two identical GET requests, the first will be sent to the web server, but the second may be displayed from your browser's cache instead. This speeds up viewing web pages across the Internet, but it's not so good if you want to log each request, store data, or otherwise take an action for each request. GET includes all of the form information right in the URL. For example:

http://www.yourserver.com/cgi-bin/script.cgi?name1=value1&name2=value2

Everything after the question mark is sent back to the server for the script to read. The below code will read and display all of the name-value pairs passed to the server when you call it from your web browser. So, if you called the above URL using this code, it would display this in your web browser:

name1 = value1
name2 = value2

Keep in mind that this demonstrates the GET method only.

#/usr/bin/perl
# parse_get.cgi

&parse_form;

print "Content-type: text/html\n\n";

foreach my $key (sort keys %FORM)
 {
 print "$key = $FORM{$key}<br />\n";
 } 

exit;

sub parse_form {
    local($name, $value);

    # First we split all name-value pairs
    foreach (split(/[&;]/, $ENV{'QUERY_STRING'})) {

    # Now convert all + signs to spaces
    s/\+/ /g;

    # Split the name-value pairs between the = signs
    # Then assign to local $name and $value
    ($name, $value)= split('=', $_, 2);

    # Convert all hexadecimal characters back to ASCII
    $name =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge;
    $value =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge;

    # Assign the name-value pairs to the global hash %FORM
    $FORM{$name} .= "\0" if defined($FORM{$name});
    $FORM{$name} .= $value ;
    }
}


POST is normally used to send a chunk of data to the server to be processed. When an HTML form is submitted using POST, your form data is attached to the end of the POST request, in its own object (specifically, in the message body). This is not as simple as using GET, but is more versatile. For example, you can send entire files using POST. Also, data size is not limited like it is with GET. Some advantages of POST are that you're unlimited in the data you can submit, and you can count on your script being called every time the form is submitted.

The below code will process a GET or POST request that is sent to the server.

#/usr/bin/perl
# parse_form.cgi

&parse_form;

print "Content-type: text/html\n\n";

foreach my $key (sort keys %FORM)
 {
 print "$key = $FORM{$key}<br />\n";
 } 

exit;

sub parse_form {

   # If it's a GET request use the QUERY_STRING variable

   if ("\U$ENV{'REQUEST_METHOD'}\E" eq 'GET') {

   # Split the name-value pairs
   @pairs = split(/&/, $ENV{'QUERY_STRING'});
   }

   # If it's a POST request read from STDIN and get the length
   # from the CONTENT_LENGTH environment variable

   elsif ("\U$ENV{'REQUEST_METHOD'}\E" eq 'POST') {
      # Get the input
      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
 
   # Split the name-value pairs
   @pairs = split(/&/, $buffer);
   }

   else {

   # If neither method is called, show an error message
   &error('request_method');
   }

   foreach $pair (@pairs) {

      # Split the name-value pairs and assign to $name and $value

      ($name, $value) = split(/=/, $pair);

      # Convert + signs to spaces and hexadecimal characters to ASCII

      $name =~ tr/+/ /;
      $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

      $value =~ tr/+/ /;
      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

      # If they try to include server side includes, erase them, so they
      # aren't a security risk if the html gets returned.  Another 
      # security hole plugged up.

      $value =~ s///g;

      # Remove HTML Tags

      $value =~ s/<([^>]|\n)*>//g;
    
      # Assign the name-value pairs to the hash %FORM

      if ($FORM{$name} && ($value)) {
          $FORM{$name} = "$FORM{$name}, $value";
	 }
         elsif ($value ne "") {
            $FORM{$name} = $value;
         }
  }
}

sub error {
local($msg) = @_;
print "Content-Type: text/html\n\n";
print "<CENTER><H2>$msg</H2></CENTER>\n";
exit; }
The above code can be used to display all of the variables of any form you create on your web site.

With this knowledge, there are many more things you can do with your web site. Here's a great tip. Why not start asking for your visitor's e-mail address, so you can keep in touch with them! You'll need a way to capture that information from their web browser, but you know how that's all done now.

You can always use existing software to design and manage your forms. Take a look at FormPRO II or FormSender for that.

[ Back ] [ Main Menu ]



Download Fuse Node.js Compiler