HTTP Client Browser Simulator Robot Monitoring Tool

| 0 Comments | 0 TrackBacks
Administrators use Nagios, Zenoss or something else to monitor their websites. Sometimes these sites are simple pages that you do an HTTP GET on and look for a string as a validation token that your service is up. But sometimes things are not so simple. You'll have an application in which you want to GET the first page, login by entering a username and a password, then getting the next page, and the next page while having a valid session, clicking more around without getting kicked out of the application and then finally logging yourself out. During your navigation you might want to send an email through the site and check that it was delivered and also verifying database entries. Nagio or Zenoss or any other monitoring tool is not going to do this for you automatically. You must create your own robot.

I've seen a tool written for the navigation part. You play the record button and start your browsing. You'll authenticate and navigate around. The tool serving as a proxy records your protocol and all its parameters and will generate a script with which you can replay your browsing. Then you can use this to monitor your application without having to do any coding yourself or even understanding HTTP's design features.

That's brilliant but there are two problems with the above tool. First, It doesn't work when content is changed dynamically by client side mantis.jpgjavascripts (DOM) or when a good security feature is in place such as a different dynamically generated hidden form field returned by the server each time that needs to be sent back. Second is validations that are non HTTP related such as email check or database entry verification for your session. So you know that you will have to take matters into your own hands and write your own HTTP browser robot that would mimic a user.

My tool is the most powerful one because I wrote it (around March 2008) and thus know what it does and I can change it however I like. I will have to know the application's behavior at each step and create a configuration file manually. I don't have to be the application's developer. I use Firefox's Firebug to capture the parameters. The configuration I use is in YAML format. The client simulator reads the configuration and at each step will handle the processing to a module written for that step's protocol. Whether it is a GET, POST, MAIL, DB, or whatever else you like.

There are two sections: global and main. Global  contains information that is not related to the browser's navigation. Global has information about the HTTP's cookie-jar file location or the email settings we need to send an alert message. The main section has subsections 1, 2, 3 and so on. Each integer defines, in order, the action the browser script will take. Let's show a sample configuration to clarify (YAML config format):

---

mysite.conf:
   global:
      email:
         FROM: farhad@mysite.com
         TO: web_operations@mysite.com,pagers@mysite.com
         Subject: Alert mysite.com needs attention
         content-type: text/html; charset="ISO-8859-1"; format=flowed
         msg: https://mysite.com/auth.php?method=autoLogin
      lwp:
         cookie-jar: cookiejar/mysite.com.cookie_jar

   main:
      1:
        proto: GET
        action:
           url: https://www.mysite.com/
           save: out/mysite.com_01.html
           validate: Welcome To mysite
           title: Step 1: Welcome Page

      2:
        proto: GET
        action:
           url: https://www.mysite.com/auth.php?method=autoLogin
           save: out/mysite.com_02.html
           validate: Enter Username
           title: Step 1: Login Page

      3:
        proto: POST
        action:
           url: https://www.mysite.com/Login.php
           save: out/mysite.com_03.html
           validate: Welcome
           title: Step: 2 Successfully logged in
          
        fields:
           method: auth
           password: monitoring123
           username: robot


Here is a basic config for getting a page that is protected by the Basic authentication method:

---
basic_auth_login.conf:
   global:
      email:
         FROM: farhad.saberi@basicauth.com
         TO: farhad.saberi@basicauth.com
         Subject: Alert basicauth.com
         content-type: text/html; charset="ISO-8859-1"; format=flowed
         msg: http://basicauthsite.com
      lwp:
         cookie-jar: cookiejar/mysite.com.cookie_jar

   main:
      1:
         proto: GET_AUTH_BASIC
         action:
            url: http://basicauthsite.com/
            save: out/basicauthsite.01.html
            validate: Basic Auth will open in a new window
            title: "Step 1: Basic Auth login"
            realm: basicauth
            username: farhad.saberi
            password: robot123



The above YAML config shows three steps in the main section that our automated client browser will action on. The first two use the GET protocol on the urls https://www.mysite.com/ and https://www.mysite.com/auth.php?method=autoLogin and will save the response to the files out/mysite.com_01.html and out/mysite.com_02.html. The field "validate" is used to verify the result. If the response at step 2: contains the string "Enter Username" then we are good to continue to step 3: which is the POST. At each step of the browsing there's a field called "title" and it is informational only. If at a step the response fails to validate (regex match usually) then we know there's a problem. I include the "title" field in the email message to specify at which step the failure occurred.

The third step which requires the POST protocol to be used contains the "fields" that I will send. While I set my configuration up I will navigate once and use Firefox's Firebug to know all of the POST fields that were sent out. I just copy & paste them into my config. The YAML parser i use is YAML::Syck.

I will show the code for the browser and then the modules GET.pm and POST.pm that the browser will use. I mentioned above DB and MAIL as well. If there's a 4th step that requires a DB query for example, you would write a DB.pm module, add a 4: step in your YAML config and specify its "proto" field as DB. I actually have all this but left them out so as to show the framework for this monitoring tool only and not make this article too long.

Here's the browser code:

#!/usr/bin/perl

use strict;
use warnings;
use File::Basename;
use Cwd 'abs_path';
use LWP::UserAgent;
use HTTP::Cookies;
use YAML::Syck;
use Time::HiRes qw(gettimeofday tv_interval);

unless(defined $ARGV[0]){
   die "usage: $0 <config>\nWhere <config> is a configuration file under configs directory.\n";
}

my $title=$ARGV[0]; #note that this must be the same YAML config title on top of the config file
my $scriptdir = dirname(abs_path($0));

push @INC, "${scriptdir}/pm";
require Mail::Sendmail;
require GET;
require GET_AUTH_BASIC;
require POST;

my $config = "${scriptdir}/configs/".$ARGV[0];
my $conf   = LoadFile($config);
my $global = $conf->{$title}->{'global'};
my $main = $conf->{$title}->{'main'};

my $tot_attemtps = 1;
my $cj = $global->{'lwp'}->{'cookie-jar'};

# We make our own specialization of LWP::UserAgent that asks for
# user/password if document is protected.
{
    package RequestAgent;
    our @ISA = qw(LWP::UserAgent);

    sub new
    {
        my $self = LWP::UserAgent::new(@_);
        $self->{username} = '';
        $self->{password} = '';
        $self->agent("lwp_robot ");
        $self;
    }

    # set_user_password() is called inside GET_AUTH_BASIC
    #
    sub set_user_password
    {
        my($self, $user, $password) = @_;
        $self->{username} = $user;
        $self->{password} = $password;
    }

    sub get_basic_credentials
    {
        my($self, $realm, $uri) = @_;
        return ($self->{username}, $self->{password});
    }
}

for(my $attempt=0 ; $attempt <= $tot_attemtps ; $attempt++){
  print "attempt=${attempt}\n";
  my $flag=0;#if flag is 1, then it means a second attempt should be made
  my $cookie_jar=HTTP::Cookies->new(file => "${cj}",autosave => 1,ignore_discard => 1);
  HERE: {
       my $ua = RequestAgent->new;
          $ua->cookie_jar($cookie_jar);
       foreach my $k (sort keys %{$main}){
           my $req = $main->{$k}->{'proto'}->new(\%{$main->{$k}->{'action'}}, \%{$main->{$k}->{'fields'}});

           print "$k. ". $main->{$k}->{'proto'} ." ... ";
           print $main->{$k}->{'action'}->{'title'} if ($main->{$k}->{'action'}->{'title'});

           $cookie_jar->load;
           my $start_t = [gettimeofday];
           my $res = $req->submit($ua,$cookie_jar);
           my $elapsed_t = tv_interval ($start_t, [gettimeofday]);

           #if $res is defined, then it means it is an error and we mail out the error.
           if (defined $res && $attempt >= $tot_attemtps){
               print " ... ERROR a attempt=$attempt flag=${flag}. $elapsed_t sec\n";
               alert($res, $k);
           }elsif(defined $res && $attempt < $tot_attemtps){ #out first attempt that failed, where $attmept is 0
               undef $ua;
               sleep 7;
               print " ... ERROR b attempt=$attempt flag=${flag}. $elapsed_t sec\n";
               $flag=1;
               $cookie_jar->clear;
               last HERE;
           }else{
               print " ... SUCCESS. $elapsed_t sec\n";
           }

           $cookie_jar->extract_cookies( $res );
           $cookie_jar->save;
           sleep 3;
       }
  }
  if ($flag == 0){
        $cookie_jar->clear; #clean up and leave
        last;
  }
}

sub alert{
   my $res = shift;
   my $step= shift;

   my $title = $main->{$step}->{'action'}->{'title'};
   my $msg= $global->{'email'}->{'msg'};
   my $host=`hostname`;chomp $host;

   my %mail = (
        from => $global->{'email'}->{'FROM'},
        to => $global->{'email'}->{'TO'},
        subject => $global->{'email'}->{'Subject'} . " from $host",
       'content-type' => $global->{'email'}->{'content-type'}
   );
   $mail{body}  = $global->{'email'}->{'Subject'} . " Failed at ${title}\n<br>\n<br>";
   $mail{body} .= $msg ."\n<br>";
   $mail{body} .= $res->content ;
   sendmail(%mail) || print "Error: $Mail::Sendmail::error\n";
   exit 1;
}

When I push "$scriptdir/pm" onto @INC i follow to require my other modules that reside under the script's current location. Config::YAML and Mail::Sendmail have been installed under the script's pm directory from CPAN. GET.pm and POST.pm are written by myself. I can't say use because the current directory's pm is not yet in @INC and I don't want to place it in a BEGIN block. Why? Because that would be a hard coded path and if I were to move my installation under another directory I would have to edit the script's BEGIN block with the new location. The require statement happens at runtime while use happens at compile time so that's why I require instead of use for those locally installed modules that are not in perl's default @INC location.

There's a $tot_attempts variable. I should have put its count into the config's global section :-) This is the number of attempts the script will try before giving up. If there's a error at any step, the browser script will clean up and start over again a total of $tot_attempt times.

Moving on, let's look at GET.pm and POST.pm. They are written to handle the hash reference passed as argument, which is the configuration's subsection for it.

package GET;
use warnings;
use strict;

sub new {
  my ($class) = shift;
  my $action = shift;
  my $fields = shift;
  my $self={ 'action' => {%$action}, 'fields' => {%$fields} };
  bless $self,$class;
  return $self;
}

sub submit{
  my $self = shift;
  my $ua = shift;
  my $cookie_jar = shift;
  use HTTP::Request;
  my $req=HTTP::Request->new(GET => $self->{'action'}->{'url'});

  foreach my $header (keys %{$self->{'action'}->{'headers'}}){
        $req->header($header => $self->{'action'}->{'headers'}->{$header} );
  }

  $cookie_jar->add_cookie_header( $req );
  my $res = $ua->request($req);

  if($self->{'action'}->{'save'}){
        open (FH,">",$self->{'action'}->{'save'}) || die("ERR open save file ".$self->{'action'}->{'save'}." failed: $!");
        print FH $res->content;
        close(FH);
  }
  my $validate_s = $self->{'action'}->{'validate'};
  if($res->content =~ /$validate_s/ ){
        return undef;
  }else{
        return $res;
  }
}
1;

And POST.pm.

package POST;
use warnings;
use strict;

sub new {
  my ($class) = shift;
  my $action = shift;
  my $fields = shift;
  my $self={ 'action' => {%$action}, 'fields' => {%$fields} };
  bless $self,$class;
  return $self;
}

sub submit{
  my $self = shift;
  my $ua = shift;
  my $cookie_jar = shift;

  use HTTP::Request::Common;
  my $req = POST($self->{'action'}->{'url'}, [ %{$self->{'fields'}} ]);

  $cookie_jar->add_cookie_header( $req );
  my $res = $ua->request($req);

  if($self->{'action'}->{'save'}){
     open (FH,">",$self->{'action'}->{'save'}) || die("ERR open save file ".$self->{'action'}->{'save'}." failed: $!");
     print FH $res->content;
     close(FH);
  }

  my $validate_s = $self->{'action'}->{'validate'};
  unless($res->content =~ /$validate_s/ ){
        return $res;
  }else{
        return undef;
  }

  if($res->content =~ /$validate_s/ ){
        return undef;
  }else{
        return $res;
  }
}
1;

and GET_AUTH_BASIC.pm

package GET_AUTH_BASIC;
use warnings;
use strict;

sub new {
  my ($class) = shift;
  my $action = shift;
  my $fields = shift;
  my $self={ 'action' => {%$action}, 'fields' => {%$fields} };
  bless $self,$class;
  return $self;
}

sub submit{
  my $self = shift;
  my $ua   = shift;
  my $cookie_jar = shift;
  use HTTP::Request;
  my $req=HTTP::Request->new(GET => $self->{'action'}->{'url'});

  foreach my $header (keys %{$self->{'action'}->{'headers'}}){
        $req->header($header => $self->{'action'}->{'headers'}->{$header} );
  }

  $cookie_jar->add_cookie_header( $req );

  $ua->set_user_password($self->{'action'}->{'username'}, $self->{'action'}->{'password'});

  my $res = $ua->request($req);
  if($self->{'action'}->{'save'}){
        open (FH,">",$self->{'action'}->{'save'}) || die("ERR open save file ".$self->{'action'}->{'save'}." failed: $!");
        print FH $res->content;
        close(FH);
  }
  my $validate_s = $self->{'action'}->{'validate'};
  unless($res->content =~ /$validate_s/ ){
        return $res;
  }else{
        return undef;
  }
}
1;


There you have a solid framework to build on. You can now easily add a DB.pm or a MAIL.pm too. For each proto in your YAML config you write your own module and make it do what you like.

Let me be explicit that this browser is not overly complicated and it shouldn't be unless it would have to be. Got that? :-) What I mean is this. Assume that a connection hangs forever. What if you add an SSH check in there. SSH can hang on you going through a certain firewall, literally forever. I've seen it sit there for 20 hours! You would use signals and most likely fork() with a timeout setting that will return if your SSH attempt stalls for too long. But if you stick to the basic HTTP stuff then you won't need to do that. I hope that this article was helpful.

No TrackBacks

TrackBack URL: http://www.farhadsaberi.com/cgi-bin/mt/mt-tb.cgi/10

Leave a comment

About this Entry

This page contains a single entry by Farhad Saberi published on January 8, 2011 1:37 AM.

Infix to Postfix Conversion Stack calculator was the previous entry in this blog.

HTTP Post stream upload a file chunked transfer is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.