Blogs

Swapping two variables in place without using a temporary variable (storage) seems to be a favourite question for some eccentric interviewers. I've been asked it on three separate occasions. The first two times I didn't know the answer and didn't bother looking it up. The third time I realized that perhaps I should figure this out.

The actual answer will be in the context of reversing all the words in a sentence without using a temporary variable. For example, the sentence "last night the DJ saved my life" should be reversed to "life my saved DJ the night last".

The swapping of two variables using no extra storage is done using a technique called the XOR swap algorithm. It is explained by many people over the internet. This is wikipedia's: http://en.wikipedia.org/wiki/XOR_swap_algorithm

so if A=1 0 1 1 0 1 0 0 1 1, and B= 1 1 0 1 1 0 1 0 0 0, then the swap happens in three XOR operations:

A = A ^ B;
B = A ^ B;
A = A ^ B;

The hat sign, ^, is the Perl notation for XOR. Just to show the three XOR operations:

# First step is to XOR the two variables and store the result in A.
A = A ^ B = 1 0 1 1 0 1 0 0 1 1 ^ 1 1 0 1 1 0 1 0 0 0 = 0 1 1 0 1 1 1 0 1 1;

# Now use the new A and XOR it with B, replacing B.
B = A ^ B = 0 1 1 0 1 1 1 0 1 1 ^ 1 1 0 1 1 0 1 0 0 0 = 1 0 1 1 0 1 0 0 1 1;

# Finally, XOR A and B again, storing the result in A.
A = A ^ B = 0 1 1 0 1 1 1 0 1 1 ^ 1 0 1 1 0 1 0 0 1 1 = 1 1 0 1 1 0 1 0 0 0;

Now A and B are swapped. The reversing of the words in a sentence happens in two steps. First reverse all letters within the sentence. Second, reverse all letters within each word. For the above example, the first step produces this string:

e f i l   y m   d e v a s   J D   e h t   t h g i n   t s a l

Reversing each word gives the desired output "life my saved DJ the night last".

I ignore the beginning and ending white spaces and only swap non-white space characters. Then to figure out each word I use a two variables as cursors, i and j. Where the integer i denotes the beginning position for the reversing and j the ending position. So a reverse happens between i and j. And i should never be bigger than j.

#!/usr/bin/perl


use strict;

use warnings;


sub reverse_{

   my $s = shift;

   my $i = shift;

   my $j = shift;


   for( ; $i < $j ; $i++, $j--){ 

       $s->[$i] = $s->[$i] ^ $s->[$j];

       $s->[$j] = $s->[$i] ^ $s->[$j];

       $s->[$i] = $s->[$i] ^ $s->[$j];

    }

}


sub main{

   my $s = $ARGV[0] || 'last night the DJ saved my life';


   my @s = split(//, $s);

   my $ssize = scalar(@s);


   # ignore beginning white spaces

   #

   my $i=0;

   while(1){

      if($i > $#s){ #string is all white spaces

          print "string is empty: '@s'\n"

          exit;

      }elsif($s[$i] =~ /\s/){

           $i++; 

      }else{

           last;

      }

   }


   # ignore trailing white spaces

   #

   my $j = $#s;

   while(1){

      if($s[$j] =~ /\s/){

          $j--;

      }else{

          last;

      }

   }

   # reverse @s until the end

   reverse_(\@s,$i, $j) if($j > $i);

   print "all letters reversed: @s\n";

   

   while($i < scalar(@s)){

        if($s[$i] =~ /\s/){

            $i++;

            next

        }

        if($i == $#s){ last; }

        $j=$i + 1;


        while (1){

             if($j == $#s){

                  last;

             }elsif($s[$j] =~ /\s/){

                  $j--; 

                  last;

             }else{

                  $j++;

             }

        }

        if($s[$j] =~ /\s/){

             print $s[$j];

             $j--;

        }

        reverse_(\@s,$i, $j) if ($j > $i);

        $i = $j + 1;

   }

   foreach my $w (@s){

      print "$w";

   }

   print "\n";

}

main();


farhadsaberi$ ./reverse_words.pl "it's the economy stupid"
all letters reversed: d i p u t s   y m o n o c e   e h t   s ' t i
stupid economy the it's

farhadsaberi$ ./reverse_words.pl "   the yellow     balloon "
all letters reversed:       n o o l l a b           w o l l e y   e h t  
    balloon     yellow the 


The second example shows that the beginning and ending white spaces as well white spaces between words are preserved.

build create a one single file rpm package - Linux_FreeBSD

| 0 Comments | 0 TrackBacks
Let's create an rpm package that contains a single file. This was so annoying for me to figure out that I don't want to ever remember or research how to do it again so here it is documented. All I want is to package the file /usr/bin/flashplayerdebugger into an RPM. This is a single binary file and my server installation on the cloud is forcing me to install it using rpm.

RPM seems to have a lot of predefined expectations that is tedious to find and learn. My starting point was this short article by IBM. http://www.ibm.com/developerworks/library/l-rpm1/

That article goes through a build, configure and make steps to build a binary and finally package that binary. In my case I already have the binary and just want to build the rpm right away. After many facing many errors and searching around, these are the steps I had to do to successfully create the rpm.

Create the following 5 directories:

my home is /home/fsaberi.
I created /home/fsaberi/flashplayerdebugger01

then the following 5 directories under /home/fsaberi/flashplayerdebugger01

BUILD/  RPMS/  SOURCES/  SPECS/  SRPMS/


the binary flashplayerdebugger01 is packaged under the SOURCES directory in this way, in a tar ball, so that it would work with the spec file shown further below:

[fsaberi@farm2-zcon-01-v1 flashplayerdebugger01]$ tar -tvf SOURCES/flashplayerdebugger01.tar
drwxr-xr-x f/u     0 20.. 15:30:27 flashplayerdebugger-01/
drwxr-xr-x f/u     0 20.. 15:30:27 flashplayerdebugger-01/usr/
drwxr-xr-x f/u     0 20.. 15:30:46 flashplayerdebugger-01/usr/bin/
-rwxr-xr-x f/u 16816172 201.. 15:30:46 flashplayerdebugger-01/usr/bin/flashplayerdebugger01


The spec file SPECS/flashplayerdebugger.spec would then look like this:

[fsaberi@farm2-zcon-01-v1 flashplayerdebugger01]$ cat SPECS/flashplayerdebugger.spec

%define _topdir         /home/fsaberi/flashplayerdebugger01
%define name        flashplayerdebugger
%define release        1
%define version     01
%define buildroot %{_topdir}/BUILD/%{name}-%{version}

BuildRoot:    %{buildroot}
Summary:         flash player debugger for blah2 server
License:         GPL
Name:             %{name}
Version:         %{version}
Release:         %{release}
Source:         %{name}%{version}.tar
Prefix:         /usr/bin
Group:             blah2

AutoReqProv: no

%description
flash player debugger file for blah2 server. Contains one binary file.

%prep
%setup -q

%build

%install
cp %{buildroot}/usr/bin/%{name}%{version} /usr/bin

%files
%defattr(-,root,root)
/usr/bin/flashplayerdebugger01
[fsaberi@farm2-zcon-01-v1 flashplayerdebugger01]$


A few things I learned is that %setup -q will go into the SOURCES directory and will expect to find a tar file or else it will error out. When the untar occurs it then expect to find a directory that will be %{name}-%{version}. If the hyphen is not there then it complains. It untars %{name}%{version}.tar under the BUILD directory and goes on to the %build and then the %install sections.

I'm packaging a single binary file so I left %build empty. %install will actually install the binary locally on the system before heading to the %files section. In the %files i specify the only binary I wanted to package.

The command to build the rpm is

To make this work, I had used:

   sudo rpmbuild  -vv -bb --clean  SPECS/flashplayerdebugger.spec

But then I got this error:

error: Installed (but unpackaged) file(s) found:
   /debugfiles.list
   /debuglinks.list
   /debugsources.list

Searching a bit more, meant that I had to create a ~/.rpmmacros file with this line in it:

%debug_package          %{nil}

Then everything worked and the RPM is created under the directory RPMS.

To package and copy just one file using rpmbuild, you will most likely want to disable the dependency check or your package won't install without requiring a lot of other packages. So add the tag AutoReqProv: no as shown in the above spec file. There are also AutoReq and AutoProv tags you can look into. But the AutoReqProv deals with both.

So much trouble just to rpm package one single file.

Perl detach process daemon - perl

| 0 Comments | 0 TrackBacks

This is how a running process can detach itself from the shell and become a daemon. Not just from the shell but sometimes from another parent process such as an HTTP server. One characteristic of a daemon is that its parent process ID will be one, that of the init process. It would be a good thing to check that your PPID is not one before attempting to daemonize yourself.


#!/usr/bin/perl

use POSIX 'setsid';


defined (my $kid = fork) or die "Cannot fork: $!\n";

if ($kid) {

    # Parent runs this block

    exit(0);

} else {


    close STDIN;

    close STDOUT;

    close STDERR;

    setsid or die "Can't start a new session: $!";

    umask(0027); # create files with perms -rw-r----- 

    chdir '/' or die "Can't chdir to /: $!";


    open STDIN'<', '/dev/null' or die $!;

    open STDOUT, '>', '/dev/null' or die $!;

    open STDERR, '>>', '/tmp/mysql_install.log';


    defined (my $grandkid = fork) or die "Cannot fork: $!\n";

    if($grandkid){

        exit(0);

    }else{

         # my daemon is here.

        sleep(60);

        system("rpm -e --nodeps mysql-5.0.77-4.el5_4.1");

        system("yum install -y MySQL-client-5.5.22-1.linux2.6");

        system("yum install -y MySQL-devel-5.5.22-1.linux2.6");

        system("yum install -y MySQL-shared-compat-5.5.22-2.linux2.6");

    }

}



But in the above code, the parent is unaware of the detached process's ID. The parent of the daemon will be init and init will be responsible for cleaning after it. But if we want to still know what PID it is so that we can explicitly send a signal to it from the grandparent the detached grandchild will have somehow communicate his process ID back to the grandparent.


#!/usr/bin/perl

use POSIX 'setsid';


pipe(READER, WRITER) || die "pipe failed: $!\n";

defined (my $kid fork) or die "Cannot fork: $!\n";

if ($kid) {

    # Parent runs this block

   close(WRITER);

   my $line = <READER>;

   chomp $line;

   $daemon_pid = $line;

   close(READER) || warn "kid existed $?";

   waitpid($kid,0);

    exit(0);

else {

    close(READER);

    close STDIN;

    close STDOUT;

    close STDERR;

    setsid or die "Can't start a new session: $!";

    umask(0027); # create files with perms -rw-r----- 

    chdir '/' or die "Can't chdir to /: $!";


    open STDIN,  '<''/dev/null' or die $!;

    open STDOUT'>''/dev/null' or die $!;

    open STDERR'>>''/tmp/mysql_install.log';


    defined (my $grandkid fork) or die "Cannot fork: $!\n";

    if($grandkid){

        print WRITER "$grandkid\n";

        close(WRITER);

        exit(0);

    }else{

         # my daemon is here.

        sleep(60);

        system("rpm -e --nodeps mysql-5.0.77-4.el5_4.1");

        system("yum install -y MySQL-client-5.5.22-1.linux2.6");

        system("yum install -y MySQL-devel-5.5.22-1.linux2.6");

        system("yum install -y MySQL-shared-compat-5.5.22-2.linux2.6");

    }

}


We used pipes for this purpose. Notice that the grandchild ends it's PID with a newline when writing it to the grandparent. That's because the reader < and > will block until it reads a newline character.

Hierarchical data representation in SQL Netsted Sets - perl

| 0 Comments | 0 TrackBacks
Storing hierarchical data in a relational database is not obvious. There are a few techniques and I studied one that is called Netsted Sets and popularized by Joe Celko in his book Trees and Hierarchies in SQL for Smarties. I found this technique in an online article a long time ago and studied it and have implemented it at work and it is always useful so short of buying that book, I will spend the time to explain it here. That article did not go into detail as I will here, in fact I don't see anyone else spending so much time detailing this one out. I have not read Joe Celko's book and wonder if there are any discrepancies.

Let's show what we're talking about:

sql_hierarchy_tree.png

Nested Sets representation in SQL is a form of Preorder Tree Traversal where every child of a node is visited first, depth first. Every node is represented by a left and a right number.  Our goal is to show the above tree, which is stored in a mySQL table for example, as this:

sql_hierarchy_side_view.gifThis is always very useful. Examples of websites that can use this are forums or a javascript menu. Forums can use this for parent to child relationships of reply messages.

Ok let's explain what's happening in the above tree with the red arrows. In a Preorder Tree Traversal we visit every child of a node first, and then the sibling of that child. So starting at node 'a', we visit 'b', then the children of 'b' which is 'c' and 'd', then continue depth first so next is 'e' and 'f'. Now that all children of 'b' are seen we can come back up and go to root's next sibling which is 'g'. Once all children of elephant_eye.jpg'g' are visited, 'h', we continue with the next child of root which is 'i'. Going left to right and depth first, nodes 'j' then 'k' are visited and since 'k' has a child, we have to visit it first before going to 'm'. We finally come back up from the right side to root's right.

So again, we go left to right, visit every left of a left node first, going depth first, and then once all left and depth first are visited, we can continue to visit the right node.

Understanding this left and depth first visiting order is crucial in understanding how every node in this technique is stored in a relational database table.

First let's show how a tree is built. This will show the SQL's involved in inserting a new node. Then we'll show the code that draws the tree, that is the depth of the indentation shown above. Then how to delete a node, how to find the path from one node up to its root. And then some basic facts and calculations such as how to determine the number of children of a node. (think how a comments section of a site knows how many replies a comment has received without actually retrieving all children of that node).

When the root node is born it looks like thissql_hierarchy_root_node.jpg
A SQL INSERT statement with left=1 and right=2 will create the root node. Then every child node will require the the updating of the left and right nodes of every node that would appear *after* that node. Let's show this:

sql_hierarchy_addNode_b.jpgYou can see that it required the right value of the root node to be incremented by 2. The four sql statements show what needs to be done and they are the same for every node added. In your HTML page you will obviously know the node a user clicks on to add a child value and that node has a left (parent left or plft) and right (prgt) values. Now let's add a child node 'c' to the parent node 'b':
sql_hierarchy_addNode_c.jpgwhat's a bit more interesting is the simplification of the m value. It is just the parent's right value minus one. The m value, which is really a reference point to know which left and right values need incrementing during a new node addition, can be further simplified. Let's show this by adding a new node 'd' to the parent 'b':
sql_hierarchy_addNode_d.jpgSo far only right values have been updated. Let's skip ahead a few steps and show a new node 'x' being added that affects the updating of left values as well.
sql_hierarchy_addNote_x.png
Now that new node insertion is cleared up, let's show how we would retrieve the nodes from the table and display them in proper order and indent a child node beneath its parent according to its depth. Just like a java script menu or the forum on the reddit website where replies to a message are indented below it. Let's do this for the same eight node tree above.

The piece of code that does it is this. There is a stack involved. In this php code it is the array $right. Also, the same code would apply whether you would display a subtree or the entire tree starting at root. That's why we specify the node's plft or prgt. The returning tree would be its subtree. (I have a php class that handles my SQL statements which is irrelevant here). I'm not an HTML DIV expert that's why the &nbsp; are used for indentation :-)

$right = array();  // define stack

$ans = $db->executeQuery("select txt,lft,rgt from T where lft between $plft AND $prgt ORDER by lft asc");

      while ($row = mysql_fetch_array($ans)){
          
                if(count($right)>0){
                          while($right[count($right)-1] < $row['rgt']){
                                    array_pop($right);
                          }
                }
                // indent txt according to its depth in the tree
               echo str_repeat('&nbsp;&nbsp;&nbsp;&nbsp;',count($right));
               echo $row['txt'] . "<br />";

               //add this node to the stack
               $right[] = $row['rgt'];
       }


In a few words, you define an empty stack (called $right). Then you loop over the rows retrieved. For each node, if the stack is not empty, keep popping stack until the top of the stack has a value that is smaller than the current node. Then what remains in the stack, that is the size of the stack, will define the depth of the node. For root this will be zero. For root's immediate children, b, d and e, the size of the stack (the size of the array $right) will be 1. So once the depth of the node is determined, you print the node txt and then you add this node's right value on top of the stack. Let's show this loop and its stack graphically:
sql_hierarchy_draw_tree_a.jpgThe first time we enter the loop the stack is empty so nothing's popped and the size of the stack being zero defines that the root node will be indented zero times. 16 is pushed onto the stack $right and now we come back to the top of the while loop. The next node is 'b' (ORDER by lft ASC);sql_hierarchy_draw_tree_b.png Since 'b's right is smaller than the top of the stack, nothing's popped. The size of the stack remains 1 so the indentation of 'b' will be 1. Then 5 is pushed onto stack. Next node is 'c':
sql_hierarchy_draw_tree_c.jpg
Current right value is 4. Nothing's popped from the stack because 4 is smaller than the top of the stack. The resulting size of the stack, 2 (values 16 and 5), is the depth of 'c'. Then 4 is pushed onto stack. The next node d is where we seeing the popping of the stack. Node c is at depth 2 and we see d we'll pop 5 and 4 because they are smaller than d's right value 9. This gives a stack size of 1, (value 16), defining correctly d's depth of 1.
sql_hierarchy_draw_tree_d.jpgThe same process continues with the rest of the nodes x, e, f and g, in that order because the SQL statement was ORDER by lft ASC.

Now that we know how to add to the tree and how to draw it, let's show how to delete a node from it. Remember the property of the PREORDER traversal of this tree. Every child's left value will be higher than that of its parent. And every left child of a parent will have a smaller left value than its right sibling. Look at the tree drawn at the top to see that 'c' has a bigger left value than its parent 'b.' Also that 'b' has a smaller left value than its sibling 'g.' What it all means is that when you delete a node, you have to update every other node that would be seen AFTER the deleted node in a preorder traversal. Or, update every node that would have a left and right value higher than the deleted node.

While showing how to delete a node or an entire sub tree, we'll also learn how to count the number of child nodes of any node. Given lft and rgt of a node, the count of its children is going to be ⌊ ( rgt - lft ) / 2 ⌋

For the tree at the very top of this article, we see that node i has 4 children. Or, ⌊ ( 25 - 16 ) / 2 ⌋ = 4. When we're going to delete a node, obviously all of its children are going to be deleted as well. Every node coming after the group of nodes deleted (because of pre order traversal) will have to have their left and right values updated. By how much is what we have to calculate. The number would have to be relative to how many nodes are going to disappear. Let's show the formula by deleting node "5 d 10" from our original tree at the top.
sql_hierarchy_delete_node.jpgSo 6 is the number that we have to reduce the left and right values of all nodes that come *after* node 'd' in our preorder traversal. You already know your deleting node's left and right values of 5 and 10:

DELETE from T where lft >= 5 and rgt <= 10.
UPDATE T set lft=lft - 6 where lft > 5.
UPDATE T set rgt=rgt - 6 where rgt > 10.

So far we've seen how to add a node, delete a node of a sub tree in case the node is not a leaf node, and how to draw the tree. We also saw how to calculate how many children a node has. Another cool property this technique allows us to find is the ancestry of a leaf node up to the root of the tree:
sql_hierarchy_node_ancestry.jpg
That's the path to a node. This SQL SELECT statement on any node will return all nodes that are between it and the root node. To draw it you would "ORDER by lft ASC" and use the same loop to draw the tree as discussed above.

One other property you can see is the discovery of a leaf node. If right minus left equals 1 then the node is a leaf node.

Finally, there comes the problem of knowing all immediate children of a node. That is, only one level deep. For 'a' it would be b,g and i. For 'i' it would be j,k and m. You can write a script to keep calling select statements starting from left node 'b', then selecting 'g' and then selecting 'i'. That's because it is obvious that the immediate left child of a parent will have its left number be the parent's plus 1. (b's 2 is a's 1 + 1) (j's 17 is i's 16 + 1). And then for every node, the left value of its immediate sibling is going to be that node's right value plus one. (g's 12 is b's 11 + 1) (k's 19 is j's 18 + 1). So knowing the two above rules one could write a select statement in a loop to keep fetching first the immediate left child and then the next sibling of that child until you get a node who's right value is the parent's right minus one. (i's 25 is a's 26 - 1) (m's right is i's 25 - 1). Not knowing how many siblings there are, calling select in a loop can become overwhelming.

The other option to knowing every node's immediate children in one SQL query is to add a depth column to the original table. This is also explained in an article on Wikipedia on Nested Sets model. The new depth column would look like this for the above example. 

txt left right depth
a1260
b2111
c342
d5102
e673
f893
g12151
h13142
...
You know when you have inserted your very first node, the root with lft=1 and rgt=2. You specify its depth to be zero. After that, the node's immediate child is inserted with a depth of "depth plus one." To retrieve all nodes just one level deeper one more constraint can be added:

SELECT txt,lft,rgt FROM T where lft between $plft AND $prgt AND depth=ParentDepth + 1 ORDER by lft ASC;

I hope this was of any use. If anyone sees a mistake please do let me know.

French - music

| 0 Comments | 0 TrackBacks

Céline Dion - Pour que tu m'aimes encore [ Live ] YouTube
Charles Aznavour - Mourir d'aimer YouTube
Claude François - Comme d'habitude YouTube
Jean-Jacques Goldman - Là-Bas YouTube
France Gall - Ella Elle l'a YouTube
Francis Cabrel - je t'aimais, Je t'aime et je t'aimerai YouTube
Garou, Daniel, Patric - Belle, the original cast YouTube
Joe Dassin - À toi YouTube
Joe Dassin - Et Si Tu N' Existais Pas YouTube
Mylene Farmer - California YouTube
Patrick Bruel - J'te l'dis quand meme YouTube
Patrick Bruel - Qui a le droit 1989 YouTube

Follow Symbolic Link Tree pstree - perl

| 0 Comments | 0 TrackBacks
Just like the native unix command pstree, or ptree, for processes, l want to follow a symbolic link and display its tree. A C developer once complained to me that he's lost in his home directory. His setup was such that every time he created a new version of any source file, a new symbolic link was made. I've no idea how but he told me that the links were so deep that he could no longer find the actual file he was opening. This was three years prior to this writing and this simple problem stuck to my mind. I can't believe that there isn't a native Unix command that follows each link and displays where it points to, kind of like pstree.

I have to admit that the script is simple and yet it took me a while to write it. Much longer than I thought so I must be a bad coder. But I think that I have a few legitimate complaints which contributed to my slow down. Here are my complaints which all evolve around Perl functions that deal with file and directories but what if they are used on a symbolic link instead. Consider the following files:

mac:links fsaberi$ pwd
/Users/fsaberi/tmp/links
mac:links fsaberi$ ls -l
total 24
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 13:41 a -> b
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 13:41 b -> c
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 13:41 c -> d

notice how file d doesn't exist. If I were to use Perl's -e file existence test on file a, it would return false because it will follow the link to b then to c and then when it seed that d doesn't exist it will say that a also doesn't exist. To me that's a wrongful interpretation of the existence of "a" itself.

If I were to use module Cwd's abs_path function on a, it would tell me that its absolute path is /Users/fsaberi/tmp/links/d. It followed it all the way. What I was expecting is the absolute path of "a" itself: /Users/fsaberi/tmp/links/a. The symbolic Link a is a valid unix file and it does have an absolute path as well. I wish there were options to Perl saying "don't follow symlinks."

Now consider this, where v circularly points back to itself:

04372B-fsaberi:links fsaberi$ ls -l
total 40
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:11 v -> w
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:11 w -> x
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:12 x -> y
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:12 y -> z
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:13 z -> v

If I were to open v, it would fail. It doesn't just look at v itself and try to open it allowing me to read "w" which is the string content of the file "v." I guess it's by design that you have to first check if it's a link with -l and then use readlink instead. Also, -e on the circular link v would return false! But that's a philosophical question. Does "v" not exist? In my opinion it does exist and -e should return true. Whether it is a symlink who down the road badly points to itself or not was not part of the original question. In my understanding, -e should return true if there exists an inode for that path. Perl doesn't agree with me though.

Even -d can trick you. If you test a link that eventually points to a directory, -d will return true. Again it follows the link. I expected it to say that this file is not a directory.

The program sltree.pl named after symbolic link tree outputs this on file v:

mac:links fsaberi$ sltree.pl /Users/fsaberi/tmp/links/v
/Users/fsaberi/tmp/links/v
  -> /Users/fsaberi/tmp/links/w
   -> /Users/fsaberi/tmp/links/x
    -> /Users/fsaberi/tmp/links/y
     -> /Users/fsaberi/tmp/links/z
      -> /Users/fsaberi/tmp/links/v (circular link)

or making the links a bit more complicated. 

mac:links fsaberi$ ls -l
total 40
lrwxr-xr-x  1 fsaberi  staff   1 Nov 21 14:11 v -> w
lrwxr-xr-x  1 fsaberi  staff   1 Nov 21 14:11 w -> x
lrwxr-xr-x  1 fsaberi  staff  21 Nov 21 14:46 x -> ../../../../tmp/ooo/o
lrwxr-xr-x  1 fsaberi  staff   1 Nov 21 14:12 y -> z
lrwxr-xr-x  1 fsaberi  staff  12 Nov 21 14:49 z -> /usr/bin/dig
mac:links fsaberi$ ls -l  ../../../../tmp/ooo
total 8
lrwxr-xr-x  1 fsaberi  wheel  34 Nov 21 14:48 o -> ../../../Users/fsaberi/tmp/links/y

mac:links fsaberi$ perl ../../Documents/workspace/sltree/sltree.pl ../../tmp/links/v
/Users/fsaberi/tmp/links/v
  -> /Users/fsaberi/tmp/links/w
   -> /Users/fsaberi/tmp/links/x
    -> /tmp/ooo/o
     -> /Users/fsaberi/tmp/links/y
      -> /Users/fsaberi/tmp/links/z
       -> /usr/bin/dig


where link "v" eventually points to the program dig. Here is the source:

#!/usr/bin/perl

use strict; use warnings;

use Cwd;

use File::Basename;


my $start_node = $ARGV[0];

unless($start_node){

print "Must provide a file as argument: $0 <file>\n"; exit;

}


my $tree = DFS_G($start_node);

print $tree->[0] ."\n";

for(my $i=1; $i < scalar(@$tree) ; $i++){

print ' ' x $i;

print " -> ". $tree->[$i] ."\n";

}


# normalizeNewLink_relative_to_OrigLink, returns an absolute path with the . and ..'s

# removed. If path is not absolute, then it will be relative to another path. Meaning

# that this other path will be prepended to the original path first.

# If that "relative to" path is not given, then it will be relative to current working

# directory. (ugliest function name ever! sorry about that.)

sub normalizeNewLink_relative_to_OrigLink{

my $newlink = shift;

my $relativeTo = shift || undef;

my $origlink = $newlink; # store orig path in case an error message needs it.


unless(defined $relativeTo){

$relativeTo = getcwd();

}

my ($filename, $directories, $suffix) ;

if(-l $relativeTo){

($filename, $directories, $suffix) = fileparse($relativeTo);

$relativeTo = $directories;

}

$relativeTo =~ s/\/$//;


# this is how i have to get abs_path so that perl doesn't fail in case it 

        # is a bad link.

unless($newlink =~ /^\//){

($filename, $directories, $suffix) = fileparse($newlink);

$filename .= $suffix;

$newlink = $relativeTo . '/'. $directories . $filename;

}


if($newlink eq '/'){

return $newlink;

}

my @dirs = split('/', $newlink);

my @newlink;

for(my $i = 0 ; $i < scalar(@dirs); $i++){

if ($dirs[$i] eq '.' || $dirs[$i] eq ''){

next;

}elsif($dirs[$i] eq '..'){

pop @newlink if (scalar(@newlink) > 0);

}else{

push @newlink, $dirs[$i];

}

}

my $l = scalar(@newlink);

my $p;

if($l == 0){

return "/"; # we were given something weird like /.././../../

}elsif($l == 1){

$p = '/' . $newlink[0];

}elsif($l > 1){

$newlink[0] = '/'. $newlink[0] if ($newlink[0] !~ /^\\/);

$p = join '/', @newlink;

}else{

die "Error: Could not normalize $origlink\n";

}

return $p;

}


sub DFS_G{

    my $node = shift;

    my %visited;

    my @tree;


    $node = normalizeNewLink_relative_to_OrigLink($node);

    push @tree, $node;

    $visited{$node} =1;


    my $newNode;

while(1){

   if(-l $node){

$newNode = readlink($node);

$newNode normalizeNewLink_relative_to_OrigLink($newNode,$node);

if(exists $visited{$newNode}){

push @tree, $newNode;

$tree[ $#tree ] .= ' (circular link)';

return \@tree;

}else{

$visited{$newNode} = 1;

push @tree, $newNode;

}

    }else{

# it is not a link. It is either a directory or some file. 

                # Figure out if it exists. I can use -e safely if it is NOT

                # a symbolic link, which it isn't here.

unless(-e $tree[ $#tree ]){

$tree[ $#tree ] .= ' (broken link)';

return \@tree;

    }

    $node = $newNode;

}

}


I think that what I first tried to mean by the function name DFS_G was depth first search graph :-) well, this is an extremely simplified form of a DFS on a graph. Every node has exactly one child. And we stop at a broken link, or when we first see a node already visited (circular link). So there is no need for gray and black coloring neither as is the technique in a graph traversal.


I have tested the script somewhat but you never know what can go wrong so if ever anyone uses this and finds a bug please do let me know.

Non Blocking Multiple Parallel Processing - perl

| 0 Comments | 0 TrackBacks

Parallel processing at the user level became a critical part of my job when I encountered two situations. One is when I'm working with thousands of similar servers that I need to query by SSH'ing to, or by any other method. The second is when I had to load test LAMP servers which is done by taking an existing http access log file and re-requesting every query.


The first, querying servers, is much easier to deal with. There are so many examples out there which copy from each other the gist of their parallel processing model. Parallel_processing_Cheetah_Speed.jpgFor each server they fork a new child until a number of maximum children is reached. Then when one child finishes and exists, a new process is forked with the next server in the queue as its argument. Maximum performance is not a requirement here as long as the thousand server is queried within a reasonable amount of time. Say five to twenty minutes.

The problem is with the second task, that of application load testing. If for every line of an Apache access log you were to fork a separate child then your own load testing server would come to a halt before a decently programmed mod_perl/php application running under Apache would. For querying servers I can fork 500 times and it might take my script half an hour to complete and that's fine. But for load testing the purpose is to run hundreds of queries as fast as possible. Remember that forking is the most expensive call a kernel performs. There's no way you can load test for performance and fork continuously at the same time. I realized that I needed to modify the parallel processing model so that there's no forking involved while the requests are going on. I chose the image for this blog appropriately, that of a Cheetah. Just as a Cheetah is the fast animal on land, so is Pcmd.pm the fastest parallel processed implementation that I could come up with.

Not only I must never fork, but I need a way to collect data from each child for statistics purposes. For example I want to know how long the average HTTP GET request is taking and which URI took the longest. That's how I came up with Pcmd which stands for Parallel Command. It first spawns MaxChildren processes and then for every available input (server IP's or http access log entry), it assigns one to each child in the READy status. Once a child has been assigned it is then placed in the RUNNing status. While a child is in RUNNing status the parent attempts to read from it while not blocking on the child's file handle. Once a complete message is returned by the child it is put in DONE status. The output of a DONE child is processed and the child is put back once again in READy status where it is available to receive another input from the parent. This bidirectional message passing using BSD's socketpair and a precise handling of each child's status, and a robust protocol for passing the messages to and from each child makes this model of parallel processing work so that no additional forking is needed. The same initially spawned children can handle as many inputs as there are.

Download Pcmd.pm here.  Instead of pasting the entire source, i'll show bits of it and explain why it's there and what it does. I'll specifically explain:

1- The different hash structures used to keep track of a child's status and outputs.
2- The different hash structures used to keep track of the message passing.
3- How to fork and create a non-blocking bidirectional socketpair pipes between the parent and the child process. 
4- How the parent reads from multiple non blocking handles.
5- How message passing from and to the child is implemented such that it is robust and eliminates confusion.
6- two client examples using Pcmd.pm that put it all together.

1- The parent process loops around $children amount of time and calls the function make_child who will create the communication pipes using socketpair, fork off a process and give this new process the CHILD end of the socketpair stream. The parent has to keep track of each child's communication pipe, birth time, child's output (answer back to parent), and child's status which can be READY, RUNN or DONE.

Available within the class:
$self->{children_s}->{$pid}; # status of $pid.
$self->{children_f}->{$pid} # The CHILD end of the socketpair stream the parent uses to R/W to $pid

Defined within run():
%children_t; # {pid} = birth time in seconds.
%children_a; # {$pid} = array ref to a list of strings. They are answers read back from $pid
Exported:
$self->{timedout_answers} = []; # if a child is killed because it stayed RUNNing more than $self->{timeout} seconds, any amount of incomplete message read from that child will be stored here.

2- Message passing to and from the child can be tricky. When the parent is reading from the child's pipe, how does he know that the child is done sending information and that it should be placed in the DONE status and no longer read from? To achieve this I borrowed the idea of how HTTP POST protocol does this. In this article I explained HTTP POST. In the comments below EOM stands for EndOfMessage.

my %children_Mstate; # {pid}= Message states while looking for EOM '0\r\n\r\n'
my %children_Mstring;# {pid}= Message string to hold 0, then \r, then \n, looking for EOM.
my %children_Msize;  # {pid}= just like HTTP POST's proto, this tells our loop how many chars to read

There are two subroutines used to read messages. The parent always uses read_from_child() and the child always uses read_from_parent(). The only difference between the two is that the child will loop until a full message is read from the parent (the child blocks on the read). Whereas the parent will read from the same non-blocking child pipe until a block occurs. As soon as the parent can't read anymore it returns and moves on to the next child. But if a full message is read, the child is placed in the DONE status by read_from_child().

3 - The easiest way to have a bi-directional conversation with a child is to use socketpair, IMHO. Spawning a child and establishing a bi-directional non-blocking streams between the child process and the parent is done in subroutine make_child. I've commented within the code what's happening so no extra paragraphs should be needed:

my $CHILD  = gensym; # parent uses this handle to talk to the child

my $PARENT = gensym; # child uses this handle to talk to the parent

socketpair($CHILD, $PARENT, AF_UNIX, SOCK_STREAM, PF_UNSPEC) ||  die "socketpair: $!";

  

my  $flags = 0;

fcntl($CHILD, F_GETFL, $flags) or die "Couldn't get flags for HANDLE : $!\n";

$flags |= O_NONBLOCK;

fcntl($CHILD, F_SETFL, $flags) or die "Couldn't set flags for HANDLE: $!\n";


$flags = 0;

fcntl($PARENT, F_GETFL, $flags) or die "Couldn't get flags for HANDLE : $!\n";

$flags |= O_NONBLOCK;

fcntl($PARENT, F_SETFL, $flags) or die "Couldn't set flags for HANDLE: $!\n";


# Once a filehandle is set for non-blocking I/O, the sysread or syswrite calls that 

# would block will instead return undef and set $! to EAGAIN:

my $old_fh = select($CHILD);

$|=1;

select($old_fh);

$old_fh = select($PARENT);

$|=1;

select($old_fh);


$CHILD->autoflush(1);

$PARENT->autoflush(1);

    

if (my $pid = fork()){

# parent records the child's birth and returns

close($PARENT); # parent doesn't need PARENT handle.

        $self->{children_f}->{$pid} = $CHILD;

return $pid;

}else{

die "cannot fork: $!" unless defined $pid;

close $CHILD; # child doesn't need CHILD handle. Uses PARENT to talk to parent.

$SIG{'INT'} = 'DEFAULT';

$SIG{'TERM'} = 'DEFAULT';

$self->child($PARENT);

close($CHILD); # in case child() doens't close the handle.

exit(0); # in case child doesn't exit.

}



4 and 5 - For each child created, the parent holds a non blocking end of a socketpair pipe in %children_f where each keys is a PID with its value being the pipe to that PID. Foreach PID, if it is in the RUNN status, then an attempt to read from its output is made. This attempt cannot block on a child that is hung because that would starve all other children. In order to be able to read and parse a string according to a set of rules, I don't see other options but to read one character at a time from the file handle.

The parent is expecting a child who will send its output mimicking HTTP POST's protocol. I first expect a number which terminates by a newline character. This number designates how many characters the child is sending. He's telling me regardless of the character you have to read this many characters. This sequence of number, newline, message, number, newline, message ... continues until the number is a zero followed by these tour characters: \r\n\r\n. To achieve the parsing of each message, the read goes through six different states, 0,1,5,6,7 and 8. It starts at state 0 and once state 8 is reached then a full child message has been read. The child is placed in the DONE status and the message is passed to the defined parent callback subroutine. These states are obviously stored in each PID's $children_Mstate{$pid} structure. The message parsing loop is this:

while($self->{children_s}->{$pid} ne 'DONE'){

    my $rv = sysread($self->{children_f}->{$pid}, $buf, $BUFSIZ);

if (!defined($rv) && $! == EAGAIN) {

last; # would block

}else{

# if state 0, read msg size until \n then set state to 1

if($children_Mstate->{$pid} == 0 && $buf eq "\n"){

$children_Mstate->{$pid} = 1;

}elsif($children_Mstate->{$pid} == 0 && $buf == 0 && \

                                               ($children_Msize->{$pid} == 0)){

  # if Msize ==0, it means that the 0 in buf is in first pos.

                  # Msg length integer does NOT have leading zeros. This is

                  # how i distinguish the zero in 40 with the 0 in '0\r\n\r\n'

        $children_Mstate->{$pid} = 5;

}elsif($children_Mstate->{$pid} == 0){

$children_Msize->{$pid} .= $buf;

}


# if state 1, read msg body until Msize is 0 then go back to state 0

elsif($children_Mstate->{$pid} == 1 && $children_Msize->{$pid} > 0){

$children_Mstring->{$pid} .= $buf;

$children_Msize->{$pid} -= 1;


#if this was the last char to read, go back to state 0.

if($children_Msize->{$pid} == 0){

    push @{$children_a->{$pid}}, $children_Mstring->{$pid};

    $children_Mstring->{$pid} = '';

    $children_Mstate->{$pid} = 0;

}

}elsif($children_Mstate->{$pid} == 5 && $buf eq "\r"){

$children_Mstate->{$pid} = 6;

}elsif($children_Mstate->{$pid} == 6 && $buf eq "\n"){

$children_Mstate->{$pid} = 7;

}elsif($children_Mstate->{$pid} == 7 && $buf eq "\r"){

$children_Mstate->{$pid} = 8;

}elsif($children_Mstate->{$pid} == 8 && $buf eq "\n"){

   # End of msg successfully reached. Child is DONE so parent 

                   # shouldn't attempt to continue reading.

                        $self->{children_s}->{$pid} = 'DONE';

                        $children_Mstate->{$pid} = 0;

}else{

my $state = $children_Mstate->{$pid};

my $str = $children_Mstring->{$pid};

print "ERROR read_from_child $pid impossible state ";

print "$state with but \"$buf\" Mstring: \"$str\"\n";

print "Please report this as a bug\n";

last;

}

}

}



The parent reading from a child and the child reading from the parent follow the same message passing mechanism. Both subroutines read_from_child and read_from_parent are identical except that when the parent sees EAGAIN is stops, with the state of that PID as is so that the read resumes at a subsequent time, and control is returned back to the main loop inside run(). But the child will keep on reading until a full message is passed because it doesn't have anything else to do :-)

6 - let's just give a very simple example of using Pcmd.pm. The synopsis actually suffices. We'll define the parent and child subroutines that Pcmd object uses as callbacks. We'll define a number of IP addresses in a text file called ip.txt. Pcmd reads ip.txt and assigns one IP to each child. You define what the child will do with this IP. Then when the child returns, its answer is passed to the parent callback for some post processing. The parent callback should keep things to a very minimum because the parent callback actually blocks the main running loop inside Pcmd. It should mostly store the answers and process them after all of the children are done.

Synopsis:

use strict; use warnings;

use Pcmd;


# Spawn 24 children and each child won't run a command longer than 90 seconds.

my $cmd = Pcmd->new( "children" => 24 );

$cmd->timeout(90);


# Each child will be given a line from the file in.txt (Or, STDIN, i.e.: cat in.txt | cmd.pl)

$cmd->input_file("in.txt");

OR

$cmd->input_file('-'); # the default is STDIN so you pipe to your script


# The input_delimiter defines $/ inside Pcmd.pm. Used to read your input_file

$cmd->input_delimiter("\n"); # the default. 

# Be very careful not to exit from a child_callback. Run anything that you feel

# might cause an error inside an eval block. Make sure that you always return 

# from child_callback. At this time of writing, Pcmd.pl won't detect that a child

# has suddenly and unexpectedly died. Just behave socially :-)

sub child_callback { # This is your child code, what runs in parallel

my $m = shift;

        chomp $m;

        my $o = your_func($m);

        return $o# Returned value is passed to parent_callback 

}


my @all_answers ;

sub parent_callback {
my $answer = shift;

push @all_answers, @$answer; #that's all I do, save the answer for later.

# or do whatever else with @$answer here ... 

# but carefull as you should return quickly from parent_callback

# so that Pcmd can go on working with its other children

        # in Pcmd.pm i'm not even dealing with a return value for parent_callback.

}


$cmd->callback(\&child_callback, 'child');

$cmd->callback(\&parent_callback,'parent');


$cmd->run(); 

foreach my $a (@all_answers){

print "I am parent. Child said: $a\n";

}


#Any child that timeout, if there was anything in its output stream, it would be store here

my $timedout_answers = $cmd->timedout_answers();

foreach my $a (@$timedout_answers){#each $a is a reference to an array of answers

foreach my $b (@$a){

print "timedout answers:\"" . $b ."\"\n" if ($b);

}

}


Or put URI's one per line in uris.txt and load test a drupal site:


#!/usr/bin/perl


use strict;

use warnings;

use Pcmd;

use HTTP::Tiny;

use Time::HiRes qw(gettimeofday tv_interval);


my $cmd = Pcmd->new( "children" => 24 );

$cmd->timeout(90);

$cmd->input_file("uris.txt");

$cmd->input_delimiter("\n"); # the default.



sub child_callback { # This is your child code, what runs in parallel

    

    my $uri = shift;

    chomp $uri;

    my $start_t = [gettimeofday];

    my $response = HTTP::Tiny->new->get("http://stage.farhadsaberi.com${uri}");

    my $elapsed_t = tv_interval ($start_t, [gettimeofday]);

    $elapsed_t = 'E' unless $response->{success};

    

    my $cacheHit;

    if (exists $response->{'headers'}->{'x-drupal-cache'}){

        $cacheHit = $response->{'headers'}->{'x-drupal-cache'} eq 'HIT' ? 'H' : 'M';

    }else{

        $cacheHit = 'N'; # means that the header is not even there

    }

    

    # format of the return is "H 1.235224 /" where the first letter is an H or M for

    # cache MISS or HIT. We want to count how many hits we get.

    return "$cacheHit $elapsed_t $uri"; # Returned value is passed to parent_callback

}


my @all_answers;

$|=1;

sub parent_callback {

    my $answer = shift;

    

    # ## spinner code if needed

    # ## print substr( "-/|\\", $spin++ % 4, 1 ), "\b";

    # ##

    

    push @all_answers, @$answer; #that's all I do, save the answer for later.

    my $requests= scalar(@all_answers);

    

    my $b = length $requests;

    print "requests: $requests";

    $b += 10;

    print "\b" x $b;

}


$cmd->callback(\&child_callback, 'child');

$cmd->callback(\&parent_callback,'parent');


$cmd->run();

my $tot = 0;

my $tot_c_hits = 0;          # total caches

my $tot_c_none_existant = 0; # No cache info in header

my $tot_c_miss = 0;          # Cache misses

my $tot_t = 0;               # total times


my %error_uris; # keys are the uris and value the count

foreach my $a (@all_answers){

    $a =~ /([HMN])\s(\d+\.?\d*|E)\s(.*)$/;

    my ($c, $t, $uri) = ($1, $2, $3);

    

    # #########

    # possible values:

    # $1: H= Cache Hit, M= Cache Miss, N= No Cache Header

    # $2: E= Error in HTTP respone, x.abc= Decimal number timing the request

    # $3: The requested URI

    # /\/\/\/\/\

    

    ++$tot_c_hits if ($c eq 'H');

    ++$tot_c_miss if ($c eq 'M');

    ++$tot_c_none_existant if ($c eq 'N');

    

    if($t =~ /^\d+\.?\d*/){

        $tot_t += $t;

        $tot++;

    }elsif($t eq 'E'){

        if(exists $error_uris{$uri}){

            $error_uris{$uri} += 1;

        }else{

            $error_uris{$uri} = 1;

        }

    }else{

        print "ERROR Bad child response: \"$a\"\n";

    }

}


my $all_requests_count = scalar(@all_answers);

print "There were $tot successful requests out of $all_requests_count\n";


my $avg = $tot_t / $tot;

print "average response time: $avg\n";

print "Cache Hits: $tot_c_hits. Missing Cache header: $tot_c_none_existant.\n";


print "\nUnsuccessful URI's:\n";

foreach my $k (keys %error_uris){

    print $error_uris{$k} . "  $k\n";

}


#Any child that timeout, if there was anything in its output stream, it would be store here

my $timedout_answers = $cmd->timedout_answers();

foreach my $a (@$timedout_answers){#each $a is a reference to an array of answers

    foreach my $b (@$a){

        print "timedout answers:\"" . $b ."\"\n" if ($b);

    }

}



I hope that there's some use to this for others. When I first started programming in parallel processing it was difficult to get a grasp of what's going on and why it would fail. But after many years of experience and especially many failed attempts it becomes clear. One note about signal handling. I did not explain what I put here and not all signals are handled. That's perhaps the subject of another article, once I would know what I would exactly write :-)

Clearly you will never die or exit within your child callback. This model does not expect any child to ever exit on its own. The parent is the only one with the discretion of terminating a hung child and replacing it with another one with the next item from its input. So, be paranoid and place anything that you think might exit or die your code in an eval block. Pcmd.pm will call the child callback in an eval block as well and if a die or any error has occurred the whole process group is terminated. If a child exits, the parent process will handle SIGCHLD by terminating all other children and exiting itself also. Therefore, never exit from a child process because it is programmed this way for maximum speed.

Persian - music

| 0 Comments | 0 TrackBacks

Favorite Persian Songs
Dariush - EshghYouTube
Dariush - Age Cheshmat Began AreYouTube
Elaheh - SaagiYouTube
Farzin - Eshghe manYouTube
Googoosh - ShekayatYouTube
Habib - Kharchangha-e MordabiYouTube
Habib - Vadie EshghYouTube
Javad Yassari - Sepideh DamYouTube
Golpa - Yade DelYouTube
Moein - Elaheh NazYouTube
Shahyar Ghambari - Ghadeghan (Forbidden)YouTube
Simin Ghanem - Marde ManYouTube
Simin Ghanem - SeebYouTube

POP - music

| 0 Comments | 0 TrackBacks

POP
Al Green - Let's Stay Together   YouTube
Black Eyed Peas - Meet Me Half Way   YouTube
Foreigner - I Want To Know What Love Is   YouTube
jermaine jackson - do what you do   YouTube
Gregory Abbott - Shake you down   YouTube
Hall & Oates - Sara Smile   YouTube
Kenny Rogers - Lady   YouTube
Patti Label - On My Own   YouTube
Patti Austin & James Ingram - Baby Come to me   YouTube
Queen - The Show Must Go On   YouTube
Womack - Tear Drops   YouTube

Chillout - music

| 0 Comments | 0 TrackBacks

Chillout
Above & Beyond pres. OceanLab - Breaking Ties (Flow Mix)   YouTube
Amber - Sexual (Afterlife Chillout Remix)   YouTube
Armin Van Buuren feat Justine Suissa - Burned With Desire (Chillout Mix)   YouTube
Arnica Montana - Sea,Sand And Sun   YouTube
ATB - Let You Go (Schiller Chillout Remix)   YouTube
Bat For Lashes - Laura   YouTube
Bliss - Wish You Were Here   YouTube
Cafe del Mar - Mandalay - Beautiful   YouTube
Café del Mar (LONDON) Kisses,You & I (D.B.)   YouTube
Cafe del Mar - Levitation - More than ever people   YouTube
Cafe del Mar - Love My Soul   YouTube
Deep Dish - I'm in love with a friend   YouTube
Delerium -Silence (Original Song from Karma) Feat. Sarah Mclachlan   YouTube
Enigma - Sadeness   YouTube
Garbage - Milk   YouTube
Goldfrapp - utopia   YouTube
Grace Jones - Strace (Frantic soundtrack)   YouTube
Karunesh - Punjab   YouTube
Underworld - Cups (Salt City Orchestra Mix)   YouTube
Wally Brill - A Loop In Time   YouTube