Follow Symbolic Link Tree pstree

| 0 Comments | 0 TrackBacks
Just like the native unix command pstree, or ptree, for processes, l want to follow a symbolic link and display its tree. A C developer once complained to me that he's lost in his home directory. His setup was such that every time he created a new version of any source file, a new symbolic link was made. I've no idea how but he told me that the links were so deep that he could no longer find the actual file he was opening. This was three years prior to this writing and this simple problem stuck to my mind. I can't believe that there isn't a native Unix command that follows each link and displays where it points to, kind of like pstree.

I have to admit that the script is simple and yet it took me a while to write it. Much longer than I thought so I must be a bad coder. But I think that I have a few legitimate complaints which contributed to my slow down. Here are my complaints which all evolve around Perl functions that deal with file and directories but what if they are used on a symbolic link instead. Consider the following files:

mac:links fsaberi$ pwd
/Users/fsaberi/tmp/links
mac:links fsaberi$ ls -l
total 24
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 13:41 a -> b
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 13:41 b -> c
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 13:41 c -> d

notice how file d doesn't exist. If I were to use Perl's -e file existence test on file a, it would return false because it will follow the link to b then to c and then when it seed that d doesn't exist it will say that a also doesn't exist. To me that's a wrongful interpretation of the existence of "a" itself.

If I were to use module Cwd's abs_path function on a, it would tell me that its absolute path is /Users/fsaberi/tmp/links/d. It followed it all the way. What I was expecting is the absolute path of "a" itself: /Users/fsaberi/tmp/links/a. The symbolic Link a is a valid unix file and it does have an absolute path as well. I wish there were options to Perl saying "don't follow symlinks."

Now consider this, where v circularly points back to itself:

04372B-fsaberi:links fsaberi$ ls -l
total 40
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:11 v -> w
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:11 w -> x
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:12 x -> y
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:12 y -> z
lrwxr-xr-x  1 fsaberi  staff  1 Nov 21 14:13 z -> v

If I were to open v, it would fail. It doesn't just look at v itself and try to open it allowing me to read "w" which is the string content of the file "v." I guess it's by design that you have to first check if it's a link with -l and then use readlink instead. Also, -e on the circular link v would return false! But that's a philosophical question. Does "v" not exist? In my opinion it does exist and -e should return true. Whether it is a symlink who down the road badly points to itself or not was not part of the original question. In my understanding, -e should return true if there exists an inode for that path. Perl doesn't agree with me though.

Even -d can trick you. If you test a link that eventually points to a directory, -d will return true. Again it follows the link. I expected it to say that this file is not a directory.

The program sltree.pl named after symbolic link tree outputs this on file v:

mac:links fsaberi$ sltree.pl /Users/fsaberi/tmp/links/v
/Users/fsaberi/tmp/links/v
  -> /Users/fsaberi/tmp/links/w
   -> /Users/fsaberi/tmp/links/x
    -> /Users/fsaberi/tmp/links/y
     -> /Users/fsaberi/tmp/links/z
      -> /Users/fsaberi/tmp/links/v (circular link)

or making the links a bit more complicated. 

mac:links fsaberi$ ls -l
total 40
lrwxr-xr-x  1 fsaberi  staff   1 Nov 21 14:11 v -> w
lrwxr-xr-x  1 fsaberi  staff   1 Nov 21 14:11 w -> x
lrwxr-xr-x  1 fsaberi  staff  21 Nov 21 14:46 x -> ../../../../tmp/ooo/o
lrwxr-xr-x  1 fsaberi  staff   1 Nov 21 14:12 y -> z
lrwxr-xr-x  1 fsaberi  staff  12 Nov 21 14:49 z -> /usr/bin/dig
mac:links fsaberi$ ls -l  ../../../../tmp/ooo
total 8
lrwxr-xr-x  1 fsaberi  wheel  34 Nov 21 14:48 o -> ../../../Users/fsaberi/tmp/links/y

mac:links fsaberi$ perl ../../Documents/workspace/sltree/sltree.pl ../../tmp/links/v
/Users/fsaberi/tmp/links/v
  -> /Users/fsaberi/tmp/links/w
   -> /Users/fsaberi/tmp/links/x
    -> /tmp/ooo/o
     -> /Users/fsaberi/tmp/links/y
      -> /Users/fsaberi/tmp/links/z
       -> /usr/bin/dig


where link "v" eventually points to the program dig. Here is the source:

#!/usr/bin/perl

use strict; use warnings;

use Cwd;

use File::Basename;


my $start_node = $ARGV[0];

unless($start_node){

print "Must provide a file as argument: $0 <file>\n"; exit;

}


my $tree = DFS_G($start_node);

print $tree->[0] ."\n";

for(my $i=1; $i < scalar(@$tree) ; $i++){

print ' ' x $i;

print " -> ". $tree->[$i] ."\n";

}


# normalizeNewLink_relative_to_OrigLink, returns an absolute path with the . and ..'s

# removed. If path is not absolute, then it will be relative to another path. Meaning

# that this other path will be prepended to the original path first.

# If that "relative to" path is not given, then it will be relative to current working

# directory. (ugliest function name ever! sorry about that.)

sub normalizeNewLink_relative_to_OrigLink{

my $newlink = shift;

my $relativeTo = shift || undef;

my $origlink = $newlink; # store orig path in case an error message needs it.


unless(defined $relativeTo){

$relativeTo = getcwd();

}

my ($filename, $directories, $suffix) ;

if(-l $relativeTo){

($filename, $directories, $suffix) = fileparse($relativeTo);

$relativeTo = $directories;

}

$relativeTo =~ s/\/$//;


# this is how i have to get abs_path so that perl doesn't fail in case it 

        # is a bad link.

unless($newlink =~ /^\//){

($filename, $directories, $suffix) = fileparse($newlink);

$filename .= $suffix;

$newlink = $relativeTo . '/'. $directories . $filename;

}


if($newlink eq '/'){

return $newlink;

}

my @dirs = split('/', $newlink);

my @newlink;

for(my $i = 0 ; $i < scalar(@dirs); $i++){

if ($dirs[$i] eq '.' || $dirs[$i] eq ''){

next;

}elsif($dirs[$i] eq '..'){

pop @newlink if (scalar(@newlink) > 0);

}else{

push @newlink, $dirs[$i];

}

}

my $l = scalar(@newlink);

my $p;

if($l == 0){

return "/"; # we were given something weird like /.././../../

}elsif($l == 1){

$p = '/' . $newlink[0];

}elsif($l > 1){

$newlink[0] = '/'. $newlink[0] if ($newlink[0] !~ /^\\/);

$p = join '/', @newlink;

}else{

die "Error: Could not normalize $origlink\n";

}

return $p;

}


sub DFS_G{

    my $node = shift;

    my %visited;

    my @tree;


    $node = normalizeNewLink_relative_to_OrigLink($node);

    push @tree, $node;

    $visited{$node} =1;


    my $newNode;

while(1){

   if(-l $node){

$newNode = readlink($node);

$newNode normalizeNewLink_relative_to_OrigLink($newNode,$node);

if(exists $visited{$newNode}){

push @tree, $newNode;

$tree[ $#tree ] .= ' (circular link)';

return \@tree;

}else{

$visited{$newNode} = 1;

push @tree, $newNode;

}

    }else{

# it is not a link. It is either a directory or some file. 

                # Figure out if it exists. I can use -e safely if it is NOT

                # a symbolic link, which it isn't here.

unless(-e $tree[ $#tree ]){

$tree[ $#tree ] .= ' (broken link)';

return \@tree;

    }

    $node = $newNode;

}

}


I think that what I first tried to mean by the function name DFS_G was depth first search graph :-) well, this is an extremely simplified form of a DFS on a graph. Every node has exactly one child. And we stop at a broken link, or when we first see a node already visited (circular link). So there is no need for gray and black coloring neither as is the technique in a graph traversal.


I have tested the script somewhat but you never know what can go wrong so if ever anyone uses this and finds a bug please do let me know.

No TrackBacks

TrackBack URL: http://www.farhadsaberi.com/cgi-bin/mt/mt-tb.cgi/19

Leave a comment

About this Entry

This page contains a single entry by Farhad Saberi published on November 20, 2012 9:32 PM.

Non Blocking Multiple Parallel Processing was the previous entry in this blog.

Hierarchical data representation in SQL Netsted Sets is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.