Perl FTP Retrieve All Files and Sub Directories

| 0 Comments | 0 TrackBacks
In this article I want to tackle everyone's old friend who refuses to die. The File Transfer Protocol or FTP. I personally believe that FTP will be around forever especially on internal networks isolated from any possible outside intrusion. FTP servers are not feature rich. You can do basic file transfers, rename files or move files. But one thing that everyone at some point tries to do is retrieve all the files and directories. Basically scraping the remote file system and bringing it over locally. Borrowing from the SCP world, we want to achieve the equivalent of this for FTP:

scp  -r  remote_host.domain.com:/*  /local_directory

In my last article Perl Find file with BFS and DFS, I explained the two algorithms we can use to traverse a directory structure. Combining the Breath First Search algorithm with perl's awesome Net::FTP::File module, we can do just that. Get this module from here.

Here's a picture of the traversal of the the same directory structure as the one in the previous article but this time we shows the traversal order in BFS order.
BFS.png
Let's show the code that will do all the work and then explain it.

#!/usr/bin/perl
use strict;
use warnings;
use Net::FTP::File;

my  $ftp = Net::FTP->new( '10.1.10.50', Debug => 0) 
                                     || die "Cannot connect: $@\n";
      $ftp->login("anonymous", "anonymous") 
                                     || die "FTP: Cannot login ". $ftp->message."\n";

BFS('1'); # or '.' as the remote root directory where the crawling begins
$ftp->quit;
exit(0);

sub BFS{
     my $root=shift;
     my @queue = ($root);

     mkdir_locally($root);
     while (scalar(@queue) > 0 ){
          my @tmp_queue; 
          foreach my $remotedir (@queue){
                print "$remotedir\n";
                my($remotefiles,$remotedirs) = ftp_dirs_files($remotedir);
                map { &mkdir_locally($_);} @$remotedirs;
                map { &ftp_get_file($_);}  @$remotefiles;
                push @tmp_queue,@$remotedirs;
          }
          @queue = @tmp_queue;
     }
}

sub mkdir_locally{
    my $local_dir = shift;
    $local_dir =~ s/\/$//;
    my @dirs = split(/\//,$local_dir);
    my $dir;
    for(my $i=0; $i < scalar(@dirs) ; $i++) {
          $dir .= $dirs[$i]."/"; 
          unless(-d $dir){
                mkdir $dir || die("can't mkdir $dir: $!");
          }
    }
}

sub ftp_dirs_files{
     my $dir=shift;
     my $dirinfo_href= $ftp->dir_hashref($dir);
     my (@remotedirs, @remotefiles);
     foreach my $remotefile (keys %$dirinfo_href){
          next if ($remotefile =~ /^\.$/ || $remotefile =~ /^\.\.$/);
          $remotefile = "${dir}/${remotefile}";
          if ($ftp->isfile($remotefile)){
                push(@remotefiles,$remotefile);
          }elsif($ftp->isdir($remotefile)){
                push(@remotedirs,$remotefile);
          }
     }
     #return lists of remote files names and directories names
     return(\@remotefiles,\@remotedirs); 
}

sub ftp_get_file{
    my $remotefile = shift;
    print "remotefile: $remotefile\n";
    $ftp->get("$remotefile","$remotefile");
}

There are a few subroutines here so I'll explain each briefly. First we connect to the FTP server and immediately call BFS('1'). The argument '1' is the remote directory called '1' that we are going to fetch and all its sub directories and files. If you want to retrieve all files and subdirectories starting at root, just say BFS('.'). In Unix, a dot mean "the current directory."

canadian_loon_head.jpg
The sub mkdir_locally() takes in as its argument a single directory path which may look like '1/2/3.' Sine you cannot create directory 2 before first creating 1, we split the names into their parts on '/' and create them in order from parent to child.

The sub ftp_dirs_files() takes a remote directory name as its argument. It then populates two different arrays with the file names and directory names it finds under that remote directory. You might have noticed that I always construct a full path so that I don't have to chdir. The two arrays of files and directory names are returned back to sub BFS() where it creates those directories locally by calling mkdir_locally() on each array element in @remotedirs. The ftp_get_file() is called for each entry in @remotefiles to actually perform an "ftp get" command to retrieve the file. 

At every step of the descent you can choose what to do with each remote directory and file you see. You have full control over what you wish to perform and not be at the mercy of bundled ftp client softwares.

Just as in this article, we are going to create the directories with names that represent our tree structure above and run our program ftp.pl on it to demonstrate the BFS traversal. The directories are created with the same names that reflect 

mkdir  -p  1/2/5/11  1/2/6/12  1/2/6/13  1/2/7
mkdir  -p  1/3/8  1/3/9/14
mkdir  -p  1/4/10/15  1/4/10/16  1/4/10/17  1/4/10/18
mkdir  -p  1/4/10/17/19/21  1/4/10/17/20

If you run this program you'll get an output which shows the BFS nature of the traversal.
 $ ./ftp.pl
1
1/4
1/3
1/2
1/4/10
1/3/8
1/3/9
1/2/6
1/2/7
1/2/5
1/4/10/18
1/4/10/16
1/4/10/17
1/4/10/15
1/3/9/14
remotefile: 1/3/9/14/image.jpg
1/2/6/13
1/2/6/12
1/2/5/11
1/4/10/17/19
1/4/10/17/20
1/4/10/17/19/21

refer back to this article for an explanation of this output. In short, BFS visits all siblings of a child directory before it visits each grand child. 

I put in a file under 1/3/9/14 just to show when its retrieval happens in a print statement. I hope that you have found this article useful.

No TrackBacks

TrackBack URL: http://www.farhadsaberi.com/cgi-bin/mt/mt-tb.cgi/3

Leave a comment

About this Entry

This page contains a single entry by Farhad Saberi published on October 27, 2010 9:41 PM.

Perl File Find Breath Depth First Search Algorithms was the previous entry in this blog.

Perl Binary Heap is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.