HTTP Post stream upload a file chunked transfer

| 0 Comments | 0 TrackBacks
Let's upload a file using our own method instead of Perl's LWP. It is really easy to do just about anything with LWP but uploading a file might present a little bit more challenge when the HTTP server you're dealing with does not handle multipart post messages correctly. Thus you will have to dig down and write your own uploading code.

Further more I want to upload without first reading the entire file into memory. LWP does support this  though still forcing your upload to be a multipart POST. I'll first explain how LWP works with uploads, how the HTTP multipart message will look like and what the limitation is with LWP. Then going my own way once again, I will show my Perl program that uses the module IO::Socket::SSL to stream a file over a secured HTTPS connection thus never running out of memory no matter how large the file may be.

LWP's idea of a file upload is to mimic the POST'ing of an HTML's form data. This is shown in its HTTP::Request::Common documentation:

my $res = $ua->request(POST 'http://www.perl.org/survey.cgi', 
                        Content_Type => 'form-data', 
                        Content => [ name => 'Gisle Aas',         email => 'gisle@aas.no',         gender => 'M',  
                                born => '1964',  
                                init => ["$ENV{HOME}/.profile"],
                                ]
                      )
It asks you to require HTTP::Request::Common which exports the subroutine POST and you will create your HTTP::Request object as shown above. In order for LWP to create its multipart/form-data message for uploading you must specify a content type of 'form-data' as one of the request headers. If you want LWP to automatically open a file and read it in for you then you must place the file name inside of an anonymous array reference (the square brackets around $ENV{HOME}/.profile).


Now what if your .profle is 20T in size? Let's just assume that. The above code will run out of memory because LWP by default will read into memory the entire file before sending it. The problem is solved by streaming up the file and this technique is called chunked transferring. More specifically you would tell the HTTP server that the data will be coming in small chunks so keep on reading until I tell you that there's nothing else to send. This instruction is done when your request header says "Transfer-Encoding: chunked".

You instruct LWP to stream up your 20T file by setting this environment value in your code:

$HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1;
By the way chunked transfers are part of HTTP/1.1 specification so you can't say 1.0.  In either case, chunked transferring or not, LWP will construct a multipart/form-data message that will look something like this:
 
POST http://www.perl.org/survey.cgi
  Content-Length: 388
  Content-Type: multipart/form-data; boundary="6G+f"

  --6G+f
  Content-Disposition: form-data; name="name"

  Gisle Aas
  --6G+f
  Content-Disposition: form-data; name="email"

  gisle@aas.no
  --6G+f
  Content-Disposition: form-data; name="gender"

  M
  --6G+f
  Content-Disposition: form-data; name="born"

  1964
  --6G+f
  Content-Disposition: form-data; name="init"; filename=".profile"
  Content-Type: text/plain

  PATH=/local/perl/bin:$PATH
  export PATH

  --6G+f--

The PATH=/local... stuff is the content of the file .profile. In the cause of a chunked transfer, the last part would would look like this:

  Content-Disposition: form-data; name="init"; filename=".profile"
  Content-Type: text/plain
  Transfer-Encoding: chunked

  0x1B
  PATH=/local/perl/bin:$PATH
  0xB

  export PATH

  0

    --6G+f--

So what's the difference? The chunked transferring protocol kicks in and says in hex how many bytes of data are coming (eg: 0x1B for 27 bytes. (26 chars and 1 newline)). The server reads 27 bytes and stores it as content. The next byte, 0xB, is a hexadecimal value saying the amount of data to read next time that will be part of the data. This goes on and on until a \r\n is sent, followed by a 0 and two more \r\n's.

Now the limitation with LWP. I was writing code to upload a file to an HTTP server and my upload was failing. The other team responsible for the HTTP server told me that your message must have a content-type of application/octet-sream. And to put more restrictions on me, their HTTP server did not support multipart/form-data messages properly so I must upload a file by not including it as part of a multipart POST. Just a straight PUT or POST, NOT multipart as shown above.

But LWP does not give you these flexibilities. It forces 2 things on you. First is that your HTTP request message will be multipart. Second is that the first part of the multipart request will have a content type of "multipart/form-data." In fact if in your Perl code you don't say "form-data" then your code will break all together. LWP does allow you to change the content type of the other parts of your multipart request, but not the first part. I could explicitly specify .profile's part of the multipart request to be application/octet-stream but since the first part would always remain multipart/form-data the server handling my request would reject the entire message. And they did say they don't support multipart requests properly. So for this reason I saw that I cannot use LWP and no other module I searched for could give me a straight forward solution. So I was forced to write my own uploading code, and uploading whilst reading the input file bits at a time so as not to run out of memory in case the input file was very large.

Here's the HTTPS upload code for documentation and i don't believe it needs any explanation. When dealing for sockets one has to take lots of precautions and check for errors at every print. Everything's been left out and the gist of the chunked transfer upload over HTTPS is shown.


use IO::Socket::SSL

$IO::Socket::SSL::DEBUG=2;

open (FH,"<","$src_file") || die "create: can't open src file $src_file: $!\n";
binmode FH;

my $sock;
if(! ($sock = IO::Socket::SSL->new(PeerAddr => $server,
                                         PeerPort => $port,
                                         Proto => 'tcp'))) {
                  die "unable to create socket: ". &IO::Socket::SSL::errstr . "\n";
      }else{
           print "create: socket connected to ${server}:${port}\n" if ($verbose > 1);
      }
      binmode $sock;

      $sock->print("POST $url HTTP/1.1\n");
      $sock->print("Host: ${server}:${port}\n");
      $sock->print("Content-Type: application/octet-stream\n");
      $sock->print("farhadsaberi_com_auth: ". $auth_string ."\n");
      $sock->print("Any_Other_Header: Some more Header info here for ya ... \n");
      $sock->print("Transfer-Encoding: chunked\n\n");
      my $filebuf;
      while ( (my $bytes = read(FH,$filebuf,8192)) > 0 ){
           my $hex = sprintf("%X",$bytes);
           unless($sock->print($hex)){
                 warn "Error printing to socket " . &IO::Socket::SSL::errstr . "\n";
                 return 0;
           }
           $sock->print("\r\n");
           unless($sock->print($filebuf)){
                 warn "Error printing to socket " . &IO::Socket::SSL::errstr . "\n";
                 return 0;
           }
           $sock->print("\r\n");
      }
      $sock->print("0\r\n") || return 0;
      $sock->print("\r\n") || return 0;

      my @buf = $sock->getlines();
      close(FH);
      $sock->close();

Now @buf will contain the server's response to you. Print it out and you should see that the first line will be an HTTP response code like this:

HTTP/1.1  201

And the rest will be any header responses the server wished to send back especially set-cookie headers which you will have to parse out correctly and store it if you need to reconnect back right away. The way I did it was to read LWP's source code to figure it out. But that's another story. Hope this helps.

No TrackBacks

TrackBack URL: http://www.farhadsaberi.com/cgi-bin/mt/mt-tb.cgi/13

Leave a comment

About this Entry

This page contains a single entry by Farhad Saberi published on January 24, 2012 9:30 PM.

HTTP Client Browser Simulator Robot Monitoring Tool was the previous entry in this blog.

Non Blocking Multiple Parallel Processing is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.