Main Content

Serialsing and unserialising data for storage and transfer in Perl

Archive - Originally posted on "The Horse's Mouth" - 2012-02-28 22:09:25 - Graham Ellis

If you want to save a series of strings to a file, or pass them over a network connection, you'll need to delimit them - add in a special character so that the receiving / reading program will know where one piece of data ends and the next starts. The problem comes if the chosen special character may appear in the string, and how to tell the receiver which the real end is. The process of manipulating the incoming character string so that it can be transferred / saved is known as serialisation.

One of the common ways of serialising data is to add \ characters in front of "specials", but that becomes a bit of an issue if the string includes control codes. Another way is to use "URL encoding" where special characters are replaced by a % character followed by 2 hex digits. That works for any ASCII character, and also has the advantages that it's a familiar format to many people, that words and numbers are still readable, and that the stored / transferred string isn't that much longer than the original. One of the other thing about a serialisation algorithm is that it must be easily reversed - again, that's true for URL encoding.

I was discussing how to do URL encoding and decoding in Perl today (on a Tcl course during a coffee break, but that's another story) and my complete demonstration is [here]. And, being Perl, both encoding and decoding are very short - one line - algorithms.

Encoding:
  $serialised =~ s/([^\w])/"%".sprintf("%02x",ord())/eg;
Decoding:
  $reverted =~ s/%(..)/pack("C",hex())/eg;


Running that code within a sample program:

  munchkin:ftcl grahamellis$ perl itsperl
  'We love C++!' said George
  %27We%20love%20C%2b%2b%21%27%20said%20George%0a
  'We love C++!' said George
  munchkin:ftcl grahamellis$