Uuencoding transforms binary data into a text-based form suitable for delivery through a text-only mailer. The uu in uuencode comes from the phrase UNIX-to-UNIX. Prior to the general availability of the Internet, UNIX computers would store and forward mail through telephone connections. Many of these connections would support only 7-bit ASCII. Uuencoding was a way for users to send binary files through this early mail system. Even though mailers can now transport binary attachments, they still do so by using a form of binary-to-text encoding called MIME.
Uuencode is a filter; it reads from standard input and writes to standard output. For example, to uuencode the file test.bin, use the command
uuencode test.bin < test.bin > test.bin.uue
Why does test.bin appear both as a command-line argument and as redirected input? Because the argument is used to create a header line within the uuencoded file naming the file to be recreated by the uudecode command. Also, the mode (protection bits) of test.bin are incorporated into the header so that they can be reproduced at the destination. Clearly, the files specified by the command-line argument and the input direction need not be the same. Regardless of whether the file specified by the command-line argument exists, the mode in the header line is taken from the mode of standard input.
Continuing with the example, after test.bin.uue has been transmitted (through a mailer, via FTP, or even using a simple mv command) the command
uudecode test.bin.uue
reproduces test.bin in the current directory with the mode specified by the header.
A mailer will oftern prepend and append additional lines to mail messages. Most uudecode programs scan the source for the uuencode header line. After uudecoding the file, some uudecode programs scan for an additional header line, allowing the user to encode several programs into one mail message.
A uuencoded file contains the following sections
begin <file-name> <mode>
full-data
residual data
<empty line>
end
Each of these sections consists of exactly one line, except for the full-date section which is zero or more lines, and the residual-data section, which is zero or one line. The keywords begin and end are case insensitive. <mode> is an octal number, with each octal digit representing the 3-bit protection code for user, group, and other respectively, as used by chmod. Lines are delimited by the line-end convention for the file-system (that is, <cr><nl> for DOS, <nl> for UNIX, binary line-length prefix for VMS, etc.)
Each line of uuencoded data consists of a origin-32 line length followed by 4-byte tuples containing a modified origin-32 representation of four numbers in the range 0 to 63. Each 4-tuple represents 3 bytes of binary data. The line length is the number of bytes of binary data contained in the line, not the number of 4-tuples. This anomaly shows up only on the residual-data line; the full-data lines contain 60 characters, representing 45 bytes of data. Thus the length of a full-data line is always 60 * 3 / 4 = 45 (+32).
(Actually, since each line contains its own length, the format shown above is a simplification showing the usual format of uuencoded files.)
Origin-32 is used because 32 is the first "printable" character in the ASCII collating sequence. The line-length in a full-data line shows as the character 'M' because ord('M') = 45 + ord(' '). In origin-32, the binary value zero would be encoded as 32 (the space character). Since some mailers truncate trailing spaces and others replace internal spaces by tabs and spaces, data would be lost. Modified origin-32 uses 32+65 (the "`" character) to represent zero.
It takes 8 bits to represent 256 values, 6 bits to represent 64 values. There are 24 bits in 3 bytes, requiring 4 6-bit values. For example, the 3-byte value 0x3C4854 is represented in binary as 0011 1100 0100 1000 0101 0100. Taken as 4 groups of 6-bit numbers, this is 001111 000100 100001 010100, or in decimal, 15, 4, 33, 20. In origin-32, these four decimal numbers become 47, 36, 65, 52.
If there are fewer than 3 bytes to encode, as will happen 2 times out of three in a file of arbitrary length, 1 or two bytes of zeros are supplied and 3-to-4 encoding takes place as usual. However, the transmitted length for the data line counts just the residual data bytes; the extra zero bytes supplied at encoding time are deleted at decoding time.
The example uses text rather than binary because it's easier to show here. The 5-line file x.html contains
<HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <META NAME="GENERATOR" CONTENT="Mozilla/4.02 [en] (WinNT; I) [Netscape]"> <TITLE>Process Synchronization</TITLE> </HEAD>
When uuencoded, this becomes
begin 644 x.html
M/$A434P^"CQ(14%$/@H@(" \345402!(5%10+45154E6/2)#;VYT96YT+51Y
M<&4B($-/3E1%3E0](G1E>'0O:'1M;#L@8VAA<G-E=#UI<V\M.#@U.2TQ(CX*
M(" @/$U%5$$@3D%-13TB1T5.15)!5$]2(B!#3TY414Y4/2)-;WII;&QA+S0N
M,#(@6V5N72 H5VEN3E0[($DI(%M.971S8V%P95TB/@H@(" \5$E43$4^4')O
E8V5S<R!3>6YC:')O;FEZ871I;VX\+U1)5$Q%/@H\+TA%040^"DE4
end