UP Paper 101 US-M-ODOWN
Reducing the Length of Shannon-Fano-Elias Codes and Shannon-Fano Codes
Ruan,XiaoyuNorth Dakota State University
Katti,RajendraNorth Dakota State University
In many applications both compression and encryption are desirable for data transmission and storage. The traditional way is to first compress the data and then encrypt it. This strategy achieves “absolute” secrecy. The implementations of such systems would be relatively expensive since both encoder and encrypter are present. We consider using compression schemes for encryption. For some common applications, although both compression and encryption are needed, the requirement for security is minor. Consider storing a large textual database on a CD-ROM. The text of the database needs to be compressed for memory efficiency and encrypted to prevent illegal use. It is only required that the cryptanalysis is as expensive as the protected materials themselves. Another example is embedded multimedia systems that require both data and programs to be compressed and encrypted and stored in main memory. In those cases using compression as a means to achieve security can avoid extra hardware overhead, reduce memory requirement, and protect copyrighted materials. Huffman coding is one of the best-known compression techniques that has an optimal expected length H<= L<H+1 where H represents the entropy of the source. It is shown that if the cryptanalyst does not know the probability mass function (PMF) of the source, then a Huffman coded file is difficult to cryptanalyze. However, this assumption of the cryptanalyst not knowing the PMF is often invalid especially for text data. If the cryptanalyst also knows the construction rule, then Huffman codes can be easily decoded and essentially provide no security because, the probabilities of source symbols must be ordered non-increasingly before encoding, resulting in a unique possible codeword set. Shannon-Fano-Elias code was first introduced in a 1963 information theory book. The probabilities of symbols can be input with arbitrary order in encoding. This feature might be employed to increase the ambiguity of the codes and make the code difficult to break. Unfortunately, the compression efficiency of Shannon-Fano-Elias coding is unsatisfactory. In order to improve its efficiency, we propose two algorithms to reduce the length of Shannon-Fano-Elias codes. Experimental results show that for English alphabet ordered from A to Z and Z to A, the expected lengths decreases by 19.9% and 21.1%, respectively. We show that the expected length of Shannon-Fano codes can also be reduced by using the same strategy.

Xiaoyu Ruan received the B.S. degree in electronics engineering from Fudan University, China in 2003, and the M.S. degree in computer engineering from North Dakota State University in 2005. He is working toward the Ph.D. degree in the Department of Electrical and Computer Engineering at North Dakota State University. In the mean time he is also working as a system security engineer with Corsec Security, Inc., Fairfax, VA. His research interests include VLSI, data compression, cryptography, and network security.