# 压缩算法之LZSS—lzss 算法实现

## 二、LZSS算法实现

Python实现

import ctypes
import os

class LZSS():
def __init__(self, preBufSizeBits):
self.threshold = 2


C实现

## 三、性能分析

[En]

Because the code is the most basic implementation, and the matching string search function is not optimized, so the time performance is relatively low, we only compare the byte size of files before and after compression. It is important to mention that there is a threshold parameter 2 in the code, which means that the matching string is greater than or equal to 2 before compression is necessary, because (position, length) the output length of this tag is 16 bits, so it is obvious that only 1 byte of the matching string can not be compressed in this way. You can also find a problem, when writing to the file, the matching length is not the actual value, because the matching length of 0 and 1 does not exist, so simply treat 0 as 2 and 1 as 3, so you can expand the matching length by two lengths.

[En]

After determining that the output format of (position, length) is two bytes, the next step is how to select the number of bits that should be occupied by “position” and “length”. We can set it to a combination of (8), (9), (10), etc., but the bit set for “position” must be greater than or equal to the bit of “length”. The reason is that the sliding window size should be greater than or equal to the forward buffer size, otherwise it doesn’t make sense. For the same file, if the two parameters are different, the compression ratio will also be different. The following is a comparison of the file size of a log file after selecting different parameters to compress. In the figure, preBufSizeBits refers to the number of bits occupied by “length”:

[En]

The uncompressed original file size is 5.06MB, you can clearly see that preBufSizeBits is not the smaller the better, nor the bigger the better, because if the preBufSizeBits is small, then the forward buffer will be smaller, and the matching data string will be smaller; if the preBufSizeBits is large, then although the forward buffer becomes larger, the sliding window will shrink, and the matching range of the data string will become smaller. So you need to choose an appropriate value to make the best file compression performance.

[En]

This is the end of the internal comparison of the algorithm, and then we compare the compression performance with the current popular ZIP and RAR, although it feels beyond our reach, but only when there is a comparison can we make progress.

115.log是原始文件，encode2到encode8其实就是上边性能图里不同preBufSizeBits时生成的压缩文件，decode2到decode8是对应再解压缩出来的文件。115.rar是用电脑上RAR软件压缩的文件，115.zip是用电脑上ZIP软件压缩的文件。我们这个算法最好的性能是压缩到了197KB，和ZIP的161KB差距不是特别大，和RAR就差了相当一大截了。 因为ZIP其实也是类似的滑动窗口匹配压缩，所以接下来优化LZSS算法的话，还是要尽量向ZIP的性能看齐。

## 四、总结

[En]

Next, I will continue to think about the improvement of the LZSS algorithm, including the compression ratio and the performance of compression / decompression time. Because I am an embedded engineer, I will think about how to use data compression on embedded devices. When I look up information on the Internet, I will also find some open source compression libraries, but there may be some problems and limitations in use. Of course, it is best that we know the algorithm we use. In this way, we can modify and apply flexibly according to the needs and characteristics of the project, and we won’t panic if anything goes wrong.

Original: https://www.cnblogs.com/pengkunfan/p/16078537.html
Author: midu
Title: 压缩算法之LZSS—lzss 算法实现

(0)