压缩算法之LZSS—lzss 算法实现

【自取】最近整理的,有需要可以领取学习:

一、前言

本文是基于我的上一篇博客《无损压缩算法专题——无损压缩算法介绍》的基础上来实现的,博客链接https://blog.csdn.net/qq_34254642/article/details/103651815,这一篇当中实现基本的LZSS算法功能,先不做改进,所以算法效率比较低,但是便于理解。写了Python和C两个版本,两种语言的代码结构是一样的,代码中都有详尽的注释。实现了对任意文件的压缩和解压功能。

二、LZSS算法实现

Python实现

import ctypes
import os

class LZSS():
    def __init__(self, preBufSizeBits):
        self.threshold = 2

C实现

三、性能分析

由于代码是最基本的实现,并且匹配字符串搜索功能没有优化,所以时间性能比较低,所以我们只比较压缩前后文件的字节大小。值得一提的是,代码中有一个阈值参数2,这意味着匹配的字符串在需要压缩之前大于或等于2,因为(位置,长度)这个标签的输出长度是16比特,所以显然只有1个字节的匹配字符串不能用这种方式压缩。你也可以发现一个问题,在写入文件时,匹配的长度不是实际值,因为0和1的匹配长度不存在,所以简单地将0作为2,1作为3,这样就可以将匹配长度扩展两个长度。

[En]

Because the code is the most basic implementation, and the matching string search function is not optimized, so the time performance is relatively low, we only compare the byte size of files before and after compression. It is important to mention that there is a threshold parameter 2 in the code, which means that the matching string is greater than or equal to 2 before compression is necessary, because (position, length) the output length of this tag is 16 bits, so it is obvious that only 1 byte of the matching string can not be compressed in this way. You can also find a problem, when writing to the file, the matching length is not the actual value, because the matching length of 0 and 1 does not exist, so simply treat 0 as 2 and 1 as 3, so you can expand the matching length by two lengths.

在确定(位置,长度)的输出格式为两个字节后,下一步是如何选择应由“位置”和“长度”占用的位数。我们可以将其设置为(8)、(9)、(10)等的组合,但为“位置”设置的位必须大于或等于“长度”位。原因是滑动窗口大小应该大于或等于前向缓冲区大小,否则没有意义。对于同一个文件,如果两个参数不同,压缩比也会不同。以下是选择不同参数进行压缩后日志文件的文件大小比较。图中preBufSizeBits指的是长度占用的位数:

[En]

After determining that the output format of (position, length) is two bytes, the next step is how to select the number of bits that should be occupied by “position” and “length”. We can set it to a combination of (8), (9), (10), etc., but the bit set for “position” must be greater than or equal to the bit of “length”. The reason is that the sliding window size should be greater than or equal to the forward buffer size, otherwise it doesn’t make sense. For the same file, if the two parameters are different, the compression ratio will also be different. The following is a comparison of the file size of a log file after selecting different parameters to compress. In the figure, preBufSizeBits refers to the number of bits occupied by “length”:

压缩算法之LZSS---lzss 算法实现

未压缩的原始文件大小为5.06MB,可以清楚地看到preBufSizeBits不是越小越好,也不是越大越好,因为如果preBufSizeBits越小,那么前向缓冲区就越小,匹配的数据串就越小;如果preBufSizeBits很大,那么虽然前向缓冲区变大,但滑动窗口会缩小,数据串的匹配范围会变小。因此,您需要选择适当的值才能获得最佳的文件压缩性能。

[En]

The uncompressed original file size is 5.06MB, you can clearly see that preBufSizeBits is not the smaller the better, nor the bigger the better, because if the preBufSizeBits is small, then the forward buffer will be smaller, and the matching data string will be smaller; if the preBufSizeBits is large, then although the forward buffer becomes larger, the sliding window will shrink, and the matching range of the data string will become smaller. So you need to choose an appropriate value to make the best file compression performance.

这是算法内部比较的结束,然后我们将压缩性能与目前流行的ZIP和RAR进行比较,虽然感觉力不从心,但只有进行比较才能有所进步。

[En]

This is the end of the internal comparison of the algorithm, and then we compare the compression performance with the current popular ZIP and RAR, although it feels beyond our reach, but only when there is a comparison can we make progress.

压缩算法之LZSS---lzss 算法实现

115.log是原始文件,encode2到encode8其实就是上边性能图里不同preBufSizeBits时生成的压缩文件,decode2到decode8是对应再解压缩出来的文件。115.rar是用电脑上RAR软件压缩的文件,115.zip是用电脑上ZIP软件压缩的文件。我们这个算法最好的性能是压缩到了197KB,和ZIP的161KB差距不是特别大,和RAR就差了相当一大截了。 因为ZIP其实也是类似的滑动窗口匹配压缩,所以接下来优化LZSS算法的话,还是要尽量向ZIP的性能看齐。

四、总结

接下来,我将继续思考LZSS算法的改进,包括压缩比和压缩/解压缩时间的性能。因为我是一名嵌入式工程师,所以我会考虑如何在嵌入式设备上使用数据压缩。当我在网上查找信息时,我也会发现一些开源的压缩库,但在使用中可能会有一些问题和限制。当然,我们最好知道我们使用的算法。这样,我们就可以根据项目的需求和特点灵活修改和应用,出了什么问题也不会慌。

[En]

Next, I will continue to think about the improvement of the LZSS algorithm, including the compression ratio and the performance of compression / decompression time. Because I am an embedded engineer, I will think about how to use data compression on embedded devices. When I look up information on the Internet, I will also find some open source compression libraries, but there may be some problems and limitations in use. Of course, it is best that we know the algorithm we use. In this way, we can modify and apply flexibly according to the needs and characteristics of the project, and we won’t panic if anything goes wrong.

Original: https://www.cnblogs.com/pengkunfan/p/16078537.html
Author: midu
Title: 压缩算法之LZSS—lzss 算法实现

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6520/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

发表回复

登录后才能评论
免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部