Python3 反向按行读取大文件、日志read_reverse_bigfile

适用场景:

反向读取大文件的需求,多半出于想要获取超大的日志文件最后几百行内容,小文件直接通过 lins = file.readlines()[-200:] 就可以直接得出了,

大文件readlines() 会将文件内容全部读取到内存,在GB大小情况下,内存不一定够,列表切片性能差。 生成器反向读取方式更合适。

反向读取大文件,主要基于:<details><summary>*<font color='gray'>[En]</font>*</summary>*<font color='gray'>Read large files in reverse, mainly based on:</font>*</details>
  • file对象的seek()函数,偏移文件指针
  • re 模块finditer 搜寻文件行尾符,Match对象的end获取行尾符索引位置
  • yield 生成器 节省内存

性能验证:

在windows10, RAM 16G,处理器 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz 2.59 GHz , python 3.7.11 下简单验证,

import re
def read_reverse_bigfile(filepath, encoding='utf-8', separator=b'\n', single_size=1024 * 1024):    """    :param filepath: 文件路径    :param encoding: 字符编码,默认utf-8    :param separator: 行尾分隔符,默认 '\n'    :param single_size: 单次读取 字符量,默认 1024*1024    :return: generator     """    with open(filepath, 'rb') as f:        try:            f.seek(0, 2)            position = f.tell()            if position > single_size:                f.seek(-single_size, 2)            else:                f.seek(0, 0)        except OSError as e:            return 'Blank file'        line = b''        while 1:            chunk = f.read(single_size)            index_list = [match.end() for match in re.finditer(separator, chunk)]            index = None            while index_list:                target = index_list.pop()                if index is None:                    line = chunk[target:] + line                else:                    line = chunk[target:index] + line                if line:                    yield line.decode(encoding=encoding)                line = b''                index = target            else:                line = chunk[:index] + line            position = f.tell()            if position > 2 * single_size and single_size > 0:                f.seek(-2 * single_size, 1)            else:                f.seek(0, 0)                single_size = position - single_size                if single_size                     yield line.decode(encoding=encoding)                    return 'End'

undefined

if __name__ == '__main__':    import time    import os    import psutil    pid = os.getpid()    p = psutil.Process(pid)
start_time = time.time()    fp = r'./test30.txt'    rrb = list(read_reverse_bigfile(fp))    print(len(rrb))    print(f'耗时0:{time.time() - start_time}')    with open(fp, encoding='utf-8') as f:        orl = f.readlines()        print(len(orl))    print(f'耗时1:{time.time() - start_time}')    info = p.memory_full_info().uss/(1024*1024)    print(f'内存信息:{info}MB')

undefined

Original: https://www.cnblogs.com/yougnen/p/16081877.html
Author: 阿伦来啦
Title: Python3 反向按行读取大文件、日志read_reverse_bigfile

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/500156/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球