pandas的内存使用

目录​​​​​​​

统计内存使用情况

info

memory_usage

数据类型和内存的关系

info

ataFram对象调用 info() 时会显示 DataFrame 的内存使用情况(包括索引)。
例如,调用 info() 时会显示下面的 DataFrame 的内存使用情况:

import pandas as pd
import numpy as np
dtypes = [
        "int8",
        "uint8",
        "int16",
        "int32",
        "int64",
        "float64",
        "datetime64[ns]",
        "timedelta64[ns]",
        "complex128",
        "object",
        "bool",
    ]
n = 5000

data = {"col_"+t: np.random.randint(100, size=n).astype(t) for t in dtypes}

df = pd.DataFrame(data)

df["categorical"] = df["col_object"].astype("category")

df.info()

output

RangeIndex: 5000 entries, 0 to 4999
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype
 0   col_int8             5000 non-null   int8
 1   col_uint8            5000 non-null   uint8
 2   col_int16            5000 non-null   int16
 3   col_int32            5000 non-null   int32
 4   col_int64            5000 non-null   int64
 5   col_float64          5000 non-null   float64
 6   col_datetime64[ns]   5000 non-null   datetime64[ns]
 7   col_timedelta64[ns]  5000 non-null   timedelta64[ns]
 8   col_complex128       5000 non-null   complex128
 9   col_object           5000 non-null   object
 10  col_bool             5000 non-null   bool
 11  categorical          5000 non-null   category
dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int16(1), int32(1), int64(1), int8(1), object(1), timedelta64[ns](1), uint8(1)
memory usage: 463.8 KB

memory_usage

每列的内存使用情况可以通过调用memory_usage()方法得到。 这将返回一个 Series,其索引由列名和每列的内存使用情况表示,以字节为单位。 对于上面的DataFrame,可以通过memory_usage方法查看每一列的内存使用量和总内存使用量:

如果要获取准确内存时候情况,可以开启参数deep=True

df.memory_usage(deep=True)
output
Index                     128
col_int8                 5000
col_uint8                5000
col_int16               10000
col_int32               20000
col_int64               40000
col_float64             40000
col_datetime64[ns]      40000
col_timedelta64[ns]     40000
col_complex128          80000
col_object             179800
col_bool                 5000
categorical              9968
dtype: int64

df.memory_usage(deep=True).sum()
#output
474896

Data typeDescription

Boolean (True or False) stored as a byte

Default integer type (same as C

; normally either

or

Identical to C

(normally

or

Integer used for indexing (same as C

; normally either

or

Byte (-128 to 127)

Integer (-32768 to 32767)

Integer (-2147483648 to 2147483647)

Integer (-9223372036854775808 to 9223372036854775807)

Unsigned integer (0 to 255)

Unsigned integer (0 to 65535)

Unsigned integer (0 to 4294967295)

Unsigned integer (0 to 18446744073709551615)

Shorthand for

Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

Shorthand for

Complex number, represented by two 32-bit floats

Complex number, represented by two 64-bit floats

Original: https://blog.csdn.net/haohaizijhz/article/details/122722847
Author: 只要开始永远不晚
Title: pandas的内存使用

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/676085/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球