pandas模块总结一

pandas模块

pandas是建立在numpy之上的,用于数据操纵和分析的一个库。
pandas常用的数据结构有两种:Series和DataFrame。

  1. Series的基本操作

Series对象本质上由两个数组构成,一个数组构成对象的键(index索引),一个数组构成对象的值(values),键 -> 值。

2.1 Series的创建方法

import pandas as pd
a = pd.Series(data, index)

举例:

import pandas as pd
import numpy as np

p1 = pd.Series([1, 2, 34, 32, 56, 788, 67])
print(p1)
print(type(p1))
print(p1.dtype)

p2 = pd.Series([1, 2, 34, 56, 34, 12, 78], index=list("abcdefg"))
print(p2)

temp_dict = {"name": "wuyanzu", "age": 22, "tel": 10086}
p3 = pd.Series(temp_dict)
print(p3)

执行结果:
    0      1
    1      2
    2     34
    3     32
    4     56
    5    788
    6     67
    dtype: int64
    <class 'pandas.core.series.Series'>
    int64

    a     1
    b     2
    c    34
    d    56
    e    34
    f    12
    g    78
    dtype: int64

    name    wuyanzu
    age          22
    tel       10086
    dtype: object

2.2 获取Series中数据的index、values

import pandas as pd
import numpy as np

temp_dict = {"name": "wuyanzu", "age": 22, "tel": 10086}
p3 = pd.Series(temp_dict)
print(p3)
print(p3.dtype)
print(p3.index)
print(type(p3.index))
print(p3.values)
print(type(p3.values))

执行结果:
    name    wuyanzu
    age          22
    tel       10086
    dtype: object

    object

    Index(['name', 'age', 'tel'], dtype='object')
    <class 'pandas.core.indexes.base.Index'>

    ['wuyanzu' 22 10086]
    <class 'numpy.ndarray'>

2.3 Series的切片和索引

2.3.1 切片

切片直接传入start、end、步长即可,步长默认为1。

import pandas as pd
import numpy as np
import string

t = pd.Series(np.arange(10), index=list(string.ascii_uppercase[:10]))
print("创建Series类型的对象:\n", t)

t1 = t[0:5]
print("切片得到的结果:\n", t1)

执行结果:
    创建Series类型的对象:
        A    0
        B    1
        C    2
        D    3
        E    4
        F    5
        G    6
        H    7
        I    8
        J    9
        dtype: int32
    切片得到的结果:
        A    0
        B    1
        C    2
        D    3
        E    4
        dtype: int32

2.3.2 索引

一个Series对象中的元素其实就是一组组”键值对”。
(1)传入单个索引,获得对应索引的值;
(2)传入多个索引组成的列表时,获得对应索引的键值对;
(3)传入的是某个”值”,返回就是该值;
(4)传入条件语句,返回满足条件的键值对;
(5)传入的单个索引或索引列表中的索引不存在时,返回的值、键值对中的值为nan(验证后报错?)。

t = pd.Series(np.arange(10), index=list(string.ascii_uppercase[:10]))
print("创建Series类型的对象:\n", t)

t1 = t[0:5]
print("切片得到的结果:\n", t1)

print(t[1])

print(t["A"])

t2 = t[['A', 'D', 'E']]
print(t2)

print(t[t > 4])

执行结果:
    创建Series类型的对象:
        A    0
        B    1
        C    2
        D    3
        E    4
        F    5
        G    6
        H    7
        I    8
        J    9
        dtype: int32

    切片得到的结果:
        A    0
        B    1
        C    2
        D    3
        E    4
        dtype: int32

        1

        0

        A    0
        D    3
        E    4
        dtype: int32

        F    5
        G    6
        H    7
        I    8
        J    9
dtype: int32

2.4 pandas读取外部数据

import pandas as pd

df = pd.read_csv("D:/人工智能课程/【4】14100_HM数据科学库课件/数据分析资料/day04/code/dogNames2.csv")
print(df)

执行结果:
            Row_Labels  Count_AnimalName
    0              1                 1
    1              2                 2
    2          40804                 1
    3          90201                 1
    4          90203                 1
    ...          ...               ...
    16215      37916                 1
    16216      38282                 1
    16217      38583                 1
    16218      38948                 1
    16219      39743                 1

    [16220 rows x 2 columns]

3.DataFrame的基本操作

3.1 DataFrame的创建

3.1.1 传入可迭代对象创建DataFrame对象

DataFrame对象既有行索引,又有列索引。
(1)行索引:表明不同行,横向索引,叫index,axis=0;
(2)列索引:表明不同列,纵向索引,叫columns,axis=1.

pandas模块总结一
import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(12).reshape(3, 4))
print(df)

执行结果:
       0  1   2   3
    0  0  1   2   3
    1  4  5   6   7
    2  8  9  10  11

3.1.2 传入可迭代对象、index、columns创建DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(12).reshape(3, 4), index=list("abc"), columns=list("WXYZ"))
print(df)

执行结果:
       W  X   Y   Z
    a  0  1   2   3
    b  4  5   6   7
    c  8  9  10  11

3.1.3 传入列表创建DataFrame

传入的列表中某个键值对不存在时,保留对应的位置,返回NAN。

import pandas as pd
import numpy as np

list1 = {"name": ["xiaoming", "xiaowang"], "age": [15, 23], "tel": [10023, 12340]}
df1 = pd.DataFrame(list1)
print(df1)

list2 = [{"name": "Tom", "age": 18, "tel": 10089}, {"age": 22, "tel": 10078}, {"name": "Lucy", "tel": 12789}]
df2 = pd.DataFrame(list2)
print(df2)

执行结果:
            name     age    tel
        0  xiaoming   15  10023
        1  xiaowang   23  12340

            name  age   tel
        0   Tom  18.0  10089
        1   NaN  22.0  10078
        2  Lucy   NaN  12789

3.2 DataFrame的基础属性

import pandas as pd
import numpy as np

list2 = [{"name": "Tom", "age": 18, "tel": 10089}, {"age": 22, "tel": 10078}, {"name": "Lucy", "tel": 12789}]
print(list2)
df2 = pd.DataFrame(list2)
print(df2)

print("DataFrame对象的行数、列数:", df2.shape)
print("DataFrame对象的列数据类型:", df2.dtypes)
print("DataFrame对象的数据维度:", df2.ndim)
print("DataFrame对象的行索引:", df2.index)
print("DataFrame对象的列索引:", df2.columns)
print("DataFrame对象的对象值:", df2.values)

执行结果:
       name   age    tel
    0   Tom  18.0  10089
    1   NaN  22.0  10078
    2  Lucy   NaN  12789

    DataFrame对象的行数、列数: (3, 3)

    DataFrame对象的列数据类型:
            name     object
            age     float64
            tel       int64
            dtype: object

    DataFrame对象的数据维度: 2

    DataFrame对象的行索引: RangeIndex(start=0, stop=3, step=1)

    DataFrame对象的列索引: Index(['name', 'age', 'tel'], dtype='object')

    DataFrame对象的对象值:
            [['Tom' 18.0 10089]
            [nan 22.0 10078]
            ['Lucy' nan 12789]]

3.3 DataFrame对象的整体情况查询

df.head(3)
df.tail(3)
df.info()
df.describe()
import pandas as pd
import numpy as np

list2 = [{"name": "Tom", "age": 18, "tel": 10089}, {"age": 22, "tel": 10078}, {"name": "Lucy", "tel": 12789}]
print(list2)
df2 = pd.DataFrame(list2)
print(df2)

print("头部2行:", df2.head(2))
print("尾部2行:", df2.tail(2))
print("相关信息概览:")
print(df2.info())
print("快速综合统计结果:\n", df2.describe())

执行结果:
    头部2行:
            name   age    tel
        0  Tom  18.0  10089
        1  NaN  22.0  10078
    尾部2行:
            name   age    tel
        1   NaN  22.0  10078
        2  Lucy   NaN  12789
    相关信息概览:
            <class 'pandas.core.frame.DataFrame'>
            RangeIndex: 3 entries, 0 to 2
            Data columns (total 3 columns):

            ---  ------  --------------  -----
            0   name    2 non-null      object
            1   age     2 non-null      float64
            2   tel     3 non-null      int64
            dtypes: float64(1), int64(1), object(1)
            memory usage: 200.0+ bytes
            None

    快速综合统计结果:
                    age           tel
        count   2.000000      3.000000
        mean   20.000000  10985.333333
        std     2.828427   1562.030836
        min    18.000000  10078.000000
        25%    19.000000  10083.500000
        50%    20.000000  10089.000000
        75%    21.000000  11439.000000
        max    22.000000  12789.000000

3.4 DataFrame对象排序

import pandas as pd
import numpy as np

list2 = [{"name": "Tom", "age": 18, "tel": 10089}, {"name": "Lulu", "age": 34, "tel": 10078},
         {"name": "Lucy", "age": 23, "tel": 12789}]
df2 = pd.DataFrame(list2)
print(df2)

print("排序:")
df_sort = df2.sort_values(by="age", ascending=False)
print(df_sort)

执行结果:
            name  age    tel
        0   Tom   18  10089
        1  Lulu   34  10078
        2  Lucy   23  12789

    排序:
            name  age    tel
        1  Lulu   34  10078
        2  Lucy   23  12789
        0   Tom   18  10089

3.5 pandas取行或者取列

(1)方括号写数组,表示取行,对行进行操作
(2)写字符串,表示的去列索引,对列进行操作

import pandas as pd
import numpy as np

list2 = [{"name": "Tom", "age": 18, "tel": 10089}, {"name": "Lulu", "age": 34, "tel": 10078},
         {"name": "Lucy", "age": 23, "tel": 12789}]
df2 = pd.DataFrame(list2)
print(df2)

print("取行、取列操作")
print("取0、1行:\n",df2[0:2])
print("取age列:\n",df2["age"])
print("取0、1行,age列\n",df2[0:2]["age"])

执行结果:
            name  age    tel
        0   Tom   18  10089
        1  Lulu   34  10078
        2  Lucy   23  12789

    取行、取列操作
    取0、1行:
            name  age    tel
        0   Tom   18  10089
        1  Lulu   34  10078

    取age列:
        0    18
        1    34
        2    23
        Name: age, dtype: int64

    取0、1行,age列
        0    18
        1    34
        Name: age, dtype: int64

3.5 pandas获取行列数据之loc、iloc

(1)df.loc[]通过 索引获取数据。
(2)df.iloc[]通过 位置获取数据 。
(3)行和列之间使用逗号”,”分隔开来;loc中使用 ” : “切片时,左闭右闭,包括[ ] 右边的元素。

3.5.1 df.loc[]

import pandas as pd
import numpy as np

t = pd.DataFrame(np.arange(12).reshape(3, 4), index=list("abc"), columns=list("WXYZ"))
print(t)

print(t.loc["a", "W"])
print(type(t.loc["a", "W"]))

print(t.loc[["a"], ["W"]])
print(type(t.loc[["a"], ["W"]]))

print(t.loc["a", ["W", "X"]])
print(type(t.loc["a", ["W", "X"]]))

print(t.loc[["a","c"],["X","Z"]])
print(type(t.loc[["a","c"],["X","Z"]]))

print(t.loc["a":"c","W":"Z"])
print(type(t.loc["a":"c","W":"Z"]))

执行结果:
           W  X   Y   Z
        a  0  1   2   3
        b  4  5   6   7
        c  8  9  10  11

        0
        <class 'numpy.int32'>

            W
        a   0
        <class 'pandas.core.frame.DataFrame'>

        W    0
        X    1
        Name: a, dtype: int32
        <class 'pandas.core.series.Series'>

            X   Z
        a  1   3
        c  9  11
        <class 'pandas.core.frame.DataFrame'>

            W  X   Y   Z
        a  0  1   2   3
        b  4  5   6   7
        c  8  9  10  11
        <class 'pandas.core.frame.DataFrame'>

3.5.2 df.iloc[]

import pandas as pd
import numpy as np

t = pd.DataFrame(np.arange(12).reshape(3, 4), index=list("abc"), columns=list("WXYZ"))
print(t)

print(t.iloc[0])
print(type(t.iloc[0]))

print(t.iloc[0, 0])
print(type(t.iloc[0, 0]))

print(t.iloc[[0], [0]])
print(type(t.iloc[[0], [0]]))

print(t.iloc[0, [0, 1, 2]])
print(type(t.iloc[0, [0, 1, 2]]))

print(t.iloc[[0, 2], [1, 3]])
print(type(t.iloc[[0, 2], [1, 3]]))

print(t.iloc[0:2, 0:3])
print(type(t.iloc[0:2, 0:3]))

执行结果:
           W  X   Y   Z
        a  0  1   2   3
        b  4  5   6   7
        c  8  9  10  11

        W    0
        X    1
        Y    2
        Z    3
        Name: a, dtype: int32
        <class 'pandas.core.series.Series'>

        0
        <class 'numpy.int32'>

           W
        a  0
        <class 'pandas.core.frame.DataFrame'>

        W    0
        X    1
        Y    2
        Name: a, dtype: int32
        <class 'pandas.core.series.Series'>

            X   Z
        a  1   3
        c  9  11
        <class 'pandas.core.frame.DataFrame'>

            W  X  Y
        a  0  1  2
        b  4  5  6
        <class 'pandas.core.frame.DataFrame'>

3.5.3 赋值更改数据

import numpy as np
import pandas as pd

t = pd.DataFrame(np.arange(12).reshape(3, 4), index=list("abc"), columns=list("WXYZ"))
print(t)

t.loc["a", "W"] = 999
print(t)

t.iloc[2, 3] = 666
print(t)

执行结果:
           W  X   Y   Z
        a  0  1   2   3
        b  4  5   6   7
        c  8  9  10  11

           W  X   Y   Z
        a  999  1   2   3
        b    4  5   6   7
        c    8  9  10  11

           W  X   Y    Z
        a  999  1   2    3
        b    4  5   6    7
        c    8  9  10  666

3.6 pandas 布尔索引筛选

import pandas as pd
import numpy as np

t = pd.DataFrame(np.arange(12).reshape(3, 4), index=list("abc"), columns=list("WXYZ"))
print(t)

print(t[t["W"] > 6])

print(t[(t["W"]>3)&(t["X"]<10)])

执行结果:
       W  X   Y   Z
    a  0  1   2   3
    b  4  5   6   7
    c  8  9  10  11

       W  X   Y   Z
    c  8  9  10  11

       W  X   Y   Z
    b  4  5   6   7
    c  8  9  10  11

3.7 pandas 字符串方法

import pandas as pd
import numpy as np
import string

temp_str = ["hello", "world", "Tom", "abcdefg",
            "huweiyong", "wuyanzu", "jinchengwu", "jinxin",
            "haiqing", "john", "huihui", "Li"]
t = pd.DataFrame(np.array(temp_str).reshape(3, 4), index=list("abc"), columns=list("WXYZ"))
print(t)

print(t[(t["W"].str.len() > 5) & (t["Z"].str.len() < 3)])

执行结果:
                 W        X           Y        Z
        a      hello    world         Tom  abcdefg
        b  huweiyong  wuyanzu  jinchengwu   jinxin
        c    haiqing     john      huihui       Li

                 W        X           Y        Z
        c  haiqing      john       huihui       Li

pandas模块总结一

Original: https://blog.csdn.net/weixin_43188487/article/details/121343584
Author: 北四金城武
Title: pandas模块总结一

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/678273/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球