pandas10minnutes_中英对照01

2023年8月6日下午9:24 • Python • 阅读 42

本次主要讲以下三部分：
1.Object creation(对象创建)
2.Viewing data(查看数据)
3.Selection(筛选）

导入包

import numpy as np
import pandas as pd

1.Object creation(对象创建)

Creating a Series by passing a list of values, letting pandas create a default integer index:
通过传递一列值创建序列，利用pandas（熊猫）创建默认整数索引

s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:
通过传递NumPy数组创建带有日期时间索引和带标签列名的数据帧（数据框），
创建时间索引


dates = pd.date_range("2013/01/01", periods=6,freq='d')
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')


df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df = pd.DataFrame(np.random.randn(6, 4), index=dates,columns=['A','B','C','D'])
df

ABCD2013-01-01-0.520896-0.340412-1.265841-0.4195622013-01-02-0.2704851.139635-0.099596-0.6226232013-01-031.380236-1.9222051.406446-1.5342922013-01-041.0490230.363657-0.479516-0.2430512013-01-050.7208960.8215810.369389-0.1330512013-01-06-0.337006-0.3295371.296696-2.602595

Creating a DataFrame by passing a dictionary of objects that can be converted into a series-like structure:
通过传递字典对象创建数据帧，这些对象可以转换为类似序列的结构

df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df2

ABCDEF01.02013-01-021.03testfoo11.02013-01-021.03trainfoo21.02013-01-021.03testfoo31.02013-01-021.03trainfoo

The columns of the resulting DataFrame have different dtypes:
结果数据帧的列具有不同的数据类型：


df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

If you’re using IPython, tab completion for column names (as well as public attributes) is automatically enabled. Here’s a subset of the attributes that will be completed:
如果使用的是IPython，则会自动启用列名（以及公共属性）的制表符补齐功能，以下是将要完成的属性子集：


df2.describe()

ACDcount4.04.04.0mean1.01.03.0std0.00.00.0min1.01.03.025%1.01.03.050%1.01.03.075%1.01.03.0max1.01.03.0

As you can see, the columns A, B, C, and D are automatically tab completed. E and F are there as well; the rest of the attributes have been truncated for brevity.

如您所见，A、B、C和D列是自动完成的。E和F也存在；为简洁起见，其余属性已被截断。

2.Viewing data（查看数据）

Here is how to view the top and bottom rows of the frame:
以下是如何查看数据框的顶行和底行：


df.head()


df.tail(3)

ABCD2013-01-041.0490230.363657-0.479516-0.2430512013-01-050.7208960.8215810.369389-0.1330512013-01-06-0.337006-0.3295371.296696-2.602595

Display the index, columns:
显示索引，列：


df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')


df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

DataFrame.to_numpy() gives a NumPy representation of the underlying data. Note that this can be an expensive operation when your DataFrame has columns with different data types, which comes down to a fundamental difference between pandas and NumPy: NumPy arrays have one dtype for the entire array, while pandas DataFrames have one dtype per column. When you call DataFrame.to_numpy(), pandas will find the NumPy dtype that can hold all of the dtypes in the DataFrame. This may end up being object, which requires casting every value to a Python object.

DataFrame.to_numpy()给出了底层数据的numpy表示。请注意，当您的DataFrame具有不同数据类型的列时，这可能是一个代价昂贵的操作，这可以归结为pandas和numpy之间的一个根本区别：NumPy整个数组有一个数据类型，而pandas数据框的每列有自己的一个数据类型。当你调用函数DataFrame.to_numpy()时，pandas需要找到可以保存数据帧中所有数据类型的NumPy数据类型。这最终可能将数据类型转化为一个对象，需要将每个值都转换为Python对象。

For df, our DataFrame of all floating-point values, DataFrame.to_numpy() is fast and doesn’t require copying data:
对于df,DataFrame中的每个值都是浮点型，DataFrame.to_numpy() 很快，不需要对数据进行复制

For df2, the DataFrame with multiple dtypes, DataFrame.to_numpy() is relatively expensive:
对于 df2, DataFrame（数据框）有多种数据类型, DataFrame.to_numpy() 的操作代价相对昂贵

df2.to_numpy()

array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

note:
DataFrame.to_numpy() does not include the index or column labels in the output.

注意:DataFrame.to_numpy() 在输出中不包括索引或列标签。

df.describe()

ABCDcount6.0000006.0000006.0000006.000000mean0.336962-0.0445470.204596-0.925862std0.8126501.0966721.0379800.961737min-0.520896-1.922205-1.265841-2.60259525%-0.320375-0.337693-0.384536-1.30637550%0.2252060.0170600.134896-0.52109375%0.9669910.7071001.064869-0.287179max1.3802361.1396351.406446-0.133051

Transposing your data:
对数据进行转置

df.T

2013-01-01 00:00:002013-01-02 00:00:002013-01-03 00:00:002013-01-04 00:00:002013-01-05 00:00:002013-01-06 00:00:00A-0.520896-0.2704851.3802361.0490230.720896-0.337006B-0.3404121.139635-1.9222050.3636570.821581-0.329537C-1.265841-0.0995961.406446-0.4795160.3693891.296696D-0.419562-0.622623-1.534292-0.243051-0.133051-2.602595

Sorting by an axis:
按轴排序：


df.sort_index(axis=1, ascending=False)

DCBA2013-01-01-0.419562-1.265841-0.340412-0.5208962013-01-02-0.622623-0.0995961.139635-0.2704852013-01-03-1.5342921.406446-1.9222051.3802362013-01-04-0.243051-0.4795160.3636571.0490232013-01-05-0.1330510.3693890.8215810.7208962013-01-06-2.6025951.296696-0.329537-0.337006


df.sort_index(axis=0, ascending=False)

ABCD2013-01-06-0.337006-0.3295371.296696-2.6025952013-01-050.7208960.8215810.369389-0.1330512013-01-041.0490230.363657-0.479516-0.2430512013-01-031.380236-1.9222051.406446-1.5342922013-01-02-0.2704851.139635-0.099596-0.6226232013-01-01-0.520896-0.340412-1.265841-0.419562

Sorting by values:
对值进行排序

df.sort_values(by="B")

ABCD2013-01-031.380236-1.9222051.406446-1.5342922013-01-01-0.520896-0.340412-1.265841-0.4195622013-01-06-0.337006-0.3295371.296696-2.6025952013-01-041.0490230.363657-0.479516-0.2430512013-01-050.7208960.8215810.369389-0.1330512013-01-02-0.2704851.139635-0.099596-0.622623

3.Selection （筛选）

note:
While standard Python / NumPy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we recommend the optimized pandas data access methods, .at, .iat, .loc and .iloc. See the indexing documentation Indexing and Selecting Data and MultiIndex / Advanced Indexing.

注意：虽然用于选择和设置的标准Python/NumPy表达式非常直观，并且对于交互式工作非常方便，对于生产代码，我们推荐优化的pandas数据访问方法。如 at,iat,loc和 .iloc.请参阅索引文档索引,选择数据以及多索引/高级索引。

Selecting a single column, which yields a Series, equivalent to df.A:
选择一个列，生成一个序列，相当于df.A:

df["A"]

2013-01-01   -0.520896
2013-01-02   -0.270485
2013-01-03    1.380236
2013-01-04    1.049023
2013-01-05    0.720896
2013-01-06   -0.337006
Freq: D, Name: A, dtype: float64

Selecting via [], which slices the rows:
通过[]进行筛选，将行切片

df[0:3]

ABCD2013-01-01-0.520896-0.340412-1.265841-0.4195622013-01-02-0.2704851.139635-0.099596-0.6226232013-01-031.380236-1.9222051.406446-1.534292

df["2013-01-02":"2013-05-04"]

ABCD2013-01-02-0.2704851.139635-0.099596-0.6226232013-01-031.380236-1.9222051.406446-1.5342922013-01-041.0490230.363657-0.479516-0.2430512013-01-050.7208960.8215810.369389-0.1330512013-01-06-0.337006-0.3295371.296696-2.602595

See more in Selection by Label.

For getting a cross section using a label:
按标签选择
请参阅”按标签选择”中的详细信息
要使用标签获取横截面：

df.loc[dates[0]]

A   -0.520896
B   -0.340412
C   -1.265841
D   -0.419562
Name: 2013-01-01 00:00:00, dtype: float64

Selecting on a multi-axis by label:
按标签在多轴上选择：

df.loc[:, ["A", "B"]]

AB2013-01-31-0.512502-1.0737982013-02-281.671920-1.6031492013-03-310.116484-0.5197652013-04-300.3833180.4106092013-05-31-0.818920-2.5959572013-06-301.0591150.402510

Showing label slicing, both endpoints are included:
显示标签切片时，包括两个端点：

df.loc["20130102":"20130104", ["A", "B"]]

AB2013-01-02-0.2704851.1396352013-01-031.380236-1.9222052013-01-041.0490230.363657

Reduction in the dimensions of the returned object:
减少返回对象的维度：

df.loc["20130102", ["A", "B"]]

A   -0.270485
B    1.139635
Name: 2013-01-02 00:00:00, dtype: float64

For getting a scalar value:
要获取标量值，

df.loc[dates[0], "A"]

-0.52089556678858

For getting fast access to a scalar (equivalent to the prior method):
为了快速访问标量（相当于前面的方法）：

df.at[dates[0], "A"]

-0.52089556678858

See more in Selection by Position.

Select via the position of the passed integers:
按位置选择
请参阅”按位置选择”中的更多内容
通过传递的整数位置选择：

df.iloc[3]

A    1.049023
B    0.363657
C   -0.479516
D   -0.243051
Name: 2013-01-04 00:00:00, dtype: float64

By integer slices, acting similar to NumPy/Python:
通过整数切片，其作用类似于NumPy/Python：

df.iloc[3:5, 0:2]

By lists of integer position locations, similar to the NumPy/Python style:
通过整数位置列表，类似于NumPy/Python样式：

df.iloc[[1, 2, 4], [0, 2]]

AC2013-01-02-0.440009-0.0949012013-01-03-1.0955891.4432712013-01-05-0.8263572.082919

For slicing rows explicitly:
对于精确地行切片：

df.iloc[1:3, :]

ABCD2013-01-02-0.4400090.666086-0.0949011.0876102013-01-03-1.0955890.7084281.443271-0.012472

For slicing columns explicitly:
对于精确地列切片：

df.iloc[:, 1:3]

BC2013-01-010.1299660.7491872013-01-020.666086-0.0949012013-01-030.7084281.4432712013-01-04-0.3399910.5848772013-01-050.0721592.0829192013-01-06-0.7462470.195187

For getting a value explicitly:
对于精确地获取值，

df.iloc[1, 1]

0.6660861685291358

For getting fast access to a scalar (equivalent to the prior method):
为了快速访问标量（相当于前面的方法）：

df.iat[1, 1]

0.6660861685291358

Using a single column’s values to select data:
布尔索引
使用某个列的值选择数据：

df[df["A"] > 0]

ABCD2013-02-281.671920-1.603149-0.154643-0.7521012013-03-310.116484-0.5197650.918146-0.7175622013-04-300.3833180.4106090.071098-0.0299652013-06-301.0591150.4025100.773409-1.164358

Selecting values from a DataFrame where a boolean condition is met:
从满足布尔条件的 DataFrame（数据帧）中选择值：

df[df > 0]

ABCD2013-01-31NaNNaN1.407725NaN2013-02-281.671920NaNNaNNaN2013-03-310.116484NaN0.918146NaN2013-04-300.3833180.4106090.071098NaN2013-05-31NaNNaN0.362031NaN2013-06-301.0591150.4025100.773409NaN

Using the isin() method for filtering:
通过isin()方法进行过滤

df2 = df.copy()
df2["E"] = ["one", "one", "two", "three", "four", "three"]
df2

ABCDE2013-01-01-0.6330720.1299660.7491871.201542one2013-01-02-0.4400090.666086-0.0949011.087610one2013-01-03-1.0955890.7084281.443271-0.012472two2013-01-04-0.012166-0.3399910.584877-0.930127three2013-01-05-0.8263570.0721592.082919-0.478526four2013-01-06-0.357370-0.7462470.195187-1.009280three

df2[df2["E"].isin(["two", "four"])]

ABCDE2013-01-03-1.0955890.7084281.443271-0.012472two2013-01-05-0.8263570.0721592.082919-0.478526four

Setting a new column automatically aligns the data by the indexes:
设置新列并自动按索引对齐原数据：

s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130131", periods=6))
df["F"] = s1
df

ABCDF2013-01-31-0.512502-1.0737981.407725-2.0425281.02013-02-281.671920-1.603149-0.154643-0.752101NaN2013-03-310.116484-0.5197650.918146-0.717562NaN2013-04-300.3833180.4106090.071098-0.029965NaN2013-05-31-0.818920-2.5959570.362031-1.440398NaN2013-06-301.0591150.4025100.773409-1.164358NaN

Setting values by label:

df.at[dates[0], "A"] = 0
df

ABCDF2013-01-010.0000000.1299660.7491871.201542NaN2013-01-02-0.4400090.666086-0.0949011.0876101.02013-01-03-1.0955890.7084281.443271-0.0124722.02013-01-04-0.012166-0.3399910.584877-0.9301273.02013-01-05-0.8263570.0721592.082919-0.4785264.02013-01-06-0.357370-0.7462470.195187-1.0092805.0

Setting values by position:
按位置设置值：

df.iat[0, 1] = 0
df

ABCDF2013-01-010.0000000.0000000.7491871.201542NaN2013-01-02-0.4400090.666086-0.0949011.0876101.02013-01-03-1.0955890.7084281.443271-0.0124722.02013-01-04-0.012166-0.3399910.584877-0.9301273.02013-01-05-0.8263570.0721592.082919-0.4785264.02013-01-06-0.357370-0.7462470.195187-1.0092805.0

Setting by assigning with a NumPy array:
通过使用NumPy数组来赋值：

df.loc[:, "D"] = np.array([5] * len(df))
df

ABCDF2013-01-010.0000000.0000000.7491875NaN2013-01-02-0.4400090.666086-0.09490151.02013-01-03-1.0955890.7084281.44327152.02013-01-04-0.012166-0.3399910.58487753.02013-01-05-0.8263570.0721592.08291954.02013-01-06-0.357370-0.7462470.19518755.0

A where operation with setting:
使用where操作赋值：

df2 = df.copy()
df2[df2 > 0] = -df2
df2

ABCDF2013-01-010.0000000.000000-0.749187-5NaN2013-01-02-0.440009-0.666086-0.094901-5-1.02013-01-03-1.095589-0.708428-1.443271-5-2.02013-01-04-0.012166-0.339991-0.584877-5-3.02013-01-05-0.826357-0.072159-2.082919-5-4.02013-01-06-0.357370-0.746247-0.195187-5-5.0

df2[df2 <0] = -df2+1
df2

ABCDF2013-01-010.0000000.0000001.7491876NaN2013-01-021.4400091.6660861.09490162.02013-01-032.0955891.7084282.44327163.02013-01-041.0121661.3399911.58487764.02013-01-051.8263571.0721593.08291965.02013-01-061.3573701.7462471.19518766.0

Original: https://blog.csdn.net/u012338969/article/details/124558022
Author: 雪龙无敌
Title: pandas10minnutes_中英对照01

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/738468/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

两个库搞定python中引用javascript代码块/文件

在一些特殊的python应用场景下需要逆向执行javascript代码块或者.js文件，比如：爬虫的时候下载下来的html页面中包含你要逆向执行的js代码块。【阅读全文】在py…

Python 2023年11月2日
0040
【图像识别】基于卷积神经网络实现手写汉字识别附matlab代码

Python 2023年5月24日
0053
Python疫情数据爬取与可视化

使用Python爬取腾讯新闻疫情数据，并使用pyecharts可视化，绘制增长人数地图、柱状图、折线图。文章目录 * – 1.分析网页 – 2.导入模块 …

Python 2023年8月7日
0067
老板问我，1个月能不能搞定元宇宙成为最in的公司？？？

职场中有一种需求叫做老板的需求！真的很怕又有什么爆款产品出现，因为会接到老板亲自下的需求。今年元宇宙很火爆，大厂都纷纷入局！我们公司也出现了这样的一幕…&#82…

Python 2023年10月25日
0037
FIX:FusionCharts Suite XT 3.19.x

FusionCharts Suite XT：探索 100 多张图表和 2000 多张地图FusionCharts 提供了 100 多张图表和 2000 多张地图。凭借广泛的文档、一…

Python 2023年9月17日
0032
Python matplotlib 实时数据动画

; 文章目录一、实时数据可视化的数据准备 * 01.设置图表主题样式 02 使用样例数据二、使用电影票房数据制作动画一、实时数据可视化的数据准备 import pandas …

Python 2023年9月1日
0042
[亲身实践]如何通过nginx部署使用dwebsocket的django工程

upstream echcom { server 127.0.0.1:8000; server { listen 10000; server_name 127.0.0.1: cha…

Python 2023年8月4日
0034
软件设计模式白话文系列（十四）策略模式

定义一个算法的系列，将其各个分装，并且使他们有交互性。策略模式使得算法在用户使用的时候能独立的改变。在 Java 中，从 JDK1.8 开始支持函数式编程，就是策略模式的一种体现…

Python 2023年10月14日
0045
使用 Python 实现一个简单的智能聊天机器人

简要说明最近两天需要做一个python的小程序, 就是实现人与智能机器人（智能对话接口）的对话功能，目前刚刚测试了一下可以实现，就是能够实现个人与机器的智能对话(语音交流)。 …

Python 2023年5月24日
0091
sql serve数据库基础入门(2)

; 前言 🎈个人主页:🎈 :✨✨✨初阶牛✨✨✨🐻推荐专栏: 🍔🍟🌯 c语言初阶🔑个人信条: 🌵知行合一🍉本篇简介:>:在上一篇的最后,牛牛介绍了怎么创建表,本篇牛牛介绍如何管…

Python 2023年10月9日
0037
Django ORM 字段

Django ORM 字段在models.py 中创建，按照固定格式在数据库模型类中建立，主要包括指定字段名的字段类型、字段属性等。代码 name = models.CharFi…

Python 2023年8月6日
0028
Python循环遍历多值QuerySet类型问题的处理心得——嵌套字典

搜索引擎需要查询到数据库中符合条件的所有网页并一一显示，前边介绍了单条语句的查询与显示，实战中需要将多条并列信息传入模板，列表肯定是不够的，这就考虑到了字典。 list = pag…

Python 2023年8月4日
0056
三分钟解决session not created: This version of ChromeDriver only supports Chrome version问题

三分钟解决session not created: This version of ChromeDriver only supports Chrome version问题 1. 第…

Python 2023年8月2日
0038
python对excel数据分析常用功能（一文学会如何用Python实现excel基础功能）

（一文学会如何用Python实现excel基础功能）本文主要应用pandas包完成，先加载pandas包 import pandas as pd data = pd.read_e…

Python 2023年8月16日
0040
英语16种时态

时态英语动词的时态动作的时间 + 动作的状态时间状态概念 (1).动词时间过去将来也有图2这种情况过去将来时间在现实中不常见一般只出现在从句中过去将来完成进行时态这样的甚至…

Python 2023年6月15日
0069
联邦聚合(FedAvg、FedProx、SCAFFOLD)

联邦聚合算法对比(FedAvg、FedProx、SCAFFOLD) 论文链接： FedAvg：Communication-Efficient Learning of Deep Ne…

Python 2023年9月26日
0046

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

pandas10minnutes_中英对照01

导入包

1.Object creation(对象创建)

2.Viewing data（查看数据）

3.Selection （筛选）

大家都在看