Machine Learning and Data Science (2): Introduction to pandas in Python

2023年8月9日上午12:07 • Python • 阅读 56

Creating a series

series_name = pd.Series(["1", "2", "3"])

Creating a dataframe

dataframe_name = pd.DataFrame({"key1_name": value1_name, "key2_name": value2_name})

Importing data

dataframe_name = pd.read_csv("file_name")

dataframe_name = pd.read_csv("URL_of_the_file")

Exporting data

dataframe_name.to_csv("file_name_you_want_to_store_as")

Describing data

.dtypes shows us what datatype each column contains.

dataframe_name.dtypes

.describe() gives you a quick statistical overview of the numerical columns.

dataframe_name.describe()

.info() shows a handful of useful information about a DataFrame

dataframe_name.info()

You can also call various statistical and mathematical methods such as .mean() or .sum() directly on a DataFrame or Series.

dataframe_name.mean()

series_name.mean()

dataframe_name.sum()

series_name.sum()

.columns will show you all the columns of a DataFrame.

dataframe_name.columns

.index will show you the values in a DataFrame’s index (the column on the far left).

dataframe_name.index

len will show you the length of a dataframe.

len(dataframe_name)

Viewing and selecting data

.head() allows you to view the first 5 rows of your DataFrame.

dataframe_name.head()

.tail() allows you to see the bottom 5 rows of your DataFrame. This is helpful if your changes are influencing the bottom rows of your data.

dataframe_name.tail()

.loc[] takes an integer as input. And it chooses from your Series or DataFrame whichever index matches the number.

dataframe_name.loc[index you choose]

series_name.loc[index you choose]

iloc[] does a similar thing but works with exact positions.

dataframe_name.iloc[index you choose]

series_name.iloc[index you choose]

If you want to select a particular column, you can use [‘COLUMN_NAME’].

dataframe_name['column name']

Boolean indexing works with column selection too. Using it will select the rows which fulfill the condition in the brackets.

dataframe_name[dataframe_name['column name'] > a_condition]

pd.crosstab() is a great way to view two different columns together and compare them.

pd.crosstab(dataframe_name["column_name_1"], dataframe_name["column_name_2"])

If you want to compare more columns in the context of another column, you can use .groupby().

Group by one column and find the mean of the other columns
dataframe_name.groupby(["column_name"]).mean()

%matplotlib inline is a special command which tells Jupyter to show your plots. Commands with % at the front are called magic commands.

Import matplotlib and tell Jupyter to show plots
import matplotlib.pyplot as plt
%matplotlib inline

You can visualize a column by calling .plot() on it.

dataframe_name["column_name"].plot()

You can see the distribution of a column by calling .hist() on you

dataframe_name["column_name"].hist()

Manipulating data

Lower the column
dataframe_name["column_name"].str.lower()

Some functions have a parameter called inplace which means a DataFrame is updated in place without having to reassign it.

.fillna() is a function which fills missing data.

The missing data will not be replaced with mean values when inplace = flase
dataframe_name["column_name"].fillna(dataframe_name["column_name"].mean(),
                                     inplace=False)

Let’s say you wanted to remove any rows which had missing data and only work with rows which had complete coverage.

You can do this using .dropna().

dataframe_name.dropna(inplace = True)

You can remove a column using .drop(‘COLUMN_NAME’, axis=1).

dataframe_name = dataframe_name.drop("column_name", axis=1)

To shuffle the order of the dataframe you could use .sample(frac=1).

.sample() randomly samples different rows from a DataFrame. The frac parameter dictates the fraction, where 1 = 100% of rows, 0.5 = 50% of rows, 0.01 = 1% of rows.

dataframe_1 = dataframe.sample(frac=1)

To get the index back to order

dataframe_1.reset_index()

what if you wanted to apply a function to a column. You can do so using the .apply() function and passing it a lambda function.

dataframe_name["column_name"].apply(lambda x: the equation of the function)

Original: https://blog.csdn.net/stellalxy/article/details/125177038
Author: stellalxy
Title: Machine Learning and Data Science (2): Introduction to pandas in Python

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/743465/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

四、pytest框架

一. Pytest-基本使用应用场景：pytest 框架可以解决我们多个测试脚本一起执行的问题 1.1 安装和介绍概念：pytest 是 python 的一种单元测试框架，同自…

Python 2023年9月12日
0058
【Pygame系列】@你，快来签收卡通Q弹的小奥特曼打小怪兽的小游戏啦~

前言时光留声机。 1966年至今，它火了几十年。推出上百部经典作品，作为史上首部特摄片。奥特曼不仅是几代人的童年与回忆，更是小朋友的快乐，大朋友的情怀！话说，大家都比较喜欢…

Python 2023年9月23日
0054
matplotlib可视化系列之【坐标系统】

了解 matplotlib 的坐标系统，绘制图形的时候，如果需要在图上添加一些额外的元素或者说明文字，您可以很好地控制添加元素和描述文字的位置。 [En] You can well…

Python 2023年5月24日
0056
Django-PyCharm调试

目录（一）PyCharm命令运行项目 1、打开自己创建的MyDjango项目 2、配置Django Server （1）打开”运行/调试配置对话框” 编…

Python 2023年8月6日
0043
ubuntu18.04安装docker

sudo apt update sudo apt install apt-transport-https ca-certificates curl software-propert…

Python 2023年6月6日
0067
数据清洗与数据处理

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。 Original: https://blo…

Python 2023年8月20日
0039
docker部署Python网站

趁着51又把docker基础重新学习了一遍，之前都是看课程，很少动手实践，看着像会了，其实提笔忘字，现在记忆力不行了，很多东西不练一练，基本看完就忘了，linux学习其实没那么难，…

Python 2023年8月10日
0064
Python Flask：安装、使用

正如总所周知的一样，Flask是一个使用 Python 编写的轻量级 Web 应用框架。轻巧页就意味着他比较简洁，不过见到的MTV框架还是有的，（MVC）但是最重要的还是他的可扩展…

Python 2023年8月11日
0044
Python 3 教程

Python 是一种功能强大的编程语言，非常适合编写脚本和快速应用程序开发。它用于 Web 开发（如：Django 和 Bottle）、科学和数学计算（Orange、SymPy、N…

Python 2023年9月24日
0032
梅西进球了,用Python预测世界杯冠军是 … 网友：痛，太痛了

今天凌晨，夺冠热门阿根廷终于赢球了，梅西也打进了自己本届世界杯的第一粒进球！你熬夜看这场比赛了吗？小编也用Python预测了一下本届世界杯的冠军归属，结果却不是阿根廷，来一起看看…

Python 2023年10月27日
0053
【面试总结】人事面试问题

博客园：当前访问的博文已被密码保护请输入阅读密码: Original: https://www.cnblogs.com/upstudy/p/16709007.htmlAutho…

Python 2023年6月15日
0075
numpy与pandas各种功能及其对比（超全）

在做数据处理的时候经常会用到numpy和pandas，有时候容易搞混，这篇文章就从功能方面总结对比一下二者的区别。一、简介 numpy：numpy是以矩阵为基础的数学计算模块，提…

Python 2023年8月2日
0048
django项目部署（uwsgi+nginx）

版本背景： python：3.7 django：2.2 mysql：8.0 uwsgi：2.0.17 nginx：1.15.6 部署步骤： 收&#x9…

Python 2023年8月6日
0030
Python图像处理库Pillow(PIL)的简单使用

图像库PIL(Python Image Library)是Python的第三方图像处理库，但是由于其强大的功能与众多的使用人数，几乎已经被认为是python官方图像处理库了。PIL…

Python 2023年8月2日
0044
python reset_index()_Python Pandas DataFrame.reset_index()用法及代码示例

Python是进行数据分析的一种出色语言，主要是因为以数据为中心的python软件包具有奇妙的生态系统。 Pandas是其中的一种，使导入和分析数据更加容易。 Pandas res…

Python 2023年8月7日
0070
pandas中的apply()使用

apply() 函数可以直接对 Series 或者 DataFrame 中元素进行逐元素遍历操作，方便且高效，apply() 使用时，通常放入一个 lambda 函数表达式、或一个…

Python 2023年8月8日
0063

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31