Creating a series
series_name = pd.Series(["1", "2", "3"])
Creating a dataframe
dataframe_name = pd.DataFrame({"key1_name": value1_name, "key2_name": value2_name})
Importing data
dataframe_name = pd.read_csv("file_name")
dataframe_name = pd.read_csv("URL_of_the_file")
Exporting data
dataframe_name.to_csv("file_name_you_want_to_store_as")
Describing data
.dtypes shows us what datatype each column contains.
dataframe_name.dtypes
.describe() gives you a quick statistical overview of the numerical columns.
dataframe_name.describe()
.info() shows a handful of useful information about a DataFrame
dataframe_name.info()
You can also call various statistical and mathematical methods such as .mean() or .sum() directly on a DataFrame or Series.
dataframe_name.mean()
series_name.mean()
dataframe_name.sum()
series_name.sum()
.columns will show you all the columns of a DataFrame.
dataframe_name.columns
.index will show you the values in a DataFrame’s index (the column on the far left).
dataframe_name.index
len will show you the length of a dataframe.
len(dataframe_name)
Viewing and selecting data
.head() allows you to view the first 5 rows of your DataFrame.
dataframe_name.head()
.tail() allows you to see the bottom 5 rows of your DataFrame. This is helpful if your changes are influencing the bottom rows of your data.
dataframe_name.tail()
.loc[] takes an integer as input. And it chooses from your Series or DataFrame whichever index matches the number.
dataframe_name.loc[index you choose]
series_name.loc[index you choose]
iloc[] does a similar thing but works with exact positions.
dataframe_name.iloc[index you choose]
series_name.iloc[index you choose]
If you want to select a particular column, you can use [‘COLUMN_NAME’].
dataframe_name['column name']
Boolean indexing works with column selection too. Using it will select the rows which fulfill the condition in the brackets.
dataframe_name[dataframe_name['column name'] > a_condition]
pd.crosstab() is a great way to view two different columns together and compare them.
pd.crosstab(dataframe_name["column_name_1"], dataframe_name["column_name_2"])
If you want to compare more columns in the context of another column, you can use .groupby().
Group by one column and find the mean of the other columns
dataframe_name.groupby(["column_name"]).mean()
%matplotlib inline is a special command which tells Jupyter to show your plots. Commands with % at the front are called magic commands.
Import matplotlib and tell Jupyter to show plots
import matplotlib.pyplot as plt
%matplotlib inline
You can visualize a column by calling .plot() on it.
dataframe_name["column_name"].plot()
You can see the distribution of a column by calling .hist() on you
dataframe_name["column_name"].hist()
Manipulating data
Lower the column
dataframe_name["column_name"].str.lower()
Some functions have a parameter called inplace which means a DataFrame is updated in place without having to reassign it.
.fillna() is a function which fills missing data.
The missing data will not be replaced with mean values when inplace = flase
dataframe_name["column_name"].fillna(dataframe_name["column_name"].mean(),
inplace=False)
Let’s say you wanted to remove any rows which had missing data and only work with rows which had complete coverage.
You can do this using .dropna().
dataframe_name.dropna(inplace = True)
You can remove a column using .drop(‘COLUMN_NAME’, axis=1).
dataframe_name = dataframe_name.drop("column_name", axis=1)
To shuffle the order of the dataframe you could use .sample(frac=1).
.sample() randomly samples different rows from a DataFrame. The frac parameter dictates the fraction, where 1 = 100% of rows, 0.5 = 50% of rows, 0.01 = 1% of rows.
dataframe_1 = dataframe.sample(frac=1)
To get the index back to order
dataframe_1.reset_index()
what if you wanted to apply a function to a column. You can do so using the .apply() function and passing it a lambda function.
dataframe_name["column_name"].apply(lambda x: the equation of the function)
Original: https://blog.csdn.net/stellalxy/article/details/125177038
Author: stellalxy
Title: Machine Learning and Data Science (2): Introduction to pandas in Python
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/743465/
转载文章受原作者版权保护。转载请注明原作者出处!