Python操作Excel
- 1.读取文件
* - 1.1 pandas直接读取
– - 1.2 模块式读取,主程序调用
- 1.3 tkiner选择读取
- 1.4 pd创建dataframe
- 2.写入文件
* - 2.1 pandas写入工作簿多个工作表
- 2.2 xlwings写入
– - 3.常用数据清洗与处理
* - 3.1 df查看信息
- 3.2 列数据筛选(导入时)
- 3.3 列数据处理
– - 3.4 行数据筛选
– - 3.5 空值处理
–
+ - 4.数据透视&分组
* - 4.1.1 pd.pivot
- 4.1.2 group by
- 4.1.3 思路:添加小计
- 4.1.4 思路:数据透视变形
- 5.df之间
* - 5.1 pd.merge 匹配
– - 5.2 pd.concat()纵向合并两df
- 5.3 两表根据关联列,A表中剔除/保留含B表列
- 6 python链接Mysql
- 没什么用的小技巧
* - 运行计时
- 程序内进度条
1.读取文件
1.1 pandas直接读取
1.1.1 读取excel
字典形式导入
src1 = {
'file': r'd:\user\医药销售09月薪资考核数据1010.xlsx',
'sht': 'Sheet1',
'cols': 'AE,AF,I,J,K,Z',
'names': ['客户名称', 'yj账号', 'yj网点', '上期逾期期末未回款金额', '工资月份', '大区'],
'colsorder': ['工资月份', '大区', '客户名称', 'yj账号', 'yj网点', '上期逾期期末未回款金额']
}
src1['data'] = pd.read_excel(src1['file'], sheet_name=src1['sht'], header=None, skiprows=1, usecols=src1['cols'], names=src1['names'])
1.1.2 读取csv
类似于导入excel,需注意encoding = ”utf-8′ 或 encoding = ”gbk’
df = pd.read_csv(file, encoding='utf-8', usecols=cols, low_memory=False)
1.1.3 读取文件夹
os.chdir(r'd:\user\桌面\BP-js12.7')
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
js = pd.concat([pd.read_csv(f, encoding='gbk', low_memory=False) for f in all_filenames ])
1.2 模块式读取,主程序调用
思路:现有2张表,读入过程用函数封装,以模块形式调用。
具体做法:
①建立2个py文件,分别定义函数get_jsdf, get_bcmddf;
②将这2个文件(模块)放入同一文件夹《hkhsm》下,分别命名为importm_***。
》 导入jsdf,命名为importm_js.py
import os
import pandas as pd
import numpy as np
def get_jsdf(jsdir):
files = getfiles(jsdir, '.csv')
print(files)
file = files[0]
df = getdf(file)
for f in files[1:]:
newdf = getdf(f)
df = pd.concat([df, newdf])
return df
def getdf(file):
print(file)
df = pd.read_csv(file, encoding='gbk', low_memory=False)
df = df[df['大区'].isin(['ws大区', 'ky部', 'Tx']) & ~df['地区名称'].isin(['kyab区', 'kycd区'])]
df['yj账号'] = pd.to_numeric(df['yj账号'], errors='coerce').fillna(df['js账号'])
df['yj账号'] = df['yj账号'].astype(np.int64)
df['未收(未付)-本位币'] = df['未收(未付)-本位币'].apply(lambda x: 0.0 if x < 0.0 else x)
return df
def getfiles(path, extension):
list = os.listdir(path)
result = []
for i in range(0, len(list)):
temp = os.path.join(path, list[i])
if os.path.isfile(temp) and os.path.splitext(temp)[1] == extension:
result.append(temp)
return result
if __name__ == '__main__':
jsdir = r'd:\user\桌面\hkhs原始数据1\BP-js'
df = get_jsdf(jsdir)
print(df.shape)
print(df)
print(df.info())
》 导入bcmd,命名为importm_bcmd.py
import pandas as pd
import os
def get_bcmddf(path, shortname):
file = os.path.join(path, shortname)
md = pd.read_excel(file, usecols='A')
return md
if __name__ == '__main__':
bcmd = get_bcmddf(r'd:\user\桌面\货款回收原始数据1', '补充名单.xlsx')
print(bcmd.info())
print(bcmd)
》建立一个关联py,命名为import_.py,将3张表导入
from .importm_js import get_jsdf
from .importm_bcmd import get_bcmddf
》在主程序调用模块
import hkhsm.import_ as im
》在主程序确定路径
》在主程序导入
mydir = r'd:\user\桌面\hkhs原始数据1'
jsdir = os.path.join(mydir, 'BP-js')
jsdf = im.get_jsdf(jsdir)
bcmd = im.get_bcmddf(mydir, 'bcmd.xlsx')
python
book = xw.Book()
book.sheets(1).range('A1').options(index=False).value = df
</code></pre>
<h2>1.3 tkiner选择读取</h2>
<p>单个文件</p>
<pre><code class="language-python">import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
</code></pre>
<p>获取多个文件</p>
<ul>
<li>filedialog.askopenfilenames(),返回一个包含多个文件路径的元组(tuple)。通过迭代来对每个文件操作,以实现批处理。</li>
<li>批量读一个文件内容,然后写入另外一个文件中</li>
</ul>
<pre><code class="language-python">import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilenames()
for f in file_path:
fo = f.split('.')[0]+'.csv'
with open(fo,'w') as foo:
with open(f,'r') as fn:
fn.readline()
for line in fn.readlines():
li = line.strip().split()
foo.write('%f,%f\n'%(float(li[1]),float(li[0])))
print(li)
</code></pre>
<h2>1.4 pd创建dataframe</h2>
<pre><code class="language-python"> df = pd.DataFrame(columns=['数量', '金额'], index=['配送费', '1-20日', '20日后日均', '仓库操作费', '小计'])
df['数量'] = [ps_num, beforetwenty_num, aftertwenty_num, warehouse_operation_num, sum]
df['金额'] = [ps_bill, beforetwenty_bill, aftertwenty_bill, warehouse_operation_bill, Sum_bill]
</code></pre>
<p>2.写入文件</p>
<h2>2.1 pandas写入工作簿多个工作表</h2>
<pre><code class="language-python">xlsx = pd.ExcelWriter(r"d:\user\桌面\yy.xlsx")
table6.to_excel(xlsx, sheet_name='当期数据明细', index=False)
table1.to_excel(xlsx, sheet_name='明细1', index=False)
table3.to_excel(xlsx, sheet_name='明细2', index=False)
table5.to_excel(xlsx, sheet_name='明细3', index=False)
xlsx.close()
</code></pre>
<h2>2.2 xlwings写入</h2>
<h3>2.2.1 写入已有工作簿多个sheet</h3>
<pre><code class="language-python">s3 = table
s3 = pd.DataFrame(s3)
wd = xw.Book(r'E:\模板.xlsm')
sht = wb.sheets('py明细')
sht = wb.sheets('第一')
sht.range('A1').value = s3
wb.save()
</code></pre>
<h3>2.2.2 单个写入</h3>
<pre><code class="language-python">book = xw.Book()
book.sheets(1).range('A1').options(index=False).value = table
</code></pre>
<h3>2.2.3 思路:带格式–创建模板,复制模板写入</h3>
<pre><code class="language-python"> bill_dir = './在途账单'
template = './Template.xlsx'
dest = os.path.join(os.path.dirname(template), "综合物流.xlsx")
shutil.copyfile(template, dest)
sht = xw.Book(dest).sheets('Sheet1')
count = 0
start_row = 2
start_column = 2
sht[start_row - 2, start_column + count*2].value = dq
sht[start_row - 1, start_column + count*2].value = "数量"
sht[start_row - 1, start_column + count*2 + 1].value = '金额'
</code></pre>
<p>3.常用数据清洗与处理</p>
<h2>3.1 df查看信息</h2>
<pre><code class="language-python"> print(df)
print(df.shape)
print(df.info())
print(df.head())
</code></pre>
<h2>3.2 列数据筛选(导入时)</h2>
<h2>3.3 列数据处理</h2>
<h3>3.3.1 df挑选要展示的列</h3>
<pre><code class="language-python">cols2 = ['地区', 'yj账号', 'lxr']
dfcjb2 = dfcjb[cols2]
</code></pre>
<h3>3.3.2 修改df列名</h3>
<pre><code class="language-python">df.rename(columns={'kh卡号': 'yj账号'}, inplace=True)
</code></pre>
<h3>3.3.3 df增加一列</h3>
<pre><code class="language-python">
s1['工资'] = '9月'
s1['当期回款金额'] = None
</code></pre>
<h3>3.3.4 split()分列/map(),list[:]分列(取某几位)</h3>
<pre><code class="language-python">
s1['工号1'] = s1['转至组织'].astype(str).apply(lambda x: x.split('/')[0])
</code></pre>
<pre><code class="language-python">s2 = s1[s1['日期'].map(lambda x: x[-2:]) '20']
</code></pre>
<pre><code class="language-python">
sxty['结算3'] = [x[:3] for x in (sxty['结算账号'])]
</code></pre>
<h3>3.3.5 函数(def,lambda)–A\B列作为条件,填充C列</h3>
<p>例子:如果【用户类型】=专席且【转办状态】已转线 和任务分发,则【归属】填为坐席;
否则,归属=原用户类型。</p>
<p>方法一</p>
<pre><code class="language-python">def function(a, b):
if a == '专席' and b in ['已转线', '任务分发']:
return '坐席'
elif a == '专席' and b not in ['已转线', '任务分发']:
return '专席'
else:
return '坐席'
s1['归属'] = s1.apply(lambda x: function(x.用户类型, x.转办状态), axis=1)
</code></pre>
<p>方法二</p>
<pre><code class="language-python">df['归属'] = df.apply(lambda x: '坐席' if (x['用户类型'] == '专席') and (x['转办状态'] in ['已转线', '任务分发']) else x['用户类型'], axis=1)
</code></pre>
<h3>3.3.6 函数(lambda)–小数转换为百分比</h3>
<pre><code class="language-python">data1['比率1'] = data1['比率1'].apply(lambda x: format(x, '.2%'))
</code></pre>
<h3>3.3.7 数字转为十位文本格式</h3>
<pre><code class="language-python">
for i in data1.index:
if len(data1.at[i,'yj账号']) < 10:
data1.at[i, 'yj账号'] = '0' * (10 - len(data1.at[i,'yj账号'])) + data1.at[i, 'yj账号']
</code></pre>
<h3>3.3.8 numpy/df-文本转为int类型</h3>
<pre><code class="language-python">df['yj账号'] = df['yj账号'].astype(np.int64)
</code></pre>
<pre><code class="language-python">df['yj账号'] = int(df['yj账号'])
</code></pre>
<h3>3.3.9 .loc()–A列值=B列值</h3>
<pre><code class="language-python">sxdl.loc[:, '结算账号'] = sxdl.loc[:, '客户卡号']
</code></pre>
<h3>3.3.10 如果A列为特定值,在B列填为特定值</h3>
<pre><code class="language-python">
sxty.loc[sxty.结算账号 == '0000000003', '结算地区'] = '安徽区'
</code></pre>
<h2>3.4 行数据筛选</h2>
<p>方法一 筛选出df:A列某值和B列某值</p>
<pre><code class="language-python">data1 = data1[(data1['工资月份'] == '10月') & (data1['大区'] == 'ws')]
</code></pre>
<p>方法二 筛选出df:A列某些值和不要B列某些值</p>
<pre><code class="language-python">df = df[df['大区'].isin(['ws', 'ky部', 'TS']) & ~df['地区名称'].isin(['kyhx区', 'kygm区'])]
</code></pre>
<h3>3.4.1 去重</h3>
<pre><code class="language-python">customer = customer.drop_duplicates('月结账号', keep='first')
</code></pre>
<h3>3.4.2 ceil() —0.5向上进制</h3>
<pre><code class="language-python">
import math
def NumTrans(x):
return (math.ceil(x/0.5))*0.5
print(NumTrans(5.792))
</code></pre>
<h3>3.4.3 思路:四舍五入 2378进制</h3>
<pre><code class="language-python">
def XL(l):
if l < 10:
return round((l*10)+0.5)/10
elif l < 100:
return round(((l*10)+0.5)/10*2, 0)/2
else:
return round(((l*10)+0.5)/10, 0)
</code></pre>
<h3>3.4.3 lambda-将<0值填为0</h3>
<pre><code class="language-python">df['未收(未付)-本位币'] = df['未收(未付)-本位币'].apply(lambda x: 0.0 if x < 0.0 else x)
</code></pre>
<h3>3.4.4 df s2筛选后,s3为s1剩余部分</h3>
<pre><code class="language-python">
s1 = s0[(s0['应付金额'] >= 0) & (s0['增值费用'] == '运费')]
s2 = s1[s1['日期'].map(lambda x: x[-2:]) '20']
s3 = s1[(~s1.isin(s2))]
s3 = s3[~s3['日期'].fillna('null').isin(['null'])]
</code></pre>
<h2>3.5 空值处理</h2>
<h4>3.5.1 空值填为0</h4>
<pre><code class="language-python">cc['sdcw'] = cc['sdcw'].fillna(0)
</code></pre>
<pre><code class="language-python">table2['未回款'] = table2['未回款'].apply(lambda x: 0.0 if numpy.isnan(x) else x)
</code></pre>
<h4>3.5.2 若为空值则填为入另一列对应的值</h4>
<pre><code class="language-python">df['yj账号'] = pd.to_numeric(df['yj账号'], errors='coerce').fillna(df['js账号'])
</code></pre>
<p>4.数据透视&分组</p>
<h2>4.1.1 pd.pivot</h2>
<pre><code class="language-python">s2 = pd.pivot_table(s1, index=['考核', '跟进区', '客户名称', 'js账号', 'js网点'], values=['应收应付-CNY', '已核销金额-CNY', '未收(未付)-CNY'], aggfunc=np.sum)
s2 = s2.reset_index()
</code></pre>
<h2>4.1.2 group by</h2>
<pre><code class="language-python">dq_groups = bp3.groupby(['地区'])
ys = dq_groups['应收(应付)-本位币'].sum()/10000
ye = dq_groups['未收(未付)-本位币'].sum()/10000
pt_dq = pd.DataFrame({'应收金额': ys, '应收余额': ye})
</code></pre>
<p>例子:
<img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20201231195352558.png" />
方法一:
<img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20201231195512629.png" /></p>
<p>方法二:
<img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20201231195521263.png" /></p>
<h2>4.1.3 思路:添加小计</h2>
<p><img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303190502986.png" />
<img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303191740244.png" /></p>
<h2><a name="414__446">;</a> 4.1.4 思路:数据透视变形</h2>
<p><img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303190622476.png" />
<img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303191619512.png" /></p>
<p>5.df之间</p>
<h2>5.1 pd.merge 匹配</h2>
<pre><code class="language-python">bp1 = pd.merge(bpdf, lxr1, on=['yj账号'], how='left', validate="m:1")
</code></pre>
<pre><code class="language-python">table2 = pd.merge(s4, s5, on=['yj账号', 'yj网点'], how='left', validate="m:1")
</code></pre>
<pre><code class="language-python">table6 = data1.merge(customer, how='left', left_on='yj账号', right_on='yj号', validate='many_to_one')
</code></pre>
<h3>5.1.1 思路:A表匹,匹不到用B表</h3>
<pre><code class="language-python">
sxdl = pd.merge(sxdl, wddm1, how='left', left_on='所属区域代码', right_on='区部代码', validate='many_to_one')
sxdl['区部名称'] = sxdl['区部名称'].fillna(-1)
print(sxdl.shape)
for i in sxdl.index:
if sxdl.at[i, '区部名称'] == -1:
wd3 = sxdl.at[i, '网点']
dfwddm = wddm2[wddm2['网点'] == wd3]
if dfwddm.shape[0] > 0:
sxdl.at[i, '区部名称'] = dfwddm.iat[0, 1]
sxdl.at[i, '大区名称'] = dfwddm.iat[0, 2]
</code></pre>
<h2>5.2 pd.concat()纵向合并两df</h2>
<pre><code class="language-python">jsdf = jsdf
ywdf = ywdf
bpdf = pd.concat([jsdf, ywdf])
</code></pre>
<h2>5.3 两表根据关联列,A表中剔除/保留含B表列</h2>
<pre><code class="language-python">jsdf = jsdf[~jsdf['月结账号'].isin(tcmd['子账号'].values)]
jsdf = jsdf[jsdf['月结账号'].isin(tcmd['子账号'].values)]
</code></pre>
<p>6 python链接Mysql</p>
<pre><code class="language-python">import pandas as pd
from sqlalchemy import create_engine
from lyhk_sql_sentence import *
def get_engine():
engine = create_engine('mysql+pymysql://root:1111@localhost:3306/p&mrecovery')
return engine
def to_sql(excel_name, table_name):
engine = get_engine()
excel_name.to_sql(table_name, engine, index=False, if_exists='replace')
def read_sql(sql_sen):
engine = get_engine()
table = pd.read_sql_query(sql_sen, engine)
return table
if __name__ == '__main__':
to_sql(jcb, 'jcb')
to_sql(jsyw, 'jsyw')
mx = read_sql(sql_mx())
</code></pre>
<p>lyhk_sql_sentence.py</p>
<pre><code class="language-python">def sql_mx():
mx ='''
SELECT
,
FROM jsyw
GROUP BY
,
'''
return mx
</code></pre>
<p>没什么用的小技巧</p>
<h2>运行计时</h2>
<pre><code class="language-python">
import time
time_start = time.time()
time_end = time.time()
print('Time cost = %fs' % (time_end - time_start))
</code></pre>
<h2>程序内进度条</h2>
<p>
import time
def ProgressBar(num):
"""
:param num: 传入进度条总量值
:return:
"""
total = 20
if num == 0:
print('传入参数不可为0,仅接受int类型')
elif type(num) != int:
print('传入参数仅接受int类型')
else:
for item in range(num + 1):
test_num = int((item / num) * 100)
step = int(test_num / (100 / total))
now = r"[ %s%s ]%s" % ("#" * step, ' ' * (total - step), str(test_num))
sys.stdout.write("\r%s%%" % now)
sys.stdout.flush()
time.sleep(0.1)
Original: https://blog.csdn.net/m0_47681342/article/details/112026513
Author: 77yiya
Title: 【总结】Python数据处理-操作Excel
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/752899/
转载文章受原作者版权保护。转载请注明原作者出处!