【总结】Python数据处理-操作Excel

Python操作Excel

1.读取文件

1.1 pandas直接读取

1.1.1 读取excel

字典形式导入

src1 = {
    'file': r'd:\user\医药销售09月薪资考核数据1010.xlsx',
    'sht': 'Sheet1',
    'cols': 'AE,AF,I,J,K,Z',
    'names': ['客户名称',  'yj账号', 'yj网点', '上期逾期期末未回款金额', '工资月份', '大区'],
    'colsorder': ['工资月份', '大区', '客户名称', 'yj账号', 'yj网点', '上期逾期期末未回款金额']
    }
src1['data'] = pd.read_excel(src1['file'], sheet_name=src1['sht'], header=None, skiprows=1, usecols=src1['cols'], names=src1['names'])

1.1.2 读取csv

类似于导入excel,需注意encoding = ”utf-8′ 或 encoding = ”gbk’

df = pd.read_csv(file, encoding='utf-8', usecols=cols, low_memory=False)

1.1.3 读取文件夹

os.chdir(r'd:\user\桌面\BP-js12.7')

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

js = pd.concat([pd.read_csv(f, encoding='gbk', low_memory=False) for f in all_filenames ])

1.2 模块式读取,主程序调用

思路:现有2张表,读入过程用函数封装,以模块形式调用。
具体做法:
①建立2个py文件,分别定义函数get_jsdf, get_bcmddf;
②将这2个文件(模块)放入同一文件夹《hkhsm》下,分别命名为importm_***。

》 导入jsdf,命名为importm_js.py

import os
import pandas as pd
import numpy as np

def get_jsdf(jsdir):
    files = getfiles(jsdir, '.csv')
    print(files)
    file = files[0]
    df = getdf(file)
    for f in files[1:]:
        newdf = getdf(f)
        df = pd.concat([df, newdf])
    return df

def getdf(file):
    print(file)
    df = pd.read_csv(file, encoding='gbk', low_memory=False)
    df = df[df['大区'].isin(['ws大区', 'ky部', 'Tx']) & ~df['地区名称'].isin(['kyab区', 'kycd区'])]

    df['yj账号'] = pd.to_numeric(df['yj账号'], errors='coerce').fillna(df['js账号'])
    df['yj账号'] = df['yj账号'].astype(np.int64)
    df['未收(未付)-本位币'] = df['未收(未付)-本位币'].apply(lambda x: 0.0 if x < 0.0 else x)
    return df

def getfiles(path, extension):
    list = os.listdir(path)
    result = []
    for i in range(0, len(list)):
        temp = os.path.join(path, list[i])
        if os.path.isfile(temp) and os.path.splitext(temp)[1] == extension:
            result.append(temp)
    return result

if __name__ == '__main__':
    jsdir = r'd:\user\桌面\hkhs原始数据1\BP-js'
    df = get_jsdf(jsdir)

    print(df.shape)
    print(df)
    print(df.info())

》 导入bcmd,命名为importm_bcmd.py

import pandas as pd
import os

def get_bcmddf(path, shortname):
    file = os.path.join(path, shortname)
    md = pd.read_excel(file, usecols='A')
    return md

if __name__ == '__main__':
    bcmd = get_bcmddf(r'd:\user\桌面\货款回收原始数据1', '补充名单.xlsx')
    print(bcmd.info())
    print(bcmd)

》建立一个关联py,命名为import_.py,将3张表导入

from .importm_js import get_jsdf
from .importm_bcmd import get_bcmddf

》在主程序调用模块

import hkhsm.import_ as im

》在主程序确定路径
》在主程序导入


mydir = r'd:\user\桌面\hkhs原始数据1'

jsdir = os.path.join(mydir, 'BP-js')
jsdf = im.get_jsdf(jsdir)

bcmd = im.get_bcmddf(mydir, 'bcmd.xlsx')

python book = xw.Book() book.sheets(1).range('A1').options(index=False).value = df </code></pre> <h2>1.3 tkiner选择读取</h2> <p>单个文件</p> <pre><code class="language-python">import tkinter as tk from tkinter import filedialog root = tk.Tk() root.withdraw() file_path = filedialog.askopenfilename() </code></pre> <p>获取多个文件</p> <ul> <li>filedialog.askopenfilenames(),返回一个包含多个文件路径的元组(tuple)。通过迭代来对每个文件操作,以实现批处理。</li> <li>批量读一个文件内容,然后写入另外一个文件中</li> </ul> <pre><code class="language-python">import tkinter as tk from tkinter import filedialog root = tk.Tk() root.withdraw() file_path = filedialog.askopenfilenames() for f in file_path: fo = f.split('.')[0]+'.csv' with open(fo,'w') as foo: with open(f,'r') as fn: fn.readline() for line in fn.readlines(): li = line.strip().split() foo.write('%f,%f\n'%(float(li[1]),float(li[0]))) print(li) </code></pre> <h2>1.4 pd创建dataframe</h2> <pre><code class="language-python"> df = pd.DataFrame(columns=['数量', '金额'], index=['配送费', '1-20日', '20日后日均', '仓库操作费', '小计']) df['数量'] = [ps_num, beforetwenty_num, aftertwenty_num, warehouse_operation_num, sum] df['金额'] = [ps_bill, beforetwenty_bill, aftertwenty_bill, warehouse_operation_bill, Sum_bill] </code></pre> <p>2.写入文件</p> <h2>2.1 pandas写入工作簿多个工作表</h2> <pre><code class="language-python">xlsx = pd.ExcelWriter(r"d:\user\桌面\yy.xlsx") table6.to_excel(xlsx, sheet_name='当期数据明细', index=False) table1.to_excel(xlsx, sheet_name='明细1', index=False) table3.to_excel(xlsx, sheet_name='明细2', index=False) table5.to_excel(xlsx, sheet_name='明细3', index=False) xlsx.close() </code></pre> <h2>2.2 xlwings写入</h2> <h3>2.2.1 写入已有工作簿多个sheet</h3> <pre><code class="language-python">s3 = table s3 = pd.DataFrame(s3) wd = xw.Book(r'E:\模板.xlsm') sht = wb.sheets('py明细') sht = wb.sheets('第一') sht.range('A1').value = s3 wb.save() </code></pre> <h3>2.2.2 单个写入</h3> <pre><code class="language-python">book = xw.Book() book.sheets(1).range('A1').options(index=False).value = table </code></pre> <h3>2.2.3 思路:带格式–创建模板,复制模板写入</h3> <pre><code class="language-python"> bill_dir = './在途账单' template = './Template.xlsx' dest = os.path.join(os.path.dirname(template), "综合物流.xlsx") shutil.copyfile(template, dest) sht = xw.Book(dest).sheets('Sheet1') count = 0 start_row = 2 start_column = 2 sht[start_row - 2, start_column + count*2].value = dq sht[start_row - 1, start_column + count*2].value = "数量" sht[start_row - 1, start_column + count*2 + 1].value = '金额' </code></pre> <p>3.常用数据清洗与处理</p> <h2>3.1 df查看信息</h2> <pre><code class="language-python"> print(df) print(df.shape) print(df.info()) print(df.head()) </code></pre> <h2>3.2 列数据筛选(导入时)</h2> <h2>3.3 列数据处理</h2> <h3>3.3.1 df挑选要展示的列</h3> <pre><code class="language-python">cols2 = ['地区', 'yj账号', 'lxr'] dfcjb2 = dfcjb[cols2] </code></pre> <h3>3.3.2 修改df列名</h3> <pre><code class="language-python">df.rename(columns={'kh卡号': 'yj账号'}, inplace=True) </code></pre> <h3>3.3.3 df增加一列</h3> <pre><code class="language-python"> s1['工资'] = '9月' s1['当期回款金额'] = None </code></pre> <h3>3.3.4 split()分列/map(),list[:]分列(取某几位)</h3> <pre><code class="language-python"> s1['工号1'] = s1['转至组织'].astype(str).apply(lambda x: x.split('/')[0]) </code></pre> <pre><code class="language-python">s2 = s1[s1['日期'].map(lambda x: x[-2:]) '20'] </code></pre> <pre><code class="language-python"> sxty['结算3'] = [x[:3] for x in (sxty['结算账号'])] </code></pre> <h3>3.3.5 函数(def,lambda)–A\B列作为条件,填充C列</h3> <p>例子:如果【用户类型】=专席且【转办状态】已转线 和任务分发,则【归属】填为坐席; 否则,归属=原用户类型。</p> <p>方法一</p> <pre><code class="language-python">def function(a, b): if a == '专席' and b in ['已转线', '任务分发']: return '坐席' elif a == '专席' and b not in ['已转线', '任务分发']: return '专席' else: return '坐席' s1['归属'] = s1.apply(lambda x: function(x.用户类型, x.转办状态), axis=1) </code></pre> <p>方法二</p> <pre><code class="language-python">df['归属'] = df.apply(lambda x: '坐席' if (x['用户类型'] == '专席') and (x['转办状态'] in ['已转线', '任务分发']) else x['用户类型'], axis=1) </code></pre> <h3>3.3.6 函数(lambda)–小数转换为百分比</h3> <pre><code class="language-python">data1['比率1'] = data1['比率1'].apply(lambda x: format(x, '.2%')) </code></pre> <h3>3.3.7 数字转为十位文本格式</h3> <pre><code class="language-python"> for i in data1.index: if len(data1.at[i,'yj账号']) < 10: data1.at[i, 'yj账号'] = '0' * (10 - len(data1.at[i,'yj账号'])) + data1.at[i, 'yj账号'] </code></pre> <h3>3.3.8 numpy/df-文本转为int类型</h3> <pre><code class="language-python">df['yj账号'] = df['yj账号'].astype(np.int64) </code></pre> <pre><code class="language-python">df['yj账号'] = int(df['yj账号']) </code></pre> <h3>3.3.9 .loc()–A列值=B列值</h3> <pre><code class="language-python">sxdl.loc[:, '结算账号'] = sxdl.loc[:, '客户卡号'] </code></pre> <h3>3.3.10 如果A列为特定值,在B列填为特定值</h3> <pre><code class="language-python"> sxty.loc[sxty.结算账号 == '0000000003', '结算地区'] = '安徽区' </code></pre> <h2>3.4 行数据筛选</h2> <p>方法一 筛选出df:A列某值和B列某值</p> <pre><code class="language-python">data1 = data1[(data1['工资月份'] == '10月') & (data1['大区'] == 'ws')] </code></pre> <p>方法二 筛选出df:A列某些值和不要B列某些值</p> <pre><code class="language-python">df = df[df['大区'].isin(['ws', 'ky部', 'TS']) & ~df['地区名称'].isin(['kyhx区', 'kygm区'])] </code></pre> <h3>3.4.1 去重</h3> <pre><code class="language-python">customer = customer.drop_duplicates('月结账号', keep='first') </code></pre> <h3>3.4.2 ceil() —0.5向上进制</h3> <pre><code class="language-python"> import math def NumTrans(x): return (math.ceil(x/0.5))*0.5 print(NumTrans(5.792)) </code></pre> <h3>3.4.3 思路:四舍五入 2378进制</h3> <pre><code class="language-python"> def XL(l): if l < 10: return round((l*10)+0.5)/10 elif l < 100: return round(((l*10)+0.5)/10*2, 0)/2 else: return round(((l*10)+0.5)/10, 0) </code></pre> <h3>3.4.3 lambda-将<0值填为0</h3> <pre><code class="language-python">df['未收(未付)-本位币'] = df['未收(未付)-本位币'].apply(lambda x: 0.0 if x < 0.0 else x) </code></pre> <h3>3.4.4 df s2筛选后,s3为s1剩余部分</h3> <pre><code class="language-python"> s1 = s0[(s0['应付金额'] >= 0) & (s0['增值费用'] == '运费')] s2 = s1[s1['日期'].map(lambda x: x[-2:]) '20'] s3 = s1[(~s1.isin(s2))] s3 = s3[~s3['日期'].fillna('null').isin(['null'])] </code></pre> <h2>3.5 空值处理</h2> <h4>3.5.1 空值填为0</h4> <pre><code class="language-python">cc['sdcw'] = cc['sdcw'].fillna(0) </code></pre> <pre><code class="language-python">table2['未回款'] = table2['未回款'].apply(lambda x: 0.0 if numpy.isnan(x) else x) </code></pre> <h4>3.5.2 若为空值则填为入另一列对应的值</h4> <pre><code class="language-python">df['yj账号'] = pd.to_numeric(df['yj账号'], errors='coerce').fillna(df['js账号']) </code></pre> <p>4.数据透视&分组</p> <h2>4.1.1 pd.pivot</h2> <pre><code class="language-python">s2 = pd.pivot_table(s1, index=['考核', '跟进区', '客户名称', 'js账号', 'js网点'], values=['应收应付-CNY', '已核销金额-CNY', '未收(未付)-CNY'], aggfunc=np.sum) s2 = s2.reset_index() </code></pre> <h2>4.1.2 group by</h2> <pre><code class="language-python">dq_groups = bp3.groupby(['地区']) ys = dq_groups['应收(应付)-本位币'].sum()/10000 ye = dq_groups['未收(未付)-本位币'].sum()/10000 pt_dq = pd.DataFrame({'应收金额': ys, '应收余额': ye}) </code></pre> <p>例子: <img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20201231195352558.png" /> 方法一: <img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20201231195512629.png" /></p> <p>方法二: <img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20201231195521263.png" /></p> <h2>4.1.3 思路:添加小计</h2> <p><img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303190502986.png" /> <img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303191740244.png" /></p> <h2><a name="414__446">;</a> 4.1.4 思路:数据透视变形</h2> <p><img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303190622476.png" /> <img alt="【总结】Python数据处理-操作Excel" src="https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230809/20210303191619512.png" /></p> <p>5.df之间</p> <h2>5.1 pd.merge 匹配</h2> <pre><code class="language-python">bp1 = pd.merge(bpdf, lxr1, on=['yj账号'], how='left', validate="m:1") </code></pre> <pre><code class="language-python">table2 = pd.merge(s4, s5, on=['yj账号', 'yj网点'], how='left', validate="m:1") </code></pre> <pre><code class="language-python">table6 = data1.merge(customer, how='left', left_on='yj账号', right_on='yj号', validate='many_to_one') </code></pre> <h3>5.1.1 思路:A表匹,匹不到用B表</h3> <pre><code class="language-python"> sxdl = pd.merge(sxdl, wddm1, how='left', left_on='所属区域代码', right_on='区部代码', validate='many_to_one') sxdl['区部名称'] = sxdl['区部名称'].fillna(-1) print(sxdl.shape) for i in sxdl.index: if sxdl.at[i, '区部名称'] == -1: wd3 = sxdl.at[i, '网点'] dfwddm = wddm2[wddm2['网点'] == wd3] if dfwddm.shape[0] > 0: sxdl.at[i, '区部名称'] = dfwddm.iat[0, 1] sxdl.at[i, '大区名称'] = dfwddm.iat[0, 2] </code></pre> <h2>5.2 pd.concat()纵向合并两df</h2> <pre><code class="language-python">jsdf = jsdf ywdf = ywdf bpdf = pd.concat([jsdf, ywdf]) </code></pre> <h2>5.3 两表根据关联列,A表中剔除/保留含B表列</h2> <pre><code class="language-python">jsdf = jsdf[~jsdf['月结账号'].isin(tcmd['子账号'].values)] jsdf = jsdf[jsdf['月结账号'].isin(tcmd['子账号'].values)] </code></pre> <p>6 python链接Mysql</p> <pre><code class="language-python">import pandas as pd from sqlalchemy import create_engine from lyhk_sql_sentence import * def get_engine(): engine = create_engine('mysql+pymysql://root:1111@localhost:3306/p&mrecovery') return engine def to_sql(excel_name, table_name): engine = get_engine() excel_name.to_sql(table_name, engine, index=False, if_exists='replace') def read_sql(sql_sen): engine = get_engine() table = pd.read_sql_query(sql_sen, engine) return table if __name__ == '__main__': to_sql(jcb, 'jcb') to_sql(jsyw, 'jsyw') mx = read_sql(sql_mx()) </code></pre> <p>lyhk_sql_sentence.py</p> <pre><code class="language-python">def sql_mx(): mx =''' SELECT , FROM jsyw GROUP BY , ''' return mx </code></pre> <p>没什么用的小技巧</p> <h2>运行计时</h2> <pre><code class="language-python"> import time time_start = time.time() time_end = time.time() print('Time cost = %fs' % (time_end - time_start)) </code></pre> <h2>程序内进度条</h2> <p>
import time

def ProgressBar(num):
"""
:param num: 传入进度条总量值
:return:
"""
total = 20
if num == 0:
print('传入参数不可为0,仅接受int类型')
elif type(num) != int:
print('传入参数仅接受int类型')
else:
for item in range(num + 1):
test_num = int((item / num) * 100)
step = int(test_num / (100 / total))
now = r"[ %s%s ]%s" % ("#" * step, ' ' * (total - step), str(test_num))
sys.stdout.write("\r%s%%" % now)
sys.stdout.flush()
time.sleep(0.1)

Original: https://blog.csdn.net/m0_47681342/article/details/112026513
Author: 77yiya
Title: 【总结】Python数据处理-操作Excel

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/752899/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球