机器学习之信用卡欺诈检测

2023年9月29日下午7:04 • Python • 阅读 64

机器学习之信用卡欺诈检测

一、机器学习之信用卡欺诈检测
*
1.1 前言
1.2 案例分析
–

一、机器学习之信用卡欺诈检测

1.1 前言

数据来源：Kaggle 信用卡欺诈检测数据集https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud?resource=download；
本文采用 XGBoost、随机森林、KNN、逻辑回归、SVM 和决策树解决信用卡欺诈检测问题；

1.2 案例分析

1.2.1 导入所需模块到 python 环境


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from termcolor import colored as cl
import itertools
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score

1.2.2 读取数据，删除无用的Time列

关于数据：我们将要使用的数据是 Kaggle 信用卡欺诈检测数据集。它包含特征 V1 到 V28，是 PCA 获得的主要成分，并忽略对构建模型没有用的时间特征。
其余的特征是包含交易总金额的”金额”特征和包含交易是否为欺诈案件的”类别”特征，类别0标识欺诈，类别1表示正常。

df = pd.read_csv(r'../creditcard.csv')
print("Data's columns contain:\n", df.columns)
print("Data shape:\n", df.shape)
df.drop('Time', axis=1, inplace=True)
pd.set_option('display.max_columns', df.shape[1])
print(df.head())
'''
Data's columns contain:
 Index(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',
       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount',
       'Class'],
      dtype='object')
Data shape:
 (284807, 31)
         V1        V2        V3        V4        V5        V6        V7  \
0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599
1  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803
2 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461
3 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609
4 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941

         V8        V9       V10       V11       V12       V13       V14  \
0  0.098698  0.363787  0.090794 -0.551600 -0.617801 -0.991390 -0.311169
1  0.085102 -0.255425 -0.166974  1.612727  1.065235  0.489095 -0.143772
2  0.247676 -1.514654  0.207643  0.624501  0.066084  0.717293 -0.165946
3  0.377436 -1.387024 -0.054952 -0.226487  0.178228  0.507757 -0.287924
4 -0.270533  0.817739  0.753074 -0.822843  0.538196  1.345852 -1.119670

        V15       V16       V17       V18       V19       V20       V21  \
0  1.468177 -0.470401  0.207971  0.025791  0.403993  0.251412 -0.018307
1  0.635558  0.463917 -0.114805 -0.183361 -0.145783 -0.069083 -0.225775
2  2.345865 -2.890083  1.109969 -0.121359 -2.261857  0.524980  0.247998
3 -0.631418 -1.059647 -0.684093  1.965775 -1.232622 -0.208038 -0.108300
4  0.175121 -0.451449 -0.237033 -0.038195  0.803487  0.408542 -0.009431

        V22       V23       V24       V25       V26       V27       V28  \
0  0.277838 -0.110474  0.066928  0.128539 -0.189115  0.133558 -0.021053
1 -0.638672  0.101288 -0.339846  0.167170  0.125895 -0.008983  0.014724
2  0.771679  0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752
3  0.005274 -0.190321 -1.175575  0.647376 -0.221929  0.062723  0.061458
4  0.798278 -0.137458  0.141267 -0.206010  0.502292  0.219422  0.215153

   Amount  Class
0  149.62      0
1    2.69      0
2  378.66      0
3  123.50      0
4   69.99      0
'''

1.2.3 探索性数据分析及数据预处理

`python
cases = len(df)
nonfraud_cases = df[df.Class == 0]
fraud_cases = df[df.Class == 1]
fraud_percentage = round(len(nonfraud_cases) / cases * 100, 2)
print(cl(‘CASE COUNT’, attrs=[‘bold’]))
print(cl(‘-‘ * 40, attrs=[‘bold’]))
print(cl(‘Total number of cases are {}’.format(cases), attrs=[‘bold’]))
print(cl(‘Number of Non-fraud cases are {}’.format(len(nonfraud_cases)), attrs=[‘bold’]))
print(cl(‘Number of fraud cases are {}’.format(len(fraud_cases)), attrs=[‘bold’]))
print(cl(‘Percentage of fraud cases is {}%’.format(fraud_percentage), attrs=[‘bold’]))
print(cl(‘-‘ * 40, attrs=[‘bold’]))
print(cl(‘CASE AMOUNT STATISTICS’, attrs=[‘bold’]))
print(cl(‘-‘ * 40, attrs=[‘bold’]))
print(cl(‘NON-FRAUD CASE AMOUNT STATS’, attrs=[‘bold’]))
print(nonfraud_cases.Amount.describe())
print(cl(‘-‘ * 40, attrs=[‘bold’]))
print(cl(‘FRAUD CASE AMOUNT STATS’, attrs=[‘bold’]))
print(fraud_cases.Amount.describe())
print(cl(‘-‘ * 40, attrs=[‘bold’]))

sc = StandardScaler()
amount = df.Amount.values
df.Amount = sc.fit_transform(amount.reshape(-1, 1))
print(cl(df.Amount.head(10), attrs=[‘bold’]))

x = df.drop(‘Class’, axis=1).values
y = df.Class.values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
”’
CASE COUNT
CASE AMOUNT STATISTICS
FRAUD CASE AMOUNT STATS
count 492.000000
mean 122.211321
std 256.683288
min 0.000000
25% 1.000000
50% 9.250000
75% 105.890000
max 2125.870000
Name: Amount, dtype: float64

Original: https://blog.csdn.net/qq_40216188/article/details/125853308
Author: 西西先生666
Title: 机器学习之信用卡欺诈检测

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/787413/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

TCP 序列号和确认号是如何变化的？

大家好，我是小林。在网站上回答了很多人的问题，我发现很多人对 TCP 序列号和确认号的变化都是懵懵懂懂的，只知道三次握手和四次挥手过程中，ACK 报文中确认号要 +1，然后数据传…

Python 2023年10月17日
0061
内容资产管理11问

👇点击一键关注主笔：邹小困、邝晴岚主持人：增长黑盒分析师Emma 出品：增长黑盒研究组前言在这个信息爆炸的数据时代，各个行业正积极推进数字化转型，产业升级与技术赋能成为主题…

Python 2023年9月15日
0063
机制设计原理与应用(三)Screening

文章目录 * – 3 Screening – + 3.1 为单个不可分割的项目定价 + * 3.1.1 对θ \theta θ的假设 * 3.1.2 问题描…

Python 2023年11月8日
0052
Ubuntu 安装matplotlib

这里尝试两种方式安装matplotlib: 1) pip3 2)conda Step 1: 使用pip3 安装 matplotlib pip3 install matplotlib…

Python 2023年8月31日
0056
experiment-03-实验三 pandas的基本使用

实验三 pandas的基本使用 1．实验内容（1）pandas的导入。（2）创建Series对象和DataFrame数据表。（3）DataFrame数据表索引与切片。（4）Dat…

Python 2023年8月18日
0076
conf文件中如何导入python变量_如何在pytest中设置导入前的配置conftest.py？

我有myprj项目，文件如下。在$ tree myprj/ myprj/ ├── prj │ ├── init.py │ ├── config.py │ ├── my_logger…

Python 2023年9月12日
0058
【高并发】由InterruptedException异常引发的思考

写在前面 InterruptedException异常可能没你想的那么简单！前言当我们在调用Java对象的wait()方法或者线程的sleep()方法时，需要捕获并处理Inte…

Python 2023年10月14日
0052
使用Selenium来爬取网页内容

简单介绍一下Selenium，以下是官方文档的解释： Selenium Python 绑定提供了一个简单的 API 来使用 Selenium WebDriver 编写功能/验收测试…

Python 2023年8月2日
0061
论文阅读：Robust and Privacy-Preserving Collaborative Learning: A Comprehensive Survey

Abstract 1、提供了协作学习的系统概述 2、简要介绍了完整性和隐私攻击 3、详细介绍了现有的完整性和隐私攻击及其防御 Introduction 举例：医学图像分类、移动键盘…

Python 2023年10月24日
0050
Flask项目基本流程

基本框架 from flask import Flask app = Flask(__name__) @app.route(‘/’) def hello_world(): retu…

Python 2023年8月13日
0060
pandas复习

基础复习下载软件与核验：工欲善其事必先利其器，学习要求下载anaconda和xlrd，并且pandas的版本号不低于1.6。查看pandas的版本的方法： import pand…

Python 2023年8月18日
0064
Pytorch中使用TensorBoard

本文记录了如何在Pytorch中使用Tensorboard（主要是为了备忘） ; Pytorch中使用TensorBoard 虽然我本身就会用TensorBoard，但是因为Ten…

Python 2023年8月1日
0076
Pygame最小开发框架

Pygame最小开发框架最小开发框架代码关键代码介绍 * sys和pygame模块初始化init()及设置屏幕大小、标题、颜色设置事件循环退出游戏窗口刷新注意事项 …

Python 2023年9月23日
0036
Pandas基础学习笔记（二）——DataFrame用法

一、什么是DataFrame DataFrame是一种表格型的数据结构。它的每一列可以是不同的值类型（例如布尔型、数值型、字符串等），此外它既有行索引index，又有列索引co…

Python 2023年8月6日
0070
图像超分辨率重建概述

概念：图像分辨率是一组用于评估图像中蕴含细节信息丰富程度的性能参数，包括时间分辨率、空间分辨率及色阶分辨率等，体现了成像系统实际所能反映物体细节信息的能力。相较于低分辨率图像，高…

Python 2023年10月24日
0026
线性卷积和循环卷积（圆周卷积）

一、线性卷积：一个输入经过一个线性时不变系统，得到的输出就是线性卷积。举例：x[n] = [1, 3, 2, 4];h[n] = [2, 1, 3]; y[n] = [2, 7…

Python 2023年9月27日
00156

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

机器学习之信用卡欺诈检测

机器学习之信用卡欺诈检测

一、机器学习之信用卡欺诈检测

1.1 前言

1.2 案例分析

1.2.1 导入所需模块到 python 环境

1.2.2 读取数据，删除无用的Time列

1.2.3 探索性数据分析及数据预处理

大家都在看