# 9种常用的数据分析方法

## 一、漏斗分析法

[En]

This is a classic marketing funnel, and the image shows a sub-link in the whole process from acquiring users to eventually turning into buying. The conversion rate of adjacent links refers to the use of data indicators to quantify the performance of each step. So the whole funnel model first divides the whole purchase process into steps, then uses the conversion rate to measure the performance of each step, and finally finds out the problematic links through abnormal data indicators, so as to solve the problem and optimize the step. finally achieve the goal of improving the overall purchase conversion rate.

[En]

In fact, the core idea of the whole funnel model can be classified as decomposition and quantification. For example, to analyze the transformation of e-commerce, what we need to do is to monitor the user transformation at each level and find the optimization points at each level. For users who do not follow the process, specifically draw their transformation model to shorten the path to enhance the user experience.

## 二、对比分析法

[En]

Comparative analysis means to compare two or more data and analyze their differences, so as to reveal the development, changes and regularity of things represented by these data. It can very intuitively see the change or gap in a certain aspect of things, and can accurately and quantitatively express the change or gap. Comparative analysis can be divided into two categories: static comparison and dynamic comparison.

[En]

We know that isolated data are meaningless, and there is no difference until there is a comparison. For example, in the time dimension, year-on-year and month-on-year comparison, growth rate, fixed base ratio, comparison with competitors, comparison between categories, characteristics and attributes, and so on. The comparison method can find the law of data change, is used frequently, and is often used in conjunction with other methods.

### 0、对比分析的价值场景

[En]

Static comparison: the comparison of different overall indicators at the same time, such as the comparison of different departments, different regions and different countries, also known as horizontal comparison, referred to as horizontal ratio.

[En]

Dynamic comparison: the comparison of index values in different periods under the same overall conditions, also known as vertical comparison, referred to as vertical ratio.

[En]

These two methods can be used either alone or in combination.

1.时间维度对比

[En]

The comparison of the same index in different time dimensions, such as ratio, ring ratio, fixed base ratio and so on.

[En]

Year-on-year is compared with the same period of last year, which can be season, month, week and day.

[En]

Ring comparison is compared with the previous time period (there is also a comparison with the next time period, also known as post-comparison), such as this month and last month, this week and last week

[En]

The fixed base ratio is compared with a specified period of time, such as monthly sales in 2013 and January 2013.

[En]

The above picture shows the comparison of monthly sales, the same time range (all monthly summary), the same index, the same meaning of the index, the performance of the whole enterprise information, the overall nature is comparable.

### 2.空间对比

[En]

Comparison of different spatial data, such as North China and South China, Beijing and Shanghai, Shanghai Gubei Store and Chengdu Chunxi Road Store. The comparison object of the similar space must be similar in shape, while the advanced space must be compared with the excellent space in the same form, and the comparison with the expanded space, such as the data comparison between Beijing and the whole country, Beijing Wangfujing store and the whole Beijing, and the comparison with competitors is also included in this list.

### 3.计划对比

[En]

The comparison with the plan standard is a very important part of sales tracking. all performance reviews are plan criteria, such as the comparison between the actual amount of sales achieved and the amount reached in the sales plan, to see whether the sales have fulfilled the original plan, and if not, what’s the reason?

### 4.与经验值或理论值对比

[En]

The empirical standard is the value summed up in a large number of practice, the theoretical standard is the value inferred from the theory, and the average is the average of a certain space or time.

[En]

For example, the rate of one item: only one item accounts for the proportion of all sales receipts. The reference value is less than 40%. If the data exceeds 40%, you need to consider how to adjust the policy to help customers make associated purchases. If the reference value is less than 40%, it is a theoretical value.

[En]

In comparative analysis, total indicators, relative indicators or average indicators can be used alone, or they can be combined to compare. The results of the comparison can be expressed by relative numbers, such as percentages, multiples and other indicators.

①指数的口径范围、计算方法和测量单位必须相同，即应采用相同的单位或标准进行测量。

[En]

The caliber range, calculation method and measurement unit of ① index must be the same, that is, it should be measured by the same unit or standard.

②对比的对象要有可比性

③比较的指标类型必须一致。无论是绝对指标、相对指标、平均指标，还是其他不同类型的指标，在比较时，双方都必须统一。

[En]

The types of indicators compared by ③ must be consistent. Whether absolute indicators, relative indicators, average indicators, or other different types of indicators, when comparing, the two sides must be unified.

## 三、聚类分析

[En]

Cluster analysis is an exploratory data analysis method. Usually, we use cluster analysis to group and classify seemingly disordered objects in order to better understand the research objects. The clustering results require that the similarity of objects within the group is high and that between groups is low. In user research, many problems can be solved with the help of cluster analysis, such as website information classification, web page click behavior relevance, user classification and so on. Among them, user classification is the most common situation.

## 四、路径分析

[En]

User path analysis tracks the behavior path of users from a start event to the end event, that is, monitoring the flow of users, which can be used to measure the effect of website optimization or marketing promotion, as well as to understand users’ behavior preferences. its ultimate goal is to achieve business goals, guide users to complete the optimal path of the product more efficiently, and finally urge users to pay. How to analyze the path of user behavior?

（1）计算用户使用网站或APP时的每个第一步，然后依次计算每一步的流向和转化，通过数据，真实地再现用户从打开APP到离开的整个过程。

（2）查看用户在使用产品时的路径分布情况。

[En]

For example: after visiting the home page of an e-commerce product, what percentage of users searched, what percentage of users visited the classification page, and what percentage of users visited the product details page directly.

（3）进行路径优化分析。

[En]

For example, which path is most visited by users, and at which step, users are most likely to lose.

（4）通过路径识别用户行为特征。

[En]

For example: analyze whether users are goal-oriented or aimless browsing after leaving.

（5）对用户进行细分。

## 五、帕累托分析

[En]

The Pareto rule is derived from the classical 28 rule. For example, in terms of personal wealth, it can be said that 20% of the people in the world control 80% of the wealth. In data analysis, it can be understood that 20% of the data produces 80% of the effect and needs to be mined around this 20% of the data. It often has something to do with ranking when using the 2008 rule, and the top 20% is regarded as valid data. The 28-8 method is to focus on key analysis and is applicable to any industry. Find the focus, find its characteristics, and then think about how to convert the remaining 80% to this 20% to improve the effect.

ABC分析模型，不光可以用来划分产品和销售额，还可以划分客户及客户交易额等。比如给企业贡献80%利润的客户是哪些，占比多少。假设有20%，那么在资源有限的情况下，就知道要重点维护这20%类客户。

## 六、公式拆解

[En]

The so-called formula disassembly method is aimed at a certain index, using the formula to decompose the influencing factors of the index.

[En]

The reason for the low sales of a product is analyzed in the following figure, which is decomposed by the formula method:

## 七、A/Btest

A/Btest，是将Web或App界面或流程的两个或多个版本，在同一时间维度，分别让类似访客群组来访问，收集各群组的用户体验数据和业务数据，最后分析评估出最好版本正式采用。A/Btest的流程如下：

（1）现状分析并建立假设：

[En]

Analyze the business data, determine the most critical improvement points, make assumptions for optimization and improvement, and put forward optimization suggestions; for example, we find that the conversion rate of users is not high, and we assume that it is because the conversion rate brought by the popularized landing page is too low. Next, we need to find a way to improve.

（2）设定目标，制定方案：

[En]

Set the main goals to measure the advantages and disadvantages of each optimized version, and set auxiliary goals to evaluate the impact of the optimized version on other aspects.

（3）设计与开发：

（4）分配流量：

[En]

Determine the diversion ratio of each online test version. In the initial stage, the traffic setting of the optimization scheme can be small, and the traffic can be increased gradually according to the situation.

（5）采集并分析数据：

[En]

Collect experimental data and judge the effectiveness and effect: if the statistical significance reaches 95% or more and is maintained for a period of time, the experiment can be finished; if it is less than 95%, the testing time may need to be extended; if the statistical significance cannot reach 95% or even 90% for a long time, it is necessary to decide whether to suspend the test.

（6）最后：

[En]

According to the test results, it is determined to release the new version, adjust the shunt ratio to continue the test, or continue to optimize the iterative scheme and re-develop the on-line test if the test effect is not achieved.

## 八、象限分析

[En]

Through the division of two or more dimensions, the desired value is expressed by the way of coordinates. From the value directly to the strategy, so as to carry on some landing promotion. Quadrant method is a kind of strategy-driven thinking, often associated with product analysis, market analysis, customer management, commodity management and so on.

（1）找到问题的共性原因

[En]

Through the quadrant analysis method, the events with the same characteristics are analyzed, and the common reasons are summarized. For example, in the above advertising case, the events in the first quadrant can extract effective promotion channels and strategies, while the third and fourth quadrants can rule out some ineffective promotion channels.

（2）建立分组优化策略

[En]

The advertisement with high click-through rate and high conversion shows that the crowd is relatively accurate and is an efficient advertisement.

[En]

The advertisement with high click-through rate and low conversion shows that most of the people who click on it are attracted by the advertisement, and the conversion of the low-click advertisement shows that the people targeted at the advertising content are somewhat inconsistent with the actual audience of the product.

[En]

The advertisement with high conversion and low click shows that the advertising content is in line with the actual audience of the product, but it needs to optimize the advertising content to attract more people to click.

[En]

Ads with low click-through rate and low conversion can be given up.

## 九、留存分析

### 第一种 日留存

[En]

Daily retention can be subdivided into the following categories:

（1）次日留存率：（当天新增的用户中，第2天还登录的用户数）/第一天新增总用户数

（2）第3日留存率：（第一天新增用户中，第3天还有登录的用户数）/第一天新增总用户数

（3）第7日留存率：（第一天新增用户中，第7天还有登录的用户数）/第一天新增总用户数

（4）第14日留存率：（第一天新增用户中，第14天还有登录的用户数）/第一天新增总用户数

（5）第30日留存率：（第一天新增用户中，第30天还有登录的用户数）/第一天新增总用户数

### 第二种 周留存

[En]

The weekly retention rate refers to the number of new users who are still logged in each week relative to the first week.

### 第三种 月留存

[En]

The monthly retention rate refers to the number of new users who are still logged in each month relative to the first week. The retention rate is for new users, and the result is a matrix half-report (only half of which has data), and each data record row is the retention rate for different time periods corresponding to the date and column. Under normal circumstances, the retention rate decreases with the passage of time.

[En]

The following is the monthly user retention curve generated by taking monthly retention as an example:

Original: https://blog.51cto.com/u_15668438/5576053
Author: lanxiaofang
Title: 9种常用的数据分析方法

(0)

### 大家都在看

• #### Jenkins配置pytest+allure报告自动化测试项目

Jenkins配置pytest+allure报告自动化测试项目 前置1、pycharm的自动化项目成功运行2、jenkins环境已经配置好：环境、allure插件等 新建Item …

Python 2023年9月10日
0105
• #### 使用Python 爬取京东、淘宝等商品详情页的数据，避开反爬虫机制

Original: https://www.cnblogs.com/sn520/p/15784287.htmlAuthor: Python可乐呀Title: 使用Python 爬取…

Python 2023年5月24日
0176
• #### Python知识点大纲

这里写自定义目录标题 Pandas * 基础 – Series/DataFrame创建 删除行/列 基础切片 高阶切片 + Boolean切片 使用query 随机抽样…

Python 2023年8月8日
0107
• #### 【Python程序设计】基于Flask的音乐在线网站/系统/平台

啊哦~你想找的内容离你而去了哦 内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新 的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。 可…

Python 2023年8月9日
0106
• #### 使用python Selenium实现智慧树界面化自动刷网课 chromehandless实现智慧树无界面化自动刷网课

最近看到隔壁室友在学习python，再加上那句”爬虫学得好，监狱进的早”，于是有了一个大胆的想法，刷网课。我是通过使用selenium驱动真实的浏览器来实现…

Python 2023年8月2日
0151
• #### pyecharts案例 超市4年数据可视化分析（一）

一、数据描述 数据集中9994条数据，横跨1237天，销售额为2,297,200.8603美元，利润为286,397.0217美元，他们的库存中有1862件独特的物品，它们被分为3…

Python 2023年5月24日
0113
• #### Python按条件筛选列，并计算多个文件平均值

筛选：import pandas as pddf1 = pd.read_excel(“lucclst.xlsx”) 筛选lucc列中值为101，lcz列中值…

Python 2023年8月29日
0106
• #### linux设备树

一、概念Linux内核从3.x开始引入设备树的概念，用于实现驱动代码与设备信息相分离。在设备树出现以前，所有关于设备的具体信息都要写在驱动里，一旦外围设备变化，驱动代码就要重写。引…

Python 2023年11月7日
089
• #### 清华镜像用不了？conda安装tensorflow教程以及多种报错的解决（windows环境）

conda安装tensorflow教程以及报错处理（windows环境） 1.在很多教程中，作者会让我们先创建一个虚拟环境，在这个虚拟环境创建时一般会指定python的版本，如下面…

Python 2023年9月8日
088
• #### 浅析Python中的struct模块

最近在学习python网络编程这一块，在写简单的socket通信代码时，遇到了struct这个模块的使用，当时不太清楚这到底有和作用，后来查阅了相关资料大概了解了，在这里做一下简单…

Python 2023年6月12日
0117
• #### 【pygame】Python 制作 XP 经典扫雷游戏

本扫雷也不必多说，先看文件格式： main.pySaolei pycache mineblock.py mineblock.cpython-39.pyc resources文章结尾…

Python 2023年9月21日
0104
• #### PyQt5的敏感词检测工具制作，运营者的福音

设计思路：根据敏感词库文件筛选，查看输入的文本中是否包含敏感词汇。从而过滤出相关的敏感词。 【阅读全文】 导入应用相关的模块。 import os import logging i…

Python 2023年11月9日
068
• #### 【华为OD机试真题 C++】TLV解析 【2022 Q4 | 100分】

■ 题目描述 TLV编码是按[Tag Length Value]格式进行编码的，一段码流中的信元用Tag标识，Tag在码流中唯一不重复，Length表示信元Value的长度，Val…

Python 2023年10月11日
090
• #### 集成学习之Stacking（堆栈）方法

文章目录 集成学习（Ensemble learning） Stacking（堆栈）方法定义 Stacking中的交叉验证 Stacking中的过拟合问题 其他 集成学习（Ensem…

Python 2023年9月16日
091
• #### 关于人脸检测和人脸关键点检测的详解（涉及Opencv 和Dlibd）

关于人脸识别，大家入门opencv，最常见的是用opencv级联分类器器里面的函数进行人脸的识别（当然里面包含很多各种物体的分类器，大家可以一一测试），今天我们来练一下关于人脸识别…

Python 2023年10月11日
092
• #### python 文件的基本操作

* f1 = open(‘read.txt’, encoding=’utf-8′,mode=’r’) f1变量称为文件操作句柄，通常以f1、file_handler、f_h等约定俗…

Python 2023年6月10日
0108