拉格朗日插值法–Python

2023年5月26日下午10:58 • 大数据 • 阅读 90

数据分析

数据清洗：缺失值处理、1删除记录 2数据插补 3不处理

常见插补方法

插值法-拉格朗日插值法

根据数学知识可知，对于平面上已知的n个点（无两点在一条直线上可以找到n-1次多项式

,使次多项式曲线过这n个点。
1）求已知过n个点的n-1次多项式：

将缺失的函数值对应的点x带入多项式得到趋势值得近似值L(x)

#&#x62C9;&#x683C;&#x6717;&#x65E5;&#x63D2;&#x503C;&#x4EE3;&#x7801;
import pandas as pd #&#x5BFC;&#x5165;&#x6570;&#x636E;&#x5206;&#x6790;&#x5E93;Pandas
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import lagrange #&#x5BFC;&#x5165;&#x62C9;&#x683C;&#x6717;&#x65E5;&#x63D2;&#x503C;&#x51FD;&#x6570;

inputfile = '../data/catering_sale.xls' #&#x9500;&#x91CF;&#x6570;&#x636E;&#x8DEF;&#x5F84;
outputfile = '../tmp/sales.xls' #&#x8F93;&#x51FA;&#x6570;&#x636E;&#x8DEF;&#x5F84;

data = pd.read_excel(inputfile) #&#x8BFB;&#x5165;&#x6570;&#x636E;
temp = data[u'&#x9500;&#x91CF;'][(data[u'&#x9500;&#x91CF;'] < 400) | (data[u'&#x9500;&#x91CF;'] > 5000)] #&#x627E;&#x5230;&#x4E0D;&#x7B26;&#x5408;&#x8981;&#x6C42;&#x5F97;&#x503C; data[&#x5217;][&#x884C;]
for i in range(temp.shape[0]):
    data.loc[temp.index[i],u'&#x9500;&#x91CF;'] = np.nan #&#x628A;&#x4E0D;&#x7B26;&#x5408;&#x8981;&#x6C42;&#x5F97;&#x503C;&#x53D8;&#x4E3A;&#x7A7A;&#x503C;

#&#x81EA;&#x5B9A;&#x4E49;&#x5217;&#x5411;&#x91CF;&#x63D2;&#x503C;&#x51FD;&#x6570;
#s&#x4E3A;&#x5217;&#x5411;&#x91CF;&#xFF0C;n&#x4E3A;&#x88AB;&#x63D2;&#x503C;&#x7684;&#x4F4D;&#x7F6E;&#xFF0C;k&#x4E3A;&#x53D6;&#x524D;&#x540E;&#x7684;&#x6570;&#x636E;&#x4E2A;&#x6570;&#xFF0C;&#x9ED8;&#x8BA4;&#x4E3A;5
def ployinterp_column(s, n, k=5):
  y = s.iloc[list(range(n-k, n)) + list(range(n+1, n+1+k))] #&#x53D6;&#x6570; &#x5C31;&#x662F;&#x4F20;&#x5165;&#x5F97;data
  y = y[y.notnull()] #&#x5254;&#x9664;&#x7A7A;&#x503C;
  f = lagrange(y.index, list(y))
  return f(n) #&#x63D2;&#x503C;&#x5E76;&#x8FD4;&#x56DE;&#x63D2;&#x503C;&#x7ED3;&#x679C;

#&#x9010;&#x4E2A;&#x5143;&#x7D20;&#x5224;&#x65AD;&#x662F;&#x5426;&#x9700;&#x8981;&#x63D2;&#x503C;
for i in data.columns:
  for j in range(len(data)):
    if (data[i].isnull())[j]: #&#x5982;&#x679C;&#x4E3A;&#x7A7A;&#x5373;&#x63D2;&#x503C;&#x3002;
        data.loc[j,i] = ployinterp_column(data[i], j)

data.to_excel(outputfile) #&#x8F93;&#x51FA;&#x7ED3;&#x679C;&#xFF0C;&#x5199;&#x5165;&#x6587;&#x4EF6;
print("success")

运行结果：

这个代码是可以运行的

问题

没有报 SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

我不知道如何消除这个警告，但我只是在我不注意的时候寻找并运行它！似乎不能一次赋一个以上的值，而是单独赋值。

[En]

I don’t know how to eliminate this warning, but I just look for it and run it when I’m not paying attention! It seems that you can’t assign more than one value at a time, but assign values separately.

最后

但如果仔细观察，就会发现插入值有问题：如果我们输出插入值，就会发现存在异常值。

[En]

But if we take a closer look, we can find that there is something wrong with the inserted value: if we output the inserted value, we can see that there is an abnormal value.

我们在处理数据时把小于400，大于5000的值都变成空值，然后通过拉格朗日插值法插入值，想要把数据没有那么大的差值，但是给我们插入一个负数，并且很离谱。我检查了一下并没有发现哪里有错误；然后我把用到的数据和拟合出来的拉格朗日函数输出得到：
f=-0.008874 x + 11.53 x – 6657 x + 2.242e+06 x – 4.854e+08 x + 7.005e+10 x – 6.74e+12 x + 4.168e+14 x – 1.504e+16 x + 2.411e+17

我没有发现问题，所以我想知道拟合函数步骤是否足够准确。我加了分，但没有好的结果，但更离谱。这种情况是过度拟合的，也就是说，这个模型可以很好地适应你的训练模型，但测试模型不是。

[En]

I didn’t find a problem, so I wondered if the fitting function steps were accurate enough. I added the points, but there were no good results, but it was even more outrageous. This situation is over-fitting, that is, this model can fit your training model very well, but the test model is not.

举个例子：下面一组数据可以看到用x4函数拟合的并没有太多的点在模型上，x4函数拟合的相对较多一点，但是如果进行测试，14次方的模型可能会预测的很离谱：

最后，我降低了值点，发现我上下取4个点会有一个好的结果，当我上下取3个点、2和1(直线，不推荐)时可以接受。所以，我们拟合的五个上下点没有错，但在那个点上的拟合函数是离谱的。

[En]

Finally, I reduced the value point and found that there would be a good result when I took 4 points up and down, and it was acceptable when I went up and down to 3 points, 2, and 1 (straight line, not recommended). So there is nothing wrong with the five upper and lower points fitted by us, but the fitting function is outrageous at that point.

Original: https://www.cnblogs.com/hjk-airl/p/15766870.html
Author: hjk-airl
Title: 拉格朗日插值法–Python

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/522549/

转载文章受原作者版权保护。转载请注明原作者出处！

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

拉格朗日插值法–Python

数据分析

数据清洗：缺失值处理、1删除记录 2数据插补 3不处理

常见插补方法

插值法-拉格朗日插值法

运行结果：

问题

最后

大家都在看