【Python】merge、join、concat和append用法比较

merge

  • merge 函数通过一个或多个键将数据集的行连接起来。
  • 场景:针对同一个主键存在的两张包含不同特征的表,通过主键的链接,将两张表进行合并。合并之后,两张表的行数不增加,列数是两张表的列数之和。
def merge(left, right, how='inner', on=None, left_on=None, right_on=None,
     left_index=False, right_index=False, sort=False,
     suffixes=('_x', '_y'), copy=True, indicator=False,
     validate=None):

【Python】merge、join、concat和append用法比较
1、默认参数:以重叠的列名当作连接键
2、how=’left’:取左边的交集
import pandas as pd
import numpy as np

df1 = pd.DataFrame({'key': ['one', 'two', 'two'],
          'data1': np.arange(3)})
df2 = pd.DataFrame({'key': ['one', 'three', 'three'],
          'data2': np.arange(3)})
df3 = pd.merge(df1, df2)
df4 = pd.merge(df1, df2, how='left')
print(df1)
print(df2)
print(df3)
print(df4)
   key  data1
0  one      0
1  two      1
2  two      2
   key  data2
0  one      0
1  three    1
2  three    2
   key  data1  data2
0  one      0      0
   key  data1  data2
0  one      0    0.0
1  two      1    NaN
2  two      2    NaN

3、多键连接时将连接键做成列表传入。on默认是两者同时存在的列,outer取并集

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'key': ['one', 'two', 'two'],
          'value': ['a', 'b', 'c'],
          'data1': np.arange(3)})
df2 = pd.DataFrame({'key': ['one', 'two', 'three'],
          'value': ['a', 'c', 'c'],
          'data2': np.arange(3)})
df5 = pd.merge(df1, df2)
df6 = pd.merge(df1, df2, on=['key', 'value'], how='outer')
print(df1)
print(df2)
print(df5)
print(df6)

   key value  data1
0  one     a      0
1  two     b      1
2  two     c      2
   key value  data2
0  one     a      0
1  two     c      1
2  three   c      2
   key value  data1  data2
0  one     a      0      0
1  two     c      2      1
   key value  data1  data2
0  one     a    0.0    0.0
1  two     b    1.0    NaN
2  two     c    2.0    1.0
3  three   c    NaN    2.0

4、两个对象的列名不同,需要分别制定

import pandas as pd
import numpy as np

df7 = pd.merge(df1, df2, left_on=['key1','data1'], right_on=['key2','data2'], how='outer')
print(df7)
 key1 value_x data1  key2 value_y data2
0 one    a  0.0  one    a  0.0
1 two    b  1.0  two    c  1.0
2 two    c  2.0  NaN   NaN  NaN
3 NaN   NaN  NaN three    c  2.0

join

  • join方法将两个DataFrame中不同的列索引合并成为一个DataFrame
  • 参数的意义与merge基本相同,只是join方法默认左外连接how=left
def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
     sort=False):

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A1'],
          'B': ['B0', 'B1', 'B2']},
          index=['K0', 'K1', 'K2'])
df2 = pd.DataFrame({'C': ['C1', 'C2', 'C3'],
          'D': ['D0', 'D1', 'D2']},
          index=['K0', 'K1', 'K3'])
df3 = df1.join(df2)
df4 = df1.join(df2, how='outer')
df5 = df1.join(df2, how='inner')
print(df1)
print(df2)
print(df3)
print(df4)
print(df5)

     A   B
K0  A0  B0
K1  A1  B1
K2  A1  B2
     C   D
K0  C1  D0
K1  C2  D1
K3  C3  D2
     A   B    C    D
K0  A0  B0   C1   D0
K1  A1  B1   C2   D1
K2  A1  B2  NaN  NaN
      A    B    C    D
K0   A0   B0   C1   D0
K1   A1   B1   C2   D1
K2   A1   B2  NaN  NaN
K3  NaN  NaN   C3   D2
     A   B   C   D
K0  A0  B0  C1  D0
K1  A1  B1  C2  D1

concat

  • 制定按某个轴进行连接(可横向可纵向),也可以指定连接方法。
def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
      keys=None, levels=None, names=None, verify_integrity=False,
      sort=None, copy=True):

【Python】merge、join、concat和append用法比较

实例1

import pandas as pd
import numpy as np

s1 = pd.Series(['a', 'b'])
s2 = pd.Series(['c', 'd'])
s3 = pd.concat([s1, s2])
s4 = pd.concat([s1, s2], ignore_index=True)
print(s1)
print(s2)
print(s3)
print(s4)
0    a
1    b
dtype: object
0    c
1    d
dtype: object
0    a
1    b
0    c
1    d
dtype: object
0    a
1    b
2    c
3    d
dtype: object

实例2

import pandas as pd
import numpy as np

df1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['A', 0])
df2 = pd.DataFrame([['a', 1], ['b', 2]], columns=['B', 0])
df3 = pd.concat([df1, df2], join='inner')
print(df1)
print(df2)
print(df3)
   A  0
0  a  1
1  b  2

   B  0
0  a  1
1  b  2

   0
0  1
1  2
0  1
1  2

实例3

import pandas as pd
import numpy as np

df1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['A', 0])
df2 = pd.DataFrame([['a', 1], ['b', 2]], columns=['B', 0])
df3 = pd.concat([df1, df2], axis=1)
print(df1)
print(df2)
print(df3)
   A  0
0  a  1
1  b  2

   B  0
0  a  1
1  b  2

   A  0  B  0
0  a  1  a  1
1  b  2  b  2

append

  • 横向和纵向同时扩充,不考虑columns和index
import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A1'],
          'B': ['B0', 'B1', 'B2']},
          index=['K0', 'K1', 'K2'])
s2 = pd.Series(['X0','X1'], index=['A','B'])
result = df1.append(s2, ignore_index=True)
print(df1)
print(s2)
print(result)
     A   B
K0  A0  B0
K1  A1  B1
K2  A1  B2

A    X0
B    X1
dtype: object

    A   B
0  A0  B0
1  A1  B1
2  A1  B2
3  X0  X1

汇总

  • concat:可以沿一条轴将多个对象连接到一起
  • merge:可以根据一个或多个键将不同的DataFrame中的行连接起来。
  • join:inner是交集,outer是并集。

Original: https://blog.csdn.net/weixin_44485744/article/details/117220232
Author: 漆黑丶
Title: 【Python】merge、join、concat和append用法比较

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/742424/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球