In [1]: import pandas as pd



In [2]: import numpy as np



In [3] : df1 = pd DataFrame (np) ones ((2, 4)), the columns = list (” abcd “), the index = list (” AB “))



In [4]: df1

Out[4]:

a b c d

A 1.0 1.0 1.0 1.0 1.0

B 1.0 1.0 1.0 1.0



In [5] : df2 = pd DataFrame (np) zeros ((3, 3)), the columns = list (” xyz “), the index = list (” ABC “))



In [6]: df2

Out[6]:

x y z

A 0.0 0.0 0.0

0.0 0.0 0.0 B

0.0 0.0 0.0 C



A JOIN is a join based on the row index



In [7]: df1.join(df2)

Out[7]:

a b c d x y z

A 1.0 1.0 1.0 1.0 0.0 0.0 0.0

B 1.0 1.0 1.0 1.0 0.0 0.0 0.0



In [8]: df2.join(df1)

Out[8]:

x y z a b c d

A 0.0 0.0 0.0 1.0 1.0 1.0 1.0

B 0.0 0.0 0.0 1.0 1.0 1.0 1.0

C 0.0 0.0 0.0 NaN NaN NaN

Merge: Merges data in a specified way based on a specified column







In [41]: Df3 = pd.dataframe (np.Arange (9). 0



In [42]: df3

Out[42]:

f a x

0 0 1 2

1, 3, 4, 5

2 6 7 8



In [44]: df1 = pd.dataframe (np.ones((2,4)), columns=list(“abcd”), index=list(“A “)

. : B”))



In [45]: df1

Out[45]:

a b c d

A 1.0 1.0 1.0 1.0 1.0

B 1.0 1.0 1.0 1.0



In [47]: df1.merge(df3, on=”a”)

Out[47]:

a b c d f x

0 1.0 1.0 1.0 1.0 0 2

1 1.0 1.0 1.0 1.0 0 2









In [49]: df1.loc[“A”, “a”] = 100



In [50]: df1

Out[50]:

a b c d

A 100.0 1.0 1.0 1.0

B 1.0 1.0 1.0 1.0



In [51]: df1.merge(df3, on=”a”)

Out[51]:

a b c d f x

0 1.0 1.0 1.0 1.0 0 2

If you combine them by column,df1 and DF3, the “A” column has the same value of 1, so that’s what you get



In [52]: df3

Out[52]:

f a x

0 0 1 2

1, 3, 4, 5

2 6 7 8









In [53]: df1

Out[53]:

a b c d

A 100.0 1.0 1.0 1.0

B 1.0 1.0 1.0 1.0



In [54]: df3

Out[54]:

f a x

0 0 1 2

1, 3, 4, 5

2 6 7 8



The default is inner join

In [55]: df1.merge(df3, on=”a”, how=”inner”)

Out[55]:

a b c d f x

0 1.0 1.0 1.0 1.0 0 2









In [56]: df3

Out[56]:

f a x

0 0 1 2

1, 3, 4, 5

2 6 7 8



In [57]: df1

Out[57]:

a b c d

A 100.0 1.0 1.0 1.0

B 1.0 1.0 1.0 1.0





Outer join



In [58]: df1.merge(df3, on=”a”, how=”outer”)

Out[58]:

a b c d f x

0 100.0 1.0 1.0 1.0 NaN NaN

1 1.0 1.0 1.0 1.0 0.0 2.0

2 4.0 NaN NaN NaN 3.0 5.0

3 7.0 NaN NaN NaN 6.0 8.0











In [59]: df1

Out[59]:

a b c d

A 100.0 1.0 1.0 1.0

B 1.0 1.0 1.0 1.0



In [60]: df3

Out[60]:

f a x

0 0 1 2

1, 3, 4, 5

2 6 7 8



Left connection

In [61]: df1.merge(df3, on=”a”, how=”left”)

Out[61]:

a b c d f x

0 100.0 1.0 1.0 1.0 NaN NaN

1 1.0 1.0 1.0 1.0 0.0 2.0

The right connection



In [62]: df1.merge(df3, on=”a”, how=”right”)

Out[62]:

a b c d f x

0 1.0 1.0 1.0 1.0 0 2

1 4.0 NaN NaN NaN 3 5

2 7.0 NaN NaN NaN 6 8









The default merge mode is inner, intersection



Merge outer, union, NaN completion



Merge left, left prevails, NaN completion



Merge right, right prevails, NaN completion