Join Two Pandas Dataframe On A Common Column in Python

On this Page, we will learn how to join two pandas dataframe on a common column in python. Therefore, To join two dataframes on a common column, we need to use the merge function. However, The merge function joins a dataframe by using a specified row as the key and an additional column as the value. Moreover, You can use the concatenate method for more advanced scenarios where you want to join dataframes with common column types. 

  • Working on Jupyter Notebook Anaconda Environment.

If you want to learn more about Python Programming, visit Python Programming Tutorials.

Join two common columns using the merge method.

One way to do this is to use the merge function. It will allow you to combine the two dataframes into one, and the common column will be filled with the corresponding values in both dataframes.

In this case, “on” specifies the field name being joined across both dataframes. 

The “how” defines if it’s an inner, outer, left, or right join. However, when using “outer,” both frames’ keys are merged. 

The execution process executes the joining of common columns in the following manner:

  • Creating the two dataframes in Pandas in your Python script.
  • Then, call the merge function to perform the task of joining two identical columns in Python.
  • In the merge function, pass arguments flower, test, on= ‘flower’, how=”outer”, where “flower” and “test” are dataframes, and as mentioned above, joining is performed on columns. DataFrames are merged if ”on” is None and not merging on indexes. Moreover, “outer” merges both dataframes keys.

In the following example, we will join two pandas’ dataframe on a common column in python.

import pandas as pd
flower=pd.DataFrame({'flower':['Red Ginger','Tree Poppy','passion flower','water lily'],'test':['similarities','accuracy','correctness','classification']},
                 index=[0,1,2,3])
test=pd.DataFrame({'flower':['Red Ginger','Tree Poppy','rose flower','sun flower'],'cluster':['cluster_1','cluster_2','cluster_3','cluster_4' ]},
                 index=[4,5,6,7])
analysis = pd.merge(flower, test, on= 'flower', how="outer")
analysis
flowertestcluster
0Red Gingersimilaritiescluster_1
1Tree Poppyaccuracycluster_2
2passion flowercorrectnessNaN
3water lilyclassificationNaN
4rose flowerNaNcluster_3
5sun flowerNaNcluster_4

Join two common columns using concatenate method.

You can then use pandas’ concat function on each of these dataframes individually if you want to view only one version of the merged pandas dataframe. Return all columns from DataFrame objects that overlap. If there will be no values in the columns outside the intersection, they will be empty. It will return the “NaN” value.

Let’s say you had two pandas dataframe called flowerand test; you can merge them like so:

import pandas as pd
flower=pd.DataFrame({'flower':['Red Ginger','Tree Poppy','passion flower','water lily'],'test':['similarities','accuracy','correctness','classification']},
                 index=[0,1,2,3])
test=pd.DataFrame({'flower':['Iris','Jasmine','rose flower','sun flower'],'cluster':['cluster_1','cluster_2','cluster_3','cluster_4' ]},
                 index=[4,5,6,7])
pd.concat([flower,test],  join='outer')
flowertestcluster
0Red GingersimilaritiesNaN
1Tree PoppyaccuracyNaN
2passion flowercorrectnessNaN
3water lilyclassificationNaN
4IrisNaNcluster_1
5JasmineNaNcluster_2
6rose flowerNaNcluster_3
7sun flowerNaNcluster_4

Conclusion

On this page, there are two methods discussed with examples about how to join two Pandas dataframe on a common column in Python. However, there are other possibilities to join common columns from two dataframes. However, on this page merge() and concat() methods of Pandas are elaborated.

Leave a Comment

Your email address will not be published. Required fields are marked *