Pandas Concat: Combining DataFrames with Ease
rajneesh
3 min read
- python
While analyzing the dataset, we need to combine some datasets to do over analysis. As we know, pandas Python library is one of the best libraries to work with datasets and pandas library provides a very simple, easy concatenation function by which we can easily merge, join and concatenate datasets. In this blog post, I will cover all the important and common parameters which help you combine data with your required output.
What is pandas.concat
?
Pandas.concat is a function in the pandas Python library that is used to combine two or more datasets into a single dataset. Pandas is a popular library for data analysis. this function helps combine data along with a particular axis with optional set logic along with other axis.
Basic Usage of pandas.concat
concatenate two DataFrame
objects,using the concat
function. Here's a simple example:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']})
result = pd.concat([df1, df2])
print(result)
#output:-
'''
A B
0 A0 B0
1 A1 B1
2 A2 B2
0 A3 B3
1 A4 B4
2 A5 B5
'''
Detailed Explanation of Parameters
objs
Description: Series or DataFrame objects.
Type: List-like (e.g., list, tuple, dict)
result = pd.concat([df1, df2])
#output: result shown in above code block.
axis
Description: axis to concatenate along; default = 0
Type: axis=0 means along with index and axis=1 means along with columns
Default: default value of axis is 0
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'c': ['A3', 'A4', 'A5'],
'd': ['B3', 'B4', 'B5']})
result = pd.concat([df1, df2], axis=1)
output:
'''
A B C D
0 A0 B0 A3 B3
1 A1 B1 A4 B4
2 A2 B2 A5 B5
'''
#output with value 0:
'''
A B
0 A0 B0
1 A1 B1
2 A2 B2
0 A3 B3 <- it this line C and D dataset started
1 A4 B4
2 A5 B5
'''
ignore_index
Description: If True, do not use the index values along the concatenation axis. its new index start with 0 to length-1
Type: boolean
Default: False
result = pd.concat([df1, df2], ignore_index=True)
#output:-
'''
A B
0 A0 B0
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
'''
keys
Description: Sequence of labels used as keys for results, by which we can easily identify the start of datasets.
Type: sequence, default None
result = pd.concat([df1, df2], keys=['first dataset', 'sec dataset'])
#output:-
'''
A B
first_dataset_start 0 A0 B0
1 A1 B1
2 A2 B2
sec_dataset_start 0 A3 B3
1 A4 B4
2 A5 B5
'''
verify_integrity
Description: Check the while combining the two datasets its not have any duplicate and any other error. if it have then excepting code run.
Type: boolean, default False
try:
result = pd.concat([df1, df2], verify_integrity=True)
except ValueError as e:
print("ValueError:", e)
sort
Description: non-concatenation axis if it is not already aligned when join is ‘outer’; default = False
Type: boolean, default False
result = pd.concat([df1, df2], sort=True)
Conclusion
In this blog post I have covered all the commonly used parameters of concat function of pandas library along with examples by which you can easily understand concat function. sort, axis, ignore_index and many other parameters are covered in this post. This post mainly targets beginners in pandas, so I have covered simple and easy examples.