In this article, there will be a discussion about how to drop a dataframe index column. If there is a big dataset file, it makes processing slow when learning the model on that dataset, making it computationally expensive. However, to speed up the learning model and retrieve useful insights from the learning model, the less important and unnecessary column or rows from the index is removed from the data frame to make it more stable to deal with while learning model and enables retrieval of most insights from them.
*Working on Jupyter Notebook Anaconda Environment, Python version 3+.
Why is there a necessity to drop columns in machine learning?
Drop columns index from the dataset helps us avoid overfitting and other issues arising from too many variables in your model. Moreover, It helps us to attain reliable and accurate predictions from the learning model. Dropping a column helps us delete the particular column from the dataset that does not align with the other variables and is redundant.
How to remove a column in a data frame?
The first thing to do when you want to drop a column from a data frame is to use the drop() function.
However, this function takes one argument—the name of the column you want to remove. Python’s built-in function drop()
removes a column from a DataFrame by name.
DF = df.drop("test", axis='columns')
You can also use Python’s del
statement, yet there is some restriction.
However, If you wanted to remove the first column from your DataFrame, you’d drop(“column_name“).
column_name
A string containing the name of the column you want to delete from the dataframe.
Drop the columns from the dataframe is a straightforward procedure. The column name enclosed in quotes passed to the drop() function as an argument will be deleted from that Dataframe when the drop() function executes.
What is the difference between drop and del columns in Python?
Drop and delete Columns
are two different ways to remove data from a table.
The difference between these two operations is that drop
it does not change the table itself but instead the data in the table. However, Drop
deletes any existing rows with that value in its column or deletes the column by an index value. Nevertheless, Del
does not make any changes to the data in the table but instead removes all rows with that value in its column.
- While
drop
works on columns and rows, unlike drop,del
which only affects columns. - When dropping items by the
drop
, more than one can be handled at once; however, when usingDel
, only one item can be handled at a time. - There are two types of drop operations:
in-place or return-a-copy
; however,del
only operatesin-place
.
Drop
when using “inplace=False
” allows you to create a Subset DF while keeping the original DF or create a copy of the original file, but del does not
.
Example code for using the Del function in a Dataframe
import pandas as pd
col_1=pd.DataFrame({'flower':['Red Ginger','Tree Poppy','passion flower','water lily'],
'test':['similarities','accuracy','correctness','classification']},
index=[0,1,2,3])
DF = pd.DataFrame(col_1)
print ('DataFrame:', DF)
print('\n ')
del DF ['test']
DF
DataFrame: flower test
0 Red Ginger similarities
1 Tree Poppy accuracy
2 passion flower correctness
3 water lily classification
flower | |
0 | Red Ginger |
1 | Tree Poppy |
2 | passion flower |
3 | water lily |
Here is how the drop() function works
import pandas as pd
col_1=pd.DataFrame({'flower':['Red Ginger','Tree Poppy','passion flower','water lily'],
'test':['similarities','accuracy','correctness','classification']},
index=[0,1,2,3])
print ('col_1:', col_1)
print('\n ')
df = pd.DataFrame(col_1)
DF = df.drop("test", axis='columns')
DF
col_1: flower test
0 Red Ginger similarities
1 Tree Poppy accuracy
2 passion flower correctness
3 water lily classification
flower | |
0 | Red Ginger |
1 | Tree Poppy |
2 | passion flower |
3 | water lily |
The above example code creates a DataFrame called “df
, ” a copy of the “col_1
” DataFrame. However, this is done using the pd.DataFrame()
method and passing the “col_1
” DataFrame as an argument.
Using the drop()
function remove the ‘test
‘ column from the “df
” DataFrame and passing the “test
” column label as the first argument and “columns
” as the value for the “axis
” parameter. Thus, the results will be saved in a new ” DF
” variable with only the ‘flower
‘ column.
Example code of using drop parameter in a Dataframe
# Dropping an Index Column in Pandas
import pandas as pd
DF = pd.read_csv("WHO-COVID-19-global-data-Copy1.csv")
#display fist 5 rows only
(DF.head())
Date_reported | Country_code | Country | WHO_region | New_cases | Cumulative_cases | New_deaths | Cumulative_deaths | |
0 | 2023-01-01 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
1 | 2023-01-02 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
2 | 2023-01-03 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
3 | 2023-01-04 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
4 | 2023-01-05 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
#set_index() method will drop the column from the DataFrame
col_1 = pd.read_csv("WHO-COVID-19-global-data-Copy1.csv").set_index('WHO_region')
df = col_1.reset_index(drop=True,inplace= False)
df.head()
Date_reported | Country_code | Country | New_cases | Cumulative_cases | New_deaths | Cumulative_deaths | |
0 | 2023-01-01 | ZW | Zimbabwe | 0 | 259947 | 0 | 5635 |
1 | 2023-01-02 | ZW | Zimbabwe | 0 | 259947 | 0 | 5635 |
2 | 2023-01-03 | ZW | Zimbabwe | 0 | 259947 | 0 | 5635 |
3 | 2023-01-04 | ZW | Zimbabwe | 0 | 259947 | 0 | 5635 |
4 | 2023-01-05 | ZW | Zimbabwe | 0 | 259947 | 0 | 5635 |
Here is how the set_index()
drops a dataframe index column. The procedure is straightforward and prevalent.
- The
set_index()
the method is called oncol_1
with the argument ‘WHO_region
‘, which sets the index of the dataframe to the ‘WHO_region
‘ column. However, this means that the ‘WHO_region
‘ column is no longer a regular column of the dataframe but rather serves as the index for each row. - Afterwards, the
reset_index()
method is called oncol_1
, with the drop parameter set to True. Therefore, this will reset the index of the dataframe to a default integer index and remove the ‘WHO_region
‘ column from the index.
Methods to Drop a DataFrame index column
Following are different methods demonstrated and discussed in detail with example code to drop a DataFrame Index Column.
- Dropping an index column in Pandas Dataframe with .set_index() and .reset_index()
- Rest Index using drop parameter
- Rest Index without using drop parameter
- Remove Index Column While Exporting a CSV
- Remove Index Column While importing a CSV
Dropping an index column in Pandas Dataframe with .set_index() and .reset_index()
# Dropping an Index Column in Pandas
import pandas as pd
#set_index() method will drop the column from the DataFrame
col_1 = pd.read_csv("WHO-COVID-19-global-data-Copy1.csv")
df = col_1.assign(Index=range(len(col_1))).set_index('Index')
df=df.rename(index={0: "a", 1: "b", 2: "c"})
df.head()
Date_reported | Country_code | Country | WHO_region | New_cases | Cumulative_cases | New_deaths | Cumulative_deaths | |
Index | ||||||||
a | 2023-01-01 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
b | 2023-01-02 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
c | 2023-01-03 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
3 | 2023-01-04 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
4 | 2023-01-05 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
Rest Index using drop parameter
# Dropping an Index Column in Pandas
import pandas as pd
#set_index() method will drop the column from the DataFrame
col_1 = pd.read_csv("WHO-COVID-19-global-data-Copy1.csv")
df = col_1.assign(Index=range(len(col_1))).set_index('Index')
df=df.rename(index={0: "a", 1: "b", 2: "c"})
df = df.reset_index(drop=True)
df.head()
Date_reported | Country_code | Country | WHO_region | New_cases | Cumulative_cases | New_deaths | Cumulative_deaths | |
0 | 2023-01-01 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
1 | 2023-01-02 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
2 | 2023-01-03 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
3 | 2023-01-04 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
4 | 2023-01-05 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
how the whole procedure to drop a dataframe index column executes
The procedure is straightforward; here is how it operates:
- The
assign()
method is called oncol_1
with the argumentIndex=range(len(col_1))
. However, it creates a new column called ‘Index
‘ with a range of values from0 to the length of the dataframe
. Moreover, this effectively adds a new column to the dataframe with a default integer index. - Moving further, the
set_index()
method is called oncol_1
with the argument ‘Index
‘, which sets the index of the dataframe to the newly created ‘Index
‘ column. However, that means that the ‘Index
‘ column is no longer a regular column of the dataframe but rather serves as the index for each row. - Then, the
rename()
method is called ondf
, with the argumentindex={0: "a", 1: "b", 2: "c"}.
However, it updates or renames the first three rows of the index to'a', 'b', and 'c'.
- The resulting dataframe is assigned to the variable
df
, and thehead()
method is called on it to display the first5
rows of the dataframe. However, these rows should include the newly created ‘Index
‘ column and the renamed index values of'a', 'b', and 'c'.
Rest Index without using drop parameter
The reset_index()
method is called on col_1
, without any arguments. However, that will reset the index of the dataframe to a default integer index and will add a new column to the dataframe called ‘index
‘, which contains the values of the previous index. Furthermore, the resulting dataframe is assigned to the variable drop_index
, and the head()
method is called on it to display the first 5
rows of the dataframe. However, these rows should include the new ‘index
‘ column with the values of the previous index, and the original index column should no longer be present in the dataframe.
# Dropping an Index Column in Pandas
import pandas as pd
col_1 = pd.read_csv("WHO-COVID-19-global-data-Copy1.csv")
# Reset the index by setting existing index as column
drop_index = col_1.reset_index()
drop_index.head()
index | Date_reported | Country_code | Country | WHO_region | New_cases | Cumulative_cases | New_deaths | Cumulative_deaths | |
0 | 0 | 2023-01-01 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
1 | 1 | 2023-01-02 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
2 | 2 | 2023-01-03 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
3 | 3 | 2023-01-04 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
4 | 4 | 2023-01-05 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
Remove Index Column While Exporting a CSV
Python script allows dropping the index of the column while exporting the Datframe file using the to_csv()
method with the argument index=True
.
The following example code exports the drop_index
dataframe to a CSV file named “drop.csv
” using the to_csv()
method with the argument index=True.
The index parameter set to True
in the to_csv()
method means that the index of the dataframe will be included in the exported CSV file.
# Remove index while exporting csv file
# Dropping an Index Column in Pandas
import pandas as pd
#set_index() method will drop the column from the DataFrame
#col_1 = pd.read_csv("WHO-COVID-19-global-data-Copy1.csv")
df=drop_index.to_csv("drop.csv",index=True)
df
Here is the index
column in the dataframe is to be dropped when it set to index=False.
The index parameter
is set to be False
, so that the exported CSV file would not include the default index column.
# Remove index while exporting csv file
df=drop_index.to_csv("drop.csv",index=False)
df
Remove Index Column While importing a CSV
The above example code reads a CSV file named “WHO-COVID-19-global-data-Copy1.csv
” into a Pandas dataframe using the read_csv()
method, with the argument index_col=False. However, the index_col parameter specifies which column or columns should be used as the index of the resulting dataframe. In this case, index_col=False
means that no column will be used as the index, and a default integer index will be assigned to the dataframe.
# Drop Index while importing a CSV file to Python script
df = pd.read_csv("WHO-COVID-19-global-data-Copy1.csv", index_col=False)
df.head()
Date_reported | Country_code | Country | WHO_region | New_cases | Cumulative_cases | New_deaths | Cumulative_deaths | |
0 | 2023-01-01 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
1 | 2023-01-02 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
2 | 2023-01-03 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
3 | 2023-01-04 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
4 | 2023-01-05 | ZW | Zimbabwe | AFRO | 0 | 259947 | 0 | 5635 |
Conclusion
On this page, there will be a demonstration about how to remove a Dataframe Index Column from the Dataset in Pandas. However, dropping a dataframe can be tricky but not a difficult task. Here in this tutorial, different methods are discussed. Sometimes it is necessary to remove a column from a dataframe that is used for indexing because it may overfit the learning model or slow down the operation.
If you want to learn more about Python Progamming, visit Python Programming Tutorials.