How to drop column pandas
How to drop column pandas
How To Drop Column in Pandas Dataframe – Definitive Guide
Pandas Data frame is a data structure that stores values in a tabular format. During the data analysis operation on a dataframe, you may need to drop a column in Pandas.
You can drop column in pandas dataframe using the df.drop(“column_name”, axis=1, inplace=True) statement.
If You’re in Hurry…
You can use the below code snippet to drop the column from the pandas dataframe.
Note: If more than one column exists with the same name, then both the column with this name will be dropped.
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the different methods drop() and pop() to delete columns in pandas in various scenarios.
drop() method will return a copy of the dataframe after deleting the column. Use drop() when you want to remove the column from the dataframe and no operation needs to be performed on the deleted column.
pop() method returns the column that is being deleted. Use pop() when you want to create a dummy column that will be temporarily used for some operation.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
Dataframe Looks Like
Lang | Difficulty | Difficulty_Score | Type | |
---|---|---|---|---|
0 | Java | Medium | 5 | Statically Typed |
1 | Python | Easy | 2 | Dynamically Typed |
2 | Cobol | Hard | 10 | NaT |
3 | Javascript | Medium | 8 | Dynamically typed |
Now you’ll see the various methods to drop columns in pandas.
Drop Column By Index
In this section, you’ll learn how to drop column by index in Pandas dataframe.
You can use df.columns[index] to identify the column name in that index position and pass that name to the drop method.
An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.
After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the third column difficulty_score is deleted in the dataframe as shown below.
Dataframe Looks Like
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
You’ve removed the column using its index.
Next, you’ll remove the column by name.
Drop Column By Name
In this section, you’ll learn how to drop columns by name in Pandas dataframe.
You can use the column name directly to the drop method.
If the column is existing then, it’ll be dropped from the dataframe. If the column doesn’t exist, then the error will be raised. You can control the error behavior using the errors = ‘ignore’. You’ll see the error handling in detail at a later point in this tutorial.
After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the column difficulty_score is deleted in the dataframe as shown below.
Dataframe Looks Like
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
You’ve removed the column from the dataframe using its name.
Next, you’ll learn how to drop multiple columns by index.
Drop Multiple Columns by Index
In this section, you’ll learn how to drop multiple columns by index.
You can use df.columns[index1, index2, indexn] to identify the list of column names in that index positions and pass that list to the drop method.
An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.
After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the columns difficulty and difficulty_score with indexes 1 and 2 are deleted.
Dataframe Looks Like
Lang | Type | |
---|---|---|
0 | Java | Statically Typed |
1 | Python | Dynamically Typed |
2 | Cobol | NaT |
3 | Javascript | Dynamically typed |
You’ve deleted multiple columns using index in pandas dataframe.
Next, you’ll learn how to drop columns by list of names.
Drop Columns By List of Names
In this section, you’ll learn how to drop columns by a list of names.
You can do this by passing the columns as list [«column 1», «column 2»] to the drop method as shown below.
After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the columns difficulty_score and type are deleted.
Dataframe Looks Like
Lang | Difficulty | |
---|---|---|
0 | Java | Medium |
1 | Python | Easy |
2 | Cobol | Hard |
3 | Javascript | Medium |
You’ve deleted multiple columns by a list of names of columns.
Next, you’ll see how to delete a column if exists.
Drop Column If exists
In this section, you’ll learn how to drop column if exists in the dataframe.
Here, you’ll control the error behavior during the delete operation by using the errors=’ignore’ operation.
By default, during the drop operation if the column is not existing in the dataframe, then the error KeyError: «[‘Difficulty_Score’ ‘Type’] not found in axis» will be raised.
To drop column only if exists without raising any error, then you can specify errors=’ignore’ in the drop method as shown below.
Here, you’re deleting the columns Difficulty_score and Type. It’ll be deleted and the dataframe will consist of only two columns Lang and Difficulty.
Dataframe Looks Like
Lang | Difficulty | |
---|---|---|
0 | Java | Medium |
1 | Python | Easy |
2 | Cobol | Hard |
3 | Javascript | Medium |
You’ll see the key error exception being raise as shown below.
This is how you can delete column only if exists and ignore errors if it doesn’t exist in the dataframe.
Next, you’ll see how to drop a column that doesn’t have a name.
Drop Column No Name
In this section, you’ll see how to drop a column with no name.
Pandas dataframe can contain a column that has a blank name or in other words, can contain a column without a name.
Assume that, the sample dataframe column with index 2 doesn’t have a name. Read rename column in pandas to know more about renaming or removing the name of the column in pandas dataframe.
Now, you can drop such columns by using the index df.columns[2] as shown below.
The column with index 2 is deleted.
Dataframe Looks Like
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
This is how you can delete columns without names.
Next, you’ll see how to drop columns with Nan values.
Drop column with Nan
In this section, you’ll learn how to drop columns with Nan.
Nan means missing data and it can be used to denote when you don’t know the value for a cell in the dataframe.
When working with data frames, you may need to delete a column that has this type of missing data.
In the same dataframe, the column Type has missing data for the row index 2 as shown below.
Lang | Difficulty | Difficulty_Score | Type | |
---|---|---|---|---|
2 | Cobol | Hard | 10 | NaT |
You can use dropna method to drop such columns.
Use the below snippet to delete a column that has at least one missing data.
Now, the column type will be dropped from the dataframe as shown below.
Dataframe Looks Like
Lang | Difficulty | Difficulty_Score | |
---|---|---|---|
0 | Java | Medium | 5 |
1 | Python | Easy | 2 |
2 | Cobol | Hard | 10 |
3 | Javascript | Medium | 8 |
This is how you can delete columns that have missing data.
Next, you’ll learn how to drop all columns after a specific column.
Drop All Columns After Specific Column
In this section, you’ll learn how to drop all columns after a specific column.
For example, you may need to do this when you want to perform an operation on the first three columns.
You can achieve this by using the df.loc function. loc function is used to select rows or columns by using the label name.
Use df.loc[:, :’specific_column’] to create a dataframe with the columns until the specific_column with all the rows.
Now, df.loc[] will return a copy of a dataframe with all rows and columns until difficulty.
Dataframe Looks Like
Lang | Difficulty | |
---|---|---|
0 | Java | Medium |
1 | Python | Easy |
2 | Cobol | Hard |
3 | Javascript | Medium |
This is how you can drop all columns after a specific column.
Next, you’ll learn how to drop columns based on row value.
Drop column based on Row Value
In this section, you’ll learn how to drop columns based on row value.
You may want to use this when you want to delete a column with a value that has a specific value so that you can ignore those values in the data analysis.
You can evaluate the row value by using an IF statement.
In the IF statement, you can pass the condition which needs to be evaluated.
Now, to drop the column if it has a value greater than 7, then use the below snippet.
Since the column difficulty_score is greater than 7, it’ll delete the column from the dataframe. If it doesn’t have then the column will not be removed.
Dataframe Looks Like
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
This is how you can drop columns based on row values using IF statements.
Next, you’ll learn how to delete columns from pandas dataframe using the POP() function.
Drop Column Using POP
You can use this method when you want to pop out a column from the dataframe and store it in a separate dataframe object to perform some temporary operations.
You can use df.pop(«Difficulty_Score») to pop out a Difficulty_Score column from the dataframe. It’ll return the column and store it in the popped_df object as shown below.
Popped dataframe looks like
Dataframe Looks Like
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
This is how you can drop columns from pandas using Pop() method.
Next, you’ll learn how to drop columns using iloc.
Pandas Drop Column Using iloc
In this section, you’ll learn how to drop columns using iloc.
You can achieve this by using the df.iloc function. iloc function is used to select rows or columns by using the index of the columns.
Use df.iloc[:, 1:3] to select columns from positions 1 to 3. The index is 0 based. Hence it’ll select columns 2 to 4.
When you use this in the drop method, then column 2 to 4 will be dropped.
Now, the columns from 2 to 4 will be dropped from the dataframe.
Dataframe Looks Like
Lang | Type | |
---|---|---|
0 | Java | Statically Typed |
1 | Python | Dynamically Typed |
2 | Cobol | NaT |
3 | Javascript | Dynamically typed |
This is how you can delete columns from pandas dataframe using iloc.
Conclusion
To summarize, you’ve learned how to drop columns from pandas dataframe with various methods available. You’ve also learned about the sample use-cases when each of these methods will be useful.
If you’ve any questions feel free to comment below.
Use Pandas to Drop Columns and Rows
Working with bigger dataframes, you’ll find yourself wanting to use Pandas to drop columns or rows.
Pandas has a number of different ways to do this. In this post, you’ll learn all you need to know about the drop function.
Table of Contents
Putting Together the Dataframe
To get started, let’s put together a sample dataframe that you can use throughout the rest of the tutorial. Take a look at the code below to put together the dataframe:
By using the df.head() function, you can see what the dataframe’s first five rows look like:
Name | Score | Height | Weight | |
---|---|---|---|---|
0 | Nik | 100 | 178 | 180 |
1 | Jim | 120 | 180 | 175 |
2 | Alice | 96 | 160 | 143 |
3 | Jane | 75 | 165 | 155 |
4 | Matt | 68 | 185 | 167 |
How to use the drop function in Pandas
The Pandas drop function is a helpful function to drop columns and rows. Let’s take a quick look at how the function works:
Let’s look at what the arguments mean:
Throughout this tutorial, we’ll focus on the axis, index, and columns arguments.
How to drop columns in Pandas
Drop a Single Column in Pandas
There are multiple ways to drop a column in Pandas using the drop function.
If you wanted to drop the Height column, you could write:
This prints out:
Name | Score | Weight | |
---|---|---|---|
0 | Nik | 100 | 180 |
1 | Jim | 120 | 175 |
2 | Alice | 96 | 143 |
3 | Jane | 75 | 155 |
4 | Matt | 68 | 167 |
Personally, I find the axis argument a little awkward.
You can use the columns argument to not have to specify and axis at all:
This prints out the exact same dataframe as above:
Name | Score | Weight | |
---|---|---|---|
0 | Nik | 100 | 180 |
1 | Jim | 120 | 175 |
2 | Alice | 96 | 143 |
3 | Jane | 75 | 155 |
4 | Matt | 68 | 167 |
Drop Multiple Columns in Pandas
In order to drop multiple columns, follow the same steps as above, but put the names of columns into a list.
If you wanted to drop the Height and Weight columns, this could be done by writing either of the codes below:
Both of these return:
Name | Score | |||
---|---|---|---|---|
0 | Nik | 100 | ||
1 | Jim | 120 | ||
2 | Alice | 96 | ||
3 | Jane | 75 | ||
4 | Matt | 68 |
Name | Score | |
---|---|---|
0 | Nik | 100 |
1 | Jim | 120 |
2 | Alice | 96 |
3 | Jane | 75 |
4 | Matt | 68 |
How to drop rows if it contains a certain value in Pandas
Pandas makes it easy to drop rows based on a condition.
For example, if we wanted to drop any rows where the weight was less than 160, you could write:
This returns the following:
Name | Score | Height | Weight | |
---|---|---|---|---|
0 | Nik | 100 | 178 | 180 |
1 | Jim | 120 | 180 | 175 |
4 | Matt | 68 | 185 | 167 |
5 | Kate | 123 | 187 | 189 |
Let’s explore what’s happening in the code above:
To drop columns using the column number, you can use the iloc selector.
For example, if you wanted to drop columns of indices 1 through 3, you could write the following code:
To learn more about the iloc select (and all the other selectors!), check out this comprehensive guide to 4 Ways to Use Pandas to Select Columns in a Dataframe.
This returns the following dataframe:
Name | Weight | |
---|---|---|
0 | Nik | 180 |
1 | Jim | 175 |
2 | Alice | 143 |
3 | Jane | 155 |
4 | Matt | 167 |
5 | Kate | 189 |
Conclusion
Thanks for reading all the way to here!
In this tutorial, we learned how to use the drop function in Pandas. Specifically, we learned how to drop single columns/rows, multiple columns/rows, and how to drop columns or rows based on different conditions.
If you still want to dive a little deeper into the drop function, check out the official documentation.
Want to learn Python for Data Science? Check out my ebook!
thatascience
Achieve Dreams with thatascience
8 Ways to Drop Columns in Pandas
Often there is a need to modify a pandas dataframe to remove unnecessary columns or to prepare the dataset for model building. Column manipulation can happen in a lot of ways in Pandas, for instance, using df.drop method selected columns can be dropped. In this comprehensive tutorial we will learn how to drop columns in pandas dataframe in following 8 ways:
1. Making use of “columns” parameter of drop method
2. Using a list of column names and axis parameter
3. Select columns by indices and drop them : Pandas drop unnamed columns
4. Pandas slicing columns by index : Pandas drop columns by Index
5. Pandas slicing columns by name
6. Python’s “del” keyword :
7. Selecting columns with regex patterns to drop them
8. Dropna : Dropping columns with missing values
This detail tutorial shows how to drop pandas column by index, ways to drop unnamed columns, how to drop multiple columns, uses of pandas drop method and much more. Furthermore, in method 8, it shows various uses of pandas dropna method to drop columns with missing values.
Let’s get started.
First, let’s understand pandas drop method and it’s parameters.
Pandas Dataframe’s drop() method
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)
labels: String/List of column names or row index value.
axis: 0 or “index” for rows. 1 or “columns” for columns.
index: to provide row labels
columns: to provide column names
level: to specify level in case of multi-index dataframes
inplace: modifies original dataframe if set True
errors: ignores errors(eg. if provided column does not exist in dataframe) if set ‘ignore’.
Drop column in pandas python
Delete or drop column in python pandas by done by using drop() function. Here we will focus on Drop single and multiple columns in pandas using index (iloc() function), column name(ix() function) and by position. Drop column name that starts with, ends with, contains a character and also with regular expression and like% function. Let’s see example of each.
First let’s create dataframe
Create Dataframe
The resultant dataframe will be
Delete or drop column in pandas by column name using drop() function
Let’s see an example of how to drop a column by name in python pandas
The above code drops the column named ‘Age’, the argument axis=1 denotes column, so the resultant dataframe will be
Drop single column in pandas by using column index
Let’s see an example on dropping the column by its index in python pandas
In the above example column with index 3 is dropped(4 th column). So the resultant dataframe will be
Delete a column based on column name:
In the above example column with the name ‘Age’ is deleted. So the resultant dataframe will be
Drop multiple columns based on column name in pandas
Let’s see an example of how to drop multiple columns by name in python pandas
The above code drops the columns named ‘Age’ and ’Score’. The argument axis=1 denotes column, so the resultant dataframe will be
Drop multiple columns based on column index in pandas
Let’s see an example of how to drop multiple columns by index.
In the above example column with index 1 (2 nd column) and Index 3 (4 th column) is dropped. So the resultant dataframe will be
Drop multiple columns between two column index in pandas
Let’s see an example of how to drop multiple columns between two index using iloc() function
In the above example column with index 1 (2 nd column) and Index 2 (3 rd column) is dropped. So the resultant dataframe will be
Drop multiple columns between two column names in pandas
Let’s see an example of how to drop multiple columns between two column name using ix() function and loc() function
OR
In the above example column name starting from “country” ending till “score” is removed. So the resultant dataframe with 3 columns removed will be
Drop multiple columns that starts with character in pandas
Let’s see an example of how to drop multiple columns that starts with a character in pandas using loc() function
In the above example column name starting with “A” will be dropped. So the resultant dataframe will be
Drop multiple columns that ends with character in pandas
Let’s see an example of how to drop multiple columns that ends with a character using loc() function
In the above example column name ending with “e” will be dropped. So the resultant dataframe will be
Drop multiple columns that contains a character (like%) in pandas
Let’s see an example of how to drop multiple columns that contains a character (like%) in pandas using loc() function
In the above example column name that contains “sc” will be dropped. case=False indicates column dropped irrespective of case. So the resultant dataframe will be
Drop columns using regular expression in pandas – regex
Let’s see an example of how to drop columns using regular expressions – regex.
In the above example column starts with “sc” will be dropped using regular expressions. So the resultant dataframe will be
Pandas Drop Column: How to Drop Column in DataFrame
Pandas DataFrame drop() method allows us to remove columns and rows from the DataFrame object.
Pandas Drop Column
To drop or remove the column in DataFrame, use the Pandas DataFrame drop() method. The df.Drop() method deletes specified labels from rows or columns. It removes the rows or columns by specifying label names and corresponding axis, or by specifying index or column names directly.
When using a multi-index, labels on different levels can be removed by specifying the level.
Syntax
Parameters
labels: single label or list-like
Index or column labels to drop.
Whether to drop labels from an index (0 or ‘index’) or columns (1 or ‘columns’).
index: single label or list-like
Alternative to defining the axis (labels, axis=0 is equivalent to index=labels).
columns: single label or list-like
Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
level: int or level name, optional
For MultiIndex, the level from which the labels will be removed.
inplace: bool, default False
If False, return a copy. Otherwise, do operation inplace and returns None.
If ‘ignore’, suppress error, and only existing labels are dropped.
Return Value
The drop() function returns the DataFrame without the removed index or column labels.
Raises
The drop() method can raise the KeyError If any of the labels are not found in the selected axis.
How to Drop Column in DataFrame
Drop one or more than one column from the DataFrame can be achieved in multiple ways.
Removing columns using df.drop()
To create a DataFrame from Dictionary, use the pd.DataFrame.from_dict() function.
Output
You can see that DataFrame is created with four rows and four columns.
To drop a single column from DataFrame, use the drop() method and pass only one column in the columns list like below.
Output
You can see that we tried to remove the Season column, and it does remove the column.
Removing multiple columns from DataFrame
To remove multiple columns from DataFrame, pass the list of columns that needs to be removed while using the drop() function.
Output
You can see that we passed a list of columns like Season and Streaming, and in the output, it is removed from the DataFrame.
Removing columns based on the column index
To remove columns as index base, use df.columns() function.
In this example, we want to remove the column index 1 and 2, which is Streaming and Season. So, we are eliminating the columns using column index using df.columns[] property and pass the column indexes to the list.
Drop Columns using iloc[ ] and drop()
To remove all the columns between the specific columns, use the iloc[ ] and drop() method.
Output
Pandas.DataFrame.iloc is the unique inbuilt property that returns integer-location based indexing for selection by position. We use this function to get the index of the column and then pass that to the drop() method and remove the columns based on the indices.
Drop Columns using loc[ ] and drop()
Pandas DataFrame loc[] is used to access the group of rows and columns by labels or a Boolean array. See the following code.
Output
In this example, we use the loc[ ] method to group the columns and remove those columns from the DataFrame using the df.drop() method.
The Difference between loc( ) and iloc( ) is that iloc( ) excludes the last column range element.
Suppressing Errors in Dropping Columns and Rows
If the DataFrame doesn’t contain the given labels, KeyError is raised.
Output
We can suppress this error by specifying errors=’ignore’ in the drop() function call.
Output
Conclusion
Pandas DataFrame drop() is a beneficial method to remove unwanted columns and rows. We have seen how to use iloc[] and loc[] with the drop() method.
Источники информации:
- How to drop barrel gpo
- How to drop columns pandas