How to change type of column pandas
How to change type of column pandas
pandas.DataFrame.astypeВ¶
Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use
copy bool, default True
Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).
errors <вЂraise’, вЂignore’>, default вЂraise’
Control raising of exceptions on invalid data for provided dtype.
raise : allow exceptions to be raised
ignore : suppress exceptions. On error return original object.
Returns casted same type as caller
Convert argument to datetime.
Convert argument to timedelta.
Convert argument to a numeric type.
Cast a numpy array to a specified type.
Deprecated since version 1.3.0: Using astype to convert from timezone-naive dtype to timezone-aware dtype is deprecated and will raise in a future version. Use Series.dt.tz_localize() instead.
Create a DataFrame:
Cast all columns to int32:
Cast col1 to int32 using a dictionary:
Create a series:
Convert to categorical type:
Convert to ordered categorical type with custom ordering:
Note that using copy=False and changing data on a new pandas object may propagate changes:
Change column type in pandas
I want to convert a table, represented as a list of lists, into a pandas DataFrame. As an extremely simplified example:
What is the best way to convert the columns to the appropriate types, in this case columns 2 and 3 into floats? Is there a way to specify the types while converting to DataFrame? Or is it better to create the DataFrame first and then loop through the columns to change the type for each column? Ideally I would like to do this in a dynamic way because there can be hundreds of columns and I don’t want to specify exactly which columns are of which type. All I can guarantee is that each columns contains values of the same type.
14 Answers 14
Trending sort
Trending sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.
It falls back to sorting by highest score if no posts are trending.
Switch to Trending sort
You have four main options for converting types in pandas:
Read on for more detailed explanations and usage of each of these methods.
1. to_numeric()
This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
Basic usage
The input to to_numeric() is a Series or a single column of a DataFrame.
As you can see, a new Series is returned. Remember to assign this output to a variable or column name to continue using it:
You can also use it to convert multiple columns of a DataFrame via the apply() method:
As long as your values can all be converted, that’s probably all you need.
Error handling
But what if some values can’t be converted to a numeric type?
Here’s an example using a Series of strings s which has the object dtype:
The default behaviour is to raise if it can’t convert a value. In this case, it can’t cope with the string ‘pandas’:
Rather than fail, we might want ‘pandas’ to be considered a missing/bad numeric value. We can coerce invalid values to NaN as follows using the errors keyword argument:
The third option for errors is just to ignore the operation if an invalid value is encountered:
This last option is particularly useful for converting your entire DataFrame, but don’t know which of our columns can be converted reliably to a numeric type. In that case, just write:
The function will be applied to each column of the DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.
Downcasting
By default, conversion with to_numeric() will give you either an int64 or float64 dtype (or whatever integer width is native to your platform).
Downcasting to ‘integer’ uses the smallest possible integer that can hold the values:
Downcasting to ‘float’ similarly picks a smaller than normal floating type:
2. astype()
The astype() method enables you to be explicit about the dtype you want your DataFrame or Series to have. It’s very versatile in that you can try and go from one type to any other.
Basic usage
Just pick a type: you can use a NumPy dtype (e.g. np.int16 ), some Python types (e.g. bool), or pandas-specific types (like the categorical dtype).
Call the method on the object you want to convert and astype() will try and convert it for you:
Be careful
astype() is powerful, but it will sometimes convert values «incorrectly». For example:
These are small integers, so how about converting to an unsigned 8-bit type to save memory?
Trying to downcast using pd.to_numeric(s, downcast=’unsigned’) instead could help prevent this error.
3. infer_objects()
Version 0.21.0 of pandas introduced the method infer_objects() for converting columns of a DataFrame that have an object datatype to a more specific type (soft conversions).
For example, here’s a DataFrame with two columns of object type. One holds actual integers and the other holds strings representing integers:
Column ‘b’ has been left alone since its values were strings, not integers. If you wanted to force both columns to an integer type, you could use df.astype(int) instead.
4. convert_dtypes()
Version 1.0 and above includes a method convert_dtypes() to convert Series and DataFrame columns to the best possible dtype that supports the pd.NA missing value.
Since column ‘a’ held integer values, it was converted to the Int64 type (which is capable of holding missing values, unlike int64 ).
Column ‘b’ contained string objects, so was changed to pandas’ string dtype.
By default, this method will infer the type from object values in each column. We can change this by passing infer_objects=False :
Now column ‘a’ remained an object column: pandas knows it can be described as an ‘integer’ column (internally it ran infer_dtype ) but didn’t infer exactly what dtype of integer it should have so did not convert it. Column ‘b’ was again converted to ‘string’ dtype as it was recognised as holding ‘string’ values.
How to Change Column Type In Pandas Dataframe- Definitive Guide
Pandas Dataframe is a powerful two-dimensional data structure that can be used to store and manipulate data for your Data analysis tasks.
You can change the column type in pandas dataframe using the df.astype() method.
Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.
In this tutorial, you’ll learn how to change the column type of the pandas dataframe using
If You’re in Hurry…
You can use the below code snippet to change the column type of the pandas dataframe using the astype() method.
This is how you can convert data types of columns in the dataframe.
If You Want to Understand Details, Read on…
In this detailed tutorial, you’ll learn how to change column type in pandas dataframe using different methods provided by the pandas themselves.
Also, the examples to perform different types of conversion.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
Snippet
Datatypes of Columns
Note: The String types are displayed as objects.
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
You have the sample dataframe created with different data types.
Next, you’ll see how different types of columns can be cast to another format.
Pandas Change Column Type To String
You can use it by using the astype() method and mentioning the str as target datatype.
In the sample dataframe, the column Unit_Price is float64. When the below line is executed, Unit_Price column will be converted to String format.
Snippet
The df.dtypes will print the types of the column.
Datatypes of Columns
Refer to this link to understand why String is displayed as an object.
You’ve learned how to cast a column type to String.
Next, you’ll see how to convert column type to int.
Pandas Change Column Type To Int
You can convert a column to int using the to_numeric() method or astype() method.
Let’s look at both methods in detail.
Using to_numeric()
to_numeric() method will convert a column to int or float based on the values available in the column.
Example: The Unit_Price column in the sample dataframe contains decimal numbers and the No_Of_Units column contains only numbers.
Datatypes after converting it using the to_numeric() method.
Datatypes of Columns
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
Now, you’ll see how to handle exceptions while using to_numeric() method.
Error Handling in to_numeric
Exception handling or error handling is one of the good programming practices. Any operation in a program is prone to errors.
You can use the additional optional parameter errors to specify how the errors should be handled.
errors=’raise’ will raise the error.
For example, the Available_Quantity column in the sample dataframe contains a String value Not Available in one of the cells. It cannot be converted to a number. In this case, the conversion will raise the error.
Snippet
An error will be raised as ValueError: Unable to parse string «Not Available» as follows.
Error Output
This is how you can raise the error and stop the conversion if there is any problem during conversion.
Next, you’ll see how to ignore the errors.
Ignoring the errors
For example, when you convert the Availability_Quantity column to int which has a String value, errors will occur.
When errors=’ignore’ is used, the conversion will be stopped silently without raising any errors. You’ll have the original dataframe intact.
Snippet
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
This is how you can ignore the errors while converting.
Coercing the Error
Coercing means, persuade (an unwilling person) to do something by using force. Similarly, in this context, you’ll force the to_numeric() method to convert the columns though it has some invalid values.
It’ll convert the possible cell values and ignore the invalid values.
Snippet
You could see the Available_Quantity column is converted to float64. The String values in the column are converted to NaN, which denotes
Not A Number.
You can see that in the below visualized dataframe.
Datatypes of Columns
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5.0 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10.0 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11.0 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15.0 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | NaN | 01/05/2021 |
This is how you can use the to_numeric() to convert the column to any of the number types.
Next, you’ll learn about the astype() method.
Using astype()
astype() method is used to convert columns to any type specified in the method parameter.
You can convert column to int by specifying int in the method parameter as shown below.
Snippet
Datatypes of Columns
Note : astype() converts into int32 whereas to_numeric() converts it into int64 by default.
Now, let’s see how to handle errors during astype() conversion.
Error Handling in astype()
As said before, errors are part of any programming. You need to specify how it needs to be handled when it occurs.
errors=’raise’ will raise the error.
For example, the Available_Quantity column in the sample dataframe contains a String value Not Available in one of the cells. It cannot be converted to a number. In this case, the conversion will raise the error.
Snippet
Error will be raised as below.
Error Output
You’ve raised the error during conversion.
Next, you’ll see how to ignore the errors.
Ignoring the errors
For example, when you convert the Availability_Quantity column to int which has a String value, errors will occur.
When errors=’ignore’ is used, the conversion will be stopped silently without raising any errors. you’ll have the original dataframe intact.
Datatypes of Columns
You could see that the Availability_Quantity column is still the type object which means it is not converted but no other errors were raised as well.
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
This is how you can ignore the errors during conversion.
Note:
astype() doesn’t coerce and performs the conversion on the applicable value. It either converts or ignores and returns the original values. Hence, you’ll not be able to use errors=’coerce’ with the astype() method.
You’ve learned how to cast column type to int.
Pandas Change Column Type From Object to Int64
You can do it by using the to_numeric() method as shown below. It automatically converts numbers to int64 by default.
Snippet
Datatypes of Columns
If you just specify int in astype, it converts the column to int32.
Snippet
Datatypes of Columns
You can use np.int64 in type to convert column to int64.
Snippet
Datatypes of Columns
Pandas Change Column Type From Int To String
In this section, you’ll learn how to change column type from Int to String.
You can use the astype() method to convert an int column to a String.
In the sample dataframe, the column No_Of_Units is of number type. Now you’ll convert it to string.
Snippet
Datatypes of Columns
Note: Refer to this link to understand why String is displayed as an object.
This is how you can cast the int column to String or Object.
Next, you’ll see how to convert column type to float.
Pandas Change Column Type To Float
In this section, you’ll learn how to change column type to float.
You can use the astype() method to convert a column to float.
Datatypes of Columns
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
Now, let’s try to convert the column Available_Quantity to float. which has the non-numeric characters in one of the cells. The non-numeric value is Not Available.
Note that, you’re using errors=’coerce’ which will force the conversion of the possible values.
Datatypes of Columns
The column is converted to float64 without any problems. The non-numeric characters are converted to NaN which means Not A Number.
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5.0 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10.0 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11.0 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15.0 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | NaN | 01/05/2021 |
Next, you’ll learn how to cast column type to Datetime.
Pandas Change Column Type To Datetime64
You can use the method to_datetime() to convert a string to DateTime.
In the sample dataframe, the column Available_Since_Date has the date value as a String type.
You’ll convert the column type to datetime using the below snippet.
Snippet
Datatypes of Columns
to_datetime() also supports error handling where,
This is how you can convert column type to DateTime.
Next, you’ll see how to convert multiple columns to int.
Pandas Convert Multiple Columns to Int
In this section, you’ll learn how to convert multiple columns to int using the astype() method.
For example, We’ve shown only one column as the sample dataframe has only one numbers column.
Datatypes of Columns
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
Next, let’s convert multiple columns using the to_numeric() method.
You’ve to use the apply method to apply the function to_numeric() to the specified columns as shown below.
For example, We’ve shown only one column as the sample dataframe has only one numbers column.
Datatypes of Columns
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
This is how you can convert multiple column types to another format.
Next, you’ll see how to cast all columns to another type.
Pandas Convert All Columns
In this section, you’ll learn how to change the column type of all columns in a dataframe. For example, Converting All Object Columns To String.
You can use the astype() method also for converting all columns.
Then you can pass this list to the dataframe and invoke the astype() method, pass the target datatype to the astype() method.
For example, str to convert all columns to string.
Snippet
Datatypes of Columns
You can see that all the columns of the dataframe are converted to String and it is displayed as an object.
Refer to this link to understand why String is displayed as an object.
Printing the dataframe
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10 | 11 | 08/21/2021 |
3 | CPU | 10000.0 | 20 | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8 | Not Available | 01/05/2021 |
Conclusion
To summarize, you’ve learned how to change column type in pandas dataframe.
You’ve used the methods to_numeric() and astype() to change the column types and how to use these methods for performing various type conversions along with the exception handling.
How To Change Column Type in Pandas DataFrames
Exploring 3 different options for changing dtypes of columns in pandas
Introduction
One of the most common actions one needs to undertake when working with pandas DataFrames is data type (or dtype ) casting. In today’s article, we are going to explore 3distinct ways of changing the type of columns in pandas. These methods are
Before start discussing the various options you can use to change the type of certain column(s), let’s first create a dummy DataFrame that we’ll use as an example throughout the article.
Using astype()
Let’s suppose we want to convert column A (which is currently a string of type object ) into a column holding integers. To do so, we simply need to call astype on the pandas DataFrame object and explicitly define the dtype we wish to cast the column.
You can even cast multiple columns in one go. For example,
Additionally, you can even instruct astype() how to behave in case it observes invalid data for the provided dtype. This can be achieved by passing the corresponding errors argument. You can choose to ‘raise’ Exceptions on invalid data or ‘ignore’ to suppress exceptions.
For instance, let’s suppose we have a column with mixed dtypes as the Series shown below:
As of version 1.3.0, astype for timezone-naive type to timezone-aware dtype conversion has been deprecated. You should now be using Series.dt.tz_localize() instead.
Using to_numeric()
pandas.to_numeric is used to convert columns with non-numeric dtypes to the most suitable numeric time. For example, in order to to cast column A into int all you need to run is
Now if you want to to convert multiple columns into numeric then you have to make use of apply() method as shown below:
You may also want to take a look at to_datetime() and to_timedelta() methods should you wish to cast a column to datetime or timedelta respectively.
Using convert_dtypes()
convert_dtypes() method is included as of pandas version 1.0.0 and is used to convert columns to the best possible dtypes using dtypes supporting pd.NA (missing values). This means that the dtype will be determined at runtime, based on the values included in the specified column(s).
Final Thoughts
In today’s article we explored numerous options one has in pandas and can use to cast the data type of specific columns(s) of a DataFrame. We discussed how to use astype() in order to explicitly specify the dtypes of columns. Additionally, we explored how to use to_numeric() method so that columns get converted into numerical types. Finally, we showcased how to use convert_dtypes() method that will figure out the most suitable dtypes of the columns based on the values included in each of the columns.
Change Column Data Type in Pandas
Working with data is rarely straightforward.
Mostly one needs to perform various transformations on the imported dataset, to make it easy to analyze.
In all of my projects, pandas never detect the correct data type for all the columns of the imported dataset. But at the same time, Pandas offer a range of methods to easily convert the column data types.
Here, you will get all the methods for changing the data type of one or more columns in Pandas and certainly the comparison amongst them.
Throughout the read, the resources are indicated with 📚, the shortcuts are indicated ⚡️ with and the takeaways are denoted by 📌. Don’t forget to check out an interesting 💡 project idea at the end of this read.
You can quickly follow along with this Notebook 📌.
To make it easier to understand for you, Let’s create a simple DataFrame.
Using this example, it will be much easier to understand — how to change the data type of columns in Pandas.
pandas.DataFrame.astype()
Pandas have the solution. The 2nd optional argument in this method ì.e. errors gives you the freedom to deal with the errors. This option defaults to raise, meaning, raise the errors and do not return any output. Simply, assign ‘ignore’ to this argument to ignore the errors and return the original value.
❓ Want to change the data type of all the columns in one go ❓
⚡️ Just pass the dictionary of column name & data type pairs to this method and the problem is solved.
pandas.to_DataType()
Well well, there is no such method called pandas.to_DataType(), however, if the word DataType is replaced by the desired data type, you can get the below 2 methods.
pandas.to_numeric()
This method is used to convert the data type of the column to the numerical one. As a result, the float64 or int64 will be returned as the new data type of the column based on the values in the column.
pandas.to_datetime()
Here the column gets converted to the DateTime data type. This method accepts 10 optional arguments to help you to decide how to parse the dates.
❓ Need to change the data types of multiple columns at a time ❓
Similar to pandas.DataFrame.astype() the method pandas.to_numeric() also gives you the flexibility to deal with the errors.
pandas.DataFrame.convert_dtypes()
This method will automatically detect the best suitable data type for the given column. By default, all the columns with Dtypes as object will be converted to strings.
As per my observation, this method offers poor control over the data type conversion
In this quick read, I demonstrated how the data type of single or multiple columns can be changed quickly. I frequently use the method pandas.DataFrame.astype() as it provides better control over the different data types and has minimum optional arguments. Certainly, based on analysis requirements, different methods can be used, such as converting the data type to datetime64(ns) the method pandas.to_datetime() is much straightforward.
How to Read all the Medium articles?
Become a Medium member today & get ⚡ unlimited ⚡ access to all the Medium stories.
When you sign-up here and choose to become a paid Medium member, I will get a portion of your membership fee as a reward.
💡 Project Idea!!
It can be a good idea to start with a new dataset, assess and clean it by practicing Data Wrangling techniques and store it in a SQL Database to finally visualize it in Power BI.
Additionally, this project idea can be implemented with the resources given in it. As I always say, I am open to constructive feedback and knowledge sharing through LinkedIn.