How to iterate over dataframe rows
How to iterate over dataframe rows
Pandas – Iterate over Rows of a Dataframe
Pandas dataframes are very useful for accessing and manipulating tabular data in Python. It can be handy to know how to iterate over the rows of a Pandas dataframe. In this tutorial, we’ll look at some of the different methods using which we can iterate or loop over the individual rows of a dataframe in pandas.
How to iterate through the rows of a dataframe?
In Pandas, the iterrows() function is generally used to iterate over the rows of a dataframe as (index, Series) tuple pairs. You can also use the itertuples() function which iterates over the rows as named tuples.
Let’s look at some examples of how to iterate over a dataframe’s rows.
First, let’s create a sample dataframe which we’ll be using throughout this tutorial. You can follow along by using the code in this tutorial and implementing it in the environment of your choice.
Using Pandas iterrows() to iterate over rows
The Pandas iterrows() function is used to iterate over dataframe rows as (index, Series) tuple pairs. Using it we can access the index and content of each row. The content of a row is represented as a Pandas Series.
Since iterrows returns an iterator we use the next() function to get an individual row. We can see below that it is returned as an (index, Series) tuple.
Iterating over all rows using iterrows()
Generally, iterrows() is used along with for to loop through the rows. The contents of a row are returned as a Series and hence can be accessed by their column name as shown below –
The Pandas documentation mentions that “You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.” See the example below –
Using Pandas itertuples() to iterate over rows
The Pandas itertuples() function is used to iterate over dataframe rows as named tuples.
You can also remove the index and give custom name to the rows returned by itertuples()
Like dictionaries, named tuples contain keys that are mapped to some values. There are a number of ways you can access the values of a named tuple. See the example below –
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
You might also be interested in –
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
Pandas Iterate Over Rows with Examples
Table of Contents
1. Using DataFrame.iterrows() to Iterate Over Rows
First, let’s create a DataFrame.
Yields below output.
Let’s see what a row looks like by printing it.
Yields below output.
Note: Pandas document states that “You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.”
2. Using DataFrame.itertuples() to Iterate Over Rows
Pandas DataFrame.itertuples() is the most used method to iterate over rows as it returns all DataFrame elements as an iterator that contains a tuple for each row. itertuples() is faster compared with iterrows() and preserves data type.
Yields below output.
Let’s provide the custom name to the tuple.
Yields below output.
4. DataFrame.apply() to Iterate
You can also use apply() method of the DataFrame to loop through the rows by using the lambda function. For more details, refer to DataFrame.apply().
Yields below output.
5. Iterating using for & DataFrame.index
Yields below output.
6. Using for & DataFrame.loc
Yields same output as above.
7. Using For & DataFrame.iloc
Yields below output.
8. Using DataFrame.items() to Iterate Over Columns
DataFrame.items() are used to iterate over columns (column by column) of pandas DataFrame. This returns a tuple (column name, Series) with the name and the content as Series.
The first value in the returned tuple contains the column label name and the second contains the content/data of DataFrame as a series.
Yields below output.
9. Performance of Iterating DataFrame
Iterating a DataFrame is not advised or recommended to use as the performance would be very bad when iterating over the large dataset. Make sure you use this only when you exhausted all other options. Before using examples mentioned in this article, check if you can use any of these 1) Vectorization, 2) Cython routines, 3) List Comprehensions (vanilla for loop).
10. Complete Example of pandas Iterate over Rows
Conclusion
DataFrame provides several methods to iterate over rows (loop over row by row) and access columns/cells. But it is not recommended to manually loop over the rows as it degrades the performance of the application when used on large datasets. Each example explained in this article behaves differently so depending on your use-case use the one that suits your need.
How to Iterate Over Rows in a Pandas DataFrame
Introduction
If you’re new to Pandas, you can read our beginner’s tutorial. Once you’re familiar, let’s look at the three main ways to iterate over DataFrame:
Iterating DataFrames with items()
Let’s set up a DataFrame with some data of fictional people:
Note that we are using id’s as our DataFrame ‘s index. Let’s take a look at how the DataFrame looks like:
This results in:
We’ve successfully iterated over all rows in each column. Notice that the index column stays the same over the iteration, as this is the associated index for the values. If you don’t define an index, then Pandas will enumerate the index column accordingly.
We can also print a particular row with passing index number to the data as we do with Python lists:
Note that list index are zero-indexed, so data[1] would refer to the second row. You will see this output:
The output would be the same as before:
Iterating DataFrames with iterrows()
While df.items() iterates over the rows in column-wise, doing a cycle for each column, we can use iterrows() to get the entire row-data of an index.
Let’s try iterating over the rows with iterrows() :
In the for loop, i represents the index column (our DataFrame has indices from id001 to id006 ) and row contains the data for that index in all columns. Our output would look like this:
Free eBook: Git Essentials
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
They both produce this output:
Iterating DataFrames with itertuples()
The itertuples() function will also return a generator, which generates row values in tuples. Let’s try this out:
You’ll see this in your Python shell:
We can choose not to display index column by setting the index parameter to False :
Our tuples will no longer have the index displayed:
Now our output would be:
Iteration Performance with Pandas
The official Pandas documentation warns that iteration is a slow process. If you’re iterating over a DataFrame to modify the data, vectorization would be a quicker alternative. Also, it’s discouraged to modify data while iterating over rows as Pandas sometimes returns a copy of the data in the row and not its reference, which means that not all data will actually be changed.
For small datasets you can use the to_string() method to display all the data. For larger datasets that have many columns and rows, you can use head(n) or tail(n) methods to print out the first n rows of your DataFrame (the default value for n is 5).
Speed Comparison
To measure the speed of each particular method, we wrapped them into functions that would execute them for 1000 times and return the average time of execution.
To test these methods, we will use both of the print() and list.append() functions to provide better comparison data and to cover common use cases. In order to decide a fair winner, we will iterate over DataFrame and use only 1 value to print or append per loop.
Here’s how the return values look like for each method:
For example, while items() would cycle column by column:
iterrows() would provide all column data for a particular row:
And finally, a single row for the itertuples() would look like this:
Here are the average results in seconds:
Method | Speed (s) | Test Function |
items() | 1.349279541666571 | print() |
iterrows() | 3.4104003086661883 | print() |
itertuples() | 0.41232967500279 | print() |
Method | Speed (s) | Test Function |
items() | 0.006637570998767235 | append() |
iterrows() | 0.5749766406661365 | append() |
itertuples() | 0.3058610513350383 | append() |
Please note that these test results highly depend on other factors like OS, environment, computational resources, etc. The size of your data will also have an impact on your results.
Conclusion
Iterate over Rows of DataFrame in Pandas
This article will discuss six different techniques to iterate over a dataframe row by row. Then we will also discuss how to update the contents of a Dataframe while iterating over it row by row.
Table of Contents
Suppose we have a dataframe i.e
Contents of the created dataframe are,
Let’s see different ways to iterate over the rows of this dataframe,
Loop over Rows of Pandas Dataframe using iterrows()
Dataframe class provides a member function iterrows() i.e.
DataFrame.iterrows()
It yields an iterator which can can be used to iterate over all the rows of a dataframe in tuples. For each row it returns a tuple containing the index label and row contents as series.
Let’s iterate over all the rows of above created dataframe using iterrows() i.e.
Important points about Dataframe.iterrows()
Loop over Rows of Pandas Dataframe using itertuples()
Dataframe class provides a member function itertuples() i.e.
DataFrame.itertuples()
For each row it yields a named tuple containing the all the column names and their value for that row. Let’s use it to iterate over all the rows of above created dataframe i.e.
For every row in the dataframe a named tuple is returned. From named tuple you can access the individual values by indexing i.e.
To access the 1st value i.e. value with tag ‘index’ use,
To access the 2nd value i.e. value with tag ‘Name’ use
Named Tuples without index
If we don’t want index column to be included in these named tuple then we can pass argument index=False i.e.
Named Tuples with custom names
By default named tuple returned is with name Pandas, we can provide our custom names too by providing name argument i.e.
Pandas – Iterate over Rows as dictionary
We can also iterate over the rows of dataframe and convert them to dictionary for accessing by column label using same itertuples() i.e.
Iterate over Rows of Pandas Dataframe using index position and iloc
We can calculate the number of rows in a dataframe. Then loop through 0th index to last row and access each row by index position using iloc[] i.e.
Iterate over rows in Dataframe in reverse using index position and iloc
Get the number of rows in a dataframe. Then loop through last index to 0th index and access each row by index position using iloc[] i.e.
Iterate over rows in dataframe using index labels and loc[]
As Dataframe.index returns a sequence of index labels, so we can iterate over those labels and access each row by index label i.e.
Pandas : Iterate over rows and update
What if we want to change values while iterating over the rows of a Pandas Dataframe?
As Dataframe.iterrows() returns a copy of the dataframe contents in tuple, so updating it will have no effect on actual dataframe. So, to update the contents of dataframe we need to iterate over the rows of dataframe using iterrows() and then access each row using at() to update it’s contents.
Let’s see an example,
Suppose we have a dataframe i.e
Contents of the created dataframe df are,
Let’s update each value in column ‘Bonus’ by multiplying it with 2 while iterating over the dataframe row by row i.e.
Dataframe got updated i.e. we changed the values while iterating over the rows of Dataframe. Bonus value for each row became double.
The complete example is as follows,
Output:
Summary
We learned about different ways to iterate over all rows of dataframe and change values while iterating.
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.
How to Iterate Over Rows in a Pandas DataFrame
Discussing how to iterate over rows in pandas and why it’s better to avoid it (if possible)
Introduction
Iterating over pandas DataFrames is definitely not a best practise and you should only consider doing so only when this is absolutely necessary and when you have exhausted every other possible option that is likely to be more elegant and efficient.
Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided
In today’s article, we will discuss how to avoid iterating through DataFrames in pandas. We’ll also go through a “checklist” that you may need to reference every time before choosing to go with an iterative approach. Additionally, we will explore how to do so in cases where no other option is suitable to your specific use-case. Lastly, we will discuss why you should avoid modifying pandas object while iterating over them.
Do you really need to iterate over rows?
As highlighted in the official pandas documentation, the iteration through DataFrames is very inefficient and it can usually be avoided. Usually, pandas newcomers are not familiar with the concept of vectorisation and are unaware that most operations in pandas should (and can) be performed in a non-iterative context.
Before attempting to iterate through pandas objects, you must first ensure that none of the options below suit the needs of your use-case:
Iterating over the rows of a DataFrame
In case none of the above options will work for you, then you may still want to iterate through pandas objects. You can do so using either iterrows() or itertuples() built-in methods.
Before seeing both methods in action, let’s create an example DataFrame that we’ll use to iterate over.
For more details regarding Named Tuples in Python, you can read the article below.
Источники информации:
- http://sparkbyexamples.com/pandas/iterate-over-rows-in-pandas-dataframe/
- http://stackabuse.com/how-to-iterate-over-rows-in-a-pandas-dataframe/
- http://thispointer.com/pandas-6-different-ways-to-iterate-over-rows-in-a-dataframe-update-while-iterating-row-by-row/
- http://towardsdatascience.com/how-to-iterate-over-rows-in-a-pandas-dataframe-6aa173fc6c84