A bit more than 6 years ago, I posted Fun With DataSets, in which I wrote about the Proposed and Current versions of a DataRow. A code snippet in the post was iterating through each DataSet’s collection of DataTables and each DataTable’s collection of DataRows. I was using for loops. Years later, I have 30 comments on that post, but in the most recent thread of comments, we debated the merits of using for loops or foreach loops … which methodology is faster?
I had been under the impression, since way back in 2002 and the .NET 1.1 days, that the for was faster. It may have been true for 1.1 and I, unfortunately, held on to that belief all these years!! But my recent tests prove that wrong for all subsequent versions of .NET … foreach is much faster, roughly 7 to 8 times faster! I was curious if it was always so, and I changed the TargetFramework in my test application to be able to test all the way back to the 2.0 Framework (couldn’t get back as far as 1.1). And yes, it was the same with the 2.0 Framework … foreach wins each time!
Here is the code I used for testing this:
// First, I retreived a few thousand rows out of a database.
// Then merged them together in a loop about 7 times,
// resulting in a DataSet called dsGlobal, containing about 2 million rows
this.GetALotOfData();
// Then, in a button click:
private void button1_Click(object sender, EventArgs e)
{
// Be sure to add a using System.Diagnostics;
Stopwatch oWatch = new Stopwatch();
string msg = "";
if (this.Toggle)
{
// Time the for loop
this.Toggle = false;
oWatch.Start();
msg = this.WithFor();
oWatch.Stop();
}
else
{
// Time the foreach loop
this.Toggle = true;
oWatch.Start();
msg = this.WithForEach();
oWatch.Stop();
}
this.txtResults.Text += string.Format("{0} in {1} milliseconds\r\n", msg, oWatch.ElapsedMilliseconds;
}
private string WithFor()
{
// In my original blog post that I referenced above, I looped like this:
for (int nTable = 0; nTable < this.dsGlobal.Tables.Count; nTable++)
{
for (int nRow = 0; nRow < this.dsGlobal.Tables[nTable].Rows.Count; nRow++)
{
if (this.dsGlobal.Tables[nTable].Rows[nRow].HasVersion(DataRowVersion.Proposed))
{
this.dsGlobal.Tables[nTable].Rows[nRow].EndEdit();
}
}
}
return "For";
}
private string WithForEach()
{
foreach (DataTable dt in this.dsGlobal.Tables)
foreach (DataRow row in dt.Rows)
{
if (row.HasVersion(DataRowVersion.Proposed))
row.EndEdit();
}
return "Foreach";
}
Each iteration I did was almost identical, time-wise, with for averaging right around 800 milliseconds and foreach averaging around 100 milliseconds, for almost 2 million rows (the DataSet contained only one DataTable). Here are the results:
Time using 1,943,936 Rows
For in 832 milliseconds
Foreach in 101 milliseconds
For in 768 milliseconds
Foreach in 102 milliseconds
For in 772 milliseconds
Foreach in 102 milliseconds
For in 858 milliseconds
Foreach in 103 milliseconds
For in 757 milliseconds
Foreach in 132 milliseconds
For in 750 milliseconds
Foreach in 102 milliseconds
Now, in the comment thread from that first post, my commenter was concerned about the way I was doing the for loop, thinking that there might be some overhead in having to check this.dsGlobal.Tables[nTable].Rows[nRow] everytime through the loop. So I added another method to modify the for loop code, and tested it again:
private string WithForRowsCollection()
{
DataRowCollection rows;
for (int nTable = 0; nTable < this.dsGlobal.Tables.Count; nTable++)
{
rows = this.dsGlobal.Tables[nTable].Rows;
for (int nRow = 0; nRow < rows.Count; nRow++)
{
if (rows[nRow].HasVersion(DataRowVersion.Proposed))
{
rows[nRow].EndEdit();
}
}
}
return "For (rows)";
}
It didn't really make much difference. My original for loop tended toward 8 times slower, and the test with the modified for loop tends towards 7 times slower. Both sets of for loops (the original and the one using a row collection variable to iterate through), averaged between 700 - 800 milliseconds, with the first in the high 700's and the second in the low 700's.
Time using 1,943,936 Rows
For (rows) in 687 milliseconds
Foreach in 102 milliseconds
For (rows) in 670 milliseconds
Foreach in 112 milliseconds
For (rows) in 775 milliseconds
Foreach in 131 milliseconds
For (rows) in 675 milliseconds
Foreach in 102 milliseconds
For (rows) in 757 milliseconds
Foreach in 140 milliseconds
For (rows) in 763 milliseconds
Foreach in 103 milliseconds
Now, granted, we’re only talking about 700 to 800 hundred milliseconds here … but in any time-critical application, this could make a huge difference. And I’m not really talking about applications with User Interfaces … it’s not anything a user would notice at all. But, any kind of service application that runs standalone or in collaboration with other service applications (such as Windows Services types of applications that run server-side), it could be very important!
And now, I think I’ll go update my old Fun With DataSets post to include a link here and to post the new code!
Happy coding! =0)
This comment has been removed by the author.
ReplyDeleteHi Bonnie,
ReplyDeleteI believe the foreach is faster because retrieving the next value using a enumerator is faster, compared to using an indexer. The for loop may also be using more local variable space.
That said, I'd say your 'for loop' version is unnecessarily accessing values through multiple properties.
While foreach will still be faster, eliminating the 'dotting' should reduce the gap substantially.
string WithForImproved()
{
var tables = dsGlobal.Tables;
int tableCount = tables.Count;
for (int tableIndex = 0; tableIndex < tableCount; tableIndex++)
{
DataTable table = tables[tableIndex];
int rowCount = table.Rows.Count;
for (int rowIndex = 0; rowIndex < rowCount; rowIndex++)
{
DataRow row = table.Rows[rowIndex];
if (row.HasVersion(DataRowVersion.Proposed))
{
row.EndEdit();
}
}
}
return "For Improved";
}
By the way, there's a good post over at http://www.dotnetperls.com/for-foreach , which discusses the performance and memory implications of both approaches.
Cheers,
Daniel
Thanks for your reply, Daniel. That was also pointed out to me by someone else and I had already re-tested using code similar to what you posted. It did not amount to any significant difference at all. The real culprit is the implementation of DataSet/DataTable and/or the DataRowCollection (or maybe just Collections in general...?).
DeleteAt any rate, I decided to test this hypothesis by adding the data in the DataTable to a List instead. Wow, what a difference! Still the same number of Rows (just under 2 million), but only taking roughly 30 (for) to 50 (foreach) milliseconds!!!
Very nice example Bonnie, thanks !!!
ReplyDeleteUsing this example, how I can see if there has been a shift in rows for each table in dataset ?
Is this Ok:
private void dgw_DoubleClick(object sender, EventArgs e)
{
if (IsChanged() == true)
{
if (AskToSave() == 2)
return;
}
//invite other forms...
}
private int AskToSave()
{
int lnanswer = 2;
switch (MessageBox.Show("Do you want to save...?", "warning", MessageBoxButtons.YesNoCancel, MessageBoxIcon.Question))
{
case DialogResult.Yes:
OnSave();
lnanswer = 6;
break;
case DialogResult.No:
OnRevert();
lnanswer = 7;
break;
case DialogResult.Cancel:
break;
}
return lnanswer;
}
private bool IsChanged()
{
return ds.HasChanges();
}
I don't know what you mean by this:
Delete"Using this example, how I can see if there has been a shift in rows for each table in dataset ?"
A "shift in rows"? Do you mean a change to the data in a row? Did you read my "Fun With DataSets" post that I linked to above? Put that updated bit of code at the end of that post into your IsChanged() method, like so:
private bool IsChanged()
{
if (ds == null)
return;
DataRowCollection rows;
for (int nTable = 0; nTable < ds.Tables.Count; nTable++)
{
rows = ds.Tables[nTable].Rows;
for (int nRow = 0; nRow < rows.Count; nRow++)
{
if (rows[nRow].HasVersion(DataRowVersion.Proposed))
{
rows[nRow].EndEdit();
}
}
}
return ds.HasChanges();
}