Sunday, February 26, 2017

It's All About The Data

Every application has to deal with *some* kind of data. Where it's stored externally and how it's used in the application can vary widely ... but in this blog post, I will deal with SQL Server database storage and reading the data into either a DataSet, DataTable or a List of objects. I will *not* talk about Entity Framework nor LINQ-to-SQL, mainly because I pretty much only use DataSets.

I have seen many posts on Forums, blogs and elsewhere, recommending the use of the Load() method of a DataSet or DataTable. The Load() method takes an IDataReader parameter (so, in this case, I'd use a SqlDataReader). Supposedly, according to the many times that I've seen this recommended, it is supposed to be the fastest way to do this.

Personally, I have always used a SqlDataAdapter and its Fill() method to put the result set(s) returned from a SQL call into a DataTable or a DataSet. And I always thought that this was the fastest (although I had never tested that).

A couple of days ago, I saw this Load() method recommended again, several times ... and again I wondered about the performance of the Load() vs the Fill(), so I decided that it's about time to run some performance tests to put the question to rest, once and for all. And, while I was at it, I also decided to throw in a test putting data into a List<T> in addition to the DataSet/DataTable tests (not for my benefit, since I seldom use List<T> in this manner; but for you, Dear Reader).

So, first, here is the benchmarking code:

private void TestLoadvsFillvsList()
{
DataTable dtGlobal;
DataSet dsGlobal;
decimal LoadMilli;
decimal FillMilli;
decimal ListMilli;

Stopwatch oWatch = new Stopwatch();

// The DataTable.Load(IDataReader) method:
using (SqlConnection conn = newSqlConnection(this.ConnectionString))
{
SqlCommand sc = new SqlCommand("select * from Logdata", conn);
conn.Open();
this.dtGlobal = new DataTable();

oWatch.Start();
SqlDataReader dr = sc.ExecuteReader();
this.dtGlobal.Load(dr);
oWatch.Stop();

// If you have multiple SELECTs in your SqlCommand, you can put the multiple result sets
// into a DataSet with syntax similar to the following examples:
//this.dsGlobal.Load(dr, LoadOption.OverwriteChanges, dtLogData, dtMessage);
//this.dsGlobal.Load(dr, LoadOption.OverwriteChanges, "Table1", "Table2");
}
LoadMilli = oWatch.ElapsedMilliseconds;
oWatch.Reset();
Console.WriteLine("Load Time {0} milliseconds, Row Count, {1}", LoadMilli, this.dtGlobal.Rows.Count);

// The DataAdapter.Fill(DataSet/DataTable) method:
using (SqlConnection conn = new SqlConnection(this.ConnectionString))
{
SqlCommand sc = new SqlCommand("select * from Logdata", conn);
conn.Open();
this.dsGlobal = new DataSet();

oWatch.Start();
SqlDataAdapter da = new SqlDataAdapter(sc);
da.Fill(this.dsGlobal); // could also Fill(this.dtGlobal)
oWatch.Stop();
}
FillMilli = oWatch.ElapsedMilliseconds;
oWatch.Reset();
Console.WriteLine("Fill Time {0} milliseconds, Row Count, {1}", FillMilli, this.dsGlobal.Tables[0].Rows.Count);

// The while (dr.Read()) method:
List<LogData> logList = new List<LogData>();
using (SqlConnection conn = new SqlConnection(this.ConnectionString))
{
SqlCommand sc = new SqlCommand("select * from Logdata", conn);
conn.Open();

LogData oLog; ;
oWatch.Start();
SqlDataReader dr = sc.ExecuteReader();
while (dr.Read())
{
oLog = new LogData();
oLog.logdatakey = (long)dr["logdatakey"];
oLog.logdatetime = (DateTime)dr["logdatetime"];
oLog.message = dr["message"].ToString();
oLog.category = dr["category"].ToString();
logList.Add(oLog);
}
oWatch.Stop();
}
ListMilli = oWatch.ElapsedMilliseconds;
oWatch.Reset();
Console.WriteLine("List Time {0} milliseconds, Row Count, {1}", ListMilli, logList.Count);
Console.WriteLine("------------------------------------------------");

if (LoadMilli > FillMilli)
Console.WriteLine("Fill is {0:0.00} faster than Load", LoadMilli / FillMilli);
else
Console.WriteLine("Load is {0:0.00} faster than Fill", FillMilli / LoadMilli);

Console.WriteLine("A List of objects is faster than either one!", FillMilli / LoadMilli);
Console.WriteLine("List is {0:0.00} times faster than Load", LoadMilli / ListMilli);
Console.WriteLine("List is {0:0.00} times faster than Fill", FillMilli / ListMilli);
Console.WriteLine("------------------------------------------------");
}
public class LogData
{
public long logdatakey { get; set; }
public DateTime logdatetime { get; set; }
public string message { get; set; }
public string category { get; set; }
}

OK, so now notice in the last set of Console.WriteLine() statements above where I state that using a List<T> is *always* faster than using a DataSet/DataTable (and by a lot, as I'll get to in a minute). What this tells me is that if you have no use for DataSets at all, then you'll do just fine using a SqlDataReader to add your data to a List<T>, and then use your List elsewhere in your processing.

I used a sample size of 425,376 rows in the database table. The average time for the 3 methods were approximately as follows, in milliseconds:

Load 4040
Fill 3030
List 2020

So, doing the math:
The Fill about 1.33 times faster than the Load.
The List is about 2 times faster than the Load.
The List is about 1.5 times faster than the Fill.

I feel vindicated! I can now safely reply to these forum posts that, indeed, the .Fill() is significantly faster than the .Load()!!

Happy coding!! :0)