Geek Goddess: LINQ With DataSets

Wednesday, April 06, 2011

LINQ With DataSets

Today's blog post is going to be about using LINQ with DataSets. As anyone who reads my blog regularly (or various forum posts) might already know, I've been a big fan of Typed DataSets since the very beginning of .NET (all the way back to the pretty buggy 1.0 version). However, LINQ is fairly new to me. LINQ *itself* is not new to me ... it has been around for a few years now and I've certainly been aware of it. But I haven't utilized it much because, until .NET 3.5, there was no real support for Typed DataSets and that's really all I wanted to be able to use if for. Sure, you could still use LINQ with both regular DataSets and Typed DataSets ... as long as you used .AsEnumerable() ... but you couldn't take advantage of the Typed nature of a Typed DataSet, so I didn't see any point in messing around with it.

I've recently begun using it more with Typed DataSets because, surprisingly to me, for some things it seems to be faster. Since .NET 3.5, Typed DataSets have been getting auto-generated slightly different than they have in the past (and I'm not talking about the TableAdapter generation, which I despise ... see my blog post about avoiding that fiasco: http://geek-goddess-bonnie.blogspot.com/2010/04/create-xsd.html).

Prior to 3.5, the DataTable definition in a generated Typed DataSet looked something like this:

public partial class PersonnelDataTable : global::System.Data.DataTable, global::System.Collections.IEnumerable
{
    // rest of the code here
}

The minor difference is this:

public partial class PersonnelDataTable : global::System.Data.TypedTableBase<PersonnelRow>
{
    // rest of the code here
}

TypedTableBase is derived from DataTable, so really nothing changes. But, it allows the DataTable to be used in LINQ queries without specifying .AsEnumerable(), allows us to use the Typed column name properties of the DataRows in our LINQ queries *and* it's also much faster.

As I already mentioned above, LINQ *can* easily be used with regular DataSets/DataTables (it's use is NOT limited to Typed DataSets), but I am not including examples of that in this blog post. As with the 2.0 Typed DataSets, all that is needed is to use .AsEnumerable() with the DataTable, so the code will be almost identical.

So, let's see a few examples.

First, let's look at a typical use for a DataSet, selecting some rows. Using LINQ with a Typed DataSet ends up being quite a bit faster than using the DataTable.Select() method. In all my tests, I used a DataTable containing 270,000 rows. The Select in this test, selected 30,000 of those rows. I also did comparisons between 2.0 DataTable and 3.5 TypedTableBase, just for the heck of it.

// ds20 is a 2.0 Typed DataSet, which uses plain a plain old DataTable
DataRow[] dsSelect20 = ds20.Personnel.Select("lastname = 'Berent'");
DataRow[] linqSelect = ds20.Personnel.AsEnumerable()
 .Where(row => row.Field<string>("lastname") == "Berent")
 .Select(row => row).ToArray();

The above are your only two choices when using untyped or 2.0 Typed DataSets. Benchmark timing results show that LINQ is about 6 times faster than the old DataTable.Select() method:

Select: 5,625,000 ticks
LINQ: 937,500 ticks

If you're using 3.5 Typed DataSets, you have a few more options with LINQ. Generally, your LINQ statement will look like this:

// ds35 is a 3.5 Typed DataSet, which uses the new TypedTableBase class for it's DataTable
DataRow[] linqSelectLastName = ds35.Personnel
 .Where(row => row.lastname == "Berent")
 .Select(row => row).ToArray();

Note the differences: you don't need Personnel.AsEnumerable() and you can use the typed row column names, row.lastname. There is one caveat though to the above syntax. If any row in the Personnel table contains DBNull.Value in the lastname column, the above syntax will throw an exception. You must also check for the null, so the actual statement will need to be this:

DataRow[] linqSelectLastName = ds35.Personnel
 .Where(row => row.IslastnameNull() == false && row.lastname == "Berent")
 .Select(row => row).ToArray();

You could also use the untyped syntax of the column and then you don't need to check for DBNull, but I like to take advantage of the typed nature of a Typed DataSet. That is the purpose of them, after all. =0) Also, using the untyped syntax is slightly slower, but probably not significantly so.

DataRow[] linqSelectUntypedLastName = ds35.Personnel
 .Where(row => row.Field<string>("lastname") == "Berent")
 .Select(row => row).ToArray();

Incidentally, my data *does* contain DBNull.Value in the lastname column, so that my benchmark timing tests are valid. The timing for the DataSet.Select() yields roughly the same results either way, but the LINQ is about 1.5 times faster with the TypedTableBase than with a regular DataTable, making LINQ about 9 times faster than a regular DataSet.Select()!! Here are the results using a 3.5 Typed DataSet:

Select: 5,937,500 ticks
LINQ: 625,000 ticks
LINQ (untyped syntax): 781,250 ticks

So, I think that I'll wind up this post for now. There are plenty of other uses of LINQ that I should compare with old DataSet/DataTable functionality, but I think I'll save that for another post. This one is long enough!

Until next time ... happy coding!

15 comments:

John GroveJuly 12, 2011 7:55 AM
That is very interesting Bonnie. I have noticed on some queries that LINQ will throw an exception when it has nulls. That is why if I think a table may have a null value, I will use the old DataView.

I seem to still prefer this:

//DataView
DataView dv = new DataView(tableOne, "OrderID <> 'XXXX'", "OrderTime desc", DataViewRowState.CurrentRows);

//DataView
DataTable newTable = new DataView(tableOne, "OrderID <> 'XXXX'",
"OrderTime desc", DataViewRowState.CurrentRows).ToTable();

To this:
IEnumerable ordered = tableOne.AsEnumerable()
.Where(i => i.Field("OrderID ") != 'XXXX')
.OrderByDescending(i => i.Field("OrderTime "));

DataTable newTable = tableOne.AsEnumerable()
.Where(i => i.Field("OrderID ") != 'XXXX')
.OrderByDescending(i => i.Field("OrderTime "))
.CopyToDataTable();
ReplyDelete
Replies
BonnieJuly 13, 2011 6:32 PM
Hi John! How ya been?

One *does* get used to the look of LINQ after awhile, but most of the times I still use older DataSet notation/syntax. It's definitely easier on the eyes. ;0)
ReplyDelete
Replies
DanielFebruary 19, 2012 2:57 AM
Hello Bonnie, could you tell me if I'm doing something wrong? I looks like LINQ ignores completely the presence of a primary key column:

I am comparing two very simple searches over a column declared as primary key

Dim row = (From uo In ds.UOS Where uo.UO_ID = ID).FirstOrDefault
Dim row2 = ds.UOS.FindByUO_ID(ID)

The table is filled with a sequential process very simple, where row1 will have key "ID01", row 2 will have "ID02" and so on.

Comparing the two methods when the table is filled with 1000 rows:
------------
Searching "ID01":
LINQ: 140 ticks
FindByKey: 103 ticks

Searching an inexistent element, "ID01x"
LINQ: 2150 ticks
FindByKey: 110 ticks

With 100.000 rows:
-------------
Searching "ID01:
LINQ: 211 ticks
FindByKey: 146 ticks

Searching an inexistent element, "ID01x":
LINQ: 199.252 ticks
FindByKey: 193 ticks
ReplyDelete
Replies
DanielFebruary 19, 2012 1:20 PM
Bonnie, thank you very much for your comment. I had read your other post and then I knew that LINQ is not always the better choice, as you said. It seems clear that .FindByKEY is optimal in that scenario, but I thought that LINQ will behave better, really.

The reason I was interested in LINQ was because it allows deferred query evaluation and permits to execute it several times with different sets. I have developing a framework and needed a way to abstract any query over a dataset in some way so that it could be executed again over another (and related) dataset, if necessary.

Probably it still could be useful in other scenarios, or even in this one if the size of the dataset is small, if this way I get that feature (assuming the LINQ overhead in this last case). Anyway, I will continue searching a better solution to my problem.

Thanks again
Daniel
ReplyDelete
Replies
DanielFebruary 20, 2012 10:47 AM
Hello
Yes, I could use .Find, I know, but I have been thinking on something like a writeonly property that could receive an IEnumerable(Of DataRow). The framework knows how to manage that query, applying it if necessary on another dataset (assuming it knows the variable/property that have been used to define the query).

Dim Rows= From r In ds.<..any query..> Select CType(r, DataRow)

In framework: ¿no results and it should? -> ds = AlternativeDataSet, and requery...
The logic of the query resides outside the framework, and could return any rows of the typed dataset.

The use of .Find can be a particular case, although most times query expression would be some kind of .Select, and there LINQ behaves ok. I tried with FindByKey because I thought it could be a reference test, simple but representative. But as I have seen it is a poor case to compare LINQ with.

Anyway is not a madure idea, Boonie. Don't worry, your comments have been very useful to me. Thanks.

(Sorry for my very bad english..)

Regards, Daniel
ReplyDelete
Replies
OsOFebruary 19, 2014 9:09 AM
Hi Bonnie,
sorry to bring this up after so much time.
I am doing some tests to compare the time it takes to find data in a DataSet using a LINQ query and the traditional Select() method. And I see the opposite behavior, the LINQ query is performing way worst than the Select method.
I tried to create the DataSet.xsd in the way you suggested (http://geek-goddess-bonnie.blogspot.com.ar/2010/04/create-xsd.html) to avoid code that is not necessary, but still got the same result.
Perhaps is the way I am doing the searches:
I populate the DataSet with 270000 rows, then I create a List with 100 random numbers. I will perform a search with LINQ and the Select method with each of those numbers. The numbers can be found or not in the DataSet, I did this on purpose to make sure I get the worst scenario of a full scan. I repeat this 10 times.
In my tests, LINQ performs almost 1000 times slower than the Select(). This is weird since I saw people is saying the opposite.
If you'd like I am happy to share the code. Any comment is welcome. Thanks
ReplyDelete
Replies
AnonymousFebruary 23, 2014 6:12 AM
http://www.sqlaspnet.com/interview/207/What%20is%20the%20benefit%20of%20using%20LINQ%20on%20Dataset
ReplyDelete
Replies
AnonymousNovember 13, 2019 1:13 AM
Hi Bonnie,

How can implement this examples LINQ with combination BBDataAccess (part III - https://geek-goddess-bonnie.blogspot.com/2009/10/dataaccess-part-iii.html) (crud example) and is it possible?
And with Passing Data Between Forms Redux (https://geek-goddess-bonnie.blogspot.com/2012/12/passing-data-between-forms-redux_31.html)

Thanks
ReplyDelete
Replies
AnonymousNovember 14, 2019 9:03 AM
Hi Bonnie,

Bonnie wrote:
..."Personally, I still prefer Typed DataSets. Call me 'old school' "

Yes, I totally understand you and I absolutely agree with you.

In fact, I wanted confirmation that it was using DataSets still a good way, right?

Thanks for answer
ReplyDelete
Replies

Add comment

Geek Goddess

Wednesday, April 06, 2011

LINQ With DataSets

15 comments:

Blog Archive

About Me

Any Spare Change?

Number of Visitors

Links