Wednesday, November 23, 2011

NAS Gone Bad

Today’s post is not about my usual topic (.NET programming), but about the solution I used when my NAS (Network Attached Storage) went bad. It was a Western Digital My Book World Edition with a 1 TB Caviar Green SATA WD10EADS 5-1/4 inch drive (the one with the white light). The thing is, I was pretty sure my drive (and therefore my data!) was just fine. It’s just that the enclosure and all the electronics that make it “network attachable” was the part that went south. Or so I hoped.

I took it apart (did I mention that I *hate* doing hardware stuff!!) and got the drive out. I thought I bookmarked a link to a video showing how to take it apart, but I guess I didn’t bookmark it and I can’t find the same video at the moment. Google for it though, you’ll find some “how-to” for taking the drive out of the enclosure. If you’re a whiz at hardware, you probably don’t even need a how-to video. It’s not rocket-science … even I could do it! ;0)

Then I hooked it up to a SATA-to-USB converter and plugged the USB into my laptop. At this point, Windows Explorer would not assign a drive letter, but I could see the drive with the Disk Management utility. It was listed as having a Healthy but Unknown partition.

After a little more Googling, I discovered that the drives that get put into these NAS devices are just about always formatted to some kind of Linux format … not readable by Windows!! Grrrr …. now I have a lot more Googling to do!

I found lots of discussions on the topic … apparently some of these NAS devices (not just Western Digital) give up the ghost on a regular basis! You can buy NAS devices that are disk-less … in other words, they’re just the enclosure with all the electronics for making it network attachable and you supply your own drive (I assume an unformatted drive). Which means that if Western Digital sold the same enclosure disk-less for this particular model, then I could just buy the enclosure, pop in my drive and be back in business!!!   Unfortunately, I couldn’t find one and I suspect that they don’t exist for my model. Grrrr …. back to the drawing board!

I found a ton of links talking about how to create a Linux boot disk and boot your machine to Linux and enter in a whole bunch of gibberish Linux commands and it may (or may not) do the trick. There were like 20 steps in some of these how-to discussions! And I know absolutely nothing about Linux. I really did not want to mess with this! So, I kept looking …

Then I found a nifty little driver that could allow a Windows machine to be able to read a Linux formatted drive! Because it’s a driver, it would work seamlessly with Windows. This was just what I was looking for! And it’s free! I downloaded it (Ext2 Driver for Windows). It will work for both Ext2 and Ext3 formatted drives. It installed just fine on my computer. Unfortunately, it didn’t do the trick … why? Because my drive wasn’t formatted with Ext2 or 3!!  More Googling showed me that it was probably an XFS formatted drive. Arrggghhh!!! Will it never end?!?!?!

One more round of Googling and I finally found a solution! It wasn’t free, but it was only about $30. I could live with that if it worked (UFS Explorer). It’s a utility that can read all kinds of partition formats and there was a free trial download (limited to copying files smaller than 64KB) … so I downloaded it to take it for a testdrive. If it worked, I would buy it. You can only read/copy from the drive …  you can’t write to it. But I intended to copy all the files off of the drive, then reformat it to NTFS and install it directly on my machine (yuck … more hardware stuff). Luckily I already had a 2 TB USB drive and I had plenty of room on that drive. The trial showed that it copied the small files flawlessly, so I plunked down the $30 and bought it. It also worked flawlessly, copying every single one of my files off of the XFS partition!  Wow! Disaster averted!

Now, all that’s left to do is reformat the 1 TB and stick it in my machine. But, since that’s hardware stuff, I’ll procrastinate on that task just a little bit longer.  =0)

Saturday, October 08, 2011

XmlSerializer and Implemented Interfaces

Here’s the scenario:

  1. Get some data from calls to a third-party API.
  2. Serialize that data to XML.
  3. Use XSLT to Transform that XML into a common set of classes.

The API utilizes Interfaces for all the objects it returns and it would have been nice if I could have simply used either XmlSerializer or DataContractSerializer on the objects I received, but of course nothing is simple. Suffice it to say that it doesn’t work, for either serializer.

So, I’ve spent the last day or two working on implementing my own classes from the various third-party Interfaces for the objects I need. Basically, I’ll be using the API to get some data. Then I’ll have to copy the data into my own classes and then serialize it to XML.

Theoretically, creating classes that implement on Interface should be easy (simply right-click on the Interface and choose “Implement Interface”. But, nooooo … even though I’m using Visual Studio 2010, it still generates properties the same way it did before the “invention” of automatic properties. In other words, instead of generating a property like this:

public int VehicleID { get; set; }

The property gets generated like this:

public int VehicleID
{
    get
    {
        throw new NotImplementedException();
    }
    set
    {
        throw new NotImplementedException();
    }
}

When you’re talking about implementing a ton of properties, you don’t want to have to fix each of these manually. I knew I could find a way to generate these Interface properties differently, but I had forgotten all about code snippets. Had the old “slap on the forehead” moment when I read this blog post, written by Daniel Cazzulino, about how to do this:

http://blogs.clariusconsulting.net/kzu/how-to-replace-default-interface-property-implementation-expansion-with-automatic-properties/

Duh! Why hadn’t I remembered how to do that myself? Gettin’ old I guess. =0(

But wait, there’s more. Some of these Interface properties did NOT have setters and so they get generated with only the get. While this doesn’t cause a problem with DataContractSerializer, the XmlSerializer will not work unless you have both getters and setters. Daniel mentions, in an update at the end of his blog post, how to take care of this, by producing private setters for the generated property:

public int IncidentID { get; private set; }

Unfortunately, because I’m copying data to my own classes, my class needs to have a public setter. In my case, this doesn’t really matter as far as the functionality of these classes, because I’m only using them so I can serialize the data to XML, so I don’t care that the “set” is exposed publicly.

The two serializers produce very different XML. The task I will tackle next week is to see if that matters when I use that XML for my XSLT Transformations. For those interested, I’ll probably post the results of what happens with that little experiment next week.

Friday, September 30, 2011

DISTINCT - LINQ versus DataSet

Wow, it’s been way too long since the last time I wrote a blog post. I’ve been busy, but that’s really no excuse. In my last post in April, I wrote about using LINQ with Typed DataSets. I did some benchmark comparisons with Selects and LINQ was considerably faster. Today I’m going to compare the DISTINCT capabilities of LINQ versus DataSets.

As before, I’m using the same set of data … a DataTable containing 270,000 rows. I’ve filled the DataTable such that there’s a random number of duplicates in the rows, but it seems that the DISTINCT values always fall between 4750 and 4800 rows.

There are some interesting results. When selecting DISTINCT values, over all columns in the DataTable, it definitely is best to use the old DataSet methods rather than LINQ. And, also surprising, is that the old-style 2.0 Typed DataSets (and likewise, plain old vanilla DataSets, not Typed) are actually faster than newer 3.5 Typed DataSets (refer to my last post, linked to above, that explains what is the difference between the two).

// ds20 is a 2.0 Typed DataSet (a regular DataTable)
// ds35 is the 3.5 Typed DataSet (a TypedTableBase) 
DataTable dt20 = ds20.Personnel.DefaultView.ToTable(true);
DataTable dt35 = ds35.Personnel.DefaultView.ToTable(true);

// Note that if you MUST use the DataRowComparer.
// If you don't you get all rows returned ... it won't find the duplicates.
DataTable dtLinq20 = ds20.Personnel.AsEnumerable()
    .Select(row => row).Distinct(DataRowComparer.Default)
    .CopyToDataTable();

// note that the only difference in syntax between
// 2.0 and 3.5 is that you don't use .AsEnumerable()
DataTable dtLinq35 = ds35.Personnel
    .Select(row => row).Distinct(DataRowComparer.Default)
    .CopyToDataTable();

Benchmarking results show that the DefaultView.ToTable() is 7 times faster than LINQ for ds20 (and 8.5 times faster for ds35). The DefaultView.ToTable() is only slightly faster for ds20 than ds35 ( not significantly though) and with LINQ, ds20 is about 1.5 times faster than ds35. Most peculiar!

  • ToTable (ds20):    77,937,500 ticks
  • ToTable (ds35):    90,718,750 ticks
  • LINQ (ds20):       554,656,250 ticks
  • LINQ (ds35):       781,218,750 ticks

But, there’s where the superiority of the DefaultView.ToTable() ends. Once you start looking for specific columns to be distinct, LINQ is faster. Let’s look at the one column scenario: First, the DataSet code (since all code for the DataSet methods are exactly the same for ds20 as for ds35, I’ll only show one):

DataTable dt20 = ds20.Personnel.DefaultView.ToTable(true, "firstname");

Pretty straightforward … nothing fancy. The benchmark times for both ds20 and ds35 are almost identical.

But now, the LINQ gets trickier. Unfortunately, you’ve got to create the resulting DataTable (including the column) before you can “fill” it from the LINQ query. If there’s a workaround for this, I sure haven’t found it. Here are the two LINQ queries:

// note that you have to create the DataTable first, 
// AND add the column!
DataTable dt = new DataTable();
dt.Columns.Add("firstname");
DataTable dtLinq20 = ds20.Personnel.AsEnumerable()
    .Select(row =>
    {
        DataRow newRow = dt.NewRow();
        newRow["firstname"] = row.Field<string>("firstname");
        return newRow;
    })
    .Distinct(DataRowComparer.Default).CopyToDataTable();

// note that there are 3 differences in syntax for ds20 & ds35:
// 1) as before, you don't use .AsEnumerable()
// 2) you must check for null
// 3) you can use the typed column syntax (row.firstname)
DataTable dt = new DataTable();
dt.Columns.Add("firstname");
DataTable dtLinq35 = ds35.Personnel
    .Select(row =>
    {
        DataRow newRow = dt.NewRow();
        newRow["firstname"] = row.IsfirstnameNull() ? "" : row.firstname;
        return newRow;
    }).Distinct(DataRowComparer.Default).CopyToDataTable();

Now for the benchmark numbers.  LINQ is more than 5 times faster than the DefaultView.ToTable() … (5.25 times faster for ds20 and 5.5 for ds35).

  • ToTable:     15,406,250 ticks
  • LINQ (ds20):    2,937,500 ticks
  • LINQ (ds35):    2,781,250 ticks

As more columns get added, the LINQ advantage becomes diminished. At some point, the DefaultView.ToTable() methodology becomes preferable. I suspect that that point depends both on the number of columns you wish to have distinct, as well as the number of rows that you are processing. The syntax for the DefaultView.ToTable() becomes slightly different when you have more than one column. Here it is:

DataTable dt20 = ds20.Personnel.DefaultView
    .ToTable(true, new string[] { "firstname", "lastname" });

The LINQ syntax doesn’t change, you just keep adding the columns:

DataTable dt = new DataTable();
dt.Columns.Add("firstname");
dt.Columns.Add("lastname");
DataTable dtLinq35 = ds35.Personnel
    .Select(row =>
    {
        DataRow newRow = dt.NewRow();
        newRow["firstname"] = row.IsfirstnameNull() ? "" : row.firstname;
        newRow["lastname"] = row.IslastnameNull() ? "" : row.lastname;
        return newRow;
    }).Distinct(DataRowComparer.Default).CopyToDataTable();

In the case above, of two columns, LINQ was just slightly under 5 times faster.

So, just for giggles, I thought I’d benchmark half the columns. My DataTable has 32 columns, so I tried a distinct on the first 16 columns in my DataTable. And now, it seemed, that we were back to having DefaultView.ToTable() being superior by about 7 times (turns out I was wrong … keep reading). Now, my curiosity got the better of me … I wasted some more time trying to find “the sweet spot” … the number of columns where the performance was about equal. After all, I was a math major in college and I love to play with numbers!

But, as I played more with this, I discovered something very interesting. The time it takes for LINQ depends on the order of the columns!!! In other words, if the first column in your list doesn’t have a lot of distinct values (mine had 7), but the next one in your list does (mine had 4700+), it will take a lot longer. If you swap the column order of those two columns, BAM! Very fast! Instead of the LINQ query being 7 times slower than the DefaultView.ToTable(), it was now 2.25 times faster!

Curiously, this doesn’t seem to matter with the DefaultView.ToTable() … it benchmarked the same amount of time with the columns in any order!

  • ToTable:                                                        42,750,000
  • LINQ (original order of 16 columns):    305,375,000
  • LINQ (swapped order of columns):         19,125,000

I suspect that my very first test, the one with all the columns,suffered from this anomaly … in other words, the LINQ query was so very much slower because that problem column (the one with only 7 distinct values) happened to be the first column in the DataTable. and the syntax of that query didn’t specify columns.

After discovering this quirk, I went back and re-ran the LINQ query that used only 2 columns, but I did it this time with these two particular columns where I noticed this problem. Sure enough, with only a two column distinct query, if the columns were in the wrong order, the query slowed to a crawl … becoming 14.5 times slower than the DefaultView.ToTable()!!!  Wow!

So, what’s the takeaway from this little experiment? Good question … I guess it’s that the data makes the difference. If you’re pretty sure about the data in the columns not being too lopsided, distinct-wise, then LINQ is clearly faster for probably all scenarios except the one including all columns (I did not test this, it’s only a guess … the most columns I tested were half). However, if you’re not certain of the distinctness of the data in your columns, perhaps it’s best to stick with the tried-and-true DefaultView.ToTable() method.

Wednesday, April 06, 2011

LINQ With DataSets

Today's blog post is going to be about using LINQ with DataSets. As anyone who reads my blog regularly (or various forum posts) might already know, I've been a big fan of Typed DataSets since the very beginning of .NET (all the way back to the pretty buggy 1.0 version). However, LINQ is fairly new to me. LINQ *itself* is not new to me ... it has been around for a few years now and I've certainly been aware of it. But I haven't utilized it much because, until .NET 3.5, there was no real support for Typed DataSets and that's really all I wanted to be able to use if for. Sure, you could still use LINQ with both regular DataSets and Typed DataSets ... as long as you used .AsEnumerable() ... but you couldn't take advantage of the Typed nature of a Typed DataSet, so I didn't see any point in messing around with it.

I've recently begun using it more with Typed DataSets because, surprisingly to me, for some things it seems to be faster. Since .NET 3.5, Typed DataSets have been getting auto-generated slightly different than they have in the past (and I'm not talking about the TableAdapter generation, which I despise ... see my blog post about avoiding that fiasco: http://geek-goddess-bonnie.blogspot.com/2010/04/create-xsd.html).

Prior to 3.5, the DataTable definition in a generated Typed DataSet looked something like this:

public partial class PersonnelDataTable : global::System.Data.DataTable, global::System.Collections.IEnumerable
{
// rest of the code here
}


The minor difference is this:

public partial class PersonnelDataTable : global::System.Data.TypedTableBase<PersonnelRow>
{
// rest of the code here
}


TypedTableBase is derived from DataTable, so really nothing changes. But, it allows the DataTable to be used in LINQ queries without specifying .AsEnumerable(), allows us to use the Typed column name properties of the DataRows in our LINQ queries *and* it's also much faster.

As I already mentioned above, LINQ *can* easily be used with regular DataSets/DataTables (it's use is NOT limited to Typed DataSets), but I am not including examples of that in this blog post. As with the 2.0 Typed DataSets, all that is needed is to use .AsEnumerable() with the DataTable, so the code will be almost identical.

So, let's see a few examples.

First, let's look at a typical use for a DataSet, selecting some rows. Using LINQ with a Typed DataSet ends up being quite a bit faster than using the DataTable.Select() method. In all my tests, I used a DataTable containing 270,000 rows. The Select in this test, selected 30,000 of those rows. I also did comparisons between 2.0 DataTable and 3.5 TypedTableBase, just for the heck of it.

// ds20 is a 2.0 Typed DataSet, which uses plain a plain old DataTable

DataRow[] dsSelect20 = ds20.Personnel.Select("lastname = 'Berent'");

DataRow[] linqSelect = ds20.Personnel.AsEnumerable()
.Where(row => row.Field<string>("lastname") == "Berent")
.Select(row => row).ToArray();


The above are your only two choices when using untyped or 2.0 Typed DataSets. Benchmark timing results show that LINQ is about 6 times faster than the old DataTable.Select() method:

  • Select:  5,625,000 ticks
  • LINQ:      937,500 ticks

If you're using 3.5 Typed DataSets, you have a few more options with LINQ. Generally, your LINQ statement will look like this:

// ds35 is a 3.5 Typed DataSet, which uses the new TypedTableBase class for it's DataTable

DataRow[] linqSelectLastName = ds35.Personnel
.Where(row => row.lastname == "Berent")
.Select(row => row).ToArray();


Note the differences: you don't need Personnel.AsEnumerable() and you can use the typed row column names, row.lastname. There is one caveat though to the above syntax. If any row in the Personnel table contains DBNull.Value in the lastname column, the above syntax will throw an exception. You must also check for the null, so the actual statement will need to be this:

DataRow[] linqSelectLastName = ds35.Personnel
.Where(row => row.IslastnameNull() == false && row.lastname == "Berent")
.Select(row => row).ToArray();


You could also use the untyped syntax of the column and then you don't need to check for DBNull, but I like to take advantage of the typed nature of a Typed DataSet. That is the purpose of them, after all. =0) Also, using the untyped syntax is slightly slower, but probably not significantly so.

DataRow[] linqSelectUntypedLastName = ds35.Personnel
.Where(row => row.Field<string>("lastname") == "Berent")
.Select(row => row).ToArray();


Incidentally, my data *does* contain DBNull.Value in the lastname column, so that my benchmark timing tests are valid. The timing for the DataSet.Select() yields roughly the same results either way, but the LINQ is about 1.5 times faster with the TypedTableBase than with a regular DataTable, making LINQ about 9 times faster than a regular DataSet.Select()!!  Here are the results using a 3.5 Typed DataSet:

  • Select:                            5,937,500 ticks
  • LINQ:                                625,000 ticks
  • LINQ (untyped syntax):      781,250 ticks

So, I think that I'll wind up this post for now. There are plenty of other uses of LINQ that I should compare with old DataSet/DataTable functionality, but I think I'll save that for another post. This one is long enough!

Until next time ... happy coding!

Sunday, January 30, 2011

Passing Data Between Forms

There are several different approaches one could make to "pass" information, such as a DataSet, from one Form to another. I see this question asked a lot on the forums, so here's a quick summary. You could use one of these approaches, or combine several of them so that the developer has the flexibility to use whichever approach is appropriate. I'm going to show 3 different solutions to the problem:

First, let's assume we have a MainForm, and it instantiates and shows Form1, and needs to pass a DataSet to it.

1) Form1 can have a DataSet parameter in its constructor:

public class Form1 : Form
{
private DataSet oData;
public Form1(DataSet ds)
{
this.oData = ds;
InitializeComponent();
}
}

// called from MainForm like this:
Form1 oForm = new Form1(this.dsCustomer);
oForm.Show();

2) Form1 can expose its DataSet field as a public Property:

public class Form1 : Form
{
public DataSet oData {get;set;}
public Form1()
{
InitializeComponent();
}
}

// called from MainForm like this:
Form1 oForm = new Form1();
oForm.oData = this.dsCustomer;
oForm.Show();

3) Another option is to use Interfaces. For example, say that your Form1 allows the user to make some changes to the data in the DataSet and you'd like to have MainForm automatically reflect those changes ... even while you're still working in Form1.

First, let's define the Interface:

public interface IDisplayCustomer
{
CustomerDataSet dsCustomer {get;set};
void DisplayCustomer();
}

MainForm would then be defined like this:

public class MainForm : Form, IDisplayCustomer
{
public CustomerDataSet dsCustomer {get;set;}
public void DisplayCustomer()
{
// code here to do stuff with this.dsCustomer
}

...

// then code elsewhere to instantiate and show Form1:
this.dsCustomer = new CustomerDataSet(); // or other code to fill the dataset
Form1 oForm = new Form1(this);
oForm.Show();
}

And Form1 would look like this:

public class Form1 : Form
{
private CustomerDataSet oData;
private IDisplayCustomer CallingControl;

public Form1 (IDisplayCustomer callingControl)
{
this.CallingControl = callingControl;
this.oData = callingControl.dsCustomer;
}

public void DoStuff()
{
// code to do stuff with this.oData
...
// and then redisplay in calling form
this.CallingControl.DisplayCustomer();
}
}

UPDATE:
After getting asked about this a lot, I've decided to add a 4th idea to be considered, expanding on the above 3rd example, which used an Interface. See my new post about that here: Redux