Friday, January 31, 2020

Merge Tables And Sum Columns

A question on the MSDN Forums prompted me to write the topic of today's blog post. The question, in a nutshell, involved how to merge data into one DataTable from two DataTables originating from two different sources and provide a sum of a common column in that new DataTable.

I will illustrate the concept with an example from my fictitious ride-sharing system that I used in two previous blog posts:

https://geek-goddess-bonnie.blogspot.com/2019/09/parsing-data-with-metadata.html
https://geek-goddess-bonnie.blogspot.com/2019/12/create-dictionary-using-linq.html

The RideShare application maintains a database containing all the cars for each of the Companies that subscribe to their Service. One of the Services that this RideShare application provides is to keep track of the mileage that each car has driven every day in its database.  Each Car Company will send information to RideShare from every car that the company operates. This is a very simplistic version of this hypothetical system, because in reality we'd keep a lot more information (such as a TripID, probably a begin/end time for each Trip, etc.).  But since I'm only using this for an example of how to "Merge and Sum", I'm not going to bother with all that. I'll only be interested in the TripMileage and the TripDate (since we'll be summing all Mileages for each Date for each Car).

Let's say that RideShare is a startup company and currently only has two Customers, GeeksRide and UberGeeks. The XML for the DataSet that RideShare has retrieved from its database for today's trips, looks like this:

<RideShareOwnerDataSet>
<CompanyInfo>
<CarCompanyID>1</CarCompanyID>
<CarCompany>GeeksRide</CarCompany>
</CompanyInfo>
<CompanyInfo>
<CarCompanyID>2</CarCompanyID>
<CarCompany>UberGeek</CarCompany>
</CompanyInfo>
<CarInfo>
<CarNumberID>1</CarNumberID>
<CarCompanyID>1</CarCompanyID>
<CarNumber>C001</CarNumber >
<TripMileage>111.1</TripMileage >
<TripDate>2019-01-31</TripDate >
</CarInfo>
<CarInfo>
<CarNumberID>1</CarNumberID>
<CarCompanyID>2</CarCompanyID>
<CarNumber>C201</CarNumber >
<TripMileage>121.1</TripMileage >
<TripDate>2019-01-31</TripDate >
</CarInfo>
<CarInfo>
<CarNumberID>2</CarNumberID>
<CarCompanyID>2</CarCompanyID>
<CarNumber>C202</CarNumber >
<TripMileage>222.2</TripMileage >
<TripDate>2019-01-31</TripDate >
</CarInfo>
<CarInfo>
<CarNumberID>3</CarNumberID>
<CarCompanyID>2</CarCompanyID>
<CarNumber>C203</CarNumber >
<TripMileage>333.3</TripMileage >
<TripDate>2019-01-31</TripDate >
</CarInfo>
</RideShareOwnerDataSet>

During the course of the day, RideShare receives data from it's two customers (perhaps after every trip, or perhaps at intervals during the day ... that's totally up to the Customers as to when they send data to RideShare). So, let's say that RideShare receives data from UberGeek. It looks like this:

<RideShareOwnerDataSet>
<CompanyInfo>
<CarCompanyID>2</CarCompanyID>
<CarCompany>UberGeek</CarCompany>
</CompanyInfo >
<CarInfo>
<CarNumberID>2</CarNumberID>
<CarCompanyID>2</CarCompanyID>
<CarNumber>C202</CarNumber >
<TripMileage>22.2</TripMileage >
<TripDate>2019-01-31</TripDate >
</CarInfo>
<CarInfo>
<CarNumberID>4</CarNumberID>
<CarCompanyID>2</CarCompanyID>
<CarNumber>C204</CarNumber >
<TripMileage>42.2</TripMileage >
<TripDate>2019-01-31</TripDate >
</CarInfo>
</RideShareOwnerDataSet>

UberGeek is reporting the mileage from two different cars, one (ID 2) is already in the database and the other (ID 4) is a brand new car. We need to merge this new information with the data from the database, and add up all mileage, so that each car has only one row that contains the total mileage for this day so far.

I decided to try my hand at doing this with LINQ.

I chose to simply create a new DataSet/DataTable to hold the revised data, rather than updating the existing DataSet (since a new DataTable was the original problem posted to the Forum).

I am also using a Typed DataSet as it is much better in terms of usability for just about anything you want to do with DataSets. Since I'm providing the XML above, it can easily be used to create a Typed DataSet. See the second link to my blog above (Create Dictionary Using LINQ), and scroll down the section "XML Data and a Typed DataSet) for info about how to do this if you don't already know.

So, let's start looking at code. I took the above XML and saved it to two different XML files and then read them into two DataSets.

RideShareOwnerDataSet dsRideShare = new RideShareOwnerDataSet();
RideShareOwnerDataSet dsCustomerRide = new RideShareOwnerDataSet();

dsRideShare.ReadXml("RideShareFromDatabase.xml");
dsCustomerRide.ReadXml("RideShareFromCustomer.xml");

var dsResult = this.MergeAndSumData(dsRideShare, dsCustomerRide);

You can then use the dsResult DataSet to update your database, however that is beyond the scope of this post.

Here is the MergeAndSumData() method:

private RideShareOwnerDataSet MergeAndSumData(RideShareOwnerDataSet dsDatabase, RideShareOwnerDataSet dsCustomer)
{
RideShareOwnerDataSet dsResult = new RideShareOwnerDataSet();
RideShareOwnerDataSet.CarInfoDataTable dtResult = dsResult.CarInfo;

// First, we Merge the CarInfo DataTables from both DataSets into one DataTable.
var dtMerged = dsDatabase.CarInfo.Copy() as RideShareOwnerDataSet.CarInfoDataTable;
dtMerged.Merge(dsCustomer.CarInfo);

var test =
dtMerged.Where(row => !row.IsTripDateNull())
.GroupBy(row => new { row.CarCompanyID, row.CarNumberID, row.CarNumber, row.TripDate }, (key, group) => new
{
key.CarNumberID,
key.CarCompanyID,
key.CarNumber,
key.TripDate,
// And Sum the Mileages for each car here:
TripMileage = group.Sum(s => s.TripMileage)
})
.Select(row => dsResult.CarInfo.LoadDataRow(new object[]
{
row.CarNumberID,
row.CarCompanyID,
row.CarNumber,
row.TripDate,
row.TripMileage
}, LoadOption.Upsert))
.ToList();

// Since we are returning the entire DataSet, we might as well put the original CompanyInfo data in its DataTable
// in case the calling method might need to have both DataTables contain data.
dsResult.CompanyInfo.Merge(dsDatabase.CompanyInfo);

return dsResult;
}

A couple of things to note about the above method:

  • We're using a .ToList() at the end of the LINQ query, but we really don't care about the List (which ends up in the variable called test) ... it's used simply because the act of running .ToList() then executes the query. Until then, LINQ hasn't done anything and dsResult.CarInfo will still be empty.
  • The order that the column variables are used in the LoadDataRow() is important. It corresponds to the order that columns are defined in the DataTable.

That's all there is to it. And while the LINQ looks fairly straightforward now that I've written it, I had not used LoadDataRow much in the past, and never had I used it in a LINQ query! So, this post took a lot of experimenting and Googling. Which is why it took me so long to finish writing it!! 

And now you know the rest of the story. 

Happy Coding!!  =0)

No comments:

Post a Comment