Tuesday, July 28, 2009

Fluent Interface Pattern for Composing Entity Framework Queries

I’ve been doing a fair amount of work with Entity Framework recently.  There are some things about EF that make me want to throw it out the window, but this post is about something that I really like, the ability to eliminate redundant code from my BLL and DLL and create a fluent interface for composing queries.

The problem we want to solve

So here’s a typical scenario.  I have a blog aggregator application that I’m building.  I use Entity Framework to create a BlogPost entity and it’s data mappings. Great, now I’m ready to create a BlogRepository class that will contain all of my queries for getting Blog posts.  So I write the first data access method and it looks something like this.

public List<BlogPost> BlogPostSetByWeek_GetPage(int pageIndex, int pageSize, DateTime startDate)

{

    startDate = ToSunday(startDate);        

    DateTime startUtc = startDate.Date;

    DateTime endUtc = startDate.AddDays(7).Date;

    int skipCount = pageSize * pageIndex;

    var query = from p in Context.BlogPostSet.Include("Categories")

                where p.PostedUtc > startUtc & p.PostedUtc < endUtc

                orderby p.PostedUtc descending

                select p;

    List<BlogPost> postSet = query.Skip(skipCount).Take(pageSize).ToList<BlogPost>();

    return postSet;

}

The above method takes a startDate and some paging parameters and then returns the specified page of results in the shape of a generic list of BlogPost entities.  How easy was that!! 

Now for the next step.  I need a query that’s exactly like the query above but this time I want the entire list of results instead of just a page.  And after that I need another query that sorts the BlogPosts by Category instead of by PostedUtc, and then I need another that sorts by the BlogName, and on and on and on.  So how do I handle this??  I could just create a completely new EF query for each one of these.  Or maybe I could use EntitySQL instead of Linq to Entities and then I would be able to use a bunch of conditional blocks to create the EntitySQL text that I need….. Neither of those solutions really appeals to me.  First, I don’t like the idea of rewriting the same query over and over with minor differences in criteria or sort order.  That just seems inefficient.  Second I don’t really want to use EntitySQL because I like the strong typing that I get with Linq to Entities, plus I would need a lot of conditionals to handle all of the possible query combinations and that sounds like a mess.

The Solution

So I was thinking about how much I hate duplicating the same query code over and over when I realized something.  Microsoft has made the query an object. I didn’t really appreciate the significance of that before.  The query is no longer just text, it is now an object, an ObjectQuery<> object to be precise.  The cool part is that if I write methods that take an ObjectQuery as their parameter and then return an ObjectQuery for their return value,  I can chain them together and use them to compose queries.

How could this work?  I looked at the queries in my BLL and found that each of them consists of 3 major components:

image

Looking at this break down, I realized that I could have a Filter Method that creates an ObjectQuery that gets the data I’m looking for, then I could pass that ObjectQuery to a  Sort Method that applies a sort then returns the modified ObjectQuery, then I could pass that to a Projection Method that applies paging, shapes the data, and executes the ObjectQuery. 

So, when all this is said and done I should be able to compose Entity framework queries by combining a Filter Method, a Sort Method, and a Projection Method.  The end result should be data access code that looks like this:

List<BlogPost> postSet = GetBlogPostSet().SortBy(“Date”).GetPage(pageIndex, pageSize);

List<BlogPost> postSet = GetBlogPostSet().SortBy(“ID”).GetPage(pageIndex, pageSize);

List<BlogPost> postSet = GetBlogPostSet().SortBy(“ID”).GetAll();

Building an Example

So, I coded it up and it works pretty well.  The first step is creating a Filter Method.  This method takes search criteria as parameters and returns an ObjectQuery. Below is my filter method for getting the BlogPost entities for a given week. 

// GetBlogPostSetForWeek

private ObjectQuery<BlogPost> GetBlogPostSetForWeek(DateTime startDate)

{

    startDate = ToSunday(startDate);

    DateTime startUtc = startDate.Date;

    DateTime endUtc = startDate.AddDays(7).Date;

    var query = from p in Context.BlogPostSet.Include("Categories")

                where p.PostedUtc > startUtc & p.PostedUtc < endUtc

                select p;

    return (ObjectQuery<BlogPost>)query;

}

Now I need to create my Sort Method. This method will take the results of my Filter Method as a parameter, along with an enum that tells the method what sort to apply. Note that I’m using strongly typed object queries of type ObjectQuery<BlogPost>.  The strong typing serves two purposes.  First it lets my Sort Method know that I’m dealing with BlogPost entities which tells me what fields are available to sort by.  Second, the stong typing provides a distinct method signature so I can have multiple methods called SortBy which all handle ObjectQueries that return different types of entities.  I can have a SortBy( ObjectQuery<BlogPost>), SortBy(ObjectQuery<Person>), etc.  

One other thing.  I want to chain these methods together, fluent interface style.  For that reason I’m implementing both SortBy and my GetPage as extension methods. Here’s the code for the SortBy method.

// SortBy

internal static ObjectQuery<BlogPost> SortBy( this ObjectQuery<BlogPost> query, Enums.BlogPostSortOption sortOption)

{

    switch (sortOption)

    {

        case Enums.BlogPostSortOption.ByDate:

            return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

        case Enums.BlogPostSortOption.BySite:

            return (ObjectQuery<BlogPost>)query.OrderBy(p => p.BlogProfile.BlogName);

        case Enums.BlogPostSortOption.ByVote:

            return query;

        default:

            return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

    }

}

Lastly we need to create a Projection Method.  Below is the GetPage  method.  It takes the ObjectQuery<BlogPost> from the SortBy method, applies paging logic to it, executes the query, then returns the results as a List<BlogPost>. 

// GetPage

internal static List<BlogPost> GetPage(this ObjectQuery<BlogPost> query, int pageIndex, int pageSize)

{

    int skipCount = pageSize * pageIndex;

    return query.Skip(skipCount).Take(pageSize).ToList<BlogPost>();

}

So that’s it.  I now have all the pieces needed to create my data access methods without duplicating query logic over and over.  If I want all blog posts ordered by date, I can use the code:

  Enums.BlogPostSortOption sort = Enums.BlogPostSortOption.ByDate;

  return GetBlogPostSetForWeek(startDate).SortBy(sort).GetPage(pageIndex, pageSize);

To sort those same results by BlogName I can use the code:

  Enums.BlogPostSortOption sort = Enums.BlogPostSortOption.BySite;

  return GetBlogPostSetForWeek(startDate).SortBy(sort).GetPage(pageIndex, pageSize);

If I want to get BlogPosts by category instead of by week, I just write a new filter method named GetBlogPostSetForCategory and it plugs right in:

  return GetBlogPostSetForCategory(category).SortBy(sort).GetPage(pageIndex, pageSize);

Conclusion

So that's it.  This technique has significantly reduced the amount of data access code in my Repository classes and the time that it takes to write it.  I also like the fact that I’m not writing the same paging and sorting code over and over in different queries.  If you see any advantages or disadvantages to the technique, please leave a comment and let me know what you think.  Also, if you’re aware of anyone else using a similar method, please send me a link at rlacovara@gmail.com, I would like to check it out.

Saturday, July 11, 2009

Chaining the C# Null Coalescing Operator

This is something that came up in the comments last week.  I was refactoring some code and wound up with the  accessor method below that performs multiple null checks while trying to assign a value to _currentUser.

// CurrentUser

 private WebUser _currentUser;

 public WebUser CurrentUser

{

   get

   {

     if (_currentUser == null) _currentUser = GetWebUserFromSession();

     if (_currentUser == null) _currentUser = GetWebUserFromTrackingCookie();

     if (_currentUser == null) _currentUser = CreateNewWebUser();

     return _currentUser;

   }

}

Now this code is pretty clear, but a reader named Brian pointed out in a comment that we could shorten the code a bit by using the Null Coalescing operator “??”.  If you haven’t used the Null Coalescing operator yet, check it out.  It’s great for data access code where a method may return data or a null value.  The basic syntax is illustrated in this block of code from msdn:

   // y = x, unless x is null, in which case y = -1.
   int y = x ?? -1;

   // Assign i to return value of method, unless
   // return value is null, in which case assign
   // default value of int to i.
   int i = GetNullableInt() ?? default(int);

So, this is really cool, but there’s more. What Brian pointed out was that you can chain the ?? operator to do multiple null checks.  So, my block of code above can be rewritten like this:

// CurrentUser

 private WebUser _currentUser;

 public WebUser CurrentUser

{

   get

   {

     _currentUser = _currentUser ??

                    GetWebUserFromSession() ??

                    GetWebUserFromTrackingCookie() ??

                    CreateNewWebUser();

     return _currentUser;

   }

}

Note that normally the _currentUser assignment would fit all on one line but due to the width limitation of my blog I broke it up into multiple lines.  So, C# Null Coalescing operator chaining. I like it. I’ll be adding it to my toolbox.

 

Addendum:

Hey, you guys pretty much hated this change to the code. I got no blog comments but I received quite a few emails about this technique and the prevailing opinion seems to be that the original syntax with the multiple if blocks was clearer.  In fact, not a single person who emailed me liked the ?? method. So I took another look and I think I agree.  The original code does seem a little clearer to me and I don’t really think I was able to make my code any shorter or simpler by using ??.  So this may not have been the best example, but I do still think this is a very cool technique for the right situation. As always, I just need to make sure that I’m refactoring because the changes make the code cleaner, not just because they’re clever.

Friday, July 3, 2009

SOLID C# Code: Smaller Methods == Clean Code?

I’m a big fan of Robert Martin.  His book on Agile Patterns in C# is still one of the three most important programming books I’ve ever read.  If you listen to Uncle Bob for any amount of time, it won’t be long before you start hearing terms like SOLID Principles and Clean Code.  These are concepts that are widely known, but still a bit tough to define.  What exactly is clean code?  What does SOLID code look like?

We’ll I’m not smart enough to provide a definition that’s any better than what’s already out there, but I can point at an example and say “that smells pretty SOLID”.  So, below is an example of some code that I wrote for an ASP.Net MVC application.  CurrentUser is a property that wraps an instance of my WebUser class, which represents, you guessed it, the currently logged in user for my web app. When the CurrentUser property returns a user, there are four possible places that it might have to check to find that user:

  1. There may already be a WebUser instance in the private member _currentUser,
  2. There may be a WebUser instance stored in the Session,
  3. There may be a TrackingCookie in the HTTP Request that can be used to get an existing WebUser,
  4. We may have none of the above, in which case we have to create a new WebUser.

So, with that said, let’s take a look at my first version of this code.

// CurrentUser

private WebUser _currentUser;

public WebUser CurrentUser

{

    get

    {

        // Do we already have the CurrentUser?

        if (_currentUser == null)

        {

            // Try to get the user from Session

            Object obj  = HttpContext.Current.Session["__currentUser"];

            if (obj != null)

            {

                _currentUser = (WebUser)obj;

                return _currentUser;

            }

            // Try to get the user from a TrackingCookie

            SecurityHelper secHelper = new SecurityHelper();

            WebUserRepository rep = new WebUserRepository();

            if (secHelper.TrackingGuid != Guid.Empty)

            {                      

                _currentUser = rep.GetWebUserByTrackingGuid(secHelper.TrackingGuid);

                if (_currentUser != null) return _currentUser;

            }

            // If we still don't have a user then we need to create a new

            // WebUser with a new TrackingGuid.

            WebUserFactory factory = new WebUserFactory();

            _currentUser = factory.CreateWebUser();

        }

        return _currentUser;

    }

}

 

 

Hmmmmmm. Not too horrible, not too long and ungainly, but it definitely has some code smell (the bad kind).  First, I’m not thrilled about the multiple returns nested in conditional blocks. Also, the code is doing a number of different things that I felt needed comments to explain.  Now I love comments, and I’m a firm believer that when in doubt, comment.  But I’ve also come to realize that it is possible to design code in such a way that it can be just as clear, and just as understandable, without the need for comments. 

So how do we reach this promised land of understandability? The main technique that I’ve been using is simply to take my big blocks of code that do several different things, and break them up into several separate, clearly named methods, and then just call them from my main block. Here’s how I applied this technique to my CurrentUser property.

 

 

// CurrentUser

 private WebUser _currentUser;

 public WebUser CurrentUser

{

   get

   {

     if (_currentUser == null) _currentUser = GetWebUserFromSession();

     if (_currentUser == null) _currentUser = GetWebUserFromTrackingCookie();

     if (_currentUser == null) _currentUser = CreateNewWebUser();

     return _currentUser;

   }

}

 

 

// GetWebUserFromSession

private WebUser GetWebUserFromSession()

{

     Object obj = HttpContext.Current.Session["__currentUser"];

     return obj == null ? null : (WebUser)obj;

}

 

// GetWebUserFromTrackingCookie

private WebUser GetWebUserFromTrackingCookie()

{

     SecurityHelper secHelper = new SecurityHelper();

     WebUserRepository rep = new WebUserRepository();

     if (secHelper.TrackingGuid == Guid.Empty)

         return null;

     else

         return rep.GetWebUserByTrackingGuid(secHelper.TrackingGuid);

}

 

// CreateNewWebUser

private WebUser CreateNewWebUser()

{

     WebUserFactory factory = new WebUserFactory();

     return factory.CreateWebUser();

}

 

Now I realize that in this second version I wrote more code than the first, and I do a couple of extra null checks in my main property code, but look at how easily understandable everything is.  I mean really. There’s not a single comment, but can you imagine anybody not understanding exactly what’s going on in this block?

   get

   {

     if (_currentUser == null) _currentUser = GetWebUserFromSession();

     if (_currentUser == null)  _currentUser = GetWebUserFromTrackingCookie();

     if (_currentUser == null)  _currentUser = CreateNewWebUser();

     return _currentUser;

   }

Plus, because I factored out methods like GetWebUserFromSession() and GetWebUserFromTrackingCookie() I can now use those methods in other parts of my class without having to rewrite the functionality.  So overall, I think this version smells much more SOLID.  What do you think?  If anyone has ideas or favorite techniques for how to get more SOLID, please leave a comment.