Friday, July 3, 2009

SOLID C# Code: Smaller Methods == Clean Code?

I’m a big fan of Robert Martin.  His book on Agile Patterns in C# is still one of the three most important programming books I’ve ever read.  If you listen to Uncle Bob for any amount of time, it won’t be long before you start hearing terms like SOLID Principles and Clean Code.  These are concepts that are widely known, but still a bit tough to define.  What exactly is clean code?  What does SOLID code look like?

We’ll I’m not smart enough to provide a definition that’s any better than what’s already out there, but I can point at an example and say “that smells pretty SOLID”.  So, below is an example of some code that I wrote for an ASP.Net MVC application.  CurrentUser is a property that wraps an instance of my WebUser class, which represents, you guessed it, the currently logged in user for my web app. When the CurrentUser property returns a user, there are four possible places that it might have to check to find that user:

  1. There may already be a WebUser instance in the private member _currentUser,
  2. There may be a WebUser instance stored in the Session,
  3. There may be a TrackingCookie in the HTTP Request that can be used to get an existing WebUser,
  4. We may have none of the above, in which case we have to create a new WebUser.

So, with that said, let’s take a look at my first version of this code.

// CurrentUser

private WebUser _currentUser;

public WebUser CurrentUser

{

    get

    {

        // Do we already have the CurrentUser?

        if (_currentUser == null)

        {

            // Try to get the user from Session

            Object obj  = HttpContext.Current.Session["__currentUser"];

            if (obj != null)

            {

                _currentUser = (WebUser)obj;

                return _currentUser;

            }

            // Try to get the user from a TrackingCookie

            SecurityHelper secHelper = new SecurityHelper();

            WebUserRepository rep = new WebUserRepository();

            if (secHelper.TrackingGuid != Guid.Empty)

            {                      

                _currentUser = rep.GetWebUserByTrackingGuid(secHelper.TrackingGuid);

                if (_currentUser != null) return _currentUser;

            }

            // If we still don't have a user then we need to create a new

            // WebUser with a new TrackingGuid.

            WebUserFactory factory = new WebUserFactory();

            _currentUser = factory.CreateWebUser();

        }

        return _currentUser;

    }

}

 

 

Hmmmmmm. Not too horrible, not too long and ungainly, but it definitely has some code smell (the bad kind).  First, I’m not thrilled about the multiple returns nested in conditional blocks. Also, the code is doing a number of different things that I felt needed comments to explain.  Now I love comments, and I’m a firm believer that when in doubt, comment.  But I’ve also come to realize that it is possible to design code in such a way that it can be just as clear, and just as understandable, without the need for comments. 

So how do we reach this promised land of understandability? The main technique that I’ve been using is simply to take my big blocks of code that do several different things, and break them up into several separate, clearly named methods, and then just call them from my main block. Here’s how I applied this technique to my CurrentUser property.

 

 

// CurrentUser

 private WebUser _currentUser;

 public WebUser CurrentUser

{

   get

   {

     if (_currentUser == null) _currentUser = GetWebUserFromSession();

     if (_currentUser == null) _currentUser = GetWebUserFromTrackingCookie();

     if (_currentUser == null) _currentUser = CreateNewWebUser();

     return _currentUser;

   }

}

 

 

// GetWebUserFromSession

private WebUser GetWebUserFromSession()

{

     Object obj = HttpContext.Current.Session["__currentUser"];

     return obj == null ? null : (WebUser)obj;

}

 

// GetWebUserFromTrackingCookie

private WebUser GetWebUserFromTrackingCookie()

{

     SecurityHelper secHelper = new SecurityHelper();

     WebUserRepository rep = new WebUserRepository();

     if (secHelper.TrackingGuid == Guid.Empty)

         return null;

     else

         return rep.GetWebUserByTrackingGuid(secHelper.TrackingGuid);

}

 

// CreateNewWebUser

private WebUser CreateNewWebUser()

{

     WebUserFactory factory = new WebUserFactory();

     return factory.CreateWebUser();

}

 

Now I realize that in this second version I wrote more code than the first, and I do a couple of extra null checks in my main property code, but look at how easily understandable everything is.  I mean really. There’s not a single comment, but can you imagine anybody not understanding exactly what’s going on in this block?

   get

   {

     if (_currentUser == null) _currentUser = GetWebUserFromSession();

     if (_currentUser == null)  _currentUser = GetWebUserFromTrackingCookie();

     if (_currentUser == null)  _currentUser = CreateNewWebUser();

     return _currentUser;

   }

Plus, because I factored out methods like GetWebUserFromSession() and GetWebUserFromTrackingCookie() I can now use those methods in other parts of my class without having to rewrite the functionality.  So overall, I think this version smells much more SOLID.  What do you think?  If anyone has ideas or favorite techniques for how to get more SOLID, please leave a comment.

22 comments:

  1. The version without comments is much easier to follow.

    Might be able to clean it a little more architecturaly. You are basically doing this:

    if(repository.Contains(userid)) return repository[userid];
    return repository.CreateUser();

    Either get the user from some type of stored location or create a new one.

    ReplyDelete
  2. Hi Rudy,

    This is an interesting example. I agree the principle that it is much better that code is readable by itself as comments are often misleading (particularly after bugs have been addressed!). For example, in a recent project I was working with a multilingual team and some developers kept writing the comments in French (against the agreed policy).

    However, your refactored solution has two problems as I see it:

    1) You now need to understand more code that is distributed within your class rather than just following an execution thread in one place. This particular code is trivial, but I've seen examples where the code is so decentralised that it actually makes it harder to understand and to find and address bugs.

    2) I think you could improve your central property getter [sic] by inverting the logic. Your current implementation requires an understanding of the output of each called function. This could be changed as follows:

    get

    {
    bool hasUser = (_currentUser != null);
    if (! hasUser) GetWebUserFromSession(out _currentUser, out hasUser);
    if (! hasUser) GetWebUserFromTrackingCookie(out _currentUser, out hasUser);
    if (! hasUser) _currentUser = CreateNewWebUser();

    return _currentUser;

    }

    The advantage is that the changes in state are clearly identified and are also clear in the called function. I think using two "out" parameters is better than one "out" and one "return" in this case.

    Thoughts?

    ReplyDelete
  3. I mildly disagree with adding hasUser as described. More moving parts = more bugs. In general, use of out parameters should be sparse in favor of letting inputs be the parameters and outputs be the return values.

    Note in C# 3.0, there is a more compact way of saying the same thing:

    _currentUser = _currentUser ?? GetWebUserFromSession() ?? GetWebUserFromTrackingCookie() ?? CreateNewWebUser();

    ReplyDelete
  4. I think it's a hard to try to make all encompassing rules for this sort of thing. But I think you have a good example.

    I don't like the hasUser solution (no offence Steve) and I was about to propose

    if (_currentUser == null) {
    if (GetWebUserFromSession(out _currentUser) == false) {
    if (GetWebUserFromTrackingCookie(out _currentUser) == false) {
    CreateNewWebUser();
    }
    }
    }

    Where each function returns a bool rather than use a 2nd out param.

    But I really like Brian's solution. Especially if you spread it across multiple lines:

    _currentUser = _currentUser
    ?? GetWebUserFromSession()
    ?? GetWebUserFromTrackingCookie()
    ?? CreateNewWebUser();

    ReplyDelete
  5. There is two or three things that I dont like abt the code.
    1. When I call a get accessor I will not except any side effect. And in this case the _current user is getting set by the get accessor. Instead of using a get Accessor a function with meaning full name would have been better. If we still have to make it a get accessor then get should be only two or three lines of code with a function call that Loads the _currentyuser.

    2. The way it is coded there will be a lot of reading and writing into session and cookie and similar stuff. SO I will encapsulate all that access into a seperate class say StateInfo and create methods and accesors to it.

    Last onething I like abt you code is that there is not a public get and set for each private variable which I usally see in lot of programs.

    I would have rewritten your code as follows

    private WebUser _currentUser;
    public WebUser CurrentUser
    {
    get
    {
    if (_currentUser == null)
    LoadCurrentUser(); //_currentUser Variable is set in this method
    return _currentUser;
    }
    }

    private void LoadCurrentUser()
    {
    Object objUser = Session["__currentUser"];
    if (objUser != null)
    {
    _currentUser = (WebUser)objUser;
    }
    else
    {
    Guid TrackingGuid = new SecurityHelper().TrackingGuid;
    if (TrackingGuid != Guid.Empty)
    _currentUser = new WebUserRepository().GetWebUserByTrackingGuid(TrackingGuid);
    }

    if (_currentUser == null) // Not in Cookie nor in Session
    _currentUser = new WebUserFactory().CreateWebUser();
    }

    ReplyDelete
  6. I also think that Brian's "??" solution is the most readable and elegant.

    "??" is a bit obscure - I had to look it up. My "Programming C# 3.0" book had no mention, and Google's top results came up blank too.

    But it's in the VS2008 help, related to Nullable Types.

    Good tip - thanks!

    -Tom Bushell

    ReplyDelete
  7. Good comments. I especially like Russ's and Brian's approaches. Brian I didn't even know that you could chain the null coalescing operator like that. I'm definitely adding that to my toolbox.

    BTW I didn't mention this in the post because I thought it might be too much info, but this CurrentUser property is part of my StateBox class which is probably the same approach that sib suggested. I have a PageBase class that contains an RAStateBox property and the StateBox encapsulates all info related to application state, logged in user, etc.

    ReplyDelete
  8. I have no complaints with the ?? operator. Indeed, it is an elegant solution. However, the code is less readable. As has been pointed out, several developers aren't aware of what it does and what side effects it may have. That doesn't mean that they shouldn't know it of course.

    The compiler turns them all into the same code anyway.

    The reason I introduced the temporary variable was for readability. Giving the variable a name and using separate statements had the following implications:

    1) There were known changes by each statement, both on the out parameters and no side effects
    2) It was clear what the ouput of the method represented, rather than an imprecise interpretation

    As always, this is a question of what is easier for one person to read. My main point was to indicate that the dependent methods were not clearly identifying the changes that they introduced and as such meant that the dependent methods also needed to be understood.

    I also agree with the comments that this Get accessor is perhaps the wrong place for this code. Personally, I don't think that the authentication/session startup should take place on access through a get accessor. Instead, this should be part of a standard pipeline so that the test always occurs at the same place within a request lifecycle. The get accessor just isn't really the best place for that code (in my view).

    As always, just thoughts.

    PS. Sorry you didn't like the 'hasUser' - I actually think that introducing variables is the essence of readable code, not the reverse. It is possible to remove most variables from routines if you try hard enough (byte[] for example), but it isn't generally considered readable to do so :P

    ReplyDelete
  9. On one hand you could say that is a perfect "chain of responsibility" pattern, and as menioned in the first comment, do a simple IUserRepo.GetUser() and then chain your various methods there.

    However, I also see people adding way too much infrastructure just to show a pattern or method when in fact you really will only be doing three calls pretty much "no matter what" so what add all of the complicated overhead.

    In that case, factoring out the method is fine, but why not just use something simple as a #region directive and some nice comments to indicate "first check the session" and "then check the cookie" ... nice, simple, and SOLID at the presentation level instead of the backend. :)

    ReplyDelete
  10. I way prefer well named methods over comments.

    Over the years I've read so many misleading or meaningless comments I've gained a subconscious filter for green text ;)

    I reckon most devs read the code first and only resort to comments when they are really stuck.

    ReplyDelete
  11. I agree that good well done code is important (SOLID). But to me that does not mean don't do any comments. I come back to code months later or have a new employee and the comments help. Yes they can be out of date just as much as anything else. But that does not mean lets not do any commenting.

    ReplyDelete
  12. I think of all the alternatives proposed, Rudy's original is still the cleanest by an exceptional margin.

    The '??' solution seems clever for no reason. Sure, I expect it's as easy to figure out as it looks, but it requires I go look up those semantics to be sure. It shortens the code only if you measure bytes used in source.

    Adding an unnecessary flag is...well... adding an unnecessary flag. Plus, with that solution you're assigning/creating the member someplace and testing for it's successful operation through use of another level of indirection someplace else. Eh, more moving parts than the original.

    I'm not sure why 'not liking side effects in the accessor' but liking them in something called by the accessor makes sense. The addition of a stack frame doesn't change it in any appreciable way.

    ReplyDelete
  13. I really like the coalesce option noted here, especially when done out over several lines. I'm really not a fan of nesting code unnecessarily (I've got a blog post of my own about it), so the nested if structure smells really bad to me.

    The use of #region directives can be helpful when writing lots of code, but I always stop myself and first ask if whatever is going inside that region should be put into another method. Smaller methods make TDD easier :)

    Been listening to Uncle Bob and following him on twitter for a while, too. Smart guy, even if I disagree with his politics, I learn quite a bit!

    ReplyDelete
  14. One way to make the code cleaner may be to have the helper methods return a boolean value and update the _currentUser.

    private WebUser _currentUser;

    public WebUser CurrentUser
    {
    get
    {
    if (_currentUser != null || !WebUserCreatedFromSession() && !WebUserCreatedFromTrackingCookie())
    CreateNewWebUser();

    return _currentUser;
    }
    }

    private bool WebUserCreatedFromSession()
    {
    _currentUser = (WebUser )HttpContext.Current.Session["__currentUser"];
    return _currentUser != null;
    }

    private bool WebUserCreatedFromTrackingCookie()
    {
    var guid = new SecurityHelper().TrackingGuid;

    if (guid == Guid.Empty)
    return false;

    _currentUser = new WebUserRepository().GetWebUserByTrackingGuid(guid);
    return _currentUser != null;
    }

    private void CreateNewWebUser()
    {
    _currentUser = new WebUserFactory().CreateWebUser();
    }

    ReplyDelete
  15. Just noticed that '&&' in my previous comment should have been a '||'. Shoulda written unit tests :).

    ReplyDelete
  16. Think I like this slightly revised version better:

    private WebUser _currentUser;

    public WebUser CurrentUser
    {
    get
    {
    if (_currentUser == null)
    CreateUserFromSession();

    if (_currentUser == null)
    CreateUserFromTrackingCookie();

    if (_currentUser == null)
    CreateNewWebUser();

    return _currentUser;
    }
    }

    private void CreateUserFromSession()
    {
    _currentUser = (WebUser )HttpContext.Current.Session["__currentUser"];
    }

    private void CreateUserFromTrackingCookie()
    {
    var guid = new SecurityHelper().TrackingGuid;

    if (guid != Guid.Empty)
    _currentUser = new WebUserRepository().GetWebUserByTrackingGuid(guid);
    }

    private void CreateNewWebUser()
    {
    _currentUser = new WebUserFactory().CreateWebUser();
    }

    ReplyDelete
  17. Hi Rudy,

    Very nice and clear example of cleaning up code.
    I actually like your own version more than the others suggested here:

    1. Introducing the hasUser is making things just a bit more complex in my eyes.

    2. Using out-parameters should be a last resort. If You feel the need to return more than one thing from a method, it's probably doing more than one thing ( = SRP-violation).

    3. The null coalesce operator (??) is fine by itself, but chaining more of them together resembles the ? : operator (ternary conditional evaluation), which is somewhat widely agreed upon lowers readability.

    My only suggestion for improving your own code is to make the WebUser class implement IPrincipal and hook it onto the current Thread when authentication occurs. That way you have it handy anywhere, and you don't have to worry about extracting it from storage everytime you need it.

    Just my humble opinion :-)

    Mikkel

    ReplyDelete
  18. Why the comments

    // GetWebUserFromSession
    private WebUser GetWebUserFromSession()

    This strikes me as clutter, with no value added. Otherwise great job!

    ReplyDelete
  19. It just goes to show that we should strive to write self documenting code.

    Simplifying our code makes it more maintainable and extensible.

    Good article!

    ReplyDelete
  20. Good post; even though personally I'm not a fan of single line if statements; the second one was definatley more clear. Keep up the good work!

    ReplyDelete
  21. Good Article !!!

    I have made an attempt to revise the code using one small approach. And I would like to hear your views on the same.

    The problem:
    If a method has a possibility of returning null value, the client of that method has to use a if condition to verify the return value is null or not. Most of the time, the check is not performed for number of unknown reasons. As a result, NullPointerException getting thrown.

    The solution
    I propose to use the method signature as "bool TryGetX(out obj)".

    So the revised code is

    // CurrentUser

    private WebUser _currentUser;

    public WebUser CurrentUser
    {

    get
    {

    if (_currentUser == null &&
    !TryGetWebUserFromSession(out _currentUser) &&
    !TryGetWebUserFromTrackingCookie(out _currentUser))

    _currentUser = CreateNewWebUser();

    return _currentUser;

    }

    }





    // GetWebUserFromSession

    private bool TryGetWebUserFromSession(out WebUser webUser)
    {

    Object obj = HttpContext.Current.Session["__currentUser"];

    webUser = null ? null : (WebUser)obj;

    return (webUser != null);

    }



    // GetWebUserFromTrackingCookie

    private bool GetWebUserFromTrackingCookie(out WebUser webUser)
    {

    SecurityHelper secHelper = new SecurityHelper();

    WebUserRepository rep = new WebUserRepository();

    if (secHelper.TrackingGuid == Guid.Empty)

    webUser = null;

    else

    webUser = rep.GetWebUserByTrackingGuid(secHelper.TrackingGuid);


    return (webUser != null);

    }



    // CreateNewWebUser

    private WebUser CreateNewWebUser()
    {

    WebUserFactory factory = new WebUserFactory();

    return factory.CreateWebUser();

    }

    ReplyDelete
  22. Not a bad idea, but you could simply GetWebUserFromSession even further

    private WebUser GetWebUserFromSession()
    {
    return HttpContext.Current.Session["__currentUser"] as WebUser;
    }

    Also avoids a potential NullReferenceException.

    And then he notices the date of the post... oops..

    ReplyDelete