This post explains why Information Hiding isn't entirely the same thing as Encapsulation.

Despite being an old concept, Encapsulation is one of the most misunderstood principles of object-oriented programming. Particularly for C# and Visual Basic.NET developers, this concept is confusing because those languages have properties. Other languages, including F#, have properties too, but it would be far from me to suggest that F# programmers are confused :)

(As an aside, even in languages without formal properties, such as e.g. Java, you often see property-like constructs. In Java, for example, a pair of a getter and a setter method is sometimes informally referred to as a property. In the end, that's also how .NET properties are implemented. Thus, the present article can also be applied to Java and other 'property-less' languages.)

In my experience, the two most important aspects of encapsulation are Protection of Invariants and Information Hiding. Earlier, I have had much to say about Protection of Invariants, including the invariants of properties. In this post, I will instead focus on Information Hiding.

With all that confusion (which I will get back to), you would think that Information Hiding is really hard to grasp. It's not - it's really simple, but I think that the name erects an effective psychological barrier to understanding. If instead we were to call it Implementation Hiding, I think most people would immediately grasp what it's all about.

However, since Information Hiding has this misleading name, it becomes really difficult to understand what it means. Does it mean that all information in an object should be hidden from clients? How can we reconcile such a viewpoint with the fundamental concept that object-orientation is about data and behavior? Some people take the misunderstanding so far that they begin to evangelize against properties as a design principle. Granted, too heavy a reliance on properties leads to violations of the Law of Demeter as well as Feature Envy, but without properties, how can a client ever know the state of a system?

Direct field access isn't the solution, as this discloses data to an even larger degree than properties. Still, in the lack of better guidance, the question of Encapsulation often degenerates to the choice between fields and properties. Perhaps the most widely known and accepted .NET design guideline is that data should be exposed via properties. This again leads to the redundant 'Automatic Property' language feature.

Tell me again: how is this

public string Name { get; set; }

better than this?

public string Name;

The Design Guidelines for Developing Class Libraries isn't wrong. It's just important to understand why properties are preferred over fields, and it has only partially to do with Encapsulation. The real purpose of this guideline is to enable versioning of types. In other words: it's about backwards compatibility.

An example demonstrating how properties enable code to evolve while maintaining backwards compatibility is in order. This example also demonstrates Implementation Hiding.

Example: a tennis game

The Tennis kata is one of my favorite TDD katas. Previously, I posted my own, very specific take on it, but I've also, when teaching, asked groups to do it as an exercise. Often participants arrive at a solution not too far removed from this example:

public class Game
{
    private int player1Score;
    private int player2Score;
 
    public void PointTo(Player player)
    {
        if (player == Player.One)
            if (this.player1Score >= 30)
                this.player1Score += 10;
            else
                this.player1Score += 15;
        else
            if (this.player2Score >= 30)
                this.player2Score += 10;
            else
                this.player2Score += 15;
    }
 
    public string Score
    {
        get
        {
            if (this.player1Score == this.player2Score &&
                this.player1Score >= 40)
                return "Deuce";
            if (this.player1Score > 40 &&
                this.player1Score == this.player2Score + 10)
                return "AdvantagePlayerOne";
            if (this.player2Score > 40 &&
                this.player2Score == this.player1Score + 10)
                return "AdvantagePlayerTwo";
            if (this.player1Score > 40 &&
                this.player1Score >= this.player2Score + 20)
                return "GamePlayerOne";
            if (this.player2Score > 40)
                return "GamePlayerTwo";
            var score1Word = ToWord(this.player1Score);
            var score2Word = ToWord(this.player2Score);
            if (score1Word == score2Word)
                return score1Word + "All";
            return score1Word + score2Word;
        }
    }
 
    private string ToWord(int score)
    {
        switch (score)
        {
            case 0:
                return "Love";
            case 15:
                return "Fifteen";
            case 30:
                return "Thirty";
            case 40:
                return "Forty";
            default:
                throw new ArgumentException(
                    "Unexpected score value.",
                    "score");
        }
    }
}

Granted: there's a lot more going on here than just a single property, but I wanted to provide an example of enough complexity to demonstrate why Information Hiding is an important design principle. First of all, this Game class tips its hat to that other principle of Encapsulation by protecting its invariants. Notice that while the Score property is public, it's read-only. It wouldn't make much sense if the Game class allowed an external caller to assign a value to the property.

When it comes to Information Hiding the Game class hides the implementation details by exposing a single Score property, but internally storing the state as two integers. It doesn't hide the state of the game, which is readily available via the Score property.

The benefit of hiding the data and exposing the state as a property is that it enables you to vary the implementation independently from the public API. As a simple example, you may realize that you don't really need to keep the score in terms of 0, 15, 30, 40, etc. Actually, the scoring rules of a tennis game are very simple: in order to win, a player must win at least four points with at least two more points than the opponent. Once you realize this, you may decide to change the implementation to this:

public class Game
{
    private int player1Score;
    private int player2Score;
 
    public void PointTo(Player player)
    {
        if (player == Player.One)
            this.player1Score += 1;
        else
            this.player2Score += 1;
    }
 
    public string Score
    {
        get
        {
            if (this.player1Score == this.player2Score &&
                this.player1Score >= 3)
                return "Deuce";
            if (this.player1Score > 3 &&
                this.player1Score == this.player2Score + 1)
                return "AdvantagePlayerOne";
            if (this.player2Score > 3 &&
                this.player2Score == this.player1Score + 1)
                return "AdvantagePlayerTwo";
            if (this.player1Score > 3 &&
                this.player1Score >= this.player2Score + 2)
                return "GamePlayerOne";
            if (this.player2Score > 3)
                return "GamePlayerTwo";
            var score1Word = ToWord(this.player1Score);
            var score2Word = ToWord(this.player2Score);
            if (score1Word == score2Word)
                return score1Word + "All";
            return score1Word + score2Word;
        }
    }
 
    private string ToWord(int score)
    {
        switch (score)
        {
            case 0:
                return "Love";
            case 1:
                return "Fifteen";
            case 2:
                return "Thirty";
            case 3:
                return "Forty";
            default:
                throw new ArgumentException(
                    "Unexpected score value.",
                    "score");
        }
    }
}

This is a true Refactoring because it modifies (actually simplifies) the internal implementation without changing the external API one bit. Good thing those integer fields were never publicly exposed.

The tennis game scoring system is actually a finite state machine and from Design Patterns we know that this can be effectively implemented using the State pattern. Thus, a further refactoring could be to implement the Game class with a set of private or internal state classes. However, I will leave this as an exercise to the reader :)

The progress towards the final score often seems to be an important aspect in sports reporting, so another possible future development of the Game class might be to enable it to not only report on its current state (as the Score property does), but also report on how it arrived at that state. This might prompt you to store the sequence of points as they were scored, and calculate past and current state based on the history of points. This would cause you to change the internal storage from two integers to a sequence of points scored.

In both the above refactoring cases, you would be able to make the desired changes to the Game class because the implementation details are hidden.

Conclusion

Information Hiding isn't Data Hiding. Implementation Hiding would be a much better word. Properties expose data about the object while hiding the implementation details. This enables you to vary the implementation and the public API independently. Thus, the important benefit from Information Hiding is that it enables you to evolve your API in a backwards compatible fashion. It doesn't mean that you shouldn't expose any data at all, but since properties are just as much members of a class' public API as its methods, you must exercise the same care when designing properties as when designing methods.

Update (2012.12.02): A reader correctly pointed out to me that there was a bug in the Game class. This bug has now been corrected.


Comments

I have read through several of your posts on encapsulation and information hiding and have also noticed throughout several discussions that information hiding is an often misunderstood/underestimated concept. Personally I think a lot of problems arise from ambiguous definitions, rather than the name "information hiding". A particularly enlightening source for me was browsing through all the different definitions which have been formulated over the years, as compiled by Edward V. Berard.

http://www.itmweb.com/essay550.htm

In his conclusion he makes the following uncommon distinction: "Abstraction, information hiding, and encapsulation are very different, but highly-related, concepts. One could argue that abstraction is a technique that helps us identify which specific information should be visible, and which information should be hidden. Encapsulation is then the technique for packaging the information in such a way as to hide what should be hidden, and make visible what is intended to be visible."

Within a theoretical discussion, I agree with his conclusion that "a stronger argument can be made for keeping the concepts, and thus the terms, distinct". A key thing to keep in mind here according to this definition encapsulating something doesn't necessarily mean it is hidden.

When you refer to "encapsulation" you seem to refer to e.g. Rumbaugh's definition: "Encapsulation (also information hiding) consists of separating the external aspects of an object which are accessible to other objects, from the internal implementation details of the object, which are hidden from other objects." who ironically doesn't seem to make a distinction between information hiding and encapsulation, highlighting the problem even more. This also corresponds to the first listed definition on the wikipedia article on encapsulation you link to, but not the second.

1. A language mechanism for restricting access to some of the object's components.

2. A language construct that facilitates the bundling of data with the methods (or other functions) operating on that data.

When hiding information we almost always (always?) encapsulate something, which is probably why many see them as the same thing.

In a practical setting where we are just quickly trying to convey ideas things get even more problematic. When I refer to "encapsulation" I usually refer to information hiding, because that is generally my intention when encapsulating something. Interpreting it the other way around where encapsulation is seen solely as an OOP principle where state is placed within the context of a class with restricted access is more problematic when no distinction is made with information hiding. The many other ways in which information can be hidden are neglected.

Keeping this warning in mind on the ambiguous definitions of encapsulation and information hiding, I wonder how you feel about my blog posts where I discuss information hiding beyond the scope of the class. To me this is a useful thing to do, for all the same reasons as information hiding in general is a good thing to do.

In "Improved encapsulation using lambdas" I discuss how a piece of reusable code can be encapsulated within the scope of one function by using lambdas.

http://whathecode.wordpress.com/2011/06/05/improved-encapsulation-using-lambdas/

In "Beyond private accessibility" I discuss how variables used only within one function could potentially be hidden from other functions in a class.

http://whathecode.wordpress.com/2011/06/13/beyond-private-accessibility/

I'm glad I subscribed to this blog, you seem to talk about a lot of topics I'm truly interested in. :)
2012-11-27 21:56 UTC
And speaking about encapsulation, I think that the score comparing logic should be ENCAPSULATED into its own PlayerScore class, something like [ https://gist.github.com/4159602 ](no link as the blog errors if i'm using html tags). So you can change anytime how the score is calculated and how many score points a tennis point really is. Also, it's testable and the Game object just manage where the points go, instead of also implementing scoring rules.
2012-11-28 07:33 UTC
Steven, thank you for your thoughtful comment. There are definitely many ways to think about encapsulation, and I don't think I have the only true definition. Basically I'm just trying to make sense of a lot of things that I know, from experience and intuition, are correct.

The main purpose of this post is to drive a big stake through the notion that properties = Encapsulation.

The lambda trick you describe is an additional way to scope a method - another is to realize that interfaces can also be used for scoping.
2012-11-28 07:46 UTC
"The main purpose of this post is to drive a big stake through the notion that properties = Encapsulation."

While I agree 100% with the intent of that statement, I just wanted to warn you that according to some definitions, a property encapsulates a backing field, a getter, and a setter. Whether or not that getter and setter hide anything or are protected, is assumed by some to be an entirely different concept, namely information hiding.

However using the 'mainstream' OOP definition of encapsulation you are absolutely right that a property with a public getter and setter (without any implementation logic inside) doesn't encapsulate anything as no access restrictions are imposed.

And as per my cause, neither does moving something to the scope of the class with access restrictions provide the best possible information hiding in OOP. You are only hiding complexity from fellow programmers who will consume the class, but not for those who have to maintain it (or extend from it in the case of protected members).
2012-11-28 09:39 UTC
@Mike

You are probably right his example can be further encapsulated, but if you are taking the effort of encapsulating something I would at least try to make it reusable instead of making a specific 'PlayerScore' class.

A score is basically a range of values which is a construct which is missing from C# and Java, but e.g. Ruby has. From experience I can tell you implementing it yourself is truly useful as you start seeing opportunities to use it everywhere: https://github.com/Whathecode/Framework-Class-Library-Extension/blob/master/Whathecode.System/Arithmetic/Range/Interval.cs

Probably I would create a generic or abstract 'Score' class which is composed of a range and possibly some additional logic (Reset(), events, ...)

As to moving the logic of the game (the score comparison) to the 'PlayerScore' class, I don't think I entirely agree. This is the concern of the 'Game' class and not the 'Score'. When separating these concerns one could use the 'Score' class in different types of games as well.
2012-11-28 10:00 UTC
@Steven Once in a while I use delegates, mainly Funcs to encapsulate utility functions inside private functions to get that additional encapsulation-level.

However I feel that it's just not a very good fit in C# compared to functional languages like F# or Clojure where functions are truely first class and defining a function inside another function is fully supported with the same syntax as regular functions. What I really would like if C# simply would let you define functions inside functions with the usual syntax.

Regarding performance I don't know how C# does it but I imagine it's similar to the F# inner function performance hit is negligible: http://stackoverflow.com/questions/7920234/what-are-the-performance-side-effects-of-defining-functions-inside-a-recursive-f

Maybe it's better to encapsulate the private func as a public func in a nested private (static?) class and then have the inner function as a private function of that class.
2012-11-29 05:59 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or Google Plus, or somewhere else with a permalink. Ping me with the link, and I may add it as a comment.

Published

Tuesday, 27 November 2012 13:53:59 UTC

Tags