SOLID: the next step is Functional

Monday, 10 March 2014 08:33:00 UTC

If you take the SOLID principles to their extremes, you arrive at something that makes Functional Programming look quite attractive.

You may have seen this one before, but bear with me :)

The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.

- Anton van Straaten

While this is a lovely parable, it's not a new observation that objects and closures seem closely related, and there has been much discussion back and forth about this already. Still, in light of a recent question and answer about how to move from Object-Oriented Composition to Functional Composition, I'd still like to explain how, in my experience, the SOLID principles lead to a style of design that makes Functional Programming quite attractive.

A SOLID road map #

In a previous article, I've described how application of the Single Responsibility Principle (SRP) leads to many small classes. Furthermore, if you rigorously apply the Interface Segregation Principle (ISP), you'll understand that you should favour Role Interfaces over Header Interfaces.

If you keep driving your design towards smaller and smaller interfaces, you'll eventually arrive at the ultimate Role Interface: an interface with a single method. This happens to me a lot. Here's an example:

public interface IMessageQuery
{
    string Read(int id);
}

If you apply the SRP and ISP like that, you're likely to evolve a code base with many fine-grained classes that each have a single method. That has happened to me more than once; AutoFixture is an example of a big and complex code base that looks like that, but my other publicly available code bases tend to have the same tendency. In general, this works well; the most consistent problem is that it tends to be a bit verbose.

Objects as data with behaviour #

One way to characterise objects is that they are data with behaviour; that's a good description. In practice, when you have many fine-grained classes with a single method, you may have classes like this:

public class FileStore : IMessageQuery
{
    private readonly DirectoryInfo workingDirectory;

    public FileStore(DirectoryInfo workingDirectory)
    {
        this.workingDirectory = workingDirectory;
    }

    public string Read(int id)
    {
        var path = Path.Combine(
            this.workingDirectory.FullName,
            id + ".txt");            
        return File.ReadAllText(path);
    }
}

This FileStore class is a simple example of data with behaviour.

  • The behaviour is the Read method, which figures out a file path for a given ID and returns the contents of the file.
  • The data (also sometimes known as the state) is the workingDirectory field.
In this example, the data is immutable and passed in via the constructor, but it could also have been a public, writeable property, or even a public field.

The workingDirectory field is a Concrete Dependency, but it could also have been a primitive value or an interface or abstract base class. In the last case, we would often call the pattern Constructor Injection.

Obviously, the data could be multiple values, instead of a single value.

The FileStore example class implements the IMessageQuery interface, so it's a very representative example of what happens when you take the SRP and ISP to their logical conclusions. It's a fine class, although a little verbose.

When designing like this, not only do you have to come up with a name for the interface itself, but also for the method, and for each concrete class you create to implement the interface. Naming is difficult, and in such cases, you have to name the same concept twice or more. This often leads to interfaces named IFooer with a method called Foo, IBarer with a method called Bar, etc. You get the picture. This is a smell (that also seems vaguely reminiscent of the Reused Abstractions Principle). There must be a better way.

Hold that thought.

Functions as pure behaviour #

As the introductory parable suggests, perhaps Functional Programming offers an alternative. Before you learn about Closures, though, you'll need to understand Functions. In Functional Programming, a Function is often defined as a Pure Function - that is: a deterministic operation without side-effects.

Since C# has some Functional language support, I'll first show you the FileStore.Read method as a Pure Function in C#:

Func<DirectoryInfointstring> read = (workingDirectory, id) =>
    {
        var path = Path.Combine(workingDirectory.FullName, id + ".txt");
        return File.ReadAllText(path);
    };

This Function does the same as the FileStore.Read method, but it has no data. You must pass in the working directory as a function argument just like the ID. This doesn't seem equivalent to an object.

Closures as behaviour with data #

A Closure is an important concept in Functional Programming. In C# it looks like this:

var workingDirectory = new DirectoryInfo(Environment.CurrentDirectory);
Func<intstring> read = id =>
    {
        var path = Path.Combine(workingDirectory.FullName, id + ".txt");
        return File.ReadAllText(path);
    };

This is called a Closure because the Function closes over the Outer Variable workingDirectory. Effectively, the function captures the value of the Outer Variable.

What does that compile to?

Obviously, the above C# code compiles to IL, but if you reverse-engineer the IL back to C#, this is what it looks like:

[CompilerGenerated]
private sealed class <>c__DisplayClass3
{
    public DirectoryInfo workingDirectory;

    public string <UseClosure>b__2(int id)
    {
        return File.ReadAllText(
            Path.Combine(this.workingDirectory.FullName, id + ".txt"));
    }
}

It's a class with a field and a method! Granted, the names look somewhat strange, and the field is a public, mutable field, but it's essentially identical to the FileStore class!

Closures are behaviour with data, whereas objects are data with behaviour. Hopefully, the opening parable makes sense to you now. This is an example of one of Erik Meijer's favourite design concepts called duality.

Partial Function Application #

Another way to close over data is called Partial Function Application, but the result is more or less the same. Given the original pure function:

Func<DirectoryInfointstring> read = (workingDirectory, id) =>
    {
        var path = Path.Combine(workingDirectory.FullName, id + ".txt");
        return File.ReadAllText(path);
    };

you can create a new Function from the first Function by only invoking it with some of the arguments:

var wd = new DirectoryInfo(Environment.CurrentDirectory);
Func<intstring> r = id => read(wd, id);

The r function also closes over the wd variable, and the compiled IL is very similar to before.

Just use F#, then! #

If SOLID leads you to many fine-grained classes with a single method, C# starts to be in the way. A class like the above FileStore class is proper Object-Oriented Code, but is quite verbose; the Closures and Partially Applied Functions compile, but are hardly idiomatic C# code.

On the other hand, in F#, the above Closure is simply written as:

let workingDirectory = DirectoryInfo(Environment.CurrentDirectory)
let read id = 
    let path = Path.Combine(workingDirectory.FullName, id.ToString() + ".txt")
    File.ReadAllText(path)

The read value is a Function with the signature 'a -> string, which means that it takes a value of the generic type 'a (in C#, it would typically have been named T) and returns a string. This is just a more general version of the IMessageQuery.Read method. When 'a is int, it's the same signature, but in F#, I only had to bother naming the Function itself. Functions are anonymous interfaces, so these are also equivalent.

Likewise, if you have a Pure Function like this:

let read (workingDirectory : DirectoryInfo) id =
    let path = Path.Combine(workingDirectory.FullName, id.ToString() + ".txt")
    File.ReadAllText(path)

the Partially Applied Function is written like this:

let wd = DirectoryInfo(Environment.CurrentDirectory)
let r = read wd

The r Function is another Function that takes an ID as input, and returns a string, but notice how much less ceremony is involved.

Summary #

SOLID, particularly the SRP and ISP, leads you towards code bases with many fine-grained classes with a single method. Such objects represent data with behaviour, but can also be modelled as behaviour with data: Closures. When that happens repeatedly, it's time to make the switch to a Functional Programming Language like F#.


Comments

Looks like we had similar thoughts at the same time - mine are here
It's surprising to me that we've not moved more to the functional paradigm as an industry, when so many pieces of evidence point to it working more effectively than OO.
It feels like people can't seem to break away from those curly braces, which is perhaps why Scala is doing so well on the JVM.
2014-03-10 11:51 UTC
I like where you're going with this post, but I just can't get my head round how you would consume the closure you've written. Most of the time you would consume the IMessageQuery by taking it in your class's constructor and letting your DI framework new it up for you:
public class MyService
{
	...

	public MyService(IMessageQuery messageQuery)
	{...}
}
How would you do this with a closure? Your function no longer has a type that we can use (it's just int -> string). Surely your service doesn't look like this?
type MyService (messageQuery: int -> string) = ...
How would you register the types for injection in this example?
2014-03-11 10:05 UTC
Great explanation and justification! I believe the story goes further beyond SOLID into many other patterns. I wrote a post about OOP patterns from Functional Perspective.
2014-03-11 13:18 UTC

Richard, thank you for writing. You ask "Surely your service doesn't look like this? type MyService (messageQuery: int -> string) = ..."

Probably not. Why even have a class? A client consuming the closure would just take it as a function argument:

let myClient f =
    let message = f 42
    // Do something else interesting...
    // Return a result...

Here, f is a function with the int -> string signature, and myClient is another function. Just as you can keep on composing classes using the Composite, Decorator, and Adapter patterns, you can keep on composing functions with other functions by taking functions as function arguments.

At the top level of your application, you may have to implement a class to fit into a framework. For an example of integrating with the ASP.NET Web API, see my A Functional Architecture with F# Pluralsight course.

When it comes to integrating with a DI Container, I tend to not care about that these days. I prefer composing the application with Poor Man's DI, and that works beautifully with F#.

2014-03-11 16:57 UTC

Excellent post!

Under "Partial Function Application", you state "Given the original pure function" - the file I/O would appear to make that impure. Similarly under "Just use F#, then!" with "Likewise, if you have a Pure Function like this".

2014-03-12 17:40 UTC

Bill, you are correct! I may have gotten a little carried away at that point. The method is side-effect-free, and deterministic (unless someone comes by and changes the file), but it does depend on state on disk. Thank you for pointing that out; I stand corrected. Hopefully, that mistake of mine doesn't detract from the overall message.

2014-03-12 19:26 UTC
Leif Battermann #

Hey Mark, obviously switching to F# is not always that easy. I currently have a very similar situation like the one you describe in this post. I refactored the code to using partial application and a functional programming style with C# which works fine. You are saying that the two approaches are actually more or less the same thing which I can see. I am wondering now what the benefit is from refactoring to a functional style with partial application? Does it make sense to do that using C#? The dependencies that I inject are repositories with DB access. So I don't get the true benefits of FP because of the state of the DB. Is it still reasonable to switch to the FP approach? Personally I just like the style and I think it is a littel bit cleaner to have no constructors and private fields. Any thoughts on that? Thanks, Leif.

2014-04-02 11:21 UTC

Leif, thank you for writing. Is there value in adopting a functional style in C#? Yes, I think so, but not (in my opinion) from closures or partial function application. While it's possible to do this in C#, the syntax is awkward compared to F#. It also goes somewhat against the grain of C#.

The main benefit from FP is immutable state, which makes it much easier to reason about the code and the state of the application. Once you understand how to model a problem around immutable data, even C# code becomes much easier to reason about, so I definitely think it makes sense to adopt patterns for working with immutable data in C#.

For years, I've written C# code like that. Not only is it possible, but I strongly prefer it over more 'normal' C# with mutable state. Still, there's a lot of boilerplate code you have to write in C#, such as constructors and read-only property pairs, copy-and-update methods, structural equality, etc. After having done that for a couple of years, I got tired of writing all that boilerplate code, when I get it for free in F#.

Like you, I still have a large body of C# code that I have to maintain, so while I choose F# for most new development, I write 'functional C#' in my C# code bases. Even if there are small pockets of mutable state here and there (like you describe), I still think it makes sense to keep as much as possible immutable.

2014-04-03 17:34 UTC

Using NuGet with autonomous repositories

Monday, 03 February 2014 16:06:00 UTC

NuGet is a great tool if used correctly. Here's one way to do it.

In my recent post about NuGet, I described why the Package Restore feature is insidious. As expected, this provoked some readers, who didn't like my recommendation of adding NuGet packages to source control. That's understandable; the problem with a rant like my previous post is that while it tells you what not to do, it's not particularly constructive. While I told you to store NuGet packages in your source control system, I didn't describe patterns for doing it effectively. My impression was that it's trivial to do this, but based on the reactions I got, I realize that this may not be the case. Could it be that some readers react strongly because they don't know what else to do (than to use NuGet Package Restore)? In this post, I'll describe a way to use and organize NuGet packages that have worked well for me in several organizations.

Publish/Subscribe #

In Grean we use NuGet in a sort of Publish/Subscribe style. This is a style I've also used in other organizations, to great effect. It's easy: create reusable components as autonomous libraries, and publish them as NuGet packages. If you don't feel like sharing your internal building blocks with the rest of the world, you can use a custom, internal package repository, or you can use MyGet (that's what we do in Grean).

A reusable component may be some package you've created for internal use. Something that packages the way you authenticate, log, instrument, render, etc. in your organization.

Every time you have a new version of one of your components (let's call it C1), you publish the NuGet package.

Diagram showing pull and push from repositories.

Just like other Publish/Subscribe systems, the only other party that you rely on at this moment is the queue/bus/broker - in this case the package repository, like NuGet.org or MyGet.org. No other systems need to be available to do this.

You do this for every reusable component you want to publish. Each is independent of other components.

Pull based on need #

In addition to reusable components, you probably also build systems; that is, applications that actually do something. You probably build those systems on top of reusable components - yours, and other publicly available NuGet packages. Let's call one such system S1.

Whenever you need a NuGet package (C1), you add it to the Visual Studio project where you need it, and then you commit your changes to that system's source control. It effectively means checking in the NuGet package, including all the binaries, to source control. However, the S1 repository is not the same repository as the C1 repository. Both are autonomous systems.

The only system you need to be available when you add the NuGet package C1 is the NuGet package source (NuGet.org, MyGet.org, etc.). The only system you need to be available to commit the changes to S1 is your source control system, and if you use a Distributed Version Control System (DVCS), it's always going to be available.

Pretty trivial so far.

"This isn't pub/sub," you'll most likely say. That's right, not in the traditional sense. Still, if you adopt the pattern language of Enterprise Integration Patterns, you can think of yourself (and your colleagues) as a Polling Consumer.

"But," I suppose you'll say, "I'm not polling the repository and pulling down every package ever published."

True, but you could, and if you did, you'd most likely be filtering away most package updates, because they don't apply to your system. That corresponds to applying a Message Filter.

This last part is important, so let me rephrase it:

Just because your system uses a particular NuGet package, it doesn't mean that you have to install every single version ever published.

It seems to me that at least some of the resistance to adding packages to your repository is based on something like that. As Urs Enzler writes:

[Putting packages in source control is] "not an option if your repo grows > 100GB per month due to monthly updates of BIG nuget packages"
While I'm not at all in possession of all the facts regarding Urs Enzler's specific problems, it just got me thinking: do you really need to update your local packages every time a new package is published? You shouldn't have to, I think.

As an example, consider my own open source project AutoFixture, which keeps a fairly high release cadence. It's released according to the principles of Continuous Delivery, so every time there's a new feature or fix, we release a new NuGet package. In 2013, we released 47 versions of the AutoFixture NuGet package, including one major release. That's almost a release every week, but while I use AutoFixture in many other projects, I don't try to keep up with it. I just install AutoFixture when I start a new project, and then I mostly update the package if I need one of the new features or bug fixes. Occasionally, I also update packages in order to not fall too much behind.

As a publicly visible case, consider Hyprlinkr, which uses AutoFixture as one of its dependencies. While going though Hyprlinkr's NuGet packages recently, I discovered that the Hyprlinkr code base was using AutoFixture 2.12.0 - an 18 months old version! I simply hadn't needed to update the package during that time. AutoFixture follows Semantic Versioning, and we go to great lengths to ensure that we don't break existing functionality (unless we do a major release).

Use the NuGet packages you need, commit them to source control, and update them as necessary. For all well-designed packages, you should be able to skip versions without ill effects. This enables you to treat the code bases for each system (S1, S2, etc.) as autonomous systems. Everything you need in order to work with that code base is right there in the source code repository.

Stable Dependency Principle #

What if you need to keep up-to-date with a package that rapidly evolves? From Urs Enzler's tweet, I get the impression that this is the case not only for Urs, but for other people too. Imagine that the creator of such a package frequently publishes new versions, and that you have to keep up to date. If that's the case, it must imply that the package isn't stable, because otherwise, you'd be able to skip updates.

Let me repeat that:

If you depend on a NuGet package, and you have to stay up-to-date, it implies that the package is unstable.

If this is the case, you have an entirely other problem on your hand. It has nothing to do with NuGet Package Restore, or whether you're keeping packages in source control or not. It means that you're violating the Stable Dependencies Principle (SDP). If you feel pain in that situation, that's expected, but the solution isn't Package Restore, but a better dependency hierarchy.

If you can invert the dependency, you can solve the problem. If you can't invert the dependency, you'd probably benefit from an Anti-corruption Layer. There are plenty of better solution that address the root cause of your problems. NuGet Package Restore, on the other hand, is only symptomatic relief.


Comments

Can you elaborate a bit on not breaking existing functionality in newer versions (as long as they have one major version)? What tools are you using to achieve that? I read your post on Semantic Versioning from couple months ago. I manage OSS project and it has quite a big public API - each release I try hard to think of anything I or other contributors might have broken. Are you saying that you relay strictly on programmer deep knowledge of the project when deciding on a new version number? Also, do you build AutoFixture or any other .NET project of yours for Linux/Mono?

2014-02-03 19:00 UTC

For AutoFixture, as well as other OSS projects I maintain, we rely almost exclusively on unit tests, keeping in mind that trustworthy tests are append-only. AutoFixture has some 4000+ unit tests, so if none of those break, I feel confident that a release doesn't contain breaking changes.

For my other OSS projects, the story is the same, although the numbers differ.

These are much smaller projects than AutoFixture, but since they were all built with TDD, they have excellent code coverage.

Currently, I don't build any of these .NET projects for Mono, as I've never had the need.

2014-02-04 8:48 UTC

So you verify behaviour didn't change with a help of automated tests and a good test coverage. What I had in mind is some technique to verify not only the desired behaviour is in place, but also a public API (method signatures, class constructors, set of public types). I should probably clarify that in one of my projects public API is not fully covered by unit-tests. Most critical parts of it are covered, but not all of it. Let's say that upcoming release contains bugfixes as well as new features. I also decided that couple of public API methods are obsolete and deleted them. That makes a breaking change. Let's say I had a lot on my mind and I forgot about the fact that I made those changes. Some time goes by, I'd like to push a new version with all these changes to NuGet, but I'd like to double-check that the public API is still in place compared to the last release. Are there some tools that help with that, may be the ones you use? Or do you rely fully on the tests and your process in that regard? My approach to releases and versioning is a LOT more error prone than yours, clearly, that's the part of my projects that I'd like to improve.

2014-02-05 23:20 UTC

The only technique I rely on apart from automated tests is code reviews. When I write code myself, I always keep in mind if I'm breaking anything. When I receive Pull Requests (PR), I always review them with an eye towards breaking changes. Basically, if a PR changes an existing test, I review it very closely. Obviously, any change that involves renaming of types or members, or that changes public method signatures, are out of the question.

While I'm not aware of any other technique than discipline that will protect against breaking changes, you could always try to check out the tests you have against a previous version, and see if they all pass against the new version. If they don't, you have a breaking change.

You can also make a diff of everything that's happened since your last release, and then meticulously look through all types and members to see if anything was renamed, or method signatures changed. This will also tell you if you have breaking changes.

However, in the end, if you find no breaking changes using these approaches, it's still not a guarantee that you have no breaking changes, because you may have changed the behaviour of some methods. Since you don't have full test coverage, it's hard to tell.

What you could try to do, is to have Pex create a full test suite for your latest released version. This test suite will give you a full snapshot of the behaviour of that release. You could then try to run that test suite on your release candidate to see if anything changed. I haven't tried this myself, and I presume that there's still a fair bit of work involved, but perhaps it's worth a try.

2014-02-06 14:51 UTC

How to use FSharp.Core 4.3.0 when all you have is 4.3.1

Thursday, 30 January 2014 18:39:00 UTC

If you only have F# 3.1 installed on a machine, but need to use a compiled application that requires F# 3.0, here's what you can do.

This post uses a particular application, Zero29, as an example in order to explain a problem and one possible solution. However, the post isn't about Zero29, but rather about a particular F# DLL hell.

Currently, I'm repaving one of my machines, which is always a good idea to do regularly, because it's a great remedy against works on my machine syndrome. This machine doesn't yet have a lot of software, but it does have Visual Studio 2013 and F# 3.1.

Working with a code base, I wanted to use Zero29 to incement the version number of the code, so first I executed:

$ packages/Zero29.0.4.1/tools/Zero29 -l

which promptly produced this error message:

Unhandled Exception: System.IO.FileNotFoundException:
Could not load file or assembly
'FSharp.Core, Version=4.3.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
or one of its dependencies. The system cannot find the file specified.
   at Ploeh.ZeroToNine.Program.main(String[] argv)

On one level, this makes sense, because Zero29 0.4.1 was compiled against F# 3.0 (which corresponds to FSharp.Core 4.3.0.0).

On another level, this is surprising, since I do have F# 3.1 (FSharp.Core 4.3.1.0) on my machine. Until the error message appeared, I had lived with the naïve assumption that when you install F# 3.1, it would automatically add redirects from FSharp.Core 4.3.0.0 to 4.3.1.0, or perhaps make sure that FSharp.Core 4.3.0.0 was also available. Apparently, I've become too used to Semantic Versioning, which is definitely not the versioning scheme used for F#.

Here's one way to address the issue.

Although Zero29 is my own (and contributors') creation, I didn't want to recompile it just to deal with this issue; it should also be usable for people with F# 3.0 on their machines.

Even though it's a compiled program, you can still add an application configuration file to it, so I created an XML file called Zero29.exe.config, placed it alongside Zero29.exe, and added this content:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="FSharp.Core"
                          publicKeyToken="b03f5f7f11d50a3a"
                          culture="neutral"/>
        <bindingRedirect oldVersion="4.3.0.0"
                         newVersion="4.3.1.0"/>
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
</configuration>

This solved the problem, although I now have the derived problem that this new file isn't part of the Zero29 NuGet package, and I don't know if it's going to ruin my colleagues' ability to use Zero29 if I check it into source control...

Another option may be to add the redirect to machine.config, instead of an application-specific redirect, but I have no desire to manipulate my machine.config files if I can avoid it, so I didn't try that.


NuGet Package Restore considered harmful

Wednesday, 29 January 2014 20:06:00 UTC

The NuGet Package Restore feature is a really bad idea; this post explains why.

One of the first things I do with a new installation of Visual Studio is to disable the NuGet Package Restore feature. There are many reasons for that, but it all boils down to this:

NuGet Package Restore introduces more problems than it solves.

Before I tell you about all those problems, I'll share the solution with you: check your NuGet packages into source control. Yes, it's that simple.

Storage implications #

If you're like most other people, you don't like that solution, because it feels inefficient. And so what? Let's look at some numbers.

All of these repositories contain NuGet packages, but keep in mind that even if we'd used Package Restore instead, these repositories wouldn't have been empty - they would still have taken up some space on disk.

On my laptops I'm using Lenovo-supported SSDs, so they're fairly expensive drives. Looking up current prices, it seems that a rough estimates of prices puts those disks at approximately 1 USD per GB.

On average, each of my repositories containing NuGet packages cost me four cents of disk drive space.

Perhaps I could have saved some of this money with Package Restore...

Clone time #

Another problem that the Package Restore feature seems to address, is the long time it takes to clone a repository - if you're on a shaky internet connection in a train. While it can be annoying to wait for a repository to clone, how often do you do that, compared to normal synchronization operations such as pull, push or fetch?

What should you be optimizing for? Cloning, which you do once in a while? Or fetch, pull, and push, which you do several times a day?

In most cases, the amount of time it takes to clone a repository is irrelevant.

To summarize so far: the problems that Package Restore solves are a couple of cents of disk cost, as well as making a rarely performed operation faster. From where I stand, it doesn't take a lot of problems before they outweigh the benefits - and there are plenty of problems with this feature.

Fragility #

The more moving parts you add to a system, the greater the risk of failure. If you use a Distributed Version Control System (DVCS) and keep all NuGet packages in the repository, you can work when you're off-line. With Package Restore, you've added a dependency on at least one package source.

  • What happens if you have no network connection?
  • What happens if your package source (e.g. NuGet.org) is down?
  • What happens if you use multiple package sources (e.g. both NuGet.org and MyGet.org)?
You may think that this is unlikely to happen, but apparently, NuGet.org was down today:

This is a well-known trait of any distributed system: The system is only as strong as its weakest link. The more services you add, the higher is the risk that something breaks.

Custom package sources #

NuGet itself is a nice system, and I often encourage organizations to adopt it for internal use. You may have reusable components that you want to share within your organization, but not with the whole world. In Grean, we have such components, and we use MyGet to host the packages. This is great, but if you use Package Restore, now you depend on multiple services (NuGet.org and MyGet.org) to be available at the same time.

While Myget is a nice and well-behaved NuGet host, I've also worked with internal NuGet package sources, set up as an internal service in an organization. Some of these are not as well-behaved. In one case, 'old' packages were deleted from the package source, which had the consequence that when I later wanted to use an older version of the source code, I couldn't complete a Package Restore because the package with the desired version number was no longer available. There was simply no way to build that version of the code base!

Portability #

One of the many nice things about a DVCS is that you can xcopy your repository and move it to another machine. You can also copy it and give it to someone else. You could, for example, zip it and hand it over to an external consultant. If you use Package Restore and internal package sources, the consultant will not be able to compile the code you gave him or her.

Setup #

Perhaps you don't use external consultants, but maybe you set up a new developer machine once in a while. Perhaps you occasionally get a new colleague, who needs help with setting up the development environment. Particularly if you use custom package feeds, making it all work is yet another custom configuration step you need to remember.

Bandwidth cost #

As far as I've been able to tell, the purpose of Package Restore is efficiency. However, every time you compile with Package Restore enabled, you're using the network.

Consider a Build Server. Every time it makes a build, it should start with a clean slate. It can get the latest deltas from the shared source control repository, but it should start with a clean working folder. This means that every time it builds, it'll need to download all the NuGet packages via Package Restore. This not only wastes bandwidth, but takes time. In contrast, if you keep NuGet packages in the repository itself, the Build Server has everything it needs as soon as it has the latest version of the repository.

The same goes for your own development machine. Package Restore will make your compile process slower.

Glitches #

Finally, Package Restore simply doesn't work very well. Personally, I've wasted many hours troubleshooting problems that turned out to be related to Package Restore. Allow me to share one of these stories.

Recently, I encountered this sight when I opened a solution in Visual Studio:

My problem was that at first, I didn't understand what was wrong. Even though I store NuGet packages in my repositories, all of a sudden I got this error message. It turned out that this happened at the time when NuGet switched to enabling Package Restore by default, and I hadn't gotten around to disable it again.

The strange thing was the everything compiled and worked just great, so why was I getting that error message?

After much digging around, it turned out that the ImpromptuInterface.FSharp package was missing a .nuspec file. You may notice that ImpromptuInterface.FSharp is also missing in the package list above. All binaries, as well as the .nupkg file, was in the repository, but the ImpromptuInterface.FSharp.1.2.13.nuspec was missing. I hadn't noticed for weeks, because I didn't need it, but NuGet complained.

After I added the appropriate .nuspec file, the error message went away.

The resolution to this problem turned out to be easy, and benign, but I wasted an hour or two troubleshooting. It didn't make me feel productive at all.

This story is just one among many run-ins I've had with NuGet Package Restore, before I decided to ditch it.

Just say no #

The Package Restore feature solves these problems:

  • It saves a nickel per repository in storage costs.
  • It saves time when you clone a new repository, which you shouldn't be doing that often.
On the other hand, it
  • adds complexity
  • makes it harder to use custom package sources
  • couples your ability to compile to having a network connection
  • makes it more difficult to copy a code base
  • makes it more difficult to set up your development environment
  • uses more bandwidth
  • leads to slower build times
  • just overall wastes your time

For me, the verdict is clear. The benefits of Package Restore don't warrant the disadvantages. Personally, I always disable the feature and instead check in all packages in my repositories. This never gives me any problems.


Comments

"Hah, not sure I'm doing this commenting thing right, but here it goes anyways."
"So going on two years from when you wrote this post, is this still how you feel about nuget packages being included in the repository? I have to say, all the points do seem to still apply, and I found myself agreeing with many of them, but I havne't been able to find many oppinions that mirror it. Most advice on the subject seems to be firmly in the other camp (not including nuget packages in the repo), though, as you note, the tradeoff doesn't seem to be a favorable one.
2015-12-03 15:36 UTC

Blake, thank you for writing. Yes, this is still how I feel; nothing has changed.

2015-12-03 21:17 UTC
Peter #

Mark, completely agree with all your points, however in the future, not using package restore will no longer be an option. See Project.json all the things, most notably "Packages are now stored in a per-user cache instead of alongside the solution".

2015-02-10 00:38 UTC

Like Peter, I am also interested in what you do now.

When you wrote that post, NuGet package dependencies were specificed (in part) by packages.config files. Then came project.json. The Microsoft-recommened approach these days is PackageReference. The first approach caches the "restored" NuGet packages in the respository, but the latter two (as Peter said) only cache in a global location (namely %userprofile%\.nuget\packages). I expect that you are using the PackageReference approach now, is that correct?

I see where Peter is coming from. It does seem at first like NuGet restore is now "necessary". Of course it is still possible to commit the NuGet packages in the respository. Then I could add this directory as a local NuGet package source and restore the NuGet packages, which will copy them from the respository to the global cache (so that the build can copy the corresponding DLLs from the global cache to the output directory).

However, maybe it is possible to specify the location of the cached NuGet packages when building the solution. I just thought of this possibility while writing this, so I haven't been able to fully investiagate it. This seems reasonable to me, and my initial searches also seem to point toward this being possible.

So how do you handle NuGet dependencies now? Does your build obtain them from the global cache or have you found a way to point the build to a directory of your choice?

2019-08-04 03:38 UTC

Tyson, thank you for writing. Currently, I use the standard tooling. I've lost that battle.

My opinion hasn't changed, but while it's possible to avoid package restore on .NET, I'm not aware of how to do that on .NET Core. I admit, however, that I haven't investigated this much.

I haven't done that much .NET Core development, and when I do, I typically do it to help other people. The things I help with typically relate to architecture, modelling, or testing. It can be hard for people to learn new things, so I aim at keeping the level of new things people have to absorb as low as possible.

Since people rarely come to me to learn package management, I don't want to rock that boat while I attempt to help people with something completely different. Therefore, I let them use their preferred approach, which is almost always the standard way.

2019-08-04 9:26 UTC

A Functional architecture with F#

Wednesday, 22 January 2014 22:48:00 UTC

My new Pluralsight course, A Functional Architecture with F#, is now available.

Whenever I've talked to object-oriented developers about F#, a common reaction has been that it looks enticing, but that they don't see how they'd be able to build a 'normal' application with it. F# has gained a reputation for being a 'niche' language, good for scientific computation and financial calculations, but not useful for mainstream applications.

Not only is F# a Turing-complete, general purpose programming language, but it has many advantages to offer compared to, say, C#. That said, though, building a 'normal' application with F# will only make sense if you know how to work with the language, and define an architecture that takes advantage of all it has to offer. Therefore, I thought that it would be valuable to show one possible way to do this, through a comprehensive example.

This was the motivation behind my new Pluralsight course A Functional Architecture with F#, which is now available! In it, you'll see extensive code demos of a web application written entirely in F# (and a bit of GUI in JavaScript, but the course only shows the F# code).

If you don't already have a Pluralsight account, you can get a free trial of up to 200 minutes.


Comments

...you'll see extensive code demos of a web application written entirely in F#...

What about a desktop applications written entirely in F#? When you create a desktop applciation in F#, what do you use to create the GUI?

I am currently writting my first application in F# and need to decide what we will use to create the GUI. There seems to be many ways this could be done.

  1. For simplicity, I started using WPF with the code-behind written in C#. I am satisifed with this initial (temporary) GUI for now, but the C#/F# interop is ugly to read and painful to write.
  2. I could sitck with WPF but write the code-behind in F#. I found two ways to do this:
    1. FsXaml and
    2. Elmish.WPF.
  3. Another possibility is the video game engine Unity.
  4. I also found a XAML-based approach called Avalonia. However, their website says they are "currently in a beta phase, which means that the framework is generally usable for writing applications, but there may be some bugs and breaking changes as we continue development."

There are probably many more that I missed as well.

Among these, Elmish.WPF stands out to me. Their page commnicates a strong ethos, which I find both enticing and convincing. The core idea seems to be the Elm Architecture, which I have also seen expressed as the MVU architecture, where MVU stands for Model, View, Update.

Have you come across that architecture before? I am very interested to hear your opinion about it.

P.S. Also worth mentioning is Fabulous, which uses Xamarin.Forms, so this seems like a good choice for mobile app development.

2019-08-04 01:15 UTC

Tyson, thank you for writing. I haven't done any desktop application development in ten years, so I don't think I'm qualified to make recommendations.

2019-08-04 9:34 UTC

Do your web applications include GUIs? If so, what UI framework(s) do you like to use there (such as Angular, React, Elm, etc.)?

P.S. I have been investigating Elmish.WPF and love what I have found. The Elm / MVU / Model, View, Update architecture seems to be a specific (or the ultimate?) applicaiton of the functional programming prinicple of pushing impure behavior to the boundry of the applicaiton.

2019-08-22 18:40 UTC

Tyson, seriously, I think that the last time I wrote any GUI code was with Angular some time back in 2014, after which I walked away in disgust. Since then, I've mostly written REST APIs, with the occasional daemon and console application thrown in here and there.

Many of the REST APIs I've helped develop are consumed by apps with GUIs, but someone else developed those, and I wasn't involved.

2019-08-22 20:35 UTC

REST efficiency

Monday, 20 January 2014 07:26:00 UTC

A fully RESTful API often looks inefficient from a client perspective, until you learn to change that perspective.

One of my readers, Filipe Ximenes, asks the following question of me:

"I read you post about avoiding hackable urls and found it very interesting. I'm currently studying about REST and I'm really interested on building true RESTful API's. One thing that is bothering me is how to access resources that are not in the API root. Eg: consider the following API flow:

"root > users > user details > user messages

"Now consider that one client wants to retrieve all the messages from a user. Does it need to "walk" the whole API (from it's root to "user messages")? This does not seem very efficient to me. Am I missing something? What would be a better solution for this?"

This is a common question which isn't particularly tied to avoiding hackable URLs, but simply to the hypermedia nature of a level 3 RESTful API.

The short answer is that it's probably not particularly inefficient. There are several reasons for that.

HTTP caching #

One of the great advantages of RESTful design is that instead of abstracting HTTP away, it very explicitly leverages the protocol. HTTP has bulit-in caching, so even if an API forces a client to walk the API as in the question above, it could conceivably result in only a single HTTP request:

HTTP caching sequence diagram.

This cache could be anywhere between the client and the service. It could be a proxy server, a reverse proxy, or it could even be a local cache on the client machine; think of a Browser's local cache. It could be a combination of all of those caches. Conceivably, if a local cache is involved, a client could walk the API as described above with only a single (or even no) network request involved, because most of the potential requests would be cache hits.

This is one of the many beautiful aspects of REST. By leveraging the HTTP protocol, you can use the internet as your caching infrastructure. Even if you want a greater degree of control, you can use off-the-shelf software for your caching purposes.

Cool URLs #

As the RESTful Web Services Cookbook describes, URLs should be cool. This means that once you've given a URL to a client, you should honour requests for that URL in the future. This means that clients can 'bookmark' URLs if they like. That includes the final URL in the flow above.

Short-cut links #

Finally, an API can provide short-cut links to a client. Imagine, for example, that when you ask for a list of users, you get this:

<users xmlns:atom="http://www.w3.org/2005/Atom">
  <user>
    <links>
      <atom:link rel="user-details" href="/users/1234" />
      <atom:link rel="user-messages" href="/users/1234/messages" />
    </links>
    <name>Foo</name>
  </user>
  <user>
    <links>
      <atom:link rel="user-details" href="/users/5678" />
      <atom:link rel="user-messages" href="/users/5678/messages" />
    </links>
    <name>Bar</name>
  </user>
  <user>
    <links>
      <atom:link rel="user-details" href="/users/9876" />
      <atom:link rel="user-messages" href="/users/9876/messages" />
    </links>
    <name>Baz</name>
  </user>
</users>

As you can see in this example, a list of users can provide a short-cut to a user's messages, enabling a client to follow a more direct path:

root > users > user messages

The client would have to prioritize links of the relationship type user-messages over links of the user-details type.

Summary #

Efficiency is a common concern about HATEOAS systems, particularly because a client should always start at published URL. Often, the only published URL is the root URL, which forces the client to walk the rest of the API. This seems inefficient, but doesn't have to be because of all the other built-in mechanisms that work to effectively counter what at first looks like an inefficiency.


Hyprlinkr 1.0.0

Friday, 17 January 2014 19:10:00 UTC

Hyprlinkr 1.0.0 is released.

According to the definition of Semantic Versioning, Hyprlinkr has been in pre-release in more than a year. With the release of ASP.NET Web API 2, I thought it was a good occasion to look at a proper release version.

I've tested Hyprlinkr against Web API 2, and apart from some required assembly redirects, it passes all tests against Web API 2 as well as Web API 1. Being able to support both Web API 1 and 2 is important, I think, because not everyone will be able to migrate to Web API 2 right away.

Since Hyprlinkr is finally out of pre-release mode, it also means that no breaking changes will be introduced before Hyprlinkr 2, which isn't even on the drawing board yet. Since this constitutes a contract, I also trimmed down the API a bit before releasing Hyprlinkr 1.0.0, but all the essential methods are still available.


ZeroToNine

Wednesday, 11 December 2013 12:37:00 UTC

Introducing ZeroToNine, a tool for maintaining .NET Assembly versions across multiple files.

When working with Semantic Versioning in my .NET projects, I prefer to explicitly update the version information in all relevant AssemblyInfo files. However, doing that by hand is quite tedious when you have many AssemblyInfo files, so instead, I rely on an automated tool.

For years, I used a PowerShell script, but recently, I decided to start over and write a 'real' tool, deployable via NuGet. It's called ZeroToNine, is free, and open source. Using it looks like this:

Zero29 -i minor
This increments the minor version in all AssemblyInfo files in all subdirectories beneath your present working directory.

This is great, because it enables me to do a complete pull of a pull request, build it and run all tests, assign a new version, and push it, without ever leaving the command-line. Since I already do all my Git work in Git Bash, modifying the AssemblyVersion files was the last step I needed to make available from the command line. The main logic is implemented in a library, so if you don't like command-line tools, but would like to build another tool based on ZeroToNine, you can do that too.

It's available via NuGet, and is written in F#.


Comments

Jeff Soper #

Can you clarify where one would install this when adding the NuGet package to a solution of several projects?

Your documentation says that it will update AssemblyInfo files in all subdirectories beneath the present working directory, but I thought that NuGet packages are applied at a project level, not at a solution level. So, wouldn't this mean that I would be running your tool from one of the many project directories, in which only that project's AssemblyInfo file would be affected?

I'm sure I'm not grasping something simple, but I'm anxious to incorporate this into my workflow!

2014-01-23 19:26 UTC

NuGet packages can contain executable tools as well as, or instead of, libraries. These executables can be found in the package's tools folder. This is what the Zero29 package does. It's not associated with any particular Visual Studio project.

As an example, using Zero29 from the root of the Albedo folder, you can do this:

$ Src/packages/Zero29.0.4.0/tools/Zero29.exe -l

There are other NuGet packages that work in the same way; e.g. NuGet.CommandLine and xunit.runners.

The ZeroToNine NuGet package, on the other hand, is a 'normal' library, so installs as a reference to a particular Visual Studio project.

2014-01-24 19:42 UTC

Semantic Versioning with Continuous Deployment

Tuesday, 10 December 2013 15:19:00 UTC

When you use Semantic Versioning with Continuous Deployment, version numbers must be checked into source control systems by programmers.

If you aren't already using Semantic Versioning, you should. It makes it much easier to figure out how to version your releases. Even if you're 'just' building software for your internal organization, or a single customer, you should still care about versioning of the software you release. Instead of an ad-hoc versioning scheme, Semantic Versioning offers a set of easy-to-understand rules about when to increment which version number.

In short, you

  • increment the patch version (e.g. from 2.3.4 to 2.3.5) when you only release bug fixes and the like
  • increment the minor version (e.g. from 1.3.2 to 1.4.0) when you add new features
  • increment the major version (e.g. from 3.2.9 to 4.0.0) when you introduce breaking changes
This makes it much easier for you when you need to make a decision on a new version number, and it also makes it much easier for consumers of your software to understand when an update is 'safe', and when they should set aside some time to test compatibility.

Continuous Deployment #

While Semantic Versioning is great, it requires a bit of consideration when combined with Continuous Deployment. Every time you deploy a new version, you should increment the version number.

Continuous Delivery and Continuous Deployment rely on automation. A code check-in triggers an automated build, which is subsequently processed by a Deployment Pipeline, and potentially released to end-users. Each released (or releasable) build should have a unique version.

Traditionally, Build Servers have had the responsibility of incrementing version numbers - typically by incrementing a build number, like this:

  1. 3.7.11.942
  2. 3.7.12.958
  3. 3.7.13.959
  4. 3.7.14.979
  5. 3.7.15.987
where the fourth number is a revision number, that may correspond to a revision ID in a source control system (whether or not that makes sense, depends on which version control system you use).

Unfortunately, this versioning scheme is wrong if you combine Semantic Versioning with Continuous Deployment. Even if you throw away the fourth build number, you're left with a sequence like this:

  1. 3.7.11 (bug fix)
  2. 3.7.12 (partial new feature, hidden behind a Feature Toggle.)
  3. 3.7.13 (performance improvement)
  4. 3.7.14 (completed feature initiated in 3.7.12)
  5. 3.7.15 (breaking changes in public API)
That's not Semantic Versioning.

Semantic Versioning might look like this:

  1. 3.7.11 (bug fix)
  2. 3.7.12 (partial new feature, hidden behind a Feature Toggle.)
  3. 3.7.13 (performance improvement)
  4. 3.8.0 (completed feature initiated in 3.7.12)
  5. 4.0.0 (breaking changes in public API)
This doesn't work well with automatically incrementing the version number.

Versioning is a programmer decision #

With Continuous Deployment, every time you integrate code (check in, merge, rebase, whatever), you produce a version of the software that will be deployed. This means that every time you integrate, something or somebody should assign a new version to the software.

The rules of Semantic Versioning require explicit decisions to be made. Only the development team understands what a particular commit contains. Is it a fix? Is it a new feature? Is it a breaking change? A Build Server doesn't know how to answer these questions, but you do.

A few years ago, I changed the delivery scheme for my open source project AutoFixture to use Semantic Versioning with Continuous Deployment. When I did that, I realised that I could no longer rely on a Build Server for controlling the version. Instead, I would have to explicitly control the versioning as part of the commit process.

Because AutoFixture is a .NET project, I decided to use the version assignment mechanism already present in the framework: The [AssemblyVersion] and [AssemblyFileVersion] attributes that you typically put in AssemblyInfo files.

The version control system used for AutoFixture is Git, so it works like this in practice:

  1. A programmer adds one or more commits to a branch.
  2. The programmer sends a pull request.
  3. I pull down the commits from the pull request.
  4. I increment all the version attributes in all the AssemblyInfo files, and commit that change.
  5. I push the commits to master.
  6. The Build Server picks up the new commits, and the Deployment Pipeline kicks in.
This works well. You can see an example of this process if you examine the commit log for AutoFixture. The only problem is that AutoFixture has 28 AssemblyInfo files (each with two version attributes) that I must update and keep in sync. That's a lot of work, so obviously a target for automation, but that's the subject for another blog post.

After more than two years of experience with this way of controlling software versions, I'm consistently using this approach for all my open source software, as well as the internal software we create in Grean.

Summary #

If you want to use Continuous Deployment (or Delivery) with Semantic Versioning, the assignment of a new version number is a programmer decision. Only a human understands when a commit constitutes a bug fix, a new feature, or a breaking change. The new version number must be committed to the version control system, so that whomever or whatever compiles and/or releases the software will always use the same version number for the same version of the source code.

The version number is kept in the source control system, together with the source code. It's not the realm of a Build Server.


Comments

You wrote Build Server doesn't know how to answer some questions. I think that it could - if every commit contains link to issue ID and Build Server is able to check type, state, etc. of issues related to given build then the Build Server could theoreticaly make the decision about version incrementing.
2013-12-10 17:02 UTC

Augi, it's true that you can create other approaches in order to attempt to address the issue, but the bottom line remains that a human being must make the decision about how to increment the version number. As you suggest, you can put information guiding that decision outside the source code itself, but then you'd be introducing another movable part that can break. If you do something like you suggest, you'll still need to add some machine-readable metadata to the linked issue ID. To add spite to injury, this also makes it more difficult to reproduce what the Build Server does on your local machine - particularly if you're attempting to build while offline.

While it sounds like it would be possible, what do you gain by doing something like that?

2013-12-10 19:16 UTC

Mark, I also have a Visual Studio solution or two with multiple AssemblyInfo.cs files (although not as many as you) and wish to use a common version number for each contained project. I came up with the following approach, which doesn't require any automation. It only uses the Visual Studio/MSBuild <Link /> functionality. The key is simply to use the Add As Link functionality for common attributes.

Simply put, I split out the common information (Version info and company/copyright/trademark info) from projects' AssemblyInfo.cs files into another file called SolutionAssemblyInfo.cs. I place that file at the root of the solution (outside of any project folders). Then, for each project, remove the version information from the AssemblyInfo.cs file and use the 'Add As Link' function in the 'Add Existing Item' function in Visual Studio to link to the SolutionAssemblyInfo.cs file. With that, you have only one place to update the version information: the SolutionAssemblyInfo.cs file. Any change to that version information will be included in each project.

That might be enough information to get you going, but if not, I'll expand and outline the specific process. The basic idea is to look at the AssemblyInfo.cs file as having two sets of metadata:

You can separate the shared metadata into a common AssemblyInfo.cs file. Then, by linking to that common file in each project (as opposed to including), you won't need to update 28 files; you'll only need to update the common one.

Assume I have the following AssemblyInfo.cs file for one of my projects:

// AssemblyInfo.cs
using System.Reflection;
using System.Runtime.InteropServices;

[assembly: AssemblyTitle("SampleProject")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("Company Name")]
[assembly: AssemblyProduct("SampleProject")]
[assembly: AssemblyCopyright("Copyright (c) Company Name 2013")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]

[assembly: ComVisible(false)]

[assembly: Guid("7ae5f3ab-e519-4c44-bb65-489305fc36b0")]

[assembly: AssemblyVersion("1.0.0.0")]
[assembly: AssemblyFileVersion("1.0.0.0")]
				
I split this out into two files:
// AssemblyInfo.cs
using System.Reflection;
using System.Runtime.InteropServices;

[assembly: AssemblyTitle("SampleProject")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyProduct("SampleProject")]
[assembly: AssemblyCulture("")]

[assembly: ComVisible(false)]

[assembly: Guid("7ae5f3ab-e519-4c44-bb65-489305fc36b0")]
				
and
// SolutionAssemblyInfo.cs
using System.Reflection;

[assembly: AssemblyCompany("Company Name")]
[assembly: AssemblyCopyright("Copyright (c) Company Name 2013")]
[assembly: AssemblyTrademark("")]

[assembly: AssemblyVersion("1.0.0.0")]
[assembly: AssemblyFileVersion("1.0.0.0")]
// Depending on your needs, AssemblyInformationalVersion as well?
				

The SolutionAssemblyInfo.cs goes in the root of your solution and should initially not be included in any projects. Then, for each project:

  • Remove all attributes that went into SolutionAssemblyInfo.cs
  • Right-click the project and "Add..., Existing Item..."
  • Navigate to the SolutionAssemblyInfo.cs file
  • Instead of clicking the "Add" button, click the little drop-down on it and select "Add As Link"
  • If you want the new linked SolutionAssemblyInfo.cs file to show up under the Properties folder (like the AssesmblyInfo.cs file), just drag it from the project root into the Properties folder. Unfortunately, you can't simply add the link to the Properties folder directly (at least not in VS 2012).
(Note: it looks like there may be an easier method in VS 2012+ only to get the "Add As Link" function, mentioned on StackOverflow, by simply dragging and dropping while holding the Alt key.)

That's it. Now, you will be able to access this SolutionAssemblyInfo.cs file from any of your projects and any changes you make to that file will persist into the linked file, being shared with all projects.

The downside to this, as opposed to an automation solution, is that you need to repeat this process (starting with "Remove all attributes...") for all new projects you add. However, in my opinion, that's a small, one-time-per-project price to pay. With the above, you let the established tool work for you with built-in features.

2013-12-10 18:00 UTC

Chris, thank you for your comment. That may actually be a good way to do it, too. While I did know about the add as link feature of Visual Studio, I've had bad experiences with it in the past. This may actually be a Pavlovian reaction on my part, because I must admit that I can no longer remember what those bad experiences were :$

2013-12-10 20:38 UTC
Laurence Evans #

I had been thinking about this a bit myself and I believe an easy solution to this is to just make use of a modified branching structure in the same/similar setup as the versioning. So you'd have your major/minor/build branches and your build server could increment numbers differently depending on which branch you update which would fully take care of the automation side of things for you. This would be rather trivial to setup and maintain but fulfill your requirements set out in the post.

Of course you would have to be quite disciplined as to which branch you commit your code to but I don't see that being too much of an overhead, you usually know when you're going to be patching/creating new featuresd or introducing breaking changes before you start working. Worst case make use of git stash save/pop to move work between branches.

Could call this semantic branching?

2013-12-10 23:54 UTC

Could call this semantic branching?

You might be interested in the GitFlowVersion project, which leverages some of the concepts you mention.

2013-12-11 09:54 UTC

Laurence, Marijn, thank you for your comments. As cool as Git is (currently, I'm not aware of anything better), I don't like coupling a process unnecessarily to a particular tool. What if, some day, something better than Git arrives?

Additionally, I agree with Martin Fowler, Jez Humble, and others, that branching is essentially evil, so I don't think it's a good idea building an entire versioning scheme around branches.

As an alternative, I've introduced ZeroToNine, a command-line tool (and library) that works independently of any version control system.

2013-12-11 12:55 UTC

you'd be introducing another movable part that can break...
...sounds like it would be possible, what do you gain by doing something like that?

Humans are/have moving parts that can break too ;). In large organisations there are often as many differences of opinion as there are people. One developer's "breaking change" or "feature" is another's "improvement" or "bugfix". Human decision making also introduces arbitrarily variable logic. Software projects rotate developers in and out all the time. Will they all apply consistent logic to their versioning decisions?

Developers can make a decision about releases without incrementing a number in a file. They can for example click on a "Push to NuGet" or "Release to GitHub" button (which is a human, developer decision). It's then trivial for a CI server to calculate or increment a PATCH number based on the last NuGet or GitHub Push/Release. A MINOR version can be easily determined by linking to an issue tracker with issues that are linked to milestones. A MAJOR version is probably simplest when a human increments it, but I see no reason why it couldn't also be triggered by monitoring changes or breakages to existing unit tests (for example). Considering the clarity of the semver MAJOR.MINOR.PATCH definitions, I think an algorithm determining the version number is more consistent than a human decision. For example (in pseudo-code):

while (a release request is in progress)
    if (app has previously been 'released' to some repository AND has subsequently changed in any way)
        increment PATCH...
            unless (all issues and features in next milestone have been marked closed, subsequent to last release)
                increment MINOR and reset PATCH to zero...
                    unless (unit tests that predate release have changed OR dependent application unit tests are failing OR some other determination of breaking change) 
                        increment MAJOR and reset MINOR and PATCH to zero...
2014-04-02 17:25 UTC

Rob, thank you for writing. If I could have defined a trustworthy automated system to figure out semantic versioning, I would have done it. Is your proposed algorithm sufficiently robust?

  • How does the algorithm determine if all issues and features in the 'next milestone' have been marked closed? What if your issue tracking system is off-line?
  • Considering that the context here is Continuous Delivery, would one really have to create and maintain a 'milestone' for every update?
  • When an 'issue' is resolved, how does the algorithm know if it was a new feature, or a bug fix?
  • How does the algorithm determine if a unit test has changed in a breaking fashion? A unit test could have been refactored simply to make it easier to read, without actually changing the behaviour of the system.
  • How does the algorithm determine if a failing test was due to a breaking change, or that the actual reason was a badly written test?

2014-04-04 18:11 UTC

What about versioning based on the branch names. I mean, what if we name a branch regarding to what it is suposed to do at the end. For instance, naming branches as feature-xxx, major-xxx, patch-xxx. Where I want to go is to automate the semantic versioning everytime a pull/merge request is accepted. So then the CI/CD tool, through a shell for instance, can just look at the last commit comment which is usually 'Merge branch xxx into master' (where xxx can be feature-yyy, major-yyy, patch-yyy) and increment the version acording to the branch merged. If it's a feature it increases the digit in the middle and resets the last one. On the other hand it it's a patch it only increases the last digit. Would it work? I mean the assignment of the new version is still a programmer decision which is done when they branch from master.

2015-08-06 00:02 UTC

Gus, thank you for writing. I think your suggestion could work as well, although I haven't tried it.

The advantage of your suggestion is that it's more declarative. You address the question of what should happen (major, minor, patch), instead of how it should happen (which number). That's usually a good thing.

The disadvantage is that you push the burden of this to a central build step (on a build server, I presume), so it introduces more moving parts into the process, as well as a single point of failure.

Fortunately, the evaluation of advantages versus disadvantages can be a personal (or team) decision, so one can choose the option one likes best. It's always good to have options to choose from in the first place, so thank you for sharing!

2015-08-06 06:06 UTC

Mark, I feel a bit outdated responding to a post that has a 4th birthday coming up. I completely agree with Semantic versioning, even for cloud application deployments which my team is working with at this moment. I am intrigued to how your workflow is working.

In our current workflow, we are forcing a version and changelog to be associated with a Pull Request. Thus, the developer is incrementing this as part of the PR and our auditor pipeline is ensuring that the version/changelog is updated. The team of course still have to ensure this version is correct, ie are you sure this is a micro change, look like a new feature to me or looks to me like you broke API compatibility and this is a major increment. The issue we are starting to hit with this early model is our team is growing and we are facing constant merge conflicts with our version file and Changelog (its a ruby on rails project thus we use a config.yml for the version which is read in at runtime by our app and displayed properly as a link on the apps's page back to our Changelog version)

It appears in your workflow that you have hooks set up so that these are initiated by the person merging the code such these files are only changed post-merge and commited then. If this is elaborated on in one of your books, let me know and my team could take a "work" break to go do some reading. I appreciate your time on the matter.

2017-08-09 17:54 UTC

Jonathan, thank you for writing, and don't worry about the age of the post. It only pleases me that apparently I've managed to produce something durable enough to be of interest four years later.

In the interest of full disclosure, the busiest code base on which I've ever used this technique is AutoFixture, and to be clear, I've handed off the reigns of that project. I've worked on bigger and more busy code bases than that, but these were internal enterprise code bases that didn't use Semantic Versioning.

Based on my experience with AutoFixture, I'd still use the process I used back then. It went something like this:

  1. If a code review resulted in my decision to accept a pull request, I'd pull it down on my laptop.
  2. Once on my laptop, I'd run the entire build locally. While I do realise that GitHub has a merge button, I regarded this as an extra verification step. While we had CI servers running, I think it never hurts to verify that it also builds on a developer's machine. Otherwise, you'd just have a problem the next time someone pulls master.
  3. If the build passed, I'd merge the branch locally.
  4. I'd then run a single Zero29 command to update all version information in all appropriate files.
  5. This single command would modify a set of text files, which I'd then check in. If you look at the AutoFixture commit history, you'll see lots of those check-ins.
  6. Once checked in, I'd tag the commit with the version. Often I'd use a cryptic bash command that I no longer remember to first read the current version with Zero29, then pipe that number to some other utility that could produce the appropriate tag, and then pipe that to git tag. The point is: that could be an automated step as well.
  7. Then I'd build the release binaries. That would be one other command.
  8. Penultimately, I'd publish the release by pushing all binaries to NuGet.org. That would be one other bash command.
  9. Finally, I'd push master and the new tag to GitHub.
As you can tell, that's less than a dozen bash commands. I could have automated most of it to one or two shell scripts, but I never got around to do that because I rather enjoyed the process. Consider that I didn't do this every day. If I had to do it several times a day, I would probably automate it more.

I'm sure you could even write a server-side script with a Web UI that could do this, if you wanted to, but I've always preferred doing a local build as part of the verification process.

I don't think I've written much more about this, rather than the announcement post for ZeroToNine, as well as the documentation for it.

2017-08-09 20:09 UTC

Layers, Onions, Ports, Adapters: it's all the same

Tuesday, 03 December 2013 18:59:00 UTC

If you apply the Dependency Inversion Principle to Layered Architecture, you end up with Ports and Adapters.

One of my readers, Giorgio Sala, asks me:

In his book "Implementing DDD" mr Vernon talks a lot about the Ports and Adapter architecture as a next level step of the Layered architecture. I would like to know your thinking about it.

The short answer is that this is more or less the architecture I describe in my book, although in the book, I never explicitly call it out by that name.

Layered architecture #

In my book, I describe the common pitfalls of a typical layered architecture. For example, in chapter 2, I analyse a typical approach to layered architecture; it's an example of what not to do. Paraphrased from the book's figure 2.13, the erroneous implementation creates this dependency graph:

The arrows show the direction of dependencies; i.e. the User Interface library depends on the Domain library, which in turn depends on the Data Access library. This violates the Dependency Inversion Principle (DIP), because the Domain library depends on the Data Access library, and the DIP says that:

Abstractions should not depend upon details. Details should depend upon abstractions.

- Agile Principles, Patterns, and Practices in C#, p. 154

Later in chapter 2, and throughout the rest of my book, I demonstrate how to invert the dependencies. Paraphrased, figure 2.12 looks like this:

This is almost the same figure as the previous, but notice that the direction of dependency has changed, so that the Data Access library now depends on the Domain library, instead of the other way around. This is the DIP applied: the details (UI, Data Access) depend on the abstractions (the Domain Model).

Onion layers #

The example from chapter 2 in my book is obviously simplified, with only three libraries involved. Imagine a generalized architecture following the DIP:

While there are many more libraries, notice that all dependencies still point inwards. If you're still thinking in terms of layers, you can draw concentric layers around the boxes:

Ports and adapters architecture diagram.

These concentric layers resemble the layers of an onion, so it's not surprising that Jeffrey Palermo calls this type of architecture for Onion Architecture.

The DIP still applies, so dependencies can only go in one direction. However, it would seem that I've put the UI components (the orange boxes) and the Data Access components (the blue boxes) in the same layer. (Additionally, I've added some yellow boxes that might symbolise unit tests.) This may seem unfamiliar, but actually makes sense, because the components in the outer layer are all at the boundaries of the application. Some boundaries (such as UI, RESTful APIs, message systems, etc.) face outward (to the internet, extranets, etc.), while other boundaries (e.g. databases, file systems, dependent web services, etc.) face inward (to the OS, database servers, etc.).

As the diagram implies, components can depend on other components within the same layer, but does that mean that UI components can talk directly to Data Access components?

Hexagonal architecture #

While traditional Layered Architecture is no longer the latest fad, it doesn't mean that all of its principles are wrong. It's still not a good idea to allow UI components to depend directly on the Data Access layer; it would couple such components together, and you might accidentally bypass important business logic.

You have probably noticed that I've grouped the orange, yellow, and blue boxes into separate clusters. This is because I still want to apply the old rule that UI components must not depend on Data Access components, and vice versa. Therefore, I introduce bulkheads between these groups:

Although it may seem a bit accidental that I end up with exactly six sections (three of them empty), it does nicely introduce Alistair Cockburn's closely related concept of Hexagonal Architecture:

You may feel that I cheated a bit in order to make my diagram hexagonal, but that's okay, because there's really nothing inherently hexagonal about Hexagonal Architecture; it's not a particularly descriptive name. Instead, I prefer the alternative name Ports and Adapters.

Ports and Adapters #

The only thing still bothering me with the above diagram is that the dependency hierarchy is too deep (at least conceptually). When the diagram consisted of concentric circles, it had three (onion) layers. The hexagonal dependency graph above still has those intermediary (grey) components, but as I've previously attempted to explain, the flatter the dependency hierarchy, the better.

The last step, then, is to flatten the dependency hierarchy of the inner hexagon:

The components in the inner hexagon have few or no dependencies on each other, while components in the outer hexagon act as Adapters between the inner components, and the application boundaries: its ports.

Summary #

In my book, I never explicitly named the architecture I describe, but essentially, it is the Ports and Adapters architecture. There are other possible application architectures than the variations described here, and some of them still work well with Dependency Injection, but the main architectural emphasis in Dependency Injection in .NET is Ports and Adapters, because I judged it to be the least foreign for the majority of the book's readers.

The reason I never explicitly called attention to Ports and Adapters or Onion Architecture in my book is that I only became aware of these pattern names as I wrote the book. At that time, I didn't feel confident that what I did matched those patterns, but the more I've learned, the more I've become convinced that this was what I'd been doing all along. This just confirms that Ports and Adapters is a bona fide pattern, because one of the attributes of patterns is that they materialize independently in different environments, and are then subsequently discovered as patterns.


Page 51 of 76

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!