ploeh blog danish software design
If you only have F# 3.1 installed on a machine, but need to use a compiled application that requires F# 3.0, here's what you can do.
This post uses a particular application, Zero29, as an example in order to explain a problem and one possible solution. However, the post isn't about Zero29, but rather about a particular F# DLL hell.
Currently, I'm repaving one of my machines, which is always a good idea to do regularly, because it's a great remedy against works on my machine syndrome. This machine doesn't yet have a lot of software, but it does have Visual Studio 2013 and F# 3.1.
Working with a code base, I wanted to use Zero29 to incement the version number of the code, so first I executed:
$ packages/Zero22.214.171.124/tools/Zero29 -l
which promptly produced this error message:
Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'FSharp.Core, Version=126.96.36.199, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified. at Ploeh.ZeroToNine.Program.main(String argv)
On another level, this is surprising, since I do have F# 3.1 (FSharp.Core 188.8.131.52) on my machine. Until the error message appeared, I had lived with the naïve assumption that when you install F# 3.1, it would automatically add redirects from FSharp.Core 184.108.40.206 to 220.127.116.11, or perhaps make sure that FSharp.Core 18.104.22.168 was also available. Apparently, I've become too used to Semantic Versioning, which is definitely not the versioning scheme used for F#.
Here's one way to address the issue.
Although Zero29 is my own (and contributors') creation, I didn't want to recompile it just to deal with this issue; it should also be usable for people with F# 3.0 on their machines.
Even though it's a compiled program, you can still add an application configuration file to it, so I created an XML file called Zero29.exe.config, placed it alongside Zero29.exe, and added this content:
<?xml version="1.0" encoding="utf-8"?> <configuration> <runtime> <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1"> <dependentAssembly> <assemblyIdentity name="FSharp.Core" publicKeyToken="b03f5f7f11d50a3a" culture="neutral"/> <bindingRedirect oldVersion="22.214.171.124" newVersion="126.96.36.199"/> </dependentAssembly> </assemblyBinding> </runtime> </configuration>
This solved the problem, although I now have the derived problem that this new file isn't part of the Zero29 NuGet package, and I don't know if it's going to ruin my colleagues' ability to use Zero29 if I check it into source control...
Another option may be to add the redirect to machine.config, instead of an application-specific redirect, but I have no desire to manipulate my machine.config files if I can avoid it, so I didn't try that.
The NuGet Package Restore feature is a really bad idea; this post explains why.
One of the first things I do with a new installation of Visual Studio is to disable the NuGet Package Restore feature. There are many reasons for that, but it all boils down to this:
NuGet Package Restore introduces more problems than it solves.
Before I tell you about all those problems, I'll share the solution with you: check your NuGet packages into source control. Yes, it's that simple.
Storage implications #
If you're like most other people, you don't like that solution, because it feels inefficient. And so what? Let's look at some numbers.
- The AutoFixture repository is 28.6 MB, and that's a pretty big code base (181,479 lines of code).
- The Hyprlinkr repository is 32.2 MB.
- The Albedo repository is 8.85 MB.
- The ZeroToNine repository is 4.91 MB.
- The sample code repository for my new Pluralsight course is 69.9 MB.
- The repository for Grean's largest production application is 32.5 MB.
- Last year I helped one of my clients build a big, scalable REST API. We had several repositories, of which the largest one takes up 95.3 MB on my disk.
On my laptops I'm using Lenovo-supported SSDs, so they're fairly expensive drives. Looking up current prices, it seems that a rough estimates of prices puts those disks at approximately 1 USD per GB.
On average, each of my repositories containing NuGet packages cost me four cents of disk drive space.
Perhaps I could have saved some of this money with Package Restore...
Clone time #
Another problem that the Package Restore feature seems to address, is the long time it takes to clone a repository - if you're on a shaky internet connection in a train. While it can be annoying to wait for a repository to clone, how often do you do that, compared to normal synchronization operations such as pull, push or fetch?
What should you be optimizing for? Cloning, which you do once in a while? Or fetch, pull, and push, which you do several times a day?
In most cases, the amount of time it takes to clone a repository is irrelevant.
To summarize so far: the problems that Package Restore solves are a couple of cents of disk cost, as well as making a rarely performed operation faster. From where I stand, it doesn't take a lot of problems before they outweigh the benefits - and there are plenty of problems with this feature.
The more moving parts you add to a system, the greater the risk of failure. If you use a Distributed Version Control System (DVCS) and keep all NuGet packages in the repository, you can work when you're off-line. With Package Restore, you've added a dependency on at least one package source.
- What happens if you have no network connection?
- What happens if your package source (e.g. NuGet.org) is down?
- What happens if you use multiple package sources (e.g. both NuGet.org and MyGet.org)?
This is a well-known trait of any distributed system: The system is only as strong as its weakest link. The more services you add, the higher is the risk that something breaks.
Custom package sources #
NuGet itself is a nice system, and I often encourage organizations to adopt it for internal use. You may have reusable components that you want to share within your organization, but not with the whole world. In Grean, we have such components, and we use MyGet to host the packages. This is great, but if you use Package Restore, now you depend on multiple services (NuGet.org and MyGet.org) to be available at the same time.
While Myget is a nice and well-behaved NuGet host, I've also worked with internal NuGet package sources, set up as an internal service in an organization. Some of these are not as well-behaved. In one case, 'old' packages were deleted from the package source, which had the consequence that when I later wanted to use an older version of the source code, I couldn't complete a Package Restore because the package with the desired version number was no longer available. There was simply no way to build that version of the code base!
One of the many nice things about a DVCS is that you can xcopy your repository and move it to another machine. You can also copy it and give it to someone else. You could, for example, zip it and hand it over to an external consultant. If you use Package Restore and internal package sources, the consultant will not be able to compile the code you gave him or her.
Perhaps you don't use external consultants, but maybe you set up a new developer machine once in a while. Perhaps you occasionally get a new colleague, who needs help with setting up the development environment. Particularly if you use custom package feeds, making it all work is yet another custom configuration step you need to remember.
Bandwidth cost #
As far as I've been able to tell, the purpose of Package Restore is efficiency. However, every time you compile with Package Restore enabled, you're using the network.
Consider a Build Server. Every time it makes a build, it should start with a clean slate. It can get the latest deltas from the shared source control repository, but it should start with a clean working folder. This means that every time it builds, it'll need to download all the NuGet packages via Package Restore. This not only wastes bandwidth, but takes time. In contrast, if you keep NuGet packages in the repository itself, the Build Server has everything it needs as soon as it has the latest version of the repository.
The same goes for your own development machine. Package Restore will make your compile process slower.
Finally, Package Restore simply doesn't work very well. Personally, I've wasted many hours troubleshooting problems that turned out to be related to Package Restore. Allow me to share one of these stories.
Recently, I encountered this sight when I opened a solution in Visual Studio:
My problem was that at first, I didn't understand what was wrong. Even though I store NuGet packages in my repositories, all of a sudden I got this error message. It turned out that this happened at the time when NuGet switched to enabling Package Restore by default, and I hadn't gotten around to disable it again.
The strange thing was the everything compiled and worked just great, so why was I getting that error message?
After much digging around, it turned out that the ImpromptuInterface.FSharp package was missing a .nuspec file. You may notice that ImpromptuInterface.FSharp is also missing in the package list above. All binaries, as well as the .nupkg file, was in the repository, but the ImpromptuInterface.FSharp.1.2.13.nuspec was missing. I hadn't noticed for weeks, because I didn't need it, but NuGet complained.
After I added the appropriate .nuspec file, the error message went away.
The resolution to this problem turned out to be easy, and benign, but I wasted an hour or two troubleshooting. It didn't make me feel productive at all.
This story is just one among many run-ins I've had with NuGet Package Restore, before I decided to ditch it.
Just say no #
The Package Restore feature solves these problems:
- It saves a nickel per repository in storage costs.
- It saves time when you clone a new repository, which you shouldn't be doing that often.
- adds complexity
- makes it harder to use custom package sources
- couples your ability to compile to having a network connection
- makes it more difficult to copy a code base
- makes it more difficult to set up your development environment
- uses more bandwidth
- leads to slower build times
- just overall wastes your time
For me, the verdict is clear. The benefits of Package Restore don't warrant the disadvantages. Personally, I always disable the feature and instead check in all packages in my repositories. This never gives me any problems.
My new Pluralsight course, A Functional Architecture with F#, is now available.
Whenever I've talked to object-oriented developers about F#, a common reaction has been that it looks enticing, but that they don't see how they'd be able to build a 'normal' application with it. F# has gained a reputation for being a 'niche' language, good for scientific computation and financial calculations, but not useful for mainstream applications.
Not only is F# a Turing-complete, general purpose programming language, but it has many advantages to offer compared to, say, C#. That said, though, building a 'normal' application with F# will only make sense if you know how to work with the language, and define an architecture that takes advantage of all it has to offer. Therefore, I thought that it would be valuable to show one possible way to do this, through a comprehensive example.
If you don't already have a Pluralsight account, you can get a free trial of up to 200 minutes.
A fully RESTful API often looks inefficient from a client perspective, until you learn to change that perspective.
One of my readers, Filipe Ximenes, asks the following question of me:
"I read you post about avoiding hackable urls and found it very interesting. I'm currently studying about REST and I'm really interested on building true RESTful API's. One thing that is bothering me is how to access resources that are not in the API root. Eg: consider the following API flow:
root > users > user details > user messages
"Now consider that one client wants to retrieve all the messages from a user. Does it need to "walk" the whole API (from it's root to "user messages")? This does not seem very efficient to me. Am I missing something? What would be a better solution for this?"
This is a common question which isn't particularly tied to avoiding hackable URLs, but simply to the hypermedia nature of a level 3 RESTful API.
The short answer is that it's probably not particularly inefficient. There are several reasons for that.
HTTP caching #
One of the great advantages of RESTful design is that instead of abstracting HTTP away, it very explicitly leverages the protocol. HTTP has bulit-in caching, so even if an API forces a client to walk the API as in the question above, it could conceivably result in only a single HTTP request:
This cache could be anywhere between the client and the service. It could be a proxy server, a reverse proxy, or it could even be a local cache on the client machine; think of a Browser's local cache. It could be a combination of all of those caches. Conceivably, if a local cache is involved, a client could walk the API as described above with only a single (or even no) network request involved, because most of the potential requests would be cache hits.
This is one of the many beautiful aspects of REST. By leveraging the HTTP protocol, you can use the internet as your caching infrastructure. Even if you want a greater degree of control, you can use off-the-shelf software for your caching purposes.
Cool URLs #
As the RESTful Web Services Cookbook describes, URLs should be cool. This means that once you've given a URL to a client, you should honour requests for that URL in the future. This means that clients can 'bookmark' URLs if they like. That includes the final URL in the flow above.
Short-cut links #
Finally, an API can provide short-cut links to a client. Imagine, for example, that when you ask for a list of users, you get this:
<users xmlns:atom="http://www.w3.org/2005/Atom"> <user> <links> <atom:link rel="user-details" href="/users/1234" /> <atom:link rel="user-messages" href="/users/1234/messages" /> </links> <name>Foo</name> </user> <user> <links> <atom:link rel="user-details" href="/users/5678" /> <atom:link rel="user-messages" href="/users/5678/messages" /> </links> <name>Bar</name> </user> <user> <links> <atom:link rel="user-details" href="/users/9876" /> <atom:link rel="user-messages" href="/users/9876/messages" /> </links> <name>Baz</name> </user> </users>
As you can see in this example, a list of users can provide a short-cut to a user's messages, enabling a client to follow a more direct path:
root > users > user messages
The client would have to prioritize links of the relationship type user-messages over links of the user-details type.
Efficiency is a common concern about HATEOAS systems, particularly because a client should always start at published URL. Often, the only published URL is the root URL, which forces the client to walk the rest of the API. This seems inefficient, but doesn't have to be because of all the other built-in mechanisms that work to effectively counter what at first looks like an inefficiency.
Hyprlinkr 1.0.0 is released.
According to the definition of Semantic Versioning, Hyprlinkr has been in pre-release in more than a year. With the release of ASP.NET Web API 2, I thought it was a good occasion to look at a proper release version.
I've tested Hyprlinkr against Web API 2, and apart from some required assembly redirects, it passes all tests against Web API 2 as well as Web API 1. Being able to support both Web API 1 and 2 is important, I think, because not everyone will be able to migrate to Web API 2 right away.
Since Hyprlinkr is finally out of pre-release mode, it also means that no breaking changes will be introduced before Hyprlinkr 2, which isn't even on the drawing board yet. Since this constitutes a contract, I also trimmed down the API a bit before releasing Hyprlinkr 1.0.0, but all the essential methods are still available.
Introducing ZeroToNine, a tool for maintaining .NET Assembly versions across multiple files.
When working with Semantic Versioning in my .NET projects, I prefer to explicitly update the version information in all relevant AssemblyInfo files. However, doing that by hand is quite tedious when you have many AssemblyInfo files, so instead, I rely on an automated tool.
Zero29 -i minorThis increments the minor version in all AssemblyInfo files in all subdirectories beneath your present working directory.
This is great, because it enables me to do a complete pull of a pull request, build it and run all tests, assign a new version, and push it, without ever leaving the command-line. Since I already do all my Git work in Git Bash, modifying the AssemblyVersion files was the last step I needed to make available from the command line. The main logic is implemented in a library, so if you don't like command-line tools, but would like to build another tool based on ZeroToNine, you can do that too.
It's available via NuGet, and is written in F#.
When you use Semantic Versioning with Continuous Deployment, version numbers must be checked into source control systems by programmers.
If you aren't already using Semantic Versioning, you should. It makes it much easier to figure out how to version your releases. Even if you're 'just' building software for your internal organization, or a single customer, you should still care about versioning of the software you release. Instead of an ad-hoc versioning scheme, Semantic Versioning offers a set of easy-to-understand rules about when to increment which version number.
In short, you
- increment the patch version (e.g. from 2.3.4 to 2.3.5) when you only release bug fixes and the like
- increment the minor version (e.g. from 1.3.2 to 1.4.0) when you add new features
- increment the major version (e.g. from 3.2.9 to 4.0.0) when you introduce breaking changes
Continuous Deployment #
While Semantic Versioning is great, it requires a bit of consideration when combined with Continuous Deployment. Every time you deploy a new version, you should increment the version number.
Continuous Delivery and Continuous Deployment rely on automation. A code check-in triggers an automated build, which is subsequently processed by a Deployment Pipeline, and potentially released to end-users. Each released (or releasable) build should have a unique version.
Traditionally, Build Servers have had the responsibility of incrementing version numbers - typically by incrementing a build number, like this:
Unfortunately, this versioning scheme is wrong if you combine Semantic Versioning with Continuous Deployment. Even if you throw away the fourth build number, you're left with a sequence like this:
- 3.7.11 (bug fix)
- 3.7.12 (partial new feature, hidden behind a Feature Toggle.)
- 3.7.13 (performance improvement)
- 3.7.14 (completed feature initiated in 3.7.12)
- 3.7.15 (breaking changes in public API)
Semantic Versioning might look like this:
- 3.7.11 (bug fix)
- 3.7.12 (partial new feature, hidden behind a Feature Toggle.)
- 3.7.13 (performance improvement)
- 3.8.0 (completed feature initiated in 3.7.12)
- 4.0.0 (breaking changes in public API)
Versioning is a programmer decision #
With Continuous Deployment, every time you integrate code (check in, merge, rebase, whatever), you produce a version of the software that will be deployed. This means that every time you integrate, something or somebody should assign a new version to the software.
The rules of Semantic Versioning require explicit decisions to be made. Only the development team understands what a particular commit contains. Is it a fix? Is it a new feature? Is it a breaking change? A Build Server doesn't know how to answer these questions, but you do.
A few years ago, I changed the delivery scheme for my open source project AutoFixture to use Semantic Versioning with Continuous Deployment. When I did that, I realised that I could no longer rely on a Build Server for controlling the version. Instead, I would have to explicitly control the versioning as part of the commit process.
Because AutoFixture is a .NET project, I decided to use the version assignment mechanism already present in the framework: The [AssemblyVersion] and [AssemblyFileVersion] attributes that you typically put in AssemblyInfo files.
The version control system used for AutoFixture is Git, so it works like this in practice:
- A programmer adds one or more commits to a branch.
- The programmer sends a pull request.
- I pull down the commits from the pull request.
- I increment all the version attributes in all the AssemblyInfo files, and commit that change.
- I push the commits to master.
- The Build Server picks up the new commits, and the Deployment Pipeline kicks in.
After more than two years of experience with this way of controlling software versions, I'm consistently using this approach for all my open source software, as well as the internal software we create in Grean.
If you want to use Continuous Deployment (or Delivery) with Semantic Versioning, the assignment of a new version number is a programmer decision. Only a human understands when a commit constitutes a bug fix, a new feature, or a breaking change. The new version number must be committed to the version control system, so that whomever or whatever compiles and/or releases the software will always use the same version number for the same version of the source code.
The version number is kept in the source control system, together with the source code. It's not the realm of a Build Server.
If you apply the Dependency Inversion Principle to Layered Architecture, you end up with Ports and Adapters.
One of my readers, Giorgio Sala, asks me:
In his book "Implementing DDD" mr Vernon talks a lot about the Ports and Adapter architecture as a next level step of the Layered architecture. I would like to know your thinking about it.
The short answer is that this is more or less the architecture I describe in my book, although in the book, I never explicitly call it out by that name.
Layered architecture #
In my book, I describe the common pitfalls of a typical layered architecture. For example, in chapter 2, I analyse a typical approach to layered architecture; it's an example of what not to do. Paraphrased from the book's figure 2.13, the erroneous implementation creates this dependency graph:
The arrows show the direction of dependencies; i.e. the User Interface library depends on the Domain library, which in turn depends on the Data Access library. This violates the Dependency Inversion Principle (DIP), because the Domain library depends on the Data Access library, and the DIP says that:
Abstractions should not depend upon details. Details should depend upon abstractions.
Later in chapter 2, and throughout the rest of my book, I demonstrate how to invert the dependencies. Paraphrased, figure 2.12 looks like this:
This is almost the same figure as the previous, but notice that the direction of dependency has changed, so that the Data Access library now depends on the Domain library, instead of the other way around. This is the DIP applied: the details (UI, Data Access) depend on the abstractions (the Domain Model).
Onion layers #
The example from chapter 2 in my book is obviously simplified, with only three libraries involved. Imagine a generalized architecture following the DIP:
While there are many more libraries, notice that all dependencies still point inwards. If you're still thinking in terms of layers, you can draw concentric layers around the boxes:
The DIP still applies, so dependencies can only go in one direction. However, it would seem that I've put the UI components (the orange boxes) and the Data Access components (the blue boxes) in the same layer. (Additionally, I've added some yellow boxes that might symbolise unit tests.) This may seem unfamiliar, but actually makes sense, because the components in the outer layer are all at the boundaries of the application. Some boundaries (such as UI, RESTful APIs, message systems, etc.) face outward (to the internet, extranets, etc.), while other boundaries (e.g. databases, file systems, dependent web services, etc.) face inward (to the OS, database servers, etc.).
As the diagram implies, components can depend on other components within the same layer, but does that mean that UI components can talk directly to Data Access components?
Hexagonal architecture #
While traditional Layered Architecture is no longer the latest fad, it doesn't mean that all of its principles are wrong. It's still not a good idea to allow UI components to depend directly on the Data Access layer; it would couple such components together, and you might accidentally bypass important business logic.
You have probably noticed that I've grouped the orange, yellow, and blue boxes into separate clusters. This is because I still want to apply the old rule that UI components must not depend on Data Access components, and vice versa. Therefore, I introduce bulkheads between these groups:
You may feel that I cheated a bit in order to make my diagram hexagonal, but that's okay, because there's really nothing inherently hexagonal about Hexagonal Architecture; it's not a particularly descriptive name. Instead, I prefer the alternative name Ports and Adapters.
Ports and Adapters #
The only thing still bothering me with the above diagram is that the dependency hierarchy is too deep (at least conceptually). When the diagram consisted of concentric circles, it had three (onion) layers. The hexagonal dependency graph above still has those intermediary (grey) components, but as I've previously attempted to explain, the flatter the dependency hierarchy, the better.
The last step, then, is to flatten the dependency hierarchy of the inner hexagon:
The components in the inner hexagon have few or no dependencies on each other, while components in the outer hexagon act as Adapters between the inner components, and the application boundaries: its ports.
In my book, I never explicitly named the architecture I describe, but essentially, it is the Ports and Adapters architecture. There are other possible application architectures than the variations described here, and some of them still work well with Dependency Injection, but the main architectural emphasis in Dependency Injection in .NET is Ports and Adapters, because I judged it to be the least foreign for the majority of the book's readers.
The reason I never explicitly called attention to Ports and Adapters or Onion Architecture in my book is that I only became aware of these pattern names as I wrote the book. At that time, I didn't feel confident that what I did matched those patterns, but the more I've learned, the more I've become convinced that this was what I'd been doing all along. This just confirms that Ports and Adapters is a bona fide pattern, because one of the attributes of patterns is that they materialize independently in different environments, and are then subsequently discovered as patterns.
Unfortunately, I've had to cancel my speaking engagement at NDC London 2013.
Ever since I was accepted as a speaker at NDC London 2013, I've really looked forward to it. Unfortunately, due to serious illness in my family, I've decided to cancel my appearance at the conference. This has been a difficult decision to make, but now that I've made it, I can feel that it's the right decision, even though it pains me.
I hope to be able to return to NDC another time, both in Oslo and in London.
If you visit my Lanyrd profile, you will see that while I have removed myself from NDC London, I'm still scheduled to speak at the Warm Crocodile Developer Conference in January 2014. I still hope to be able to attend and speak here. Not only is it realistic to hope that my family situation is better in January, but because the conference is in my home town, it also means that it puts less of a strain on my family. This may change, though...
Albedo is a .NET library targeted at making Reflection programming more consistent, using a common set of abstractions and utilities.
It's a .NET library targeted at making Reflection programming more consistent, using a common set of abstractions and utilities. The project site may actually contain more details than you'd care to read, but here's a small code sample to either whet your appetite, or scare you away:
PropertyInfo pi = from v in new Properties<Version>() select v.Minor; var version = new Version(2, 7); var visitor = new ValueCollectingVisitor(version); var actual = new PropertyInfoElement(pi).Accept(visitor); Assert.Equal(version.Minor, actual.Value.OfType<int>().First());
Albedo is available via NuGet.