Open-source software is powering more commercial business than ever. R and Python in particular are seeing increased use in actuarial departments. Both R and Python are open-source themselves, and many of their libraries are also open-source. But did you know that their communities tend to take different approaches?
In this article you’ll learn what open-source means, important variations that exist within, and why people and businesses might embrace it. I’ll also give you a quick overview of some of the principles and variations that are important to understand when it comes to using and producing open-source software especially within an actuarial context.
What is open-source software?
Most people reading this will probably understand that ‘open-source’ refers to the transparency of the source code that is used to build a piece of software. This contrasts with closed-source, where the human-readable source code used to build a piece of software is kept secret.
Source code is what humans write, but it isn’t what computers read. A piece of software called a compiler translates the human-readable source code into binary code so that the computer can understand it. In the closed-source business model, the source code is a trade secret, and only the binary code is distributed to customers.
In the open-source model, the source code is treated as part of the product and is shared with (or is otherwise available to) the customer when they receive the machine-readable code. Typically, this is accompanied by the author giving lots of legal rights to their users through their choice of license.
Why do people release open-source software?
Most open-source software ends up being free, and it is not difficult to see why – if anybody can take your code without friction, then it puts a decisive downward pressure on the price you can get away with charging. It is therefore also quite easy to see why open-source software is popular amongst users. It’s less clear why somebody would want to write open-source software.
Here are a few possible reasons:
- The software is a passion project for those involved, and they are not particularly interested in monetary reward
- They have a shrewd business strategy where open-source will improve the bottom line
- They have a philosophical commitment to the open-source approach
- It may be the only legal way to release their software
The Passion Project
Programmers can be an eccentric bunch, and many of them first acquire programming as a hobby, often during childhood. I consider myself to be one of these people, and I can vouch for the fact that eliminating difficult or repetitive problems with software can be a uniquely satisfying experience. In such cases open-source makes a lot of sense – you want to engineer the best solution possible and so arbitrary barriers to getting help or feedback look unappealing.
The Shrewd Business Strategy
There are a lot of ways that open-source can be compatible with a good business plan.
There certainly have been businesses that have charged money directly for open-source software –often through careful and pragmatic licensing. However, the much more common way for open-source software to be monetised is to offer the software for free and instead charge for additional services around the software. Having the lowest possible barrier to acquire the software also helps to spread the brand and maximise revenue from the additional services or other lines of business.
There are other reasons that businesses might consider opening their source code, though. As well as increasing revenues, open source might lead to cost reduction, especially if the source is publicly available and becomes popular. For example, the business may have ambitions of acquiring free help and testing from a community of benevolent strangers. Over the longer term they may hope to reduce the costs of hiring and training in general by ensuring the general workforce of programmers can get familiar with the technology they use without being on the payroll.
Ultimately, if the code itself doesn’t contain any major innovations, or its design is easy to infer from the behaviour of the software, then the benefits of adopting an open-source model may exceed the costs. However, it is also worth knowing that the practicalities of the open-source model don’t always line up entirely with the theory.
In its earlier days people tended to dream of community-driven development; in reality, I think it’s fair to say that most open-source projects have a very focussed group of core developers who do the vast majority of the work – just like a closed-source project. While source code is more readable than binary, it still isn’t trivial to understand a large application and being able to materially contribute to a code base does tend to require a fair investment of time.
A philosophical divide: permissive vs copyleft licenses
This is where things get quite interesting. The staunchest advocates for open-source software do tend to have one thing in common: an emphasis on liberty around software. Whether or not the software is free in terms of cost, many open-source developers believe that their customers should be able to use the software as they see fit with minimal restriction. Closed-source is a barrier to this freedom of use, as the software can then only be used in the ways allowed by the original author. Contrast this with open-source, where a user can make their own modifications, or extract useful modules of code for use in their own projects and so on.
However, like many communities, there is a sectarianism behind the scenes that divides two broad factions of open-source developers, which ultimately boils down to having fundamentally different ideas of freedom. At the risk of enraging both factions through excessive reductionism, I would characterise their approaches as ‘business friendly’ versus ‘end-user friendly’.
These two attitudes are largely reflected in the licenses that you would commonly find protecting open-source code.
One group takes a very laissez-faire attitude and concludes that freedom of use must include the freedom to return to a closed-source model. In other words, just because I choose to open my source code to you, does not mean you must open your source code to another (even if it was derived from mine). This approach is reflected legally in permissive licenses. I would characterise this attitude as more ‘business friendly’ as it does not enforce the adoption of the open-source model on users. The most popular examples of permissive licenses are the MIT, BSD and Apache licenses.
On the other side of this debate, we have a more restrictive approach to open-source, which concludes that we must strive for equality of freedom rather than locally maximal freedom. The licenses that fall under this banner take away some of the freedom of each user to ensure that users further downstream do not lose the benefits of open-source. In practical terms, these licenses say that users can do whatever they want with the code, except redistribute it under a different (incompatible) license. These licenses are known as copyleft licenses. The most popular examples are the GPL family of licenses.
It is worth pointing out that these are just two broad attitudes, and in fact there are many dozens of off-the-shelf licenses available to developers, all with subtle differences. It is always necessary to consult the specific license for legal details; it is never enough to simply know which faction it belongs to.
An interesting point to note is that a business would not necessarily choose a ‘business friendly’ license. After all, other businesses are their competitors – a strategic copyleft license will deny other businesses the chance to take your source and relicense it as closed-source.
Software supplier or user?
There are really two relationships you can have with a license, depending on whether you are a supplier or user of the software. If you are supplying the software, then your choice of license reflects what you want your users to be able to do with it. Those who are habitually on the supply side are likely familiar with everything in this article and more already.
The more likely situation for a reader of this article is that you are a user of open-source software.
In theory, the particular licenses underpinning the software you use do not matter as long as you remain purely a user. This is because restrictions almost invariably arise around distribution, not use. That means that internal use of open-source software is pretty safe.
If, however, the software is deployed outside of the business, then you will need to start paying close attention to details, including the licenses used in the software and any of its dependencies (which could get very complicated). Broadly speaking, using copyleft code anywhere in a program will require you to copyleft that entire program.
Once you understand the broad rules, it can be easy to oversimplify the legality of the licenses. In fact, there are lots of edge cases that we can consider, some of which have never really been fully tested in court. Most of the legal disputes would arise with regard to (a) what constitutes a ‘derived work’ and (b) what constitutes ‘distribution’.
A situation that might immediately spring to mind is whether the license used for a programming language itself might impact the licensing of programs written in that language.
When we say “the programming language itself” we are really talking about a compiler for that language. A compiler accepts human-readable code as input and produces machine-readable code as output. So on first pass, a program written in language X doesn’t immediately look like a derived work of the compiler for language X; it is just an input. In theory then the program can be licensed however you like.
There are some caveats though. In practice many compilers will ship with standard libraries, and those libraries are commonly written under the same license as the compiler. It really is quite challenging to write a program without using any standard libraries.
However, the most popular copyleft licenses tend to make exceptions for the use of standard libraries, such that importing them into a new program does not make that program into a ‘derived work’.
What about libraries in general? As usual, the specific license matters. However, there is a popular copyleft license (the LGPL) which was specifically created to add caveats for libraries, and this is widely used. For permissive licenses, it usually doesn’t matter.
Software as a Service
Software that is not installed locally but instead run on a third-party machine has become rapidly more popular in recent years. This would include things like web apps and cloud services. There is an obvious question about whether providing software in this way counts as distribution and would therefore cause copyleft licenses to kick in.
As usual, different licenses say different things. The consensus seems to be that if the license is not specific then this does not count as distribution, and most licenses are not specific (the AGPL being the main one to look out for). Providing a program through a web app would therefore not strictly require the provision of the source code under most copyleft licenses. Remember though that consensus can mean very little when it comes to legal arguments.
R and Python are two great examples of open-source technologies that actuaries are beginning to embrace, but there are many more. Most examples of open-source that I have encountered in my career have been teams of actuaries using these programming languages internally for various odds and ends, though I am also aware of large projects that have put R at their centre.
It is interesting to consider how our profession might continue to interact with the open-source approach however, given our long tradition of knowledge sharing across company lines. Insurers certainly feel like solid candidates to make use of the open-source approach – after all, code quality is rarely what insurers compete on. It may indeed make sense for insurers or consultancies to start supplying open-source software in an attempt to improve the quality of software throughout the industry, and this may indeed be something that various stakeholders (e.g. the IFoA and PRA) would be interested in supporting.