Structuring Actuarial Models
The challenge
As actuaries, much of our work involves the production, use and maintenance of cash flow models. Many such models will be built in MS Excel – the natural ally of the actuary. Excel models are generally easy to understand, easy to build, and easy to test. There are exceptions to this rule, of course, but the amount of Excel expertise required to build decent models is relatively low compared with models written purely in code.
Unfortunately, Excel suffers from scaling problems and can get very slow as models become more complex. In the actuarial world, it is common to want to model an unknown number of factors, such as funds held by a unit-linked policy, where the number of funds may change over time. Excel models that try to handle such multiplicities are rarely elegant.
A common alternative is to build models in VBA, VB or something similar. (While the focus of this article is on code-based models, many of the points made apply to models developed in proprietary actuarial software platforms such as Prophet and MoSes.) This approach is much more potent but suffers from several drawbacks which are sometimes prohibitive. For starters, there is rarely more than a functional knowledge of coding in actuarial teams, and this makes people rightfully worried about the risk of key person dependency. Furthermore, it is not easy to get into a model written purely in code and see what is going on.
Often, though, these drawbacks will be more than offset by the gains, and a model will be built using VBA or similar. Because the priority for the business will be to get the model working and tested in time for a deadline, the quality of the code is not usually the top priority – and perhaps rightly so. Unfortunately, this means that code-based models built by the typical actuarial team have the potential to be somewhat:
- Difficult to understand
- Difficult to peer review
- Difficult to explain
- Difficult to maintain
- Difficult to extend
- Difficult to test
This is not an attempt to pretentiously deride such models. It takes a considerable amount of time and knowledge to build a model that does not have any of the above limitations, and it takes even more time and knowledge to keep them that way over a long time and across multiple owners. Even coding enthusiasts and experts will make numerous poor decisions when building or maintaining a model. It would simply be uneconomical to devote the additional time required to make the code perfect.
Unfortunately, the result is that over time models tend to become more and more difficult to work with. The question is: what measures can realistically be taken to stave off the above characteristics?
There is a unique challenge here in that there is very little authoritative advice on what to do. There are plenty of conventions for writing good code, but the more specific advice tends also to be specific to the coding language and application being built. Whilst some mantras are quite universal (e.g. “A lot more time is spent reading code than writing code”), very little is tailored to an actuarial context, for obvious reasons. Outside of software development professionals, the emphasis tends to be more on getting answers than ensuring the code is immaculately organised.
What to aim for
So let’s try to formulate some rules to follow. First, let us distil some criteria by which we can assess code quality:
- The code must be easy to understand
- The code must be easy to test
- The code must be easy to maintain
- The code must maintain these qualities in the face of inevitable code changes
The fourth point is a tough one. It is a problem faced continuously by software developers who need to share code amongst multiple people and have it still be coherent.
Different approaches to models
Let us consider some approaches to building a model in VBA, which is probably the most familiar coding platform for an actuary who does not already have a very good grip on these problems. Consider two approaches to the organisation of the code: a monolithic approach (where all the code resides in one subroutine) and a modular approach whereby code is spread across multiple modules, and each module is separated into multiple subroutines and functions.
You might reasonably suspect that the modular approach is superior to the monolithic approach in terms of meeting the criteria outlined before. There are indeed many arguments supporting this view. However, it’s not all sunshine and rainbows just because the code has been split across modules. Code that is only notionally modular, and is in fact just fragmented without thought, can be extremely difficult to work with. Not only is the code more difficult to follow, if the code needs to be changed there are now many more choices to make:
- Which module should you make the change to?
- Which function or subroutine should you make the change to? Should you create a new function?
- How should you pass the inputs to your new function?
- How should you name the function?
- How should you test the function?
- How should you test that the new function hasn’t broken the existing code?
This essentially means there are now many ways to make the code worse and only a few to make it better or even just preserve the quality.
Hopefully you don’t need to be convinced that monolithic code is not the solution to these problems – it tends to just sweep them under the rug. That said, it is at least possible to read monolithically organised code from top to bottom and be confident that you have not missed anything. That’s not to say that monolithic code cannot be well-written. However, it tends to scale poorly with increasing complexity, and it is therefore very difficult to satisfy aim #4 in our initial list without considerable expertise available.
So, let’s take for granted that a modular approach has more potential to be fruitful, but also perhaps more potential to be very bad. It has already been suggested that modularity can cause problems because of too much freedom to choose – it is generally much easier to make code worse than it is to make it better. So, what is needed are some ground rules that help to filter out the bad choices – we suggest some towards the end of this article. These rules can then be followed relatively mindlessly, and only deviated from once they are understood well enough to properly justify making exceptions.
Pure functions
A good idea when building actuarial models is to write code as a set of pure functions. Pure functions do not have any side effects; they take inputs and give an output, and that is all. Furthermore, if you put the same arguments into a pure function twice you will always get the same result – they do not depend on the external environment. Excel functions are pure. It is not a coincidence that any reasonably well-built Excel model is also usually self-explanatory. This is one of the hallmarks of good code – the need for comments is minimised.
Pure functions tend to be small and simple. This means that the solution will be well articulated in the code and hence will be easy to understand and test. Sticking to pure functions means avoiding global variables, and the passing of arguments “by reference”. These things make it generally difficult to follow the code logic, and more seriously make it impossible to know what happens throughout the lifetime of a variable without reading all the code.
Bringing it all together
There are some potential issues with using pure functions. One is that you can end up with a large number of trivial functions which are involved in a complex cascade of calls. This means testing the correct integration of the functions becomes tricky. Furthermore, the large number of functions organised in some sort of hierarchy can make it tedious to do something simple like adding a new argument. If the argument needs to be added somewhere far down the hierarchy, then every function involved in the call sequence also needs to include that argument. This can be error prone and involve a lot more typing than is strictly necessary.
The first issue can be solved by grouping the functions at an appropriate level, and performing a partial integration test on the group. There is a natural way of doing this that is somewhat analogous to the organisation of a model in Excel – use a separate module as you would a fresh worksheet in Excel. So, depending on the complexity of the model, you might create a new module for each distinct cash flow that is being modelled. The individual functions that help to calculate the cash flow can be integrated into one function which can then be tested to ensure that the different functions that it makes calls to have been integrated correctly.
The problem of arguments may seem fundamental to the approach, but there are some elegant ways around it. One is to group data into structures or classes, and pass those into functions instead of passing a long list of variables. This generally saves on typing and means that adding a new argument to a function can be as simple as adding a new member to a class, and then just pulling it out in the function. This requires some basic knowledge of how to use classes and/or structures, but it is certainly worth the investment of time learning.
Using classes or structures has a minor drawback in that declaring a function as function(big_class) is not that helpful when reading the code. You don’t know immediately from looking at the declaration what data is going to be utilised in the function – only that it resides in big_class. A good solution might be to unpack the class one level up from where you intend to use the data, such that you have something like big_function(big_class) = little_function(big_class.item). That way it is clear in big_function that you are going to be using item from big_class in the call to little_function. In the declaration for little_function, you can then describe what variable is actually going to be used. There is some room for judgement here – the earlier you start unpacking your classes, the more work you must do when making a change. The later you start doing it, the less clear your functions become. For actuarial models, it is rarely a critical choice.
Ground rules – a step by step approach
So, with all that in mind, here are some steps to follow when creating a model from scratch in VBA (or something similar):
- Think about the inputs and outputs to the model. Determine multiplicities and nesting of data, e.g. several funds can belong to one policy, several policies can belong to one investor.
- Create classes to hold the data sensibly. For example, you might have an Investor class, a Policy class, and a Fund class. The Investor class would a have a member which would be a collection of Policies and the Policy class would have a member which would be a collection of Funds.
- Write code which maps data from the data interface (e.g. a user form, an Excel sheet, an XML file) onto the classes.
- Sketch out the modules that will be needed in the model. For example, you might have a Premium module, a Withdrawal module, a Growth module, an AMC module, and a Fund module.
- In each module, create a public function with a sensible name. If the calculations are complex then write some dummy code inside, but try to mimic the calls to other functions as they would be in the end. Make sure to specify the fully qualified name of functions for clarity i.e. variable = Module.Function().
- Create an Output class (or several Output classes if sensible) which contains members which reflect all the outputs of the model.
- Map the appropriate functions to the output class. For example, one of the outputs might come from Fund.fund_value(arguments).
- Do an end-to-end test to ensure that the model accepts data from the specified interface and correctly maps functions to the output class.
- Fill in any dummy model code with the real code. Break down complex functions into simpler ones. Make use of the Private keyword when you want to restrict a function to that module. Try to aim for one Public function per module; it’s a good rule of thumb for when a new module is appropriate.
- Write some code that maps the output class to wherever the outputs will be going (e.g. a spreadsheet).
- Test the model again at multiple levels. You’ll probably only need to test the small Private functions by inspection or by using straightforward unit tests; it is more important to ensure that the small functions are integrating correctly into the higher level Public functions. Carry out full integration testing with test data. An independent tester can and should be utilised, but there is nothing stopping the coder from testing as well.
- Add some error handling if you know how to do it well. Don’t if you don’t.
- Go back through the code and add comments. Comments can be helpful (eg it is good practice to have a comment describing the purpose of each module, procedure, global variable etc), but hopefully you won’t need many as much of the code will be self-explanatory.
As a final thought, be sure to follow some sort of consistent coding conventions throughout, and preferably try to use whole English words in variable and function names. If there are a lot of modules then consider adding a prefix (in VBA), such as “A_”, “B_” so that they are ordered sensibly.
In the end, you will end up with something which structurally should resemble the schematic below:
The green borders indicate appropriate levels to perform testing. Testing should start with the simplest functions and grow to encompass more and more pieces.
The number and nature of the input and output classes will depend on the precise input and output requirements, and the number of modules will depend on the complexity of the model. Here, an output module orchestrates a projection loop consisting of four modules. Each of those modules contains one public function which returns a number, in general by initiating a cascade of calls to a set of private functions. The four modules arranged in a loop above could be interpreted as recursion; this is not recommended, as recursion is not implemented efficiently in VBA.
We have developed a simple APR example model exhibiting the key characteristics discussed in this article, and would be very happy to share this – if you would like a copy please contact joseph.barnett@aprllp.com .
Hopefully this provides some solid ground upon which to base actuarial models that are written in code. There is a lot here for enthusiasts to get their teeth into. We advise those looking to take their model design to the next level to look into different coding conventions, programming paradigms, testing, classes and error handling and look to introduce these techniques into their models to help meet the aims we have set out above.
Joseph Barnett
APR 2017