Inheriting a piece of actuarial code
Why do actuaries care about code?
When researching about being an actuary there is often a lot of explanation about the risk management and mathematical side of the work. What is often not mentioned is how often actuaries encounter the need to code.
VBA (Visual Basic for Applications) is the most common form of coding actuaries will be exposed to due to its integration within Microsoft Excel, the actuary’s program of choice, allowing Excel processes to be automated so that they take only a fraction of the time. Actuaries may also be exposed to large-scale data manipulation, requiring the use of SQL and/or SAS. Insurance company modelling software will require coding to define its many variables. Other programs and languages such as Matlab, R, Javascript or C++ may also be encountered through the use of other bespoke applications.
What problems are there with actuarial code?
The problem is that the background of actuaries is usually mathematical or statistical, and in the early years actuarial students are bombarded with so much new information relating to the actuarial exams and other specific information about their place of work, that they often receive little to no formal training in coding (APR being an exception!). Poor practice can develop as people write inefficient code because they’re unaware of alternatives, or write difficult to understand code due to a poor understanding and a lack of best practice enforcement.
Another major issue can be a lack of ownership. All too often tools are written by a ‘VBA expert’ in a team who eventually leaves. When the tool needs to be modified either external consultants need to be brought in who may have a very different style of coding, or someone less experienced within the team attempts to alter the code. Over time if no one properly manages the code, it can become bloated and unwieldy, with several sections of redundant lines that do nothing or convoluted code that takes several lines to do something that could be done in one. Compounding the issue, the code can become difficult to read, making improving it or adding new features more difficult by making it difficult to find out what each section of code is doing (and what it’s supposed to be doing, which may be different!).
My coding project
As a consultant for APR, I often found myself having to make changes or rewrite code written by others. One project involved me having to work with a set of about 30 code files, ranging from 100 to 3500 lines in each one, that were used in modelling with-profits assets and liabilities.
This code was inherited when a new manager took over a team, and he discovered that it had been very poorly written. The most recent documentation on this code was over 4 years old and was incomplete. It had been written over several years by people making minimal code changes to how the code calculated its variables to avoid changing as much code as possible. This became a problem if code they wanted to change interacted heavily with other code.
For example, a block of code calculated the value of premiums which would then feed into the liabilities; however a later development changed the process so that this premium information was fed through an input. Instead of removing the premium calculating code, someone had just changed the liability calculation to refer to the input rather than the premium variable, leaving all the premium calculation code, now redundant, as is.
All this redundant and poorly written code was slowing down the model and making adding additional features more difficult, and so the code needed to be cleaned up.
My work was to rewrite all this code so that it performed exactly the same, but with fewer lines, no redundancy and greater readability. This is a very different process to writing a piece of code from scratch, particularly if there is a lack of documentation, meaning that the existing code is the only way to know what the code should be doing!
How to rewrite code you haven’t seen before
I did the process of rewriting the code by learning to read the code ‘backwards’. That is, I started from the outputs that were required and traced them back to find out which lines of code fed into these outputs and which didn’t. A helpful property of code here is that it’s generally linear – code further down is performed after code further up. Using find functionality makes it easier to find the next or previous time a variable is used, letting you know if it’s a ‘dead end’ variable or not.
Code structure helps with this process. While not changing the number of lines, properly structured code can make it far more readable and easier to document and manage before beginning the process of simplifying and modifying it. When reading poorly written code that wasn’t written by me, sometimes it became easier to manage simply by rearranging, including using proper indentation and spacing. I would also always try to keep code related to a particular output in one place so that anyone else who needs to read the code (including me) would be able to view it all in one go, rather than having to scroll up and down to find all the component parts.
Once the code is well structured, I would start with a final export variable, and look at what fed into it. This was followed by noting what fed into that, and so on, until I had a picture of one of the export variables. This code could then be analysed in isolation to see if there was any way of simplifying it and/or making it more readable. A good example is the following:
Old code:
Capital_gains_tax = CGT_rate * (Assets – Assets_time_0) Assets = Assets – Capital_gains_tax
New code:
Assets_post_CGT = Assets_pre_CGT – CGT_rate * (Assets_pre_CGT – Assets_time_0)
If the variable Capital_gains_tax is only required in this calculation, then it can be removed so that two lines become one line.
You can also see that to make the code more readable, I’ve renamed the ‘Assets’ variables. This means that it’s far clearer throughout the code what each variable actually represents. With the old code, ‘Assets’ represents at least two different things, and to work out exactly what it represents in a given line requires reading through the whole code. With better variable naming and using a new variable rather than adjusting an existing one, reading and making changes to the code is far simpler, since you can instantly see what a variable represents in a given line and so which variable needs to be changed for a development.
Know your coding platform
A key issue I came across was that due to the lack of good documentation, many in the team were unaware of exactly how their programming platform worked. For example, it was assumed that all variables had to be initialised, otherwise the code would result in an error. This resulted in code like the following:
Dim assets as double Dim liabilities as double assets = 0 liabilities = 0
However the platform this code ran on was VB based, and I knew from my previous work in VB(A) that VB automatically initialises dimensioned variables. After doing some tests of my own I discovered that indeed, the lines setting variables to zero were completely unnecessary, allowing me to remove hundreds of lines of code.
I also saw a lot that was similar to the following in the code:
Continuous_rate = ln(1 + Get_Spot_Rate(<inputs>))
Whoever wrote that was unaware that there was already a built-in function for getting a continuous rate:
Continuous_rate = Get_Continuous_rate(<inputs>)
So it helps to be aware of what the program you’re using is capable of and how it can be used.
Documentation and comments
As mentioned earlier on, the documentation for this code was almost non-existent. This created two problems. The first was that it was very hard for me to work out what the code was supposed to be doing when I was trying to rewrite it. The second was that once I had worked out what the code was doing, I had no way of checking if that was what it was meant to be doing!
After re-writing the code, I wrote a code changes specification to explain the changes I made, and a code commentary document to explain my understanding of the code, which was limited more to what the code was doing rather than why the code was doing that. If I came across a section of code that was suspicious (because it was doing something not in line with my actuarial understanding), I would document it and raise it with someone else, however my project was supposed to have the new code output exactly the same results as the old code, so changes would not be made until someone could review it.
Results
My work ended up reducing the number of lines of code in all files by about 75%, with the largest code file going from over 3500 lines of code to about 900, all while performing exactly the same calculations and providing exactly the same output. This should illustrate how bad code can get and be a lesson in how useful it is to review and reorganise old code, or better yet, write it better in the first place!
Five top coding tips
- Use sensible variable names, and avoid using (non-looping and non-index) variables to represent more than one thing (e.g. using ‘assets’ to represent assets pre and post tax) within the same block of code.
- Comment and document your code! This can be useful for other people reading your code later on, but also yourself if you ever have to come back to code you wrote a long time ago!
- Structure your code so that it’s easy to read. This includes simple things like indentation, but also keeping related variables and their calculations in one place. Use subroutines, user-defined functions and classes to keep your code well-structured.
- Know what your platform is capable of and always be ready to learn something new. If you’re struggling to do something efficiently, it might be that there is a simple method you’re unaware of. Don’t be afraid to ask around or search online for help!
- If you ever need to modify someone else’s code, take the time to work out exactly how it works before making your change. If it’s a large piece of code, learn the big picture of the whole routine and your section in detail. This will help you to modify the code more efficiently and also identify sections that are now no longer needed in light of your change.
Martin Prutton
APR 2016