APR’s Code Clinic – Case Studies and Lessons Learned
Since its inception in the summer of 2020 as part of our technology initiative, the Code Clinic has taken on a number of challenges – multi-core compiled R code, Shiny web apps, Docker images on the cloud, new Excel features taken to the extreme, and lots of Python. Initially we started by tackling small individual problems, and after a detour where we took on huge team projects, we have now returned to solving small individual problems (such as recreating a ‘left join’ within Excel). This article summarises two of the larger Code Clinic projects we have been involved in, before reflecting on some lessons learned.
The Shiny App
After several weeks of small, bite-sized challenges, it was time to change up the Code Clinic and do something team-based spread over multiple weeks. Our tech director, Phil Creswell, had the idea to build a Generalised Linear Model (GLM) fitting app based loosely on some work he’d been doing for a client. The goal was to get a pretty complete data science pipeline set up in an R Shiny app. That is, we wanted to be able to load in a table of data, set up dependent and independent variables, select a link function for our GLM, and then get out a model with quality-of-fit statistics and visuals.
For those who don’t know, Shiny is a clever library from the R Tidyverse which allows you to attach a web-based user interface to an R program. For a modest amount of effort it can change a stack of loose R scripts into a nice clean interface that runs through a web browser. It looks a lot more professional than running R scripts directly. However, there are a couple of catches.
One catch is that although there is a low barrier to entry for Shiny, a serious app can get quite confusing for the uninitiated. Shiny is based on the “reactive” programming model, which is rather different to what most people coming from procedural programming will be used to. We wanted a nice clean app with lots of interactive features, and this meant we had to really get comfortable with reactive programming.
Another fairly distinct feature of R is that it has a very relaxed, dynamic type system, and its default behaviour tends to be to simplify types wherever possible. For example, if you have a 1-row matrix, R will often be tempted to just convert it to a vector without asking you. This makes R code rather prone to bugs, and when combined with Shiny, this is especially unpleasant.
The way we chose to manage this, as well as our team-based approach, was to make use of GitHub. Some of us at the Code Clinic have seen clients use GitHub to manage projects, but this was an opportunity for us to play around and see how we might like to do things at APR given a blank sheet of paper. Our team made extensive use of branching and the issues log, which we found really helpful for turning our idea into a proof of concept.
All in all, this was a really interesting project. For only a few hours per week, it’s safe to say that we all learned a lot about a lot of different things – both on the tech side (R, Shiny, reactive programming, Git and GitHub), and on the data science side (challenges when selecting model variables, choosing a link function and linear model for a GLM, robust ways of testing model fit).
I think it’s fair to say that harnessing the Cloud has been our tech director’s white whale for some time. After establishing the popularity of the Code Clinic and getting some simpler projects under our belt, we decided to go on one of our longest and most challenging projects.
For this project, we split into two teams. One team, led by me (Joe), would be team AWS. The other team, led by Ian, was team Azure. Phil issued 10 problems to be solved with the cloud, ranging from the very simple (set up an account with appropriate security measures in place) to the extremely niche and probably impossible (run an Excel spreadsheet calculation in parallel across cloud servers).
This was a hard challenge, and well outside of our comfort zone – the Cloud clearly being the domain of IT experts. Being an enthusiastic coder was not enough – we had to learn about firewalls, web sockets, computer hardware, Docker images, and much more. On top of that, most of the infrastructure of the Cloud assumes familiarity with Linux – not a safe assumption for the typical actuary!
Despite all of these challenges, both teams fared very well. We ultimately managed to produce results for all but the most complex challenges, though it is fair to point out that we had no independent assessment of the quality of our solutions. This challenge really made it clear that there are some things that are not best learned through trial-and-error. Indeed, there is a reason why Cloud providers have their own bespoke qualifications for Cloud engineers.
Having said that, I personally found it very impressive how much we managed to figure out on our own. This was exemplified in the solution to one challenge – namely to write a formula in a cell in Excel, and have it spin up a serverless Cloud function on AWS to compute the result and return it to the spreadsheet. Although this is clearly not a sensible use of Cloud resource, reasoning about how to make this work involved a relatively thorough understanding of certain aspects of the Cloud. As you might imagine, this is not the kind of task that you can find a tutorial for, and so we had to reason from first principles on our own two feet.
I have no doubt that a few of us will return to the Cloud at some point (I know I intend to), but it is also true that this challenge was very humbling. Most of our tech enthusiasts tend to be programming enthusiasts – they love to roll their sleeves up and solve a tricky problem with some code. This Cloud challenge involved almost no coding – just a lot of prerequisite knowledge in areas unfamiliar to most, and a lot of research on top. The teams should both be very proud of themselves for what they did, but this challenge was proof that the Code Clinic challenges can be seriously tough work and are not always about showing off flashy code.
One question we sometimes had to ask ourselves is ‘when do I try and solve the problem myself over trying to find an out-of-the box tool?’. On one hand, you have more control when you devise your own solution, but it’s usually more work for what is often a lower quality result than a pre-made solution created by experts. In most cases we have found that the latter can get you most of the way there but not always fully solve the problem, and you usually don’t learn as much using a completely out-of-the-box solution. However, if you reach a point where you have exhausted your own skill and understanding of the technology available, then you can find yourself in an equally frustrating situation. During our project where we automated the APR recruitment process, we employed the use of Power Automate (see our previous article on the subject here). Power automate was a happy medium, as although it is a Microsoft product, we still had enough control over it that we could adapt it to our needs.
Another interesting aspect to consider when coding a solution to a problem is the choice of language, namely how different languages are optimised for different tasks and learning how to take advantage of those optimisations. This can be especially the case when moving across different coding languages, as it can be tempting to transfer your learned techniques from one to the other, but in doing so you may miss out on any idiosyncrasies of the new language that were not present in the old one. For some of the smaller, weekly coding challenges, we often each coded a solution in a different language, so these differences were evident when everyone went about solving the problem in a different way, exploiting the nuances of each language used.
Something else I have noticed while learning to code is that, after that stage where you have to look up how to do everything in a coding language, you reach a point where you can do most tasks using the limited knowledge of the coding language and adapting that to whatever you are working on. In this way, you succumb to a state of complacency, as you no longer consider that there may be a better way (mainly for me because I was still in the mindset of ‘I’m just happy that it is running’). But during these times when I was presenting my solution, not only were more thoughtful ones presented but it was often pointed out to me that there were more efficient ways to do it that I never would have come across on my own. So now, instead of settling for the fact that something works, I also ask myself ‘is this the best solution?’.
The Code Clinic does not always have concrete objectives; by its nature it is exploratory, and different people have been able to attend more or less regularly at different times. One thing is certain though – it has been a weekly staple for our tech enthusiasts to explore new things and learn from their peers.