IFoA Data Science Certification – Course Review
In early 2020, the IFoA launched their flagship Data Science course for actuaries. This has prompted actuaries to ask the big questions before committing to the course; questions such as:
- How long does the course take? How much commitment is it?
- What’s in it?
- When I’ve finished, will I be able to take part in a hacking montage on TV?
In this article, John Nicholls demonstrates outstanding dedication to content production by enrolling in the course and producing a review which answers every single one of the questions above. It will cover course content, length, assignments, and delivery, in such a way that those of you on the fence might be gently encouraged to fall off it.
How is the course structured?
Broadly, the course is split into six individual modules across ten weeks of learning. In theory, the first six weeks are spent working through the material in each module (one per week), and the final four weeks act as an opportunity to put the theoretical knowledge into practice.
The six modules are:
- Introduction to Data Science, Data Management and Processing: this section introduces the student to basic data structures. It covers data structures (e.g. json), obtaining data (e.g. web-scraping), and cleaning data. For the latter, the student is introduced to some cleaning tools such as OpenRefine.
- Data Analysis and Introduction to Machine Learning: this section introduces simple data summarization statistics, considers the spread/shape of data, and walks through the central limit theorem and statistical significance. It then briefly introduces Machine Learning (which is covered in more detail in section 4), talking through classification algorithms.
- Data Visualisation and Communication: this section focuses on taking the “insight” gained in section 2 and communicating it to various levels of audience. A number of visualisation types are looked at, and some advice on narrating a data story is provided. A focus on clear and concise communication is fostered.
- Further Analysis and Artificial Intelligence: this section introduces AI in full, moving through Natural Language Generation and Processing (NLP and NLG), data mining, and image analysis (with other techniques covered).
- Good Practice of Data Science and Responsible AI: in this section, topics covered include data privacy, transparency, fairness in machine learning, identifying and mitigating biases, and exploring ethical implications in domains such as our industry.
- Future Directions: this section finalises the course by summarising the considerations an actuary will need to bear in mind when applying AI/ML methods to their company, as well as where the profession might go in the future as Data Science becomes more prevalent.
Looking back, I feel the course will be received largely depending on the prior knowledge of the student. Personally, I went into it with a good background in the more technical actuarial skills, as well as some coding and software skills; however, I had never learned much about Machine Learning and AI. As such, I found large parts of sections 1-3 very simple, as restatements of theory I would expect most actuarial students to know. On the other side, some of section 2 and all of section 4 was more or less new to me, and there was certainly a learning curve in the assignments! Most of it was interesting, although ethical considerations had a lot of overlap with the Actuaries’ Code and a lot of concerns that have been floating around under GDPR.
That said, there wasn’t an incredible amount of depth on show – there is a general focus on breadth and theory than depth or practice. There were some very good extension materials in sections 1-3, with introductions/tutorials on a number of different technologies, which was to be taken at the student’s discretion. To get full value out of the course, it is recommended that the interested student be prepared to do all the extension material, mostly alongside the course content, as it will build the relevant skills.
How is the course delivered?
Students are provided with access to an online platform. This access lasts from the beginning of the course and is revoked six months from the end date of the course, which provides plenty of time to internalise the lessons taught. In particular, even though 10 weeks might seem like a long time to do the course, it is done alongside a standard working life, meaning often I felt I was consuming the content to get the assignments done, rather than feeling able to take my time and properly understand everything.
Each section was split into individual chapters, with chapters delivered in a number of possible ways:
- Pre-recorded video to watch (transcript to download)
- Interactive training for the student to click through each section
- Case study with a downloaded PDF to read concerning a specific example of where the chapter might apply in real life scenarios.
These chapters led to three exercises which would make up the “marked” component of the course, and therefore lead to the certification. With respect to the course providers, we will refrain from providing much information about the exercises themselves, but:
- Exercise 1 covers Section 1 and 2
- Exercise 2 covers Section 3
- Exercise 3 covers Section 4, 5 and 6
Each exercise explores a practical situation that might require some level of application of the concepts learned within the course; examples may include data cleaning, data visualization and applying M-L techniques to actuarial work.
Alongside these exercises, the course providers scheduled group tutorials (in which students could ask any questions they had about the material or exercises), and individual tutorials (in which the student went through the exercise just completed, and looked for improvement points). These were highly useful, but a couple of criticisms presented themselves:
- The group tutorials were sometimes scheduled at inconvenient times (e.g. 10AM on a working day). Despite efforts to mitigate this (such as recording them and making them available to all students after), it felt harsh where a less flexible schedule was present.
- The individual tutorials were excellent and an opportunity to dive into specifics, but there were not quite enough of them; I received two (one for each of the first two exercises), and it felt limited with only a half-hour each time.
In spite of this, the course tutors, it must be noted, were exceptionally accommodating around scheduling issues. This could range from deadline extensions (such as I needed during the 10-week block) to individual tutorial rearrangements (one such was held on a Saturday morning for me). E-mailing the tutors was a very simple process (the online hub provides a client e-mail service), and responses often came within the day; I was very grateful to be given that level of support, and the feedback was really useful after each exercise.
What kind of commitment is it?
In a feat of pure miracle that would surely have started a religion or two had it been preserved by the artists of the day, I was able to seamlessly balance workload, family commitments, the Data Science course, and every Premier League football match of the era. All it cost was two deadline extensions, several late nights and the mental resilience to skip Bournemouth playing Crystal Palace on a number of separate occasions.
The point is, it certainly was a difficult balancing act – the course officially says it roughly comes out to 10 hours a week of effort for that 10-week period. Because the material is released in each of the first six weeks, and the exercises due in weeks three, six, and ten, the workload is fairly inconsistent through this time. I think that estimate is accurate, broadly, but there are spikes of work in which the workload may be unmanageable if your day job requires longer hours. An example might be if your work is in reporting and the busy period hits during a deadline week.
As a result, I would caution prospective students not to sign up for the course if they have clear commitments, such as exam preparation (fortunately, the IFoA seems to not organize the course during that time), predictable spikes in workload, or additional time-sinks outside of the former. There is, ultimately, a question over whether “manageable” is the aim – the quantity and quality of extension materials makes it more valuable to set aside the time to do everything, not just the course-mandated exercises, which do not cover as broad a spectrum of topics.
How difficult did you find it?
As stated above, the level of experience a prospective student has with pre-existing Data Cleaning techniques, or knowledge of Machine Learning or AI processes, can help a lot. What I found was that none of the course material itself was particularly challenging; conceptually, it’s easy to understand the subject matter, and there’s no distinctive learning curve (as noted above, sections 1 and 2 should be familiar to most actuarial students or STEM graduates).
The only exception to this would be that I felt the video method of delivering training was more difficult to digest; there were visual cues on them (e.g. graphs, tables) where even though visibility was fine it would have been helpful to have more interactivity or more clear detail. Having the transcripts be downloadable was helpful, if nothing else because completing the exercises sometimes required going back, and re-watching a full introduction of a 5-minute video can drag where taking the transcript and searching it for keywords was more expedient.
Despite the above, the exercises were quite challenging. Unlike in, say, a workplace, there was comparatively little guidance given to students on what to do – the task was provided in simple wording, but how it would be achieved was open to interpretation. I felt this was a really useful thing, as commands like “Clean this dataset” are very easy to achieve if they come with more specific steps. It was helpful to be asked to generate our own ideas and come up with our own solutions. In fact, in the individual tutorials, tutors would often point out something that was overlooked (usually quite obscure but very useful), and I felt a lot of value added was provided here. Additionally, the exercise feedback was very detailed on specific mark scheme points in terms of how the student did or did not achieve them – this almost always turned up unexpected insights.
Cut to the chase! I’ve only stuck with this article for so long because of your promise, now tell me! Hacking montage, yes or no?
*No guarantee is made that the course itself prepares for this; you may already be qualified and not know it. Additionally, please consult the Actuaries’ Code for any ethical implications of non-TV-montage hacking.
So, overall, would you recommend it?
If you’re new to Data Science and hold a definite interest in following up on it, you are likely the target audience! Particularly if you have no existing experience of Machine Learning and AI; the course is fairly introductory and there’s very little hands-on ML/AI, so a more experienced user might find it redundant. The big thing that hasn’t been mentioned is the course cost – which is just north of £1,500. I think this is a large drawback – much of the material is freely available online, and the largest added value (apart from the tutorials) is in the case studies, which tie generic data science into actuarial work.
Therefore, taking the above into consideration, I would not recommend the IFoA Data Science course, and this recommendation is based largely on the pricing. However, if you’ve read this article and felt there was a lot of interesting subject matter, and you have a way of funding it (say, via employer scheme), it’s definitely a good introduction which places an actuarial context on the subject.