Talking with Pythons: How to Debug Failing Code
Have you ever been stuck in a conversation with a real stickler for grammar? Unless you speak perfectly, they’ll claim not to understand a word you are saying. The conversation often goes something like this:
“Can you pass me the salt?”
“Yes – I am physically able to pass you the salt…”
Sometimes, this is how coding can feel – if you make even the smallest of syntax errors, then the compiler will spit out all kinds of dramatic error messages. This can seem petty and vindictive… but for coding to be reliable, it needs to be precise. As frustrating as this might be, things could potentially be a lot worse if the computer guessed – incorrectly – what you were trying to achieve. (In other words, if you don’t write code in exactly the correct way, then you’re not getting your salt.)
My Code Isn’t Running and I Don’t Know Why – Help!
When encountering an error while running code, Python will do its best to give information on why this has happened. That being said, some of the error messages can be somewhat unhelpful. The following are some common ones that may be particularly tricky to interpret:
- SyntaxError: Error: unexpected EOF when parsing
When the code throws up an error at the EOF (end of file), this means that you have stopped writing code when the interpreter needs more to interpret it. A common example of this is when you have forgotten to close a bracket!
- An error occurring in a module – e.g. numpy.core._exceptions._UFuncNoLoopError: ufunc ‘add’ did not contain a loop with signature matching types (dtype(‘<U4’), dtype(‘<U4’)) -> None
Oh no – is NumPy broken? (Probably not!) Python’s error message will usually direct you to the last place it found an error – but this is not always where the error started. If the place where an error occurred is in a module, it’s not necessarily true that the fault is in the module code itself. Rather than digging into the module’s code, you may need to look at the parts of the code that you have written that call or interact with the module.
- NameError: name ‘gaurantee‘ is not defined.
To be able to call a variable, you must first define it; otherwise, you will get something called a ‘name error’. Often, this can arise due to misspelling variable names (I personally have set the record for alternative spellings of ‘guarantee’!). If you’re close enough to the actual name, Python might offer a helpful suggestion. What’s more, some IDEs have built-in tools to help you catch these typos – for example, in VS Code, if you click on a variable then all references to it will light up.
- AttributeError: ‘DataFrame’ object has no attribute ‘sort’
I got this error trying to run the line of code sort() on a pandas DataFrame. The issue here is similar to the previous bullet point – nomenclature – and the best way to resolve this is to look at the documentation. Looking into the documentation for pandas, the actual function I was looking for was called sort_values. Functionality may also be deprecated or renamed between different versions of libraries – so check this too.
Ultimately, the editor is limited: it can tell you what the error is, and potentially when and where it occurred – but not why. Consider a divide by zero error – Python will flag that this has occurred, but it is up to you to investigate why this zero is there in the first place! Is this desired behaviour? Should it be possible in the model for this value to be zero?
If yes, we need to think about what that means for the logic of this calculation, and what the result should be if this value is zero; we might want to think about the ‘knock-on’ impact on future downstream calculations. If no, then we should put in measures in place to prevent this value from being zero in the first place – data validation, or by applying a floor.
We need to look upstream too. If this zero value has arisen from the result of a previous calculation, then the error here could point to an error earlier in the program, or in the inputs provided by the user. Sometimes debugging requires us to examine things in very close detail, but at other times it calls us to consider the bigger picture: programmatic flow, how the components interact with each other…
It’s important to make sure we’re get to the root cause of a problem if we want to prevent errors from occurring again. The point at which the code gives an error message is not the same as the point at which the code went wrong – it may just be that until now, the bug hasn’t prevented the code from running. In fact, there are plenty of bugs that don’t stop the code from running at all – just because code runs, doesn’t mean that it is working.
My Code Is Running, but it Doesn’t Do What I Want it to…
If you are anything like me when I started learning to code, you’ll be so happy that your code runs, you won’t even care if it is doing what it is meant to. Unfortunately, these ‘logic errors’ can be even harder to fix, as the code cannot intuit how you want it to run. These bugs often occur when the code is behaving ‘unexpectedly’ – so here are some quirky behaviours of Python that are worth being aware of:
- Many functions we work with have output values. Say we have a function called ‘bubble_sort’ that takes in a list of numbers, puts them in numerical order, and returns the sorted list:
> raw_list = [3,1,2]
> sorted_list = bubble_sort(raw_list)
Here we create a new variable called sorted_list, and assign to it the output value from our call of the bubble_sort function.
Some functions, however, do not return a value as an output. Often, these are methods whose purpose is to alter the underlying objects to which they are bound. One example of this is that list objects in Python come with a built-in .sort method:
> raw_list = [3,1,2]
If we examine the raw_list list object now, we will find that it is now equal to [1,2,3]. This is because calling the .sort method changes the list object by sorting it. If we instead tried to do something like we did in the first example:
> sorted_list = raw_list.sort()
we would (perhaps counterintuitively) discover that the sorted_list variable is None. The sort method has no output – it only changes raw_list directly.
- In Python, when you assign a variable, you are really just ‘giving a name’ to an object in memory. Crucially, this means that if you assign a variable name to an object, and then change the underlying object, other variables assigned to that object will also change. This is because there is only really one instance of that object, and you have just created multiple ways of referencing that object. Let’s examine the first example of sorting a list:
> raw_list = [3,2,1]
> sorted_list = raw_list
Both the raw_list and sorted_list variables are sorted! As far as Python is concerned, these are the same object: anything you do to sorted_list also happens to raw_list, and vice versa.
If you want to avoid these unintended side-effects, you can circumvent this behaviour by using the .copy method. This will create a new object that is a copy of the raw_list, but that can be treated as a separate object:
> raw_list = [3,2,1]
> sorted_list = raw_list.copy()
If you are familiar with these idiosyncrasies, then you will be better equipped to understand the code that you are writing.
The Best Offence Is a Good Defence
The best way to debug is simply to write code with no bugs in it… but unfortunately, that’s not humanly possible.
Instead, it’s good practice to anticipate where errors may occur and include some error handling. Consider logically what you want to happen in the event that a model encounters a particular error or unexpected value, and add in code that checks for and implements this. Should it skip to the next line? Should it replace the value? Should it halt entirely? For example, whenever you divide through by a number, it’s always good to check if that number is zero, so you don’t get hit with a divide by zero error. It’s easy to fall into the trap of thinking that there is no way that number could ever be zero, but unless you explicitly prevent this, you might be surprised – even if you think a state is ‘unreachable’, there’s no harm in building in a failsafe to allow for user error.
One mistake people often make is lack of planning. Planning is of paramount importance, particularly when you are trying to devise a novel solution. It can often help to plan out how your code might look, or even come up with a variety of solutions, and decide from there which works best. After all, it’s easier to rework a plan, than to have to rebuild your code because what you wanted to do didn’t work. And you can never have too many ideas!