Beware dormant test code!

Just recently I came across a bug in my ctxLink Wireless Debug Probe. I was doing some testing with a large project, something I had not really done extensively, occasionally during loading the code for flashing into the MCU the probe would crash. More careful exercising of the code revealed the crash occurred on the fourth load command! When I ran the code using the VSCode debugger I discovered the MCU was looping in the libOpenCM3 “blocking handler” and the stack trace revealed a failure deep inside the Wi-Fi interface driver for the WINC1500 module.

Hours of searching and testing were not revealing the issue, so, since the debugger in VSCode is missing some of the more sophisticated debugging techniques, like data breakpoints, I switched to using Visual Studio 2017 with the Visual GDB extension. Surprisingly the fault changed, the exception was no longer generated, the load just prematurely ended. My first test was to check that the level of optimization between the VSCode and VS2017 builds were the same, they were not, and when the VS2017 build was made the same as the VSCode build the exception returned.

Another several hours of fruitless debugging and the situation was starting to look dire. The crowd funding campaign launch process is inexorably moving towards the launch date and we had a serious bug!

After taking a break from the project, which often seems to help clear the brain, I returned to the issue. During one session I noticed that a test flag I had put in to detect input buffer overflow had a strange value. Instead of being true or false, it had a value of “6”. Finally, here was something the Visual GDB debugger could work with. I set a data-write breakpoint on the address of the flag and ran my test. The debugger breakpoint halted the MCU and the stack trace revealed the culprit writing over the boolean flag:

    Counts[countIndex++] = len;

The above line was doing an unbounded write to the array “Counts”. This code was used some time ago to chase a network packet issue in ctxLink. At the time, the issue being chased only occurred upon startup of ctxLink and the array was sized appropriately for that test. Unfortunately, once ctxLink was handling many, many packets the above line began writing over other variables in the code.

This one line, added to chase a bug, is without doubt responsible for several strange and unpredictable problems seen with ctxLink. If there i one lesson to be learned here it is that once a piece of test code has served its purpose, DELETE IT!

Happy debugging and please check out the Crowd Supply pre-launch page for ctxLink.