The featured picture of this blog post is by WikimediaImages on pixabay.

One typical debugging activity is setting breakpoints and then running the program from breakpoint to breakpoint, inspecting the program’s internal state at each breakpoint. While this sounds simple, it gets complicated when one looks behind the curtain, which we will do in this blog post.

The general setting

Using a debugger like GDB, you can set breakpoints, continue execution, single-step, inspect variables, and change variable values. This is possible even if the program is running on a remote MCU such as an ATmega328P, provided you have a debugging probe or hardware debugger and a GDBserver that provides the software interface between the debugging probe and the GDB debugger. On the outside, this can look as in the following picture.

Clicking to the left of a line number sets a breakpoint marked by a red dot. The line where execution is currently stopped is marked by a yellow triangle. The debugging panes on the left show some of the internal program state, and execution is controlled by the control panel in the upper left.

So, what software layers are involved in setting and removing breakpoints and executing the program? If we abstract away from the GUI layer, there are

  • the symbolic debugger GDB,
  • the GDBserver, which receives commands from GDB expressed using the GDB RSP protocol,
  • the debug probe communicating with the GDBserver using, e.g., the EDBG protocol, and finally
  • the on-chip debugger (OCD) that communicates with the debug probe, using, e.g., the debugWIRE protocol.

 

In what follows, we will assume that we use the EDBG protocol and deal only with AVR MCUs with a debugWIRE interface, simplifying some things and complicating others.

Types of breakpoints

Before we discuss the interaction of the different software layers, we need to have a look at the various types of breakpoints. First, there is a distinction between instruction breakpoints and data breakpoints (also called watchpoints). The former causes a program to stop when a certain instruction is reached, and the latter stops a program when a certain data memory element is accessed. On debugWIRE MCUs, there is no provision for data breakpoints. And for this reason, we will ignore them here.

Instruction breakpoints can be hardware breakpoints or software breakpoints. A hardware breakpoint is implemented as a register that is compared to the actual program counter. If the PC is equal to the register value, execution is stopped. Usually, only a few such hardware breakpoints are available. On a debugWIRE device, there is just one. Software breakpoints are implemented by placing a particular trap instruction into the machine code. On AVRs, this is the BREAK instruction. There are pros and cons to each. Hardware breakpoints are faster to set and to clear because they do not involve reprogramming flash memory. Further, they do not lead to flash wear as software breakpoints do. However, as mentioned, there are usually only very few of them. And they can lead to skidding, i.e., the program stops a few instructions later because of the pipelining architecture of the MCU. This is not a problem in AVR MCUs, as long as we discuss only instruction breakpoints. 

So, how severe is the flash wear problem? The data sheets state that for classic AVR MCUs, the guaranteed flash endurance is 10,000 write/erase cycles. For the more recent MCU with UPDI interface, it is only 1000 cycles!

Let’s assume an eager developer who reprograms his MCU every 10 minutes with an updated version of the program and debugs using five software breakpoints that she sets and clears during each episode. That will probably result on average in 3 additional reprogramming operations on an individual page, leading to 4 such operations in 10 minutes or 192 such operations on one workday. So, she could hit the limit for the modern AVR MCUs after one working week already. The classic AVRs can be used for 10 weeks. This holds only if she does not set and clear breakpoints all the time, but is instead rather careful doing so.

Breakpoint setting and clearing

Now, let us look at what happens when the user sets a breakpoint. Using a GUI, the user clicks to set or clear a breakpoint. GDB collects all active breakpoints until the user requests to start or continue execution. Then GDB sends all the collected breakpoints to the GDB server, followed by a command to begin execution. After the program hits a breakpoint or is stopped asynchronously, GDB requests to clear all breakpoints. This continues every time the user requests to start execution. 

Interpreting the commands GDB sends to the GDB server literally would result in massive flash reprogramming, although the breakpoints do not change at all. Fortunately, the EDBG protocol specification states that “Breakpoints are only inserted to/removed from flash when the next flow control command is executed.” Flow control commands are “execution start/continue”, “single-step”, “asynchronous stop”, and “reset”. So, when GDB first clears all breakpoints and reasserts the same set, and the GDBserver hands all these commands to the EDBG debugger, then the EDBG debugger ignores all that because the removal and addition cancel each other out. 

Unfortunately, this describes the situation only approximately. GDB believes that if one wants to continue executing from a breakpoint, this breakpoint has to be temporarily deleted first (not re-asserted), then a single step is performed, and only then is the breakpoint re-asserted, and execution continues. Assuming two breakpoints, one at address 0x100 and the other at 0x200, where we hit 0x100 first, a sequence of commands sent to the GDB server could be as follows:

> set breakpoint at 0x100
> set breakpoint at 0x200
> start execution
... breakpoint reached at 0x100
> remove breakpoint at 0x100
> remove breakpoint at 0x200
; user requests to continue execution
> set breakpoint at 0x200
> execute a single step
> set breakpoint at 0x100
> start execution

While handing all the commands down to the EDBG debugger leads to the correct behavior, it creates unnecessary flash wear. The reason is that it is not necessary to remove the breakpoint 0x100 temporarily. The EDBG debuggers can continue from a software breakpoint even if the BREAK instruction is still in memory. They execute the replaced instruction offline in a special OCD register and continue from there. At least, this is the case for one-word instructions. For two-word instructions, they restore the original word, single-step, then re-insert the BREAK instruction again, and then continue. 

Minimizing flash wear 

Passing the breakpoint set and clear commands directly from GDB to the EDBG debugger will obviously create two flash reprogramming events for each breakpoint hit. And these are entirely superfluous.  Just imagine what this could mean in the context of our eager developer. In addition to setting five breakpoints per programming development cycle,  all breakpoint hits will incur two additional reprogramming steps.  This means that we may have easily an order of magnitude more reprogramming events for an individual. So, what can one do to avoid them? 

For dw-gdbserver, I decided to use bookkeeping of all breakpoints as in other projects on GDBservers that I worked on (dw-link and avr_debug). One reason is that dw-gdbserver uses the hardware breakpoint as well and always assigns it to the most recently introduced breakpoint. This is often a temporary breakpoint inserted by GDB to step over a function call.

This means there is a breakpoint_set, a breakpoint_clear, and a breakpoint_update function. The first two functions are called when GDB issues a set or clear command, respectively. The latter function is called immediately before execution is started, either as a ‘single-step’ or a ‘continue’ command, and it sets or clears breakpoints for the EDBG debugger.

The only change necessary to deal with the problem described above was to ‘protect’ a software breakpoint at the place where a single step is to be executed. Such a breakpoint will never be removed before the single-step, but will be reconsidered at the next ‘continue’ command.

In the example above, when the command in line 9 is received from GDB, the breakpoint_update function is called, and the breakpoint at 0x100 is protected. After the single step, the breakpoint at 0x100 is re-asserted by GDB and then considered at the following execution command. So, it will not be removed and then re-asserted.

In addition to that, the execution of all two-word instructions that are located at a breakpoint is simulated inside the GDBserver. The additional programming effort was not too daunting since there are only four different such instructions. 

An interesting question is how other open-source GDB servers for the AVR architecture deal with this problem. AVaRICE also uses a breakpoint bookkeeping solution. The abovementioned problem is solved by not updating breakpoints before a single-step operation. This sounds OK because it seems impossible to hit another breakpoint when single-stepping. However, the way AVaRICE deals with interrupts during single-stepping could lead to problems. If a breakpoint in an interrupt service routine is requested to be removed by the user just before single-stepping, the continuation after being stopped in the interrupt dispatch table could hit that breakpoint. AVaRICE and/or GDB may be confused because no BREAK instruction is expected at that point.

Bloom, another open-source GDB server, ignores the issue in its current version, 2.0.0, and tolerates two flash reprogramming operations at each breakpoint hit. As the author told me, this may change in the future.

Neither system supports the simulation of two-word instructions at breakpoints, so they incur two flash reprogramming operations on each breakpoint hit at such a location. Interestingly, as far as I can tell from observing the behavior, this also appears to be true for Microchip Studio and MPLAB X.

Summary

Although setting and removing breakpoints and executing a program seem very straightforward, things can get complicated when one examines the machinery implementing them. This is particularly true if one wants to minimize flash wear. Microchip, by the way, recommends that chips that have been used for debugging using debugWIRE should not be shipped to customers. Well, I never ship AVR chips to customers anyway. If you belong to the really paranoid, you can disable software breakpoints by using the command monitor breakpoint hardware when using dw-gdbserver. After that, you can only use the one hardware breakpoint or you can single step (but not both).

Views: 9