Unused bits in a program counter should be zero. However, sometimes they are not. And in fact, since the bits are unused, their value does not matter, as long as nobody from the outside, such as a debugger, reads them. So, can such bits cause problems for AVR microcontrollers?
Unused and reserved bits
In embedded programming, unused and reserved bits in status and control registers are not an exception but the norm. Reserved bits are those that the chip vendor might have plans for in future chip generations. If a bit is reserved, the data sheet usually states what value should be written to it. In contrast to that, unused bits are simply “don’t cares.” You can write to it whatever you want, and the value that is read is most often constant, very often zero.
Unused bits in a program counter are somewhat special for a number of reasons. First, in most cases, at least a few bits in the PC are unused because physical memory is most often smaller than the space addressable by the program counter. Second, direct manipulation of the program counter is usually impossible. And for these reasons, it simply does not matter to a program running on the MCU which value these bits have. So, it may well be the case that a program that is supposed to start at address 0x0000 starts with a program counter set to 0x8000, provided bit 15 of the PC is unused. However, everything works as if the MCU would start execution at address 0x0000.
While internally non-zero unused bits do not cause problems, the outside world will always assume that they are zero. For example, a compiler generating code for the MCU will generate code for jumping to address 0x0000 and not to address 0x8000.
Confusing a debugger
Unsuspecting debuggers, such as GDB, will run into problems when some unused bits in a program counter are non-zero. As the compiler, it assumes that all unused bits are zero. Further, it makes a strong association between flash addresses and the source text. Hence, when GDB receives a flash address that is not associated with any source text, it will be lost. For this reason, the debugging user experience will leave a lot to be desired in this case.
So, how severe is this problem? Are there many MCUs that show this behavior? In my work on building the GDB server PyAvrOCD, I did systematic end-to-end tests on (almost) all the different MCUs that the GDB server is supposed to deal with. So far, I have found the following problematic chips:
- ATmega16, ATmega16A, and ATmega64A,
- ATmega329P and ATmega3250P,
- ATmega48 and ATmega88 (without an A or P suffix).
The latter two MCUs I already noticed when building dw-link, a debugWIRE debug probe with an RSP interface. This was when I learned that one cannot generalize when it comes to the debugging interface of AVR chips. When it works for one chip of a family, it does not mean that it necessarily works for close relatives. The case with the ATmega48 and ATmega88 is particularly extreme because the chips with an A suffix, which have the same chip signature, work without a hitch.
How bad can it get?
Interestingly, there are different levels of severity. The ATmega16, 16A, and 64A do not openly show that the program counter is distorted. Only when one examines return addresses stored on the stack does it become evident that some unused bits in the PC are non-zero. Nevertheless, this is something the debugger does not like. Stack backtraces are empty, and when using the next command, one will not be able to step over a function call, but will end up in the function.
While I have not tested the ATmega64, I strongly suspect that it will exhibit the same behavior because it has the same clause in the data sheet as its cousin with the A suffix:
If software reads the Program Counter from the Stack after a call or an interrupt, unused bits (bit 15) should be masked out.
Next are ATmega329P and 3250P. They do not conceal that the PC contains non-zero unused bits, but communicate it when the state of the program counter is queried. Looking into the data sheet, there is no hint about that. Non-zero unused PC bits will confuse GDB even more than distorted return addresses. The debugger will have no idea where it landed after a breakpoint is reached.
Finally, we have the ATmega48 and ATmega88. They are a bit special because once they are in debugWIRE mode, using open source software in combination with Microchip debuggers is useless. The chips appear to be bricked. The only way to get them back is to either use MPLAB X or Microchip Studio together with a Microchip debugger/programmer, or to use the above-mentioned DIY debugger dw-link.
Mitigating the rogue bits
Since GDB chokes on non-zero unused PC bits, some countermeasures are in order.
Dealing with the Hotel California Syndrome
ATmega48 and 88 pose the biggest problem. Since these chips suffer from the Hotel California Syndrome (You can check out any time you like. But you can never leave!), it is best to catch these chips before they enter debugWIRE mode. However, because they have the same device signature as ATmega48A and ATmega88A, respectively, this cannot be decided based on the device signature.
A difference between those chips and the ones with the A suffix is that the former ones do not have the device signature stored in the signature row. Unfortunately, this is not something one can find out over the SPI programming interface. But you can write a short program that can be uploaded to the chip, which will, as a result of examining the signature row, change the lock bits. For the ATmega48, this program looks as follows.
#include <avr/boot.h>
#define SIGRD 5
int main(void)
{
if (boot_signature_byte_get(0) != 0x1E) {
boot_lock_bits_set (_BV (0));
}
}
This program can be uploaded before debugging begins, and the result in the form of programmed lock bits can be examined. The program for the ATmega88 is a bit more complicated because the locking operation needs to be executed in the boot area. All in all, this method reliably prohibits these chips from entering debugWIRE mode.
Dealing with unconcealed rogue bits
If the MCU does not conceal the rogue bit, it is relatively easy to mask this bit so that the outside world (and the rest of PyAvrOCD) only sees the real flash address. Only one thing has to be taken into account: When setting hardware breakpoints, this bit has to be set, because it is significant when the hardware register is compared with the PC.
Dealing with rogue bits in return addresses
The extra non-zero bits in return addresses that are fetched from the stack could be ignored. This will result in broken stack backtraces and defective single-step operations.
However, in the long run, one would, of course, like to handle these bits. The cure is quite obvious: One needs to mask out the unused bits when communicating with GDB. However, it is not obvious when return addresses are retrieved from the stack, since it is not clear whether the requested values are return addresses or perhaps values of local variables.
The cleanest solution is therefore to let GDB handle the masking. GDB knows when something is a return address and when it is an ordinary variable value. As a first step, I filed a bug report. Looking at the other bug reports for the AVR architecture, I noticed, however, that the activity in this area is somewhat muted. One particular bug report, which results in a broken stack backtrace, has been sitting there for 12 years. On top of it, it was easy to come up with a fix, which I did.
This whetted my appetite for more. After studying the target-dependent code for AVR and discussing things with one of the GDB maintainers, I came up with a solution that works as long as the GDB server provides a memory map to GDB. This is the case, as long as GDB is compiled with the expat library. One can then mask out all unused bits in all cases in which an address points into the code space.
Hopefully, the patch will be incorporated into the next GDB release. Until then, I will provide a patched version of the current 17.1 release. That is, from now on, we can forget again that there AVR MCUs that have non-zero unused PC bits.
Summary
As we have seen, non-zero unused PC bits can be very annoying when you want to do embedded debugging. If the situation is not recognized by the debugger, debugging will feel very funny, to say the least. Fortunately, it is possible to mask out these bits on the server and the GDB side, so that we can completely forget about them.
The featured picture of this blogpost has been generated using ChatGPT.
Leave a Reply