The featured image of this post is by dooder –

Serial asynchronous communication is one of the most common forms of communication between two electronic devices. Let us see, what Arduino libraries are there to support it, and let us check, how well they perform.

Asynchronous serial communication

Using asynchronous serial communication, one needs just two lines (plus ground) in order to connect two electronic systems. In fact, this was the way teletype systems were connected to each other in the early days. A couple of years later, teletypes were connected to computers in order to provide an I/O device for the operator. These days, this kind of communication is often used between different electronic devices. One line is for outgoing signals (TX), and the other one is for incoming signals (RX).

Connecting a (vintage) teletype with a (vintage) computer

With this setup, it is possible to send and receive simultaneously (this kind of communication is called full-duplex). One can have it even simpler and use only one line. However, then only one party can send at any time and the other has to listen (which is called half-duplex). The debugWIRE protocol, which is used for hardware debugging the smaller AVR MCUs, uses such a mechanism. It uses the RESET line in order to communicate between the hardware debugger and the MCU.

Transmitting one byte

The characteristic property of asynchronous communication is that there is no clock signal that gives an indication when the data on the line is supposed to be valid (as is the case with the synchronous I2C and SPI protocols). This means that the two communicating parties have to know which communication speed is used, and they have to stick to this speed when reading and writing data.

However, it is not only the speed, but it is also the format one has to agree upon. These days, one usually transmits a so-called frame by sending first a start bit (a logical zero), followed by the data byte (8 bits) without appending a parity bit, followed by a stop bit (a logical one). This is called the 8N1 format. Further, the usual interpretation is that the transmission starts with the least significant bit (it is little-endian). If one records the transmission of one byte with a logic analyzer, it can look like as follows.

Transmission of 0x55 using the 8N1 format at 115200 bps

The idle state is that the line is in the high state. The start bit (starting at approx. 12 µs) is always a zero bit. Then the bits of the data byte, in this case 0x55, are transmitted in a backward manner, that is, with the least significant bit first. After 8 bits have been transmitted, the transmission is ended by the stop bit, which always is a one-bit. After that, a new byte could be transmitted or the line can stay in the idle state.

On the receiving side, one waits for a falling edge that signals the beginning of the start bit. One then waits 1.5-bit times before sampling the bit of the first (least significant) bit. After that, one always waits another bit of time to sample the bit in the middle of the bit time.

Potential timing problems

It can happen that the timing is somewhat off, though. The reason for that might be that the system clock is not accurate or because the universal asynchronous receiver and transmitter (UART for short) device cannot generate the correct rate from the system clock. For instance, when you run an AVR MCU at 16 MHz, then at 115200 bps, you can be either 3.5% too slow or 2.1% too fast (see WormFood’s AVR Baud Rate Calculator), where the Arduino core has decided to be 2.1% too fast.

And what about the system clock? Fortunately, the Arduino Uno uses a ceramic resonator, which should have an accuracy of 1000 ppm (= 0.1%) or better. And in fact, it does, as shown in the picture below.

Measuring the Arduino Uno system clock

However, if one uses the internal RC oscillator of an AVR MCU, then one only has a guaranteed accuracy of ± 10%. However, in all MCUs I have seen, it was ± 2%. With user calibration, one can bring that down to ± 1%.

So what are the consequences for asynchronous communication if one party is sending the bits faster or slower than the receiving party expects it? The good news is that one only has to consider one frame because after a frame has been received, timing is restarted with the next start bit. This means errors do not accumulate over multiple frames.

The next picture shows the transmission of 0x55 with three different speeds. The middle one is the correct one at 115200. The upper line shows what happens when the transmission speed is 5% slower, the lower line shows a 5% faster transmission.

What happens if the timing is 5% off?

As one can see, the error accumulates over time. Since the value of the bit is determined at the middle of the bit time, with a 5% deviation one can still determine the right value of the last bit (where the dashed line is), assuming that the middle line reflects the timing of the receiving side. However, it is apparently not the maximal possible deviation. So how much deviation will lead to the situation that we have an accumulated error of 50% in the middle of the eighth bit, i.e., after 8.5 bit times?

x\% \times 8.5 = 50\%

Solving the equation gives us x = 5.88. So anything better than 5.88% should be OK–in theory. In the tutorial on Clock Accuracy Requirements for UART Communications Protocol by Analog Devices, however, it is argued that in most scenarios one cannot ignore the rise and fall times of the signal. It is argued that in “nasty” environments only the middle 50% of the bit time can be assumed to be stable, while in “normal” scenarios it may be 75%. Further, it is assumed that one wants to verify that the stop bit is indeed a logical one. With these assumptions, the acceptable relative error reduces to 2.6% for the “nasty” environment and 3.9% for the “normal” environment.

Unfortunately, there are more sources for communication errors because of timing. One is the interrupt latency produced by interrupt services, e.g., the timer overflow interrupt that counts milliseconds. As we have seen in the blog post cited, this takes 6.625 µs, which is a quite substantial chunk when we communicate at 57600 bps, where the bit time is 17.36 µs. If one implements asynchronous communication in software, then one relies on interrupts, which in this case may be delayed by almost 7 µs! So, in such scenarios, it may be advisable to disable the timer overflow interrupt.

A final issue may be that when implementing asynchronous communication in software, receiving a byte needs to be done in an interrupt routine. That means that there is very little time to process the received byte, namely, just the bit time for the stop bit. And this may lead to a buffer overrun problem very quickly.

Serial communication libraries for the Arduino

When you use the Arduino Uno, the usual way to communicate asynchronously is to use the Serial object, which is an instance of the class HardwareSerial. The hardware UART does most of the work and only when a byte has been received or a byte can be sent, an interrupt is raised. The interrupt service routine for receiving data uses 5 µs, the interrupt routine for sending the next byte takes 8.75 µs in the worst case.

Since on the Uno, there is only one hardware UART, often one has to use a software UART. If one uses the smaller ATtinys which do not have any hardware UART at all, there is no way around using a software UART. The standard solution is the SoftwareSerial library. There are three main problems, though. First, it is necessary that the falling edge of the start bit is detected as accurately as possible. This is done using the pin-change interrupt on the receiving line. So other interrupts are counter-productive. They might lead to misinterpreting the received bit stream by sampling the bits too late. For example, if data is received with a bit rate of 57600, then the bit time is 17.4 µs. If the millis interrupt is raised just before the falling edge of a start bit, then the detection of the start bit is delayed for 6.6 µs, which is already one-third of the bit time. With a bit of other variations, one easily misinterprets the data stream.

Second, sending and receiving data needs accurate timing and for this reason interrupts are disabled during that time and no other things can go on. Since the receiving routine waits into the stop bit, only one-half of a bit time is available for processing a received byte. If too many bytes are received in short order, the receive buffer (of 64 bytes) might overflow.

Third, even slow bit rates can lead to problems. If one communicates with 1200 bps, for instance, then the bit time is 833 µs, and so interrupts will be blocked for at least 9.5 times the bit time, i.e., 7.9 ms. This implies that the millis interrupt, which is raised every millisecond, cannot be served on time.

There are at least two alternatives to SoftwareSerial. One is picoUART, a very minimalistic software UART, of which I use version 1.2.0. It uses only a minimal amount of code but is extremely accurate in timing. In contrast to SoftwareSerial, however, the input/output pins and the communication speed have to be fixed at compile time. Similarly to SoftwareSerial, almost the entire frame time is blocked for interrupts. Receiving data can either be done by polling, i.e., by actively waiting for new data, or by interrupts. In the latter case, there is only a one-byte receive buffer, which might easily lead to missing a data byte.

Our final candidate is AltSoftSerial, which uses a quite different methodology than the bit-banging technique of the last two libraries. It uses the input capture feature of Timer 1 on the ATmega328P for capturing the time when a signal edge occurs on the input line. And this is done in an interrupt-driven way, which means that the interrupt latency imposed by this method is significantly shorter than 9.5 times the bit time. It is optimistically claimed that it is 2-3 µs. It turns out, though, that in the worst-case, it can be 16 µs. This is still much better than what the other methods impose but may be prohibitive for higher bit rates. In addition, it is claimed that the library can tolerate interrupt latency of almost one-bit time. Together with its own 16µs, this is definitely over-optimistic. The generation of bytes to be transmitted is done using the output compare feature of the same timer in an interrupt-driven way as well. So, compared with the two bit-banging methods, this library does require significantly fewer MCU cycles. There is a price to pay for that, of course: the input and output pins are fixed, and one cannot use the PWM functionality of the pins associated with Timer 1.

So which one is the best alternative? SoftwareSerial is the most flexible one. You can use any pins as input and output. And one can even set up more than one SoftwareSerial instance, but only one can be active at any time. picoUART is the one with the smallest memory footprint and with impressive timing accuracy. It seems like a good fit for the smaller ATtinys. Finally, AltSoftSerial relying on timers instead of delay loops is very accurate and consumes the least amount of compute cycles.

In the next section, we have a look at what communication speeds the libraries can reliably deal with and how much deviation in the bit rate they tolerate.

Stress testing the different serial libraries

How accurate is the timing when sending and how tolerant are the libraries when receiving data? In order to measure the accuracy of the bit rate when sending, I employed my Saleae logic analyzer to measure the generated bit rates. For stress testing the receive functionality, I used an FT232R board driven by a Python script.

Let us first have a look at the transmission bit rates.

Communication speed deviation when transmitting

There are a few interesting observations to make. First, even for the hardware UART, it is not always possible to generate a bit rate that is close to the nominal one. For 57600 and 115200 bps, the real bit rate is 2.2% too fast. Even worse, for 230400 bps, it is 3.6% too slow, which is problematic. The reason for these deviations has been already mentioned: The AVR baud rate generator cannot generate all rates. The next fascinating observation is that SoftwareSerial should probably not be used with bit rates above 115200 bps. Similarly, AltSoftSerial does refuse to work when a bit rate higher than 125000 bps is requested. The clear winner appears to be picoUART.

So, how do the libraries fare when they shall receive data? I used the following sketch (a bit simplified) to test the performance of SoftwareSerial. For the other libraries, it looks similar. Notice that I do not use the available() method, but I simply read and ignore the result when it is less than zero. That is the fastest way to read bytes coming over a stream.

unsigned long baud=115200;

#include <SoftwareSerial.h>
SoftwareSerial UART =  SoftwareSerial(8, 9);
const int RTS=12; // RTS line

void setup() {
  // TIMSK0 = 0;
  digitalWrite(LED_BUILTIN, HIGH);   

void loop() {
  byte expect = 0;
  int inp;

  while (1) {
    inp =;
    if (inp >= 0) {
      if (inp != expect++) {
        digitalWrite(LED_BUILTIN, LOW);   
	pinMode(DTR, OUTPUT); // pull DTR low
	while (1);

The data to be received is generated by a Python script that drives an FT232R board. The FT232R can generate almost any bit rate you want. It will select a bit rate that is as close as possible to one that leads to an integer result when dividing 24 million by the bit rate. Here is the (simplified) script. You call it with the following parameters:

  • <bps> – the base bit rate;
  • <stopbits> – number of stop bits, usually 1; can be set to 2 if communication should be slowed down;
  • <num> – number of bytes to send for each speed step;
  • <dir> – direction of change for the speed step, can be ‘+’ or ‘-‘
  • <startstep> – the start deviation, e.g., 3.9, meaning a deviation of 3.9%.

The script changes the bit rate systematically by permille steps and stops when the Arduino sketch pulls the CTS line down because it reads an unexpected byte.

#!/usr/bin/env python3
import serial
import sys
import time

serialport = '/dev/cu.usbserial-XXXXX'

def usage():
    print(" <bps> <stopbits> <num> <dir> <startstep>")

if len(sys.argv) != 6: usage()

bps = int(sys.argv[1])
addstep = bps/100
stopbits = float(sys.argv[2])
maxwrite = int(sys.argv[3])
if sys.argv[4] == '+': direction = 1
else: direction = -1
step = float(sys.argv[5])*direction

outbyte = 0;
while (1):
    dev = bps+(step*addstep)
    print("bps:", bps, "  deviation:", "%4.1f" % (step,),  
          "  is:",  int(dev))
    ser = serial.Serial(serialport, int(dev), stopbits=stopbits)
    i = 0;
    while i < maxwrite:
        i += 1
        outbyte = (outbyte + 1) % 256
    # otherwise bytes get dropped
    if ser.cts:
    time.sleep(0.3) # otherwise bytes get dropped
    step += 0.1*direction

For some bit rates, the libraries gave an error when the millis interrupt was active, which is not a big surprise, given that the interrupt latency of 6.6 µs imposed by the timekeeping interrupt routine is close to one-bit time at 115200 bps. In order to get a meaningful result nevertheless, I disabled the interrupt and marked that with ‘#’ in the result table below. Sometimes, the idle time between reading two bytes was too short. I allowed for some extra time by using a sending format with 2 stop bits. This is marked by a star in the table.

For each library, I report the maximal negative and positive relative deviation that the library tolerated. I tested it with at least 10,000 bytes. For all bit rates above 10,000 bps, I tested it with 100,000 bytes. And for all bit rates higher or equal to 100,000 bps, I used 1 million bytes. The reported percentages are those that are tolerated, while the next higher (or lower) bit rate led to an error. Note that in particular with higher bit rates, the step to the next bit rate can be quite large (e.g., 0.5% with bit rates around 100,000 bps). Finally, one should note that I used the blocking variant of picoUART, which blocks interrupts once the read routine is called.

Possible speed deviations when receiving data
('#' = no millis interrupt, '*' = 2 stop bits)

There are a number of interesting results in this table. First, in general, it is obvious that the hardware UART is the most robust, which is not a big surprise. There are two things to note, though. It is completely unclear to me why the hardware UART does not have a symmetric tolerance interval. So why does it at nominal 1200 bps tolerate a bit rate that is 5.9% slower, but not a bit rate that is 2.3% faster? I have no idea! Further, at 230400 bps, the hardware UART cannot receive bytes with the nominal speed at all! Two Arduino Unos could nevertheless communicate without a problem because they would have the same error.

Second, SoftwareSerial looks quite OK for everything up to 57600 bps. At 112500 bps, however, the millis interrupt needs to be disabled, one has to switch to two stop bits, and, moreover, the generated bit rate (which is -0.7% off) is outside of the tolerance interval of the receiver. So, if you let two systems communicate using SoftwareSerial at 115200 bps, it is very likely that you run into problems.

Third, picoUART appears to be the most robust solution, even at high bit rates. One should note, however, that I used the polling version that blocks all interrupts. With the interrupt-driven version, I suspect, there may be problems at higher bit rates, because the ISR needs more cycles and the millis interrupt might throw off the timing. Finally, it is also the least flexible solution, since you have to fix the bit rate and pins at compile time.

Fourth, AltSoftSerial is not as robust as I would have thought. At bit rates of 56700 bps and less, it is a definite plus that you have more compute cycles for the user program than with other software solutions. However, at 57600 bps, one should probably disable the timer interrupt (and probably other interrupts as well) because the combined worst-case interrupt latency is very close to one bit-time. 115200 bps is only sustainable if all interrupts are off and two stop bits are used.

Views: 155