32bit Windows Serial Port Bug

Abstract

A potential problem with the use of the serial port in some versions of Microsoft Windows is described. This might be a bug in the operating system but if it is a "feature" then it is such a large "gotcha" that the omission of its description is a documentation bug.

It appears that, in certain versions of Microsoft Windows, changing the size of the serial port receive buffer can cause an incorrect interpretation of the flow control buffer quantity settings.

Introduction

While writing some software recently I found that it appeared that hardware handshaking did not work on serial ports when running under Windows 98 and ME even though operation was quite satisfactory under Windows NT 4.0. At first I assumed that the problem was in my code but careful re-examination of this and the Microsoft documentation followed by a few hour's web searching did not make the cause obvious.

When the systematic methods of solving the problem failed I tried a more irrational approach - making random changes to the code. This worked quite quickly which was nice but not very satisfactory until the real problem was tracked down. Following a hunch I wrote a program to test certain conditions and found that a fairly simple explanation would seem to account for what had appeared to be odd behaviour on the part of the operating system.

This simple explanation is given here in the hope that it will save somebody else the time I've wasted on this. The background information might also help to clarify some of the ambiguities in the Microsoft documentation and give an alternative view to some of the items found on the web which I believe to be highly misleading if not downright wrong.

Also, perhaps you can contribute corrections or further details to this discussion. If so, please send me (note) any relevant information. Thank you.

Handshaking Background

"Handshaking" is the term Microsoft use to describe the use of hardware signals to control flow into and out of the serial port. This is not an ideal term as it implies a multistep two way operation between the two ends of the communication.

There are protocols which use exactly the same serial port signal lines to perform what might be more strictly termed a handshake. For example, in half duplex operation it is common for the computer (or other DTE - data termination equipment) to make a request to transmit by raising the RTS line. The modem (DCE - data communication equipment) then sets the telephone line in the condition for transmission and when it is ready raises the CTS (clear-to-send) line in response. When transmission is completed, the DTE drops RTS, the DCE stops transmission on to the 'phone line and drops CTS to indicate that a new cycle may be begun.

This is not the type of handshake being considered here.

Instead, we are talking about the use of an output line from the receiver of a communication stream to the transmitter used to indicate to the transmitter when it should or should not send further characters. The indication that the transmitter should not send further characters is set when the receiver's input buffer is sufficiently close to full that if many more characters were sent there would be a danger that some might be lost because there was no buffer space to store them.

Specifically, we are considering the handshaking signals output by the computer to control the flow of characters from a device which is sending to it. There are two basic methods of doing this: use of separate serial port lines for the purpose, called hardware handshaking, or sending separate characters in the opposite direction (typically XOFF and XON), called software handshaking. We'll concentrate on hardware handshaking.

Typically, one or both of two signal lines on the serial port are (ab)used to control the flow of data into the computer. They are RTS, request to send, and DTR, data terminal ready. Here I'll assume both are in use though I have no reason to suspect that the story would be much different if only one was used.

Windows 32 Serial Port API Background

Programming the serial port under Windows 32 is basically fairly simple though a little long-winded.

First you open the port by making an appropriate CreateFile call. Using the resulting handle you make various calls to set up the communications port parameters, buffer sizes and timeout periods as required. After that I/O is performed by making ReadFile and WriteFile calls as required. At the end a CloseHandle call is required.

Many example programs found on the Web do not do a very thorough job of setting all the relevant parameters. This seems unwise as the Microsoft documentation does not say exactly what the defaults can be assumed to be and what parameters are reset to the defaults after other programs have messed with the port. Leaving the parameters as they're found could result in a program which usually works all right but fails if certain other programs have opened the port previously. This is quite unacceptable.

Therefore, I wrote the software mentioned above to try to set all of the parameters which could affect its operation. The sequence was:

GetCommState and GetCommTimeouts are used to obtain the existing parameters to ensure that any parameters missed by the software, including any which might be added in later versions of the API, are at least set to sensible, if not desirable, values.

A large number of parameters and structure fields are involved in these calls but those which are relevant are:

Function Structure Parameter/Field Meaning
SetCommState DCB fDtrControl Indicates how the DTR signal is to be used on output. Set to DTR_CONTROL_HANDSHAKE to indicate that this signal line is to be used to control the flow of characters to the computer.
fInX Set to zero to indicate that XON/XOFF flow control is not to be used for reception.
fRtsControl Set to RTS_CONTROL_HANDSHAKE to indicate that this signal line is to be used to control the flow of characters to the computer.
XonLim Set to the number of characters which can still be in the receive buffer when the computer signals the device to resume sending characters.
XoffLim Set to the amount of space in the receive buffer below which the computer will signal the device to stop sending characters.
SetupComm - dwInQueue The overall size of the receive buffer.

The idea is that normally the computer will assert the DTR and RTS signals to allow the attached device to send characters. Any characters sent are placed in the receive buffer and retrieved from there by the application program. On a fast computer the application program will probably process the characters quicker than they are sent and there is no need for flow control operation.

However, if the device sends characters faster than the application program can process them the receive buffer will slowly fill up. When the buffer fills to the point where there are dwInQueue - XoffLim characters present the computer will deassert the DTR and RTS signals telling the attached device to stop sending.

The reason that the computer does not wait until the receive buffer is completely full before signalling a stop is that there may be a few characters already "in-transit", in the sender and receiver UARTs. Therefore a little bit of buffer space must be kept in reserve for these characters.

When the application program has processed a few characters from the buffer there will now be room for more to be received. However, rather than immediately signalling that transmission can resume the computer waits until the application program has processed enough characters that the number in the receive buffer has fallen to XonLim. It then asserts DTR and RTS to tell the device to send some more characters.

The Bug

It appears that the serial port driver knows how many characters should be in the receive buffer before it sets flow control signals to stop transmission. I'll call this the Xoff threshold even though we're talking about hardware handshaking. When SetCommState is called to apply the parameters in a DCB to a particular serial port this threshold is computed by subtracting the DCB's XoffLim value from the current size of the port's buffer.

However, when SetupComm is called the port's buffer size is changed but the Xoff threshold is not adjusted.

In the extreme case, if the buffer is reduced to a size which is smaller than the previously computed Xoff threshold then a buffer overflow will occur before flow control stops the characters arriving.

This description applies to Windows 98 and ME. In the cases I have tried on Windows NT 4.0 it appears that the driver ignores the call to SetupComm, keeping the buffer at 1024 characters. The specification of SetupComm allows this behaviour - it is not a bug.

Test Program

I wrote a little program called Win32 Serial Port Handshake Bug to check that the ports behave as descibed above. The source and executable are in test-program.zip.

This is a Windows 32-bit console application which is designed to work with a single serial port which has had a couple of signal lines looped back:

Loop Source Destination 9 pin 25 pin
Source Dest. Source Dest.
Data TxD RxD 3 2 2 3
Handshake DTR CTS 4 8 20 5
RTS 7 4

Only connect one of the DTR or the RTS signals to CTS, of course.

I used a break-out box to do the looping but it would not be difficult to solder up a connector to do it.

Win32 Serial Port Handshake Bug takes a single command line parameter - the number of the port to be tested, 1 for COM1, 2 for COM2, etc.

C:\Temp>"Win32 Serial Port Handshake Bug.exe" 2

The principle of operation of the program is to set up various conditions for the buffer size and XoffLim and then send characters through the loop-back data line while watching the loop-back handshake line for evidence that the receive buffer has reached the Xoff threshold. Data is sent at 9600 baud (so each character takes about 1/960th of a second to send) with a 1/100th of a second pause between characters so there is plenty of time for characters to make their way through the UARTs and interrupt routines and for any handshake signal to propogate back.

As the program runs it writes a commentry of what it is doing to the console. This can be redirected to a file for posterity. While determining the Xoff threshold the program writes a dot to the console for every tenth character sent. Therefore, dots should appear about one per second.

Here is the output for Windows 98. The output for Windows 95 and for Windows ME was identical.

Rx buffer size set to 500.
XoffLim set to 50.
Should give 450 character space on all systems.
.............................................
CTS signal seen low after 450 characters sent.
XoffLim set to 100.
Should give 400 character space on all systems.
........................................
CTS signal seen low after 400 characters sent.
Rx buffer size set to 450.
Should give 350 character space but may leave it at 400 on some systems.
........................................
CTS signal seen low after 400 characters sent.
Rx buffer size set to 300.
Should give 200 character space but may overrun the buffer on some systems.
..............................
ClearCommError returned dwErrors = 00000001 after 301 characters.
CE_RXOVER: buffer overrun.

Because Windows NT leaves the buffer size at 1024 bytes in these circumstances its behaviour is somewhat different:

Rx buffer size set to 500.
XoffLim set to 50.
Should give 450 character space on all systems.
..................................................
...............................................
CTS signal seen low after 974 characters sent.
XoffLim set to 100.
Should give 400 character space on all systems.
..................................................
..........................................
CTS signal seen low after 924 characters sent.
Rx buffer size set to 450.
Should give 350 character space but may leave it at 400 on some systems.
..................................................
..........................................
CTS signal seen low after 924 characters sent.
Rx buffer size set to 300.
Should give 200 character space but may overrun the buffer on some systems.
..................................................
..........................................
CTS signal seen low after 924 characters sent.

Here's a page on the results of running this program on Windows 2000 and XP.

Workaround

The workaround to avoid this problem is quite simple, of course. Call SetupComm to set the buffer size before calling SetCommState to apply the contents of the DCB including setting the XoffLim value.