This article was published in:
Embedded Systems Programming, 7(11), November 1994, pp. 46-58.


Communication Protocols
for Embedded Systems

Bhargav P. Upender
barg@utrc.utc.com

Philip J. Koopman, Jr.
koopman@cmu.edu


The past few years have seen the beginning of a trend to dramatically increase the embedded electronics content of automobiles, elevators, building climate control systems, jet aircraft engines, and other traditionally electro-mechanically controlled systems. In many large systems this increasing electronics content is being accompanied by a proliferation of subsystems having separate CPUs.

The increase in the number of processors in a system is often driven by computation and I/O growth. In some development environments, the increase may also be driven by a need to ease system integration burdens among multiple design groups or to provide system flexibility through "smart sensors" and "smart actuators". But, whatever the reasons, once there is more than one CPU in a system there must be some means of communication to coordinate action.

While some high-end embedded systems communicate over a VME backplane or similar arrangement, the embedded systems we're working on use physically distributed CPUs and thus involve some sort of Local Area Network (LAN), also called a multiplexed network or a communication bus. At the heart of the LAN is the media access protocol, which arbitrates (picks the next transmitter for) access to the shared network medium (typically a wire, fiber, or RF frequency).

In this article, we will first discuss the special considerations for networking real-time embedded systems. Then, we describe several media access protocols that demonstrate fundamentally different ways of accessing the shared medium. The protocols we discuss are: Connection Oriented Protocols, Polling, Time Division Multiple Access (TDMA), Token Ring, Token Bus, Binary Countdown, Carrier Sense Multiple Access with Collision Detection (CSMA/CD), and Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA). For each of these protocols, we will evaluate the strengths and weaknesses against the special considerations. We conclude the article by presenting a protocol tradeoff chart which will enable you the select a protocol to fit your needs. While no protocol is perfect for all purposes, we think that a variation of CSMA/CA offers the most versatility for many embedded systems [1].


SPECIAL CONSIDERATIONS FOR EMBEDDED APPLICATIONS

In practice, we have found that embedded real-time networks require high efficiency, deterministic latency, operational robustness, configuration flexibility, and low cost per node.

Because cost limits the network bandwidth available to many applications, protocol efficiency (message bits delivered compared to raw network bandwidth) is very important. The embedded systems we have studied are characterized by a predominance of short, periodic messages. So, an obvious optimization is to reduce overhead bits used for message packaging and routing (it is not unusual for 8 bits of data to be packed in a message that is 32 or even 64 bits long).

Once message overhead has been reduced as much as possible, media access overhead must be reduced. For the most part, this is accomplished by minimizing the network bandwidth consumed by arbitration (e.g., passing a token or resolving collision conflict). Because worst-case behavior is usually important, efficiency should be evaluated both for light traffic as well as heavy traffic. For example, CSMA/CD (often used in workstation LANs) is highly efficient for light traffic but gives poor performance if heavily loaded, while Token Bus protocols have the reverse properties.

Determinacy, or the ability to calculate worst-case response time is important for meeting the real-time constraints of many embedded control applications. A prioritization capability is usually included in systems to improve determinacy of messages for time-critical tasks such as exception handling and high-speed loop control. Priorities can be either assigned by node number or by message type. Additionally, protocols can support local or global priority mechanisms. In local prioritization, each node gets a turn at the network in sequence and sends its highest priority queued message (thus potentially forcing a very high priority message to wait for other nodes to have their turns first). In global prioritization the highest priority message in the entire system is always transmitted first. This mechanism, which is fundamentally enabled by the media access protocol, is highly desirable for many safety critical applications.

Many applications require robust operation under extreme conditions. We call a protocol robust if it can quickly detect and recover from errors (e.g., duplicate or lost tokens), added nodes, and deleted nodes. In some systems it is also important to quickly recover from a reset or power glitch that forces a restart of the network.

Varied operating environments may dictate use of a media access protocol that is flexible in supporting multiple media as well as mixed topologies. For example, portions of a system may require expensive fiber in noisy environments, while other portions can tolerate low-cost twisted pair wires in benign environments. Further, a bus topology may be optimum for wires, but a ring or star topology maybe needed for fiber.

Finally, a vital consideration is the cost per node. In this article, the order of the media access discussion progresses from very simple to complex, high performance protocols. Simple protocols require less hardware and software resources and are therefore likely to be less expensive. For extremely cost-sensitive high-volume applications, these protocols are good candidates. However, for growth-expected applications, more advanced protocols provide a stronger foundation. In general costs are decreasing over time due to advances in IC manufacturing technology and the increasing availability of off-the-shelf protocols. Consequently, we envision advanced cost-effective protocols used in many embedded applications.

MEDIA ACCESS PROTOCOLS

Now that we have a feel for the issues to deal with in embedded networks, let's examine the various commonly available media access protocols. While many variations and combinations are possible, we'll just discuss the plain versions of each protocol.

Connection Oriented Protocols

Before LANs became popular, connection-oriented protocols were heavily used to connect remote terminals to mainframes. These protocols support only two nodes per physical transmission medium, and are typically connected via modem with serial lines. Figure 1 shows an example of a four-processor network using this protocol. Communication between nodes not physically connected requires multiple transmissions through intermediate nodes. These protocols are deterministic between directly connected nodes. For indirectly connected nodes, latency can be high.

For an embedded system with modest communication requirements, this might be a cost effective protocol (readily available hardware and software from mature technology). For demanding applications, nodes that handle a lot of pass-through traffic can become swamped, prohibiting use of low-cost nodes in a large system. Sometimes, this type of protocol is combined with a more complex communication system to provide backward compatibility to older systems or to allow simple remote modem access to the system (e.g., BACnet). This type of protocol is used by the X.25 public network standard (network services offered by telephone companies) and IBM's System Network Architecture (SNA) [2].

FIGURE 1

Figure 1: An example network using connection oriented protocols

Polling

Polling is one of the more popular protocols for embedded systems because of its simplicity and determinacy. In this protocol, a centrally assigned master periodically polls (by sending a polling message) the slave nodes, giving them explicit permission to transmit on the network.

FIGURE 2

Figure 2: Master node sequentially polling slave nodes for information

Figure 2 shows the polling order (dotted lines) of a simple four-node bus network. The majority of the protocol software is stored in the master and the communication work of slave nodes is minimal (therefore, the network costs tend to be smaller). This protocol is ideal for a centralized data acquisition system where peer-to-peer communication and global prioritization are not required. However, in embedded systems we have worked with, the single-point-of-failure from the master node (or the cost of installing redundant master hardware) is unacceptable. Additionally, the polling process consumes considerable bandwidth regardless of network load (poor efficiency). These protocols have been standardized by the military (MIL-STD-1553B) for aircraft subsystem communications. Some variants of this protocol allow inter-slave communication through the master as well as improved robustness by using multiple masters (e.g., Profibus).

Time Division Multiple Access (TDMA)

TDMA is heavily used in satellite communications [3], but is applicable to local area networks as well. In this protocol, a network master broadcasts a frame sync signal before each round of messages to synchronize clocks of all the nodes. After the sync, each node transmits during its uniquely allocated time slice as in Figure 3. Performance is similar to polling, but with greater efficiency at heavy loads due to elimination of individual polling messages. Costs for slave nodes are greater with TDMA than with polling, because each slave node must have a stable time base to measure slices. An additional weakness for TDMA is the need for fixed-length messages to fit into time slices. In some TDMA variations, unused slices are truncated by tacit agreement among nodes. Time-based protocols have been popular in aerospace applications. For example, DATAC (Digital Autonomous Terminal Access Communications) is being used by NASA and Boeing.

FIGURE 3

Figure 3: Time Slices of TDMA protocols

Token Ring

In a Token Ring network, the nodes are connected in a ring-like structure using point-to-point links as shown in Figure 4. A special token signal is passed from node to node around the ring. When a node has something to send, it stops the token circulation, sends its message all the way around the ring, and then passes the token on. Since worst-case token waiting time can easily be calculated, this protocol is deterministic. Under light traffic, Token Ring has moderate token passing overhead. However, the protocol provides efficient throughput under heavy traffic conditions since idle token passing is minimized. A frequent implementation strategy is to have a one-bit delay at each node, so a token can visit all nodes in N+T bit times, where N is the number of nodes and T is the number of bits in the token. Global prioritization is accomplished by altering the priority field of the token as it visits the nodes. This field enables only the nodes with a higher priority to send messages on the network. Initialization of the token message, and detection of accidentally duplicated or lost tokens adds complexity and cost to the protocol. A break in the cable or a failed node disabling the entire network is a common concern for many users. Consequently, node bypass hardware and dual rings are used to address this concern at additional cost. Because the ring connections themselves are point-to-point, it is well suited for fiber optics. Consequently, many LANs and Wide Area Networks (WANs) are moving to this type of protocol. For example, FDDI (Fiber Distributed Data Interface) uses dual counter-rotating rings to achieve higher reliability than bus or star topologies.

FIGURE 4

Figure 4: Token passing in the Token Ring networks

Token Bus

The operation of a Token Bus is very similar to a Token Ring -- a token is passed from node to node in a virtual ring as in Figure 5. The holder of the token has the access to the network. Like Token Ring, Token Bus works well under heavy traffic with a high degree of determinacy. However, Token Bus broadcasts the message simultaneously to all nodes instead of passing it bit-by-bit along a physical ring. The minimum time for a token to traverse the logical ring of nodes is thus N*T bit times instead of N+T bit times as in token ring (because there is no parallelism in the connections). This makes global prioritization of messages largely impractical.

Unlike unidirectional Token Ring, a break in the cable or a failed node does not necessarily disable the entire network. A lengthy reconfiguration process, where each node identifies its neighbors, is used to maintain the virtual ring when nodes are added or deleted from the network. Because bus-like topologies are well suited for manufacturing plants, MAP, Manufacturing Automation Protocol, adopted this protocol. Additionally, ARCnet [4], Attached Resource Computer Network, uses this protocol for LAN connectivity and process control. Adaptive Networks' PLC-192 power line carrier chip uses a hybrid Token Bus protocol: under light traffic, nodes dynamically join and leave from the logical ring; under heavy traffic, all nodes join the ring to maintain stability.

FIGURE 5

Figure 5: Token passing in Token Bus protocols

Binary Countdown

In Binary Countdown, also known as the Bit Dominance protocol, all nodes wait for an idle channel before transmitting a messages. Competing nodes (transmitting simultaneously) resolve contention by broadcasting a signal based on their unique node identification value. The transmission medium must have the characteristic that one value (say, a "1") overrides the opposite value (a "0"). During this transmission, a node drops out of the competition if it detects a dominant signal opposite to its own as shown in Figure 6. Thus, if a "1" signal is dominant, the highest numbered transmitting node will win the competition and gains ownership of the channel.

FIGURE 6

Figure 6: Arbitration in Binary Countdown Protocols

Global prioritization can be achieved by arbitrating over message ID values rather than the node IDs . Since the arbitration is part of the message, this protocol has good throughput and high efficiency. Additionally, the protocol is more robust because node configuration (transmission order) is not required and inactive nodes are ignored. However, since all messages are prioritized, there is no simple way to guarantee equally fair access among all nodes under heavily loaded conditions. Also, some transmission techniques (such as current-mode transformer coupling commonly used in high-noise environments) aren't compatible with the bit dominance requirement. Using this protocol, Bosch developed the Controller Area Network (CAN[5]) specification for automotive applications. The Society of Automotive Engineers standard SAE J-1850 also uses this protocol.

Carrier Sense Multiple Access with Collision Detection (CSMA/CD)

CSMA/CD has been widely researched with a large number of published variations[2]. In the simplest case, a node waits for the network to go idle before transmission (as in binary countdown). If multiple stations transmit almost simultaneously (within a round-trip transmission delay on the network), the messages collide as in Figure 7. The nodes must detect this collision, and resolve it by waiting for a random time before retrying.

FIGURE 7

Figure 7: Collisions in CSMA/CD networks

The key advantage to this protocol is that in principle it supports an unlimited number of nodes that don't require pre-allocated slots or inclusion in token passing activities. Thus, CSMA/CD allows the nodes to enter and leave the network without requiring network initialization and configuration. For light traffic conditions, overhead is very small. However, under heavy traffic the overhead is unbounded due to high probability of repeated collisions. Consequently, this protocol has poor determinacy and low efficiency. Furthermore, detecting collisions may require analog circuitry that adds to the system expense. In fact, if the network environment is very noisy or the wiring runs are long and poor quality, collision detection may not work at all. The popular Ethernet protocol used in workstation LANs is based on this protocol.

Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA)

Many researchers have developed hybrid protocols that combine the light traffic efficiency of CSMA/CD with the heavy traffic efficiency of token-based protocols. The resulting protocols are often called CSMA/CA, or collision avoidance algorithms. As in CSMA/CD, nodes transmit after detecting an idle channel. However, if two or more stations collide, a jam signal is sent on the network to notify all nodes of collision, synchronize clocks, and start contention time slots. Each contention time slot, typically just over a network round-trip propagation delay time, is assigned to a particular station. Each station is allowed to initiate transmission during its contention slot. Figure 8 shows a slot progression for a three node network. In this example, transmitter 2 and 3 collide and initiate a jam. Contention slots follow the jam signal. Since processor one has nothing to send, slot1 goes idle. Transmitter two starts sending its message during slot2. Other stations detect the message, and stop the slot progression. After end of the message, all nodes initiate new contention slots. However, to ensure fairness and determinacy, the slots are rotated (change positions) after each transmission. Additionally, the pslots, or the priority slots can precede each slot progression to support global prioritization for high priority messages. The network returns to an idle state when all the slots go unused.

FIGURE 8

Figure 8: Slot progression in CSMA/CA protocols

The contention slots in CSMA/CA protocol help in avoiding collisions. In general, there are two distinct variations of CSMA/CA protocols. If the number of slots equals the number of stations, the protocol is called Reservation CSMA or RCSMA. The RCSMA variation works efficiently under all traffic conditions [6]. However, because of the one-to-one relation of the node to the slot, RCSMA is not practical for a network with a large number of nodes. In another variation, the number of slots are less than the number of stations and the slot assignments are randomly allocated to minimize collisions. Echelon's LON [7] (Local Operating Network) uses the latter variation and dynamically varies the number of slots based on expected traffic prediction. Unlike CSMA/CD, there are ways to eliminate the need for collision detection hardware, such as by sending dummy messages that keep slots going in the absence of network traffic.


MEDIA ACCESS PROTOCOL SUMMARY AND TRADEOFFS

In the above discussions we have described the major media access protocols and noted clear differences. Table 1 summarizes some of the common traits and our assessment of their strengths and weaknesses for embedded real-time applications. The important points to take into consideration when evaluating alternatives are:

TABLE 1 (picture)

Table 1: Media Access Tradeoffs

For our embedded systems, we have found that CSMA/CA, and in particular Reservation CSMA is a good choice. While your application will no doubt have characteristics that are somewhat different than ours, this article's discussion of the special considerations and media access protocol strengths and weaknesses should allow you to select the best protocol to match your needs. We believe the electronic contents of embedded systems will continue to grow, and communication networks provide strong foundation for supporting this growth.


REFERENCES

[1] Upender, B. P. and Koopman, P. J., Embedded Communication Protocol Options, Proceedings of the Embedded Systems Conference, San Jose, CA, October 3-5, 1993.

[2] Tanenbaum, A. S., Computer Networks, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1989.

[3] Stallings, W. S., Data and Computer Communications, 3rd ed., Macmillan, New York, 1991.

[4] Hoswell, Katherine S. and Thomas, George M., ARCnet Factory LAN premier, Contemporary Control Systems Inc., Second Printing, Downers Grove, Illinois, 1988.

[5] Bosch, CAN Specification, ver. 2.0, Robert Bosch GmbH, Stuttgart, 1991.

[6] Chen and Li, Reservation CSMA/CD: A Multiple Access Protocol for LANs, IEEE Journal on Selected Areas in Communications, February 1989.

[7] Enhanced Media Access Control with Echelon's LonTalk Protocol, LonWorks Engineering Bulletin, Echelon Corp., August 1991.


AUTHOR INFORMATION

Bhargav Upender is an associate research engineer at United Technologies Research Center. Currently, he is exploring novel architectures and supporting protocols for distributed embedded systems. He holds a BS in electrical engineering from the University of Connecticut and an MS in electrical engineering from Cornell University. He can be contacted via email at barg@utrc.utc.com

Philip Koopman was a principal research engineer at United Technologies Research Center. He designed and evaluated architectures and communication protocols for a variety of embedded applications. He previously worked as an embedded CPU architect and a Navy submarine officer. Koopman holds a BS and MS in computer engineering from Rensselaer Polytechnic Institute and a Ph.D. in computer engineering from Carnegie Mellon University. He may now be reached via e-mail at koopman@cmu.edu


HOMETOP

Philip Koopman: koopman@cmu.edu