Distributed Applications in Manufacturing Environments

Article
01/14/2009

Peter Koen and Christian Strömsdörfer

Summary: This article examines some important characteristics of software architecture for modern manufacturing applications. This kind of software generally follows the same trends as other types of software, but often with important modifications. Atypical architectural paradigms are regularly applied in manufacturing systems to address the tough requirements of real time, safety, and determinism imposed by such environments.

Introduction
The Automation Pyramid
Highly Distributed, Highly Diverse
Tough Requirements
Real-World Parallelism
Security + Safety + Determinism = Dependability
Architectural Paradigms
The Future of Manufacturing Applications
Cloud Computing and Manufacturing

Introduction

Almost all items we use on a daily basis (cars, pencils, toothbrushes), and even much of the food we eat, are products of automated manufacturing processes. Although the processes are quite diverse (brewing beer and assembling a car body are quite different), the computing environments controlling them share many characteristics.

Although we refer to “manufacturing” in this article, we could also use the term “automation,” the latter encompassing non-manufacturing environments as well (such as power distribution or building automation). Everything in this article therefore also applies in those environments.

The Automation Pyramid

Software in the industry is often illustrated in a triangle (which, strangely enough, is called the “automation pyramid”; see Figure 1). The triangle shows three areas, with the upper two being rules executed by PC hardware. Although PCs link the world of factory devices with the office and server world of ERP applications, PCs running within the manufacturing application are distinct from those running outside. The manufacturing application works on objects of a raw physical nature (boiling liquids, explosive powders, heavy steel beams), whereas the objects the office and ERP applications handle are purely virtual.

Dd129909.Jour17DistApps01(en-us,MSDN.10).jpg

Figure 1. The automation pyramid

The three levels of the automation pyramid are the following:

Information
Execution
Control

The base of the pyramid and the biggest amount of devices in a manufacturing plant are the controllers driving the machinery. They follow the rules and the guidance of an execution runtime, which controls how the individual steps of the process are performed. Software at the information level controls the manufacturing process. It defines workflows and activities that need to be executed to produce the desired product. This, of course, involves algorithms that try to predict the state of the manufacturing machinery and the supply of resources to optimally schedule execution times. Obviously, the execution of this plan needs to be monitored and evaluated as well. This means that commands are flowing down from information to execution to the control level while reporting data is flowing back up.

The predictability of manufacturing processes and schedules is a tough problem that requires architectural paradigms which may be unfamiliar to developers who deal only with virtual, memory-based objects.

The software discussed in this article lives in the lower part of that triangle, where PCs meet devices, and where the well-known patterns and paradigms for writing successful office, database, or entertainment software no longer apply.

Highly Distributed, Highly Diverse

Most manufacturing tasks involve many steps executed by different devices. To produce a large number of items in a given amount of time, manufacturers employ hundreds or even thousands of devices to execute the necessary steps—very often, in parallel. All the intelligent sensors, controllers, and workstations taking part in this effort collaborate and synchronize with each other to churn the required product in a deterministic and reliable way. This is all one huge application, distributed over the factory floor.

The computing systems are rather varied, too. (See Figure 2.) On a factory floor or in a chemical plant, one would probably find all of the following systems:

Windows client (everything from Microsoft Windows 95 to Windows Vista)
Microsoft Windows Server (Windows Server 2003 mainly, Windows Server 2008 being introduced)
Microsoft Windows CE
Proprietary real-time OS
Embedded Linux

All of these systems must communicate effectively with each other in order to collaborate successfully. We therefore regularly meet a full zoo of diverse communication strategies and protocols, such as:

Windows protocols: DDE, DCOM, MSMQ
IPC protocols: Pipes, Shared Memory, RPC
Ethernet protocols: TCP, UDP
Internet protocols: HTTP, SOAP
Microsoft .NET Framework protocols: Remoting, ASP.NET, WCF
Proprietary real-time protocols

Dd129909.Jour17DistApps02(en-us,MSDN.10).jpg

Figure 2. Distribution

This diversity of operating systems and means of communication may not look like a system that must be fully deterministic and reliable, but just the opposite is true. However, because of the highly differentiated singular tasks that must be executed to create the final product, each step in this workflow uses the most effective strategy available. If data from a machine-tool controller is needed and it happens to be running Windows 95, DCOM is obviously the better choice to access the data than SOAP.

This last example also shows a common feature of factory floors: The software lives as long as the computer that hosts it. Machine tools are expensive and are not replaced just because there is a new embedded version of Microsoft Windows XP. If a piece of hardware dies, it is replaced with a compatible computer. This may mean that in a 10-year-old factory, a Windows 95 computer has to be replaced with a new one that basically has the same functionality, but is based on an OS like Windows CE or Linux. Unless a factory is brand-new, computers from various time periods are present in a single factory or plant and need to seamlessly interoperate.

Because of this diversity, manufacturing applications rely heavily on standards, especially in terms of communication. The variability in operating systems is decreasing somewhat in favor of Windows or Linux solutions and will certainly continue in the future. Also, communication is clearly moving away from proprietary designs toward real-time Ethernet protocols and comparable solutions where applicable. Especially in situations where many different transportation and communication protocols have to work together in one place, the focus for future development is on technologies that can deal with many different protocols, formats, and standards. Windows Communication Foundation (WCF) is one of the technologies that alleviates a lot of pain in plant repairs, upgrades, and future development. Using WCF, it is possible to integrate newer components into existing communication infrastructures without compromising the internal advances of the new component. There’s simply a WCF protocol adapter that translates the incoming and outgoing communication to the specific protocol used in the plant. This allows for lower upgrade and repair costs than before.

Tough Requirements

Modern manufacturing environments offer a set of tough real-time and concurrency requirements. And we are talking “real-time” here—time resolutions of 50 µs and less. Manufacturing became a mainly electrical engineering discipline in the first half of the last century, and those requirements were state of the art when software started to take over in the 1960s and 1970s. The expectations have only risen. For a steel barrel that needs to be stopped, 10 ms is close to an eternity.

Traditionally, these tough requirements have been met by specialized hardware with proprietary operating systems, such as:

Programmable Logical Controllers (PLCs). A PLC manages input and output signals of electrical devices and other PLCs attached to it.
Numerical Controllers (NCs). A machine-tool controller manages axes moving through space (which is described “numerically”).
Drive Controllers. A drive controller manages rotating axes (such as those of a motor).

These extreme requirements are not expected from PC systems yet, but they, too, must react to very different scenarios in due time. And usually there is no second try. A typical task executed by a PC on a factory floor is to gather messages coming from PLCs. If an error occurs in one of them, it regularly happens that all the other PLCs also start sending messages, like a bunch of confused animals. If the PC in charge misses one of those messages (for instance, because of a buffer overflow), the message is probably gone for good, the cause of the failure might never be detected, and the manufacturer must subsequently throw away all the beer that was brewed on that day.

To summarize:

Hard real-time tasks (measured in microseconds) rely mainly on specialized hardware and software.
PC systems must still meet “soft” real-time requirements (microseconds become milliseconds).
In manufacturing applications, there is no undo and no redo.

Real-World Parallelism

Factories are rarely built to manufacture a single product synchronously. Rather, they try to make as many copies of the product concurrently as possible. This induces massive parallelism, with appropriate heavyweight synchronization. Today this is implemented as a kind of task parallelism, with each computing unit running as a synchronous system taking care of one of those tasks. All parallelization moves into the communication. (More on how this is embedded in the overall software architecture in the section on architectural paradigms.)

With the advent of multicore chips, however, the current patterns and paradigms need to be revisited. It is still unclear what parallel execution within one of the aforementioned controllers could mean or whether parallelism makes sense within the current execution patterns.

On PC systems, however, multicore architectures can readily be applied for the typical tasks of PCs in manufacturing applications such as data retrieval, data management, data visualization, or workflow control.

Parallelism in manufacturing applications appears today mainly in the form of asynchronous communication.
Multicore architectures have a yet-to-be-determined effect in real-time controller systems.

PCs in manufacturing, however, typically carry out tasks that can take advantage of multicore architectures right away.

Security + Safety + Determinism = Dependability

Although security and safety are extremely important almost everywhere, manufacturing always adds life-threatening features such as exploding oil refineries or failing breaks in cars. If a computer virus sneaks into a chemical plant undetected, everything and anything might happen, not the worst of which were depredating the company’s accounts.

It is better to use the term “dependability” here, because this is what it is all about. Both the manufacturer and the customers consuming the manufacturer’s products depend on the correct execution of the production. This comes down to the following features:

Security— Manufacturing applications must not be exposed to the outside world (i.e. the Internet) such that anyone could penetrate the system.
Safety— The application must be fail-safe; even in the unlikely event of a software or hardware error, execution must continue, e.g. on a redundant system.
Determinism— The execution of a manufacturing application must be deterministic, 24/7. This does not mean that fuzzy algorithms could not be applied somehow (for instance, to optimize manufacturing with a batch size of “1”), but even that must be reliably executed within a certain time period.

The security requirements make it more or less impossible to connect a factory to the Internet or even to the office. Contact is therefore made based on three principles:

The communication flow into and out of the factory floor is never direct, but relayed through proxies and gateways.
Communication is usually not permanent; special tasks such as backup of production data, rollout of software updates, and so on are applied in dedicated time spans.
Many tasks have to be executed physically inside the factory (by wiring a secure portable computer directly with the plant for maintenance purposes, for instance).

Architectural Paradigms

As already mentioned, manufacturing applications do not control virtual objects such as accounts or contracts, but real physical objects such as boiling liquids, explosive powders, or heavy steel beams.

The software controlling these objects cannot rely on transactional or load-balancing patterns, because there simply is no rollback and no room for postponing an action.

This results in the application of a rather different architectural paradigm in those environments alongside the more traditional event-driven mechanisms: using a cyclic execution model.

Cyclic execution means that the operating system executes the same sequence of computing blocks within a defined period of time over and over again. The amount of time may vary from sequence to sequence, but the compiler and the operating system guarantee that the total time for execution of all blocks always fits into the length of the cycle.

Events and data in the cyclic paradigms are “pulled” rather than “pushed,” but the “pull” does not happen whenever the software feels like it (which is the traditional “pull” paradigm in some XML parsers), but rather periodically. This is the main reason why the term “cyclic” is used instead of “pull” (or “polling”).

The cyclic execution model reduces complexity:

In a cyclic system, the number of states the system can adopt corresponds directly to the product of all variable states. In an event-driven system, the sequence of changes and variables has to be taken into account as well, which makes the number of states the system can adopt completely unpredictable.
Event-driven models operate based on changes between variable states, whereas the cyclic model works on states of variables. The rules of combinatorial algebra teach us that the number of transitions from one state to another is much larger than the number of states themselves. This is particularly relevant in distributed systems, where the connection between nodes is potentially unreliable.
The difference between minimal and maximal load is considerably lower in a cyclic system: The minimal load is the computing time needed to run the cyclic itself, the maximal load is 100 percent CPU time for a whole cycle. An event-driven system is more “efficient” in that it only does something if it needs to. On the other hand, with a lot of events happening more or less simultaneously, the maximal load is rather unpredictable.

Of course, cyclic execution has a number of drawbacks, too. It uses CPU power even if there is nothing to do. It also limits the duration of any task by the duration time of the cycle, which is more likely to range between 100 µs and 100 ms than a second or more. No CPU can do a lot of processing within 100 µs.

A system cannot rely on cyclic execution only. In any environment, things might happen that cannot wait until the cyclic system fancies to read the appropriate variables to find out something’s just about to blow up. So in every cyclic system there is an event-driven demon waiting for the really important stuff to kill the cycle and take over. The problem here is not whether this kind of functionality is needed or to what degree, but rather how to marry these two approaches on a single computer in a way that allows both operating systems to work 100 percent reliably.

The question of how to deal with cyclic and event-driven patterns within a single system becomes highly important to PCs that are part of a distributed manufacturing application. Here lives a system dominated by an event-driven operating system, communicating with mainly cycle-driven systems. Depending on the job, things must be implemented in kernel or user mode.

Collecting data from devices on the factory floor is done in user mode by mimicking the appropriate cycles with timers and reading and writing the variables after each cycle.
Alarm messages, on the other hand, are often mapped to a specific interrupt, such that a kernel extension (not just a driver, but a sort of extended HAL) can acknowledge the message as soon as possible and perhaps spawn a real-time operating system that sits beside Windows on the same computer.

Those are just two examples of how architects try to let the two paradigms collaborate. None of them works extraordinarily well, although the known solutions are all robust enough today for 24/7 factories. What makes them work, in the end, is probably the huge amount of testing and calibration of timers, threading, and so on that goes into the final products. This shows clearly that there is plenty of room for improvement.

Dd129909.Jour17DistApps03(en-us,MSDN.10).jpg

Figure 3. Architectural paradigms—resource usage with cyclic versus event-driven architecture

A lot of hope for future improvements rests on the principles of Service-Oriented Architecture. By allowing cyclic and event-driven systems to be loosely coupled through messages, it is possible to more efficiently route information in the system depending on severity of the event.

Cyclic systems could then do what they are best at: delivering predictable resource usage and results, and simply using a specific time in the cycle to put relevant information into a message queue that then would be serviced by other systems (cyclic or event-driven). This approach would also lessen the chance of error from pulling data from a specific location/variable by queuing the necessary data.

The Future of Manufacturing Applications

The architectural trends in manufacturing software can be summarized as follows:

Standard operating systems are applied wherever possible. The more configuration capabilities a particular system offers, the more it will be used.

This development is intensified by the current trends in virtualization. Because manufacturing environments typically involve the duality of device operation by a real-time operating system and visualization and manual control by a Windows system, the offer of having both available on a single computer is extremely intriguing for technical and financial reasons.

Standard communication protocols and buses are applied wherever possible. The transmission and processing time for a protocol remain most critical, however. Internet protocols based on HTTP and XML will therefore probably never be used widely.
The Internet is entering the world of factories very slowly, albeit mostly in the form of an intranet. This means in particular that Internet technologies are already very important, but rather as a pattern than in their original form. Software updates are rolled out as “Click Once” applications, but via intranets and disconnected from the Internet.

This certainly also applies to cloud computing. The offer to have a distributed environment look like a single entity is too tempting not to exercise it. The mode, however, will differ from the typical cloud application. Factories employ their own “local” cloud, other than “the” cloud.

Currently, there is a lot of discussion and rethinking going on in the manufacturing space. What will the future look like? Should the development favor more cyclic or event-driven execution?

Multicore CPUs add an interesting angle to this discussion. On the one hand, it would mean that there is enough power to simply make everything cycle based and have several cycles run in parallel. On the other hand, it would favor event-driven systems, because the workload can be easily distributed.

We think that the answer is somewhere in the middle. The most interesting scenarios are enabled through the implementation of SOA principles. By changing many parts of the solution into services, it is easier to host cyclic and event-driven modules on the same computer or on several devices, and have them interact. Manufacturing systems are applying modern technology and architecture paradigms to automatically distribute load and move operations between various nodes of the system depending on availability of resources. When you think about it, manufacturing is very similar to large-scale Internet applications. It’s a system of various, similar, but still a little bit different devices and subsystems that operate together to fulfill the needs of a business process. The user doesn’t care about the details of one specific device or service. The highest priority and ultimate result of a well-working manufacturing system is the synchronization and orchestration of devices to build a well-synchronized business process execution engine that results in a flawless product. It’s not much different from the principles you can see today with cloud computing, web services, infrastructure in the cloud, and Software as a Service applications.

Taking analogies of enterprise computing to other areas of manufacturing execution and control opens up a whole new set of technologies, like virtualization and high-performance computing. Virtualization would allow moving, replicating, and scaling the solution in a fast way that simply isn’t possible today. For computation-intensive tasks, work items could be “outsourced” to Microsoft Windows Compute Cluster Server computation farms and wouldn’t affect the local system anymore. Allowing for more cycles or respectively for handling more events has a dramatic impact on the architectural paradigms in manufacturing, and no one can say with 100-percent certainty what the model of the future will be. On-demand computational resources could very well enable more use of event-driven architectures in manufacturing.

While these technical advances sound tempting, there are still a lot of problems that need to be solved. For example, there’s a need for real-time operating systems and Windows to coexist on the same device. This can’t be done with current virtualization technology, because there’s currently no implementation that supports guaranteed interrupt delivery for a real-time OS.

Cloud Computing and Manufacturing

When you compare the cloud offerings from multiple vendors to the needs in manufacturing, cloud computing looks like the perfect solution to many problems: integrated, claim-based authentication, “unlimited” data storage, communication services, message relaying, and the one factor that has changed manufacturing and software development as no other factor could have done: economics of scale.

Unfortunately, there is one basic attribute of cloud services that make the use of current cloud offerings in manufacturing pretty much impossible: the cloud itself.

Manufacturing is a highly sensitive process and any disruption could not only cost millions of dollars, but also be a serious risk to the lives of thousands of consumers. Just imagine what would happen if a hacker got access to the process that controls the recipes for production of food items. It is pretty much unthinkable to ever have a manufacturing environment connected to the cloud.

Still, plenty can be learned from cloud-based infrastructure and services. Let’s look at SQL Server Data Services as an example. SSDS is not simply a hosted SQL Server in the cloud. It’s a completely new structured data storage service in the cloud; designed for simple entities that are grouped together in containers and a very simple query, which allows direct access to these entities. The big advantage of SSDS is that it is a cloud-based solution that hides the details of replication, backup, or any other maintenance task from the application using it. By providing geo replication and using the huge data centers from Microsoft, it can guarantee virtually unlimited data storage and indestructible data.

Hosting a system like this on premise in manufacturing plants would solve many problems we are facing today: Manufacturing processes are generating a vast amount of data. Every single step of the production process needs to be documented. This means we are looking at large amounts of very simple data entities. The data-collection engine cannot miss any of the generated data; otherwise, the product might not be certified to sell it. And one very interesting aspect of manufacturing systems is that once a device starts sending an unusually high amount of data, for example due to an error condition, this behavior quickly replicates on the whole system and suddenly all devices are sending large volumes of log data. This is a situation that’s very difficult to solve with classical enterprise architecture database servers.

The automated replication feature is also of great use: Many manufacturing systems span thousands of devices on literally thousands of square feet of factory floors—many times, even across buildings or locations. Putting data in time close to where it is needed is essential to guarantee a well-synchronized process.

While cloud computing is still in its infancy and almost all vendors are busy building up their first generation of services in the cloud, the applicability of those services as an on-premise solution is often neglected to stay focused on one core competency.

About the authors

Christian Strömsdörfer is a senior software architect in a research department of Siemens Automation, where he is currently responsible for Microsoft technologies in automation systems. He has 18 years of experience as a software developer, architect, and project manager in various automation technologies (numerical, logical, and drive controls) and specializes in PC-based high-performance software. He received his M.A. in linguistics and mathematics from Munich University in 1992. He resides in Nuremberg, Germany, with his wife and three children.

Peter Koen is a Senior Technical Evangelist with the Global Partner Team at Microsoft. He works on new concepts, ideas, and architectures with global independent software vendors (ISVs). Siemens is his largest partner where he works with Industry Automation, Manufacturing Execution Systems, Building Technologies, Energy, Healthcare, and Corporate Technologies. Peter resides in Vienna, Austria and enjoys playing the piano as a way to establish his work/life balance.

This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.