Peter Koen and Christian Strömsdörfer
Summary: This article examines some important characteristics of software architecture for modern manufacturing applications. This kind of software generally follows the same trends as other types of software, but often with important modifications. Atypical architectural paradigms are regularly applied in manufacturing systems to address the tough requirements of real time, safety, and determinism imposed by such environments.
The Automation Pyramid
Highly Distributed, Highly Diverse
Security + Safety + Determinism = Dependability
The Future of Manufacturing Applications
Cloud Computing and Manufacturing
Almost all items we use on a daily basis (cars, pencils, toothbrushes), and even much of the food we eat, are products of automated manufacturing processes. Although the processes are quite diverse (brewing beer and assembling a car body are quite different), the computing environments controlling them share many characteristics.
Although we refer to “manufacturing” in this article, we could also use the term “automation,” the latter encompassing non-manufacturing environments as well (such as power distribution or building automation). Everything in this article therefore also applies in those environments.
Software in the industry is often illustrated in a triangle (which, strangely enough, is called the “automation pyramid”; see Figure 1). The triangle shows three areas, with the upper two being rules executed by PC hardware. Although PCs link the world of factory devices with the office and server world of ERP applications, PCs running within the manufacturing application are distinct from those running outside. The manufacturing application works on objects of a raw physical nature (boiling liquids, explosive powders, heavy steel beams), whereas the objects the office and ERP applications handle are purely virtual.
Figure 1. The automation pyramid
The three levels of the automation pyramid are the following:
The base of the pyramid and the biggest amount of devices in a manufacturing plant are the controllers driving the machinery. They follow the rules and the guidance of an execution runtime, which controls how the individual steps of the process are performed. Software at the information level controls the manufacturing process. It defines workflows and activities that need to be executed to produce the desired product. This, of course, involves algorithms that try to predict the state of the manufacturing machinery and the supply of resources to optimally schedule execution times. Obviously, the execution of this plan needs to be monitored and evaluated as well. This means that commands are flowing down from information to execution to the control level while reporting data is flowing back up.
The predictability of manufacturing processes and schedules is a tough problem that requires architectural paradigms which may be unfamiliar to developers who deal only with virtual, memory-based objects.
The software discussed in this article lives in the lower part of that triangle, where PCs meet devices, and where the well-known patterns and paradigms for writing successful office, database, or entertainment software no longer apply.
Most manufacturing tasks involve many steps executed by different devices. To produce a large number of items in a given amount of time, manufacturers employ hundreds or even thousands of devices to execute the necessary steps—very often, in parallel. All the intelligent sensors, controllers, and workstations taking part in this effort collaborate and synchronize with each other to churn the required product in a deterministic and reliable way. This is all one huge application, distributed over the factory floor.
The computing systems are rather varied, too. (See Figure 2.) On a factory floor or in a chemical plant, one would probably find all of the following systems:
All of these systems must communicate effectively with each other in order to collaborate successfully. We therefore regularly meet a full zoo of diverse communication strategies and protocols, such as:
Figure 2. Distribution
This diversity of operating systems and means of communication may not look like a system that must be fully deterministic and reliable, but just the opposite is true. However, because of the highly differentiated singular tasks that must be executed to create the final product, each step in this workflow uses the most effective strategy available. If data from a machine-tool controller is needed and it happens to be running Windows 95, DCOM is obviously the better choice to access the data than SOAP.
This last example also shows a common feature of factory floors: The software lives as long as the computer that hosts it. Machine tools are expensive and are not replaced just because there is a new embedded version of Microsoft Windows XP. If a piece of hardware dies, it is replaced with a compatible computer. This may mean that in a 10-year-old factory, a Windows 95 computer has to be replaced with a new one that basically has the same functionality, but is based on an OS like Windows CE or Linux. Unless a factory is brand-new, computers from various time periods are present in a single factory or plant and need to seamlessly interoperate.
Because of this diversity, manufacturing applications rely heavily on standards, especially in terms of communication. The variability in operating systems is decreasing somewhat in favor of Windows or Linux solutions and will certainly continue in the future. Also, communication is clearly moving away from proprietary designs toward real-time Ethernet protocols and comparable solutions where applicable. Especially in situations where many different transportation and communication protocols have to work together in one place, the focus for future development is on technologies that can deal with many different protocols, formats, and standards. Windows Communication Foundation (WCF) is one of the technologies that alleviates a lot of pain in plant repairs, upgrades, and future development. Using WCF, it is possible to integrate newer components into existing communication infrastructures without compromising the internal advances of the new component. There’s simply a WCF protocol adapter that translates the incoming and outgoing communication to the specific protocol used in the plant. This allows for lower upgrade and repair costs than before.
Modern manufacturing environments offer a set of tough real-time and concurrency requirements. And we are talking “real-time” here—time resolutions of 50 µs and less. Manufacturing became a mainly electrical engineering discipline in the first half of the last century, and those requirements were state of the art when software started to take over in the 1960s and 1970s. The expectations have only risen. For a steel barrel that needs to be stopped, 10 ms is close to an eternity.
Traditionally, these tough requirements have been met by specialized hardware with proprietary operating systems, such as:
These extreme requirements are not expected from PC systems yet, but they, too, must react to very different scenarios in due time. And usually there is no second try. A typical task executed by a PC on a factory floor is to gather messages coming from PLCs. If an error occurs in one of them, it regularly happens that all the other PLCs also start sending messages, like a bunch of confused animals. If the PC in charge misses one of those messages (for instance, because of a buffer overflow), the message is probably gone for good, the cause of the failure might never be detected, and the manufacturer must subsequently throw away all the beer that was brewed on that day.
Factories are rarely built to manufacture a single product synchronously. Rather, they try to make as many copies of the product concurrently as possible. This induces massive parallelism, with appropriate heavyweight synchronization. Today this is implemented as a kind of task parallelism, with each computing unit running as a synchronous system taking care of one of those tasks. All parallelization moves into the communication. (More on how this is embedded in the overall software architecture in the section on architectural paradigms.)
With the advent of multicore chips, however, the current patterns and paradigms need to be revisited. It is still unclear what parallel execution within one of the aforementioned controllers could mean or whether parallelism makes sense within the current execution patterns.
On PC systems, however, multicore architectures can readily be applied for the typical tasks of PCs in manufacturing applications such as data retrieval, data management, data visualization, or workflow control.
PCs in manufacturing, however, typically carry out tasks that can take advantage of multicore architectures right away.
Although security and safety are extremely important almost everywhere, manufacturing always adds life-threatening features such as exploding oil refineries or failing breaks in cars. If a computer virus sneaks into a chemical plant undetected, everything and anything might happen, not the worst of which were depredating the company’s accounts.
It is better to use the term “dependability” here, because this is what it is all about. Both the manufacturer and the customers consuming the manufacturer’s products depend on the correct execution of the production. This comes down to the following features:
The security requirements make it more or less impossible to connect a factory to the Internet or even to the office. Contact is therefore made based on three principles:
As already mentioned, manufacturing applications do not control virtual objects such as accounts or contracts, but real physical objects such as boiling liquids, explosive powders, or heavy steel beams.
The software controlling these objects cannot rely on transactional or load-balancing patterns, because there simply is no rollback and no room for postponing an action.
This results in the application of a rather different architectural paradigm in those environments alongside the more traditional event-driven mechanisms: using a cyclic execution model.
Cyclic execution means that the operating system executes the same sequence of computing blocks within a defined period of time over and over again. The amount of time may vary from sequence to sequence, but the compiler and the operating system guarantee that the total time for execution of all blocks always fits into the length of the cycle.
Events and data in the cyclic paradigms are “pulled” rather than “pushed,” but the “pull” does not happen whenever the software feels like it (which is the traditional “pull” paradigm in some XML parsers), but rather periodically. This is the main reason why the term “cyclic” is used instead of “pull” (or “polling”).
The cyclic execution model reduces complexity:
Of course, cyclic execution has a number of drawbacks, too. It uses CPU power even if there is nothing to do. It also limits the duration of any task by the duration time of the cycle, which is more likely to range between 100 µs and 100 ms than a second or more. No CPU can do a lot of processing within 100 µs.
A system cannot rely on cyclic execution only. In any environment, things might happen that cannot wait until the cyclic system fancies to read the appropriate variables to find out something’s just about to blow up. So in every cyclic system there is an event-driven demon waiting for the really important stuff to kill the cycle and take over. The problem here is not whether this kind of functionality is needed or to what degree, but rather how to marry these two approaches on a single computer in a way that allows both operating systems to work 100 percent reliably.
The question of how to deal with cyclic and event-driven patterns within a single system becomes highly important to PCs that are part of a distributed manufacturing application. Here lives a system dominated by an event-driven operating system, communicating with mainly cycle-driven systems. Depending on the job, things must be implemented in kernel or user mode.
Those are just two examples of how architects try to let the two paradigms collaborate. None of them works extraordinarily well, although the known solutions are all robust enough today for 24/7 factories. What makes them work, in the end, is probably the huge amount of testing and calibration of timers, threading, and so on that goes into the final products. This shows clearly that there is plenty of room for improvement.
Figure 3. Architectural paradigms—resource usage with cyclic versus event-driven architecture
A lot of hope for future improvements rests on the principles of Service-Oriented Architecture. By allowing cyclic and event-driven systems to be loosely coupled through messages, it is possible to more efficiently route information in the system depending on severity of the event.
Cyclic systems could then do what they are best at: delivering predictable resource usage and results, and simply using a specific time in the cycle to put relevant information into a message queue that then would be serviced by other systems (cyclic or event-driven). This approach would also lessen the chance of error from pulling data from a specific location/variable by queuing the necessary data.
The architectural trends in manufacturing software can be summarized as follows:
This development is intensified by the current trends in virtualization. Because manufacturing environments typically involve the duality of device operation by a real-time operating system and visualization and manual control by a Windows system, the offer of having both available on a single computer is extremely intriguing for technical and financial reasons.
This certainly also applies to cloud computing. The offer to have a distributed environment look like a single entity is too tempting not to exercise it. The mode, however, will differ from the typical cloud application. Factories employ their own “local” cloud, other than “the” cloud.
Currently, there is a lot of discussion and rethinking going on in the manufacturing space. What will the future look like? Should the development favor more cyclic or event-driven execution?
Multicore CPUs add an interesting angle to this discussion. On the one hand, it would mean that there is enough power to simply make everything cycle based and have several cycles run in parallel. On the other hand, it would favor event-driven systems, because the workload can be easily distributed.
We think that the answer is somewhere in the middle. The most interesting scenarios are enabled through the implementation of SOA principles. By changing many parts of the solution into services, it is easier to host cyclic and event-driven modules on the same computer or on several devices, and have them interact. Manufacturing systems are applying modern technology and architecture paradigms to automatically distribute load and move operations between various nodes of the system depending on availability of resources. When you think about it, manufacturing is very similar to large-scale Internet applications. It’s a system of various, similar, but still a little bit different devices and subsystems that operate together to fulfill the needs of a business process. The user doesn’t care about the details of one specific device or service. The highest priority and ultimate result of a well-working manufacturing system is the synchronization and orchestration of devices to build a well-synchronized business process execution engine that results in a flawless product. It’s not much different from the principles you can see today with cloud computing, web services, infrastructure in the cloud, and Software as a Service applications.
Taking analogies of enterprise computing to other areas of manufacturing execution and control opens up a whole new set of technologies, like virtualization and high-performance computing. Virtualization would allow moving, replicating, and scaling the solution in a fast way that simply isn’t possible today. For computation-intensive tasks, work items could be “outsourced” to Microsoft Windows Compute Cluster Server computation farms and wouldn’t affect the local system anymore. Allowing for more cycles or respectively for handling more events has a dramatic impact on the architectural paradigms in manufacturing, and no one can say with 100-percent certainty what the model of the future will be. On-demand computational resources could very well enable more use of event-driven architectures in manufacturing.
While these technical advances sound tempting, there are still a lot of problems that need to be solved. For example, there’s a need for real-time operating systems and Windows to coexist on the same device. This can’t be done with current virtualization technology, because there’s currently no implementation that supports guaranteed interrupt delivery for a real-time OS.
When you compare the cloud offerings from multiple vendors to the needs in manufacturing, cloud computing looks like the perfect solution to many problems: integrated, claim-based authentication, “unlimited” data storage, communication services, message relaying, and the one factor that has changed manufacturing and software development as no other factor could have done: economics of scale.
Unfortunately, there is one basic attribute of cloud services that make the use of current cloud offerings in manufacturing pretty much impossible: the cloud itself.
Manufacturing is a highly sensitive process and any disruption could not only cost millions of dollars, but also be a serious risk to the lives of thousands of consumers. Just imagine what would happen if a hacker got access to the process that controls the recipes for production of food items. It is pretty much unthinkable to ever have a manufacturing environment connected to the cloud.
Still, plenty can be learned from cloud-based infrastructure and services. Let’s look at SQL Server Data Services as an example. SSDS is not simply a hosted SQL Server in the cloud. It’s a completely new structured data storage service in the cloud; designed for simple entities that are grouped together in containers and a very simple query, which allows direct access to these entities. The big advantage of SSDS is that it is a cloud-based solution that hides the details of replication, backup, or any other maintenance task from the application using it. By providing geo replication and using the huge data centers from Microsoft, it can guarantee virtually unlimited data storage and indestructible data.
Hosting a system like this on premise in manufacturing plants would solve many problems we are facing today: Manufacturing processes are generating a vast amount of data. Every single step of the production process needs to be documented. This means we are looking at large amounts of very simple data entities. The data-collection engine cannot miss any of the generated data; otherwise, the product might not be certified to sell it. And one very interesting aspect of manufacturing systems is that once a device starts sending an unusually high amount of data, for example due to an error condition, this behavior quickly replicates on the whole system and suddenly all devices are sending large volumes of log data. This is a situation that’s very difficult to solve with classical enterprise architecture database servers.
The automated replication feature is also of great use: Many manufacturing systems span thousands of devices on literally thousands of square feet of factory floors—many times, even across buildings or locations. Putting data in time close to where it is needed is essential to guarantee a well-synchronized process.
While cloud computing is still in its infancy and almost all vendors are busy building up their first generation of services in the cloud, the applicability of those services as an on-premise solution is often neglected to stay focused on one core competency.
Christian Strömsdörfer is a senior software architect in a research department of Siemens Automation, where he is currently responsible for Microsoft technologies in automation systems. He has 18 years of experience as a software developer, architect, and project manager in various automation technologies (numerical, logical, and drive controls) and specializes in PC-based high-performance software. He received his M.A. in linguistics and mathematics from Munich University in 1992. He resides in Nuremberg, Germany, with his wife and three children.
Peter Koen is a Senior Technical Evangelist with the Global Partner Team at Microsoft. He works on new concepts, ideas, and architectures with global independent software vendors (ISVs). Siemens is his largest partner where he works with Industry Automation, Manufacturing Execution Systems, Building Technologies, Energy, Healthcare, and Corporate Technologies. Peter resides in Vienna, Austria and enjoys playing the piano as a way to establish his work/life balance.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.