Skip to main content
Distributed Applications in Manufacturing Environments

The Architecture Journal

Peter Koen and Christian Strömsdörfer

Summary: This article examines someimportant characteristics of software architecture for modern manufacturingapplications. This kind of software generally follows the same trends as othertypes of software, but often with important modifications. Atypicalarchitectural paradigms are regularly applied in manufacturing systems toaddress the tough requirements of real time, safety, and determinism imposed bysuch environments.

Contents

Introduction
The Automation Pyramid
Highly Distributed, Highly Diverse
Tough Requirements
Real-World Parallelism
Security + Safety + Determinism = Dependability
Architectural Paradigms
The Future of Manufacturing Applications
Cloud Computing and Manufacturing

Introduction

Almost all items we use on a daily basis (cars, pencils,toothbrushes), and even much of the food we eat, are products of automatedmanufacturing processes. Although the processes are quite diverse (brewing beerand assembling a car body are quite different), the computing environmentscontrolling them share many characteristics.

Although we refer to “manufacturing” in this article, we couldalso use the term “automation,” the latter encompassing non-manufacturing environmentsas well (such as power distribution or building automation). Everything in thisarticle therefore also applies in those environments.

The Automation Pyramid

Software in the industry is often illustrated in a triangle(which, strangely enough, is called the “automation pyramid”; see Figure 1).The triangle shows three areas, with the upper two being rules executed by PChardware. Although PCs link the world of factory devices with the office andserver world of ERP applications, PCs running within the manufacturingapplication are distinct from those running outside. The manufacturingapplication works on objects of a raw physical nature (boiling liquids,explosive powders, heavy steel beams), whereas the objects the office and ERPapplications handle are purely virtual.

 Dd129909.Jour17DistApps01(en-us,MSDN.10).jpg

Figure 1. The automationpyramid

The three levels of the automation pyramid are the following:

  • Information
  • Execution
  • Control

The base of the pyramid and the biggest amount of devices in amanufacturing plant are the controllers driving the machinery. They follow therules and the guidance of an execution runtime, which controls how theindividual steps of the process are performed. Software at the informationlevel controls the manufacturing process. It defines workflows and activitiesthat need to be executed to produce the desired product. This, of course,involves algorithms that try to predict the state of the manufacturingmachinery and the supply of resources to optimally schedule execution times.Obviously, the execution of this plan needs to be monitored and evaluated aswell. This means that commands are flowing down from information to executionto the control level while reporting data is flowing back up.

The predictability of manufacturing processes and schedules is atough problem that requires architectural paradigms which may be unfamiliar todevelopers who deal only with virtual, memory-based objects.

The software discussed in this article lives in the lower part ofthat triangle, where PCs meet devices, and where the well-known patterns andparadigms for writing successful office, database, or entertainment software nolonger apply.

Highly Distributed, Highly Diverse

Most manufacturing tasks involve many steps executed by differentdevices. To produce a large number of items in a given amount of time,manufacturers employ hundreds or even thousands of devices to execute thenecessary steps—very often, in parallel. All the intelligent sensors,controllers, and workstations taking part in this effort collaborate andsynchronize with each other to churn the required product in a deterministicand reliable way. This is all one huge application, distributed over thefactory floor.

The computing systems are rather varied, too. (See Figure 2.) Ona factory floor or in a chemical plant, one would probably find all of thefollowing systems:

  • Windows client (everything from Microsoft Windows 95 to Windows Vista)
  • Microsoft Windows Server (Windows Server 2003 mainly, Windows Server 2008 being introduced)
  • Microsoft Windows CE
  • Proprietary real-time OS
  • Embedded Linux

All of these systems must communicate effectively with each otherin order to collaborate successfully. We therefore regularly meet a full zoo ofdiverse communication strategies and protocols, such as:

  • Windows protocols: DDE, DCOM, MSMQ
  • IPC protocols: Pipes, Shared Memory, RPC
  • Ethernet protocols: TCP, UDP
  • Internet protocols: HTTP, SOAP
  • Microsoft .NET Framework protocols: Remoting, ASP.NET, WCF
  • Proprietary real-time protocols

 Dd129909.Jour17DistApps02(en-us,MSDN.10).jpg

Figure 2. Distribution

This diversity of operating systems and means of communicationmay not look like a system that must be fully deterministic and reliable, butjust the opposite is true. However, because of the highly differentiatedsingular tasks that must be executed to create the final product, each step inthis workflow uses the most effective strategy available. If data from amachine-tool controller is needed and it happens to be running Windows 95, DCOMis obviously the better choice to access the data than SOAP.

This last example also shows a common feature of factory floors:The software lives as long as the computer that hosts it. Machine tools areexpensive and are not replaced just because there is a new embedded version of MicrosoftWindows XP. If a piece of hardware dies, it is replaced with a compatible computer.This may mean that in a 10-year-old factory, a Windows 95 computer has to bereplaced with a new one that basically has the same functionality, but is basedon an OS like Windows CE or Linux. Unless a factory is brand-new, computersfrom various time periods are present in a single factory or plant and need toseamlessly interoperate.

Because of this diversity, manufacturing applications relyheavily on standards, especially in terms of communication. The variability inoperating systems is decreasing somewhat in favor of Windows or Linux solutionsand will certainly continue in the future. Also, communication is clearlymoving away from proprietary designs toward real-time Ethernet protocols andcomparable solutions where applicable. Especially in situations where manydifferent transportation and communication protocols have to work together inone place, the focus for future development is on technologies that can dealwith many different protocols, formats, and standards. Windows CommunicationFoundation (WCF) is one of the technologies that alleviates a lot of pain inplant repairs, upgrades, and future development. Using WCF, it is possible tointegrate newer components into existing communication infrastructures withoutcompromising the internal advances of the new component. There’s simply a WCFprotocol adapter that translates the incoming and outgoing communication to thespecific protocol used in the plant. This allows for lower upgrade and repaircosts than before.

Tough Requirements

Modern manufacturing environments offer a set of tough real-timeand concurrency requirements. And we are talking “real-time” here—timeresolutions of 50 µs and less. Manufacturing became a mainly electricalengineering discipline in the first half of the last century, and thoserequirements were state of the art when software started to take over in the 1960sand 1970s. The expectations have only risen. For a steel barrel that needs tobe stopped, 10 ms is close to an eternity.

Traditionally, these tough requirements have been met byspecialized hardware with proprietary operating systems, such as:

  • Programmable Logical Controllers (PLCs). A PLC manages input and output signals of electrical devices and other PLCs attached to it.
  • Numerical Controllers (NCs). A machine-tool controller manages axes moving through space (which is described “numerically”).
  • Drive Controllers. A drive controller manages rotating axes (such as those of a motor).

These extreme requirements are not expected from PC systems yet,but they, too, must react to very different scenarios in due time. And usuallythere is no second try. A typical task executed by a PC on a factory floor isto gather messages coming from PLCs. If an error occurs in one of them, itregularly happens that all the other PLCs also start sending messages, like a bunchof confused animals. If the PC in charge misses one of those messages (forinstance, because of a buffer overflow), the message is probably gone for good,the cause of the failure might never be detected, and the manufacturer mustsubsequently throw away all the beer that was brewed on that day.

To summarize:

  • Hard real-time tasks (measured in microseconds) rely mainly on specialized hardware and software.
  • PC systems must still meet “soft” real-time requirements (microseconds become milliseconds).
  • In manufacturing applications, there is no undo and no redo.

Real-World Parallelism

Factories are rarely built to manufacture a single productsynchronously. Rather, they try to make as many copies of the productconcurrently as possible. This induces massive parallelism, with appropriateheavyweight synchronization. Today this is implemented as a kind of taskparallelism, with each computing unit running as a synchronous system takingcare of one of those tasks. All parallelization moves into the communication.(More on how this is embedded in the overall software architecture in thesection on architectural paradigms.)

With the advent of multicore chips, however, the current patternsand paradigms need to be revisited. It is still unclear what parallel executionwithin one of the aforementioned controllers could mean or whether parallelismmakes sense within the current execution patterns.

On PC systems, however, multicore architectures can readily beapplied for the typical tasks of PCs in manufacturing applications such as dataretrieval, data management, data visualization, or workflow control.

  • Parallelism in manufacturing applications appears today mainly in the form of asynchronous communication.
  • Multicore architectures have a yet-to-be-determined effect in real-time controller systems.

PCs in manufacturing, however, typically carry out tasks that cantake advantage of multicore architectures right away.

Security + Safety + Determinism = Dependability

Although security and safety are extremely important almosteverywhere, manufacturing always adds life-threatening features such asexploding oil refineries or failing breaks in cars. If a computer virus sneaksinto a chemical plant undetected, everything and anything might happen, not theworst of which were depredating the company’s accounts.

It is better to use the term “dependability” here, because thisis what it is all about. Both the manufacturer and the customers consuming themanufacturer’s products depend on the correct execution of the production. Thiscomes down to the following features:

  • Security—Manufacturing applications must not be exposed to the outside world (i.e. the Internet) such that anyone could penetrate the system.
  • Safety—The application must be fail-safe; even in the unlikely event of a software or hardware error, execution must continue, e.g. on a redundant system.
  • Determinism—The execution of a manufacturing application must be deterministic, 24/7. This does not mean that fuzzy algorithms could not be applied somehow (for instance, to optimize manufacturing with a batch size of “1”), but even that must be reliably executed within a certain time period.

The security requirements make it more or less impossible toconnect a factory to the Internet or even to the office. Contact is thereforemade based on three principles:

  • The communication flow into and out of the factory floor is never direct, but relayed through proxies and gateways.
  • Communication is usually not permanent; special tasks such as backup of production data, rollout of software updates, and so on are applied in dedicated time spans.
  • Many tasks have to be executed physically inside the factory (by wiring a secure portable computer directly with the plant for maintenance purposes, for instance).

Architectural Paradigms

As already mentioned, manufacturing applications do not controlvirtual objects such as accounts or contracts, but real physical objects suchas boiling liquids, explosive powders, or heavy steel beams.

The software controllingthese objects cannot rely on transactional or load-balancing patterns, becausethere simply is no rollback and no room for postponing an action.

This results in the application of a rather differentarchitectural paradigm in those environments alongside the more traditionalevent-driven mechanisms: using a cyclic execution model.

Cyclic execution means that the operating system executes thesame sequence of computing blocks within a defined period of time over and overagain. The amount of time may vary from sequence to sequence, but the compilerand the operating system guarantee that the total time for execution of allblocks always fits into the length of the cycle.

Events and data in the cyclic paradigms are “pulled” rather than“pushed,” but the “pull” does not happen whenever the software feels like it(which is the traditional “pull” paradigm in some XML parsers), but ratherperiodically. This is the main reason why the term “cyclic” is used instead of“pull” (or “polling”).

The cyclic execution model reduces complexity:

  • In a cyclic system, the number of states the system can adopt corresponds directly to the product of all variable states. In an event-driven system, the sequence of changes and variables has to be taken into account as well, which makes the number of states the system can adopt completely unpredictable.
  • Event-driven models operate based on changes between variable states, whereas the cyclic model works on states of variables. The rules of combinatorial algebra teach us that the number of transitions from one state to another is much larger than the number of states themselves. This is particularly relevant in distributed systems, where the connection between nodes is potentially unreliable.
  • The difference between minimal and maximal load is considerably lower in a cyclic system: The minimal load is the computing time needed to run the cyclic itself, the maximal load is 100 percent CPU time for a whole cycle. An event-driven system is more “efficient” in that it only does something if it needs to. On the other hand, with a lot of events happening more or less simultaneously, the maximal load is rather unpredictable.

Of course, cyclic execution has a number of drawbacks, too. Ituses CPU power even if there is nothing to do. It also limits the duration ofany task by the duration time of the cycle, which is more likely to rangebetween 100 µs and 100 ms than a second or more. No CPU can do a lot ofprocessing within 100 µs.

A system cannot rely on cyclic execution only. In anyenvironment, things might happen that cannot wait until the cyclic systemfancies to read the appropriate variables to find out something’s just about toblow up. So in every cyclic system there is an event-driven demon waiting forthe really important stuff to kill the cycle and take over. The problem here isnot whether this kind of functionality is needed or to what degree, but ratherhow to marry these two approaches on a single computer in a way that allowsboth operating systems to work 100 percent reliably.

The question of how to deal with cyclic and event-driven patternswithin a single system becomes highly important to PCs that are part of adistributed manufacturing application. Here lives a system dominated by anevent-driven operating system, communicating with mainly cycle-driven systems.Depending on the job, things must be implemented in kernel or user mode.

  • Collecting data from devices on the factory floor is done in user mode by mimicking the appropriate cycles with timers and reading and writing the variables after each cycle.
  • Alarm messages, on the other hand, are often mapped to a specific interrupt, such that a kernel extension (not just a driver, but a sort of extended HAL) can acknowledge the message as soon as possible and perhaps spawn a real-time operating system that sits beside Windows on the same computer.

Those are just two examples of how architects try to let the twoparadigms collaborate. None of them works extraordinarily well, although theknown solutions are all robust enough today for 24/7 factories. What makes themwork, in the end, is probably the huge amount of testing and calibration oftimers, threading, and so on that goes into the final products. This showsclearly that there is plenty of room for improvement.

 Dd129909.Jour17DistApps03(en-us,MSDN.10).jpg

Figure 3. Architecturalparadigms—resource usage with cyclic versus event-driven architecture

A lot of hope for future improvements rests on the principles ofService-Oriented Architecture. By allowing cyclic and event-driven systems tobe loosely coupled through messages, it is possible to more efficiently routeinformation in the system depending on severity of the event.

Cyclic systems could then do what they are best at: deliveringpredictable resource usage and results, and simply using a specific time in thecycle to put relevant information into a message queue that then would beserviced by other systems (cyclic or event-driven). This approach would alsolessen the chance of error from pulling data from a specific location/variableby queuing the necessary data.

The Future of Manufacturing Applications

The architectural trends in manufacturing software can besummarized as follows:

  • Standard operating systems are applied wherever possible. The more configuration capabilities a particular system offers, the more it will be used.

This development is intensified by the current trends invirtualization. Because manufacturing environments typically involve theduality of device operation by a real-time operating system and visualizationand manual control by a Windows system, the offer of having both available on asingle computer is extremely intriguing for technical and financial reasons.

  • Standard communication protocols and buses are applied wherever possible. The transmission and processing time for a protocol remain most critical, however. Internet protocols based on HTTP and XML will therefore probably never be used widely.
  • The Internet is entering the world of factories very slowly, albeit mostly in the form of an intranet. This means in particular that Internet technologies are already very important, but rather as a pattern than in their original form. Software updates are rolled out as “Click Once” applications, but via intranets and disconnected from the Internet.

This certainly also applies to cloud computing. The offer to havea distributed environment look like a single entity is too tempting not toexercise it. The mode, however, will differ from the typical cloud application.Factories employ their own “local” cloud, other than “the” cloud.

Currently, there is a lot of discussion and rethinking going onin the manufacturing space. What will the future look like? Should thedevelopment favor more cyclic or event-driven execution?

Multicore CPUs add an interesting angle to this discussion. Onthe one hand, it would mean that there is enough power to simply make everythingcycle based and have several cycles run in parallel. On the other hand, itwould favor event-driven systems, because the workload can be easilydistributed.

We think that the answer is somewhere in the middle. The mostinteresting scenarios are enabled through the implementation of SOA principles.By changing many parts of the solution into services, it is easier to hostcyclic and event-driven modules on the same computer or on several devices, andhave them interact. Manufacturing systems are applying modern technology andarchitecture paradigms to automatically distribute load and move operationsbetween various nodes of the system depending on availability of resources.When you think about it, manufacturing is very similar to large-scale Internetapplications. It’s a system of various, similar, but still a little bitdifferent devices and subsystems that operate together to fulfill the needs ofa business process. The user doesn’t care about the details of one specificdevice or service. The highest priority and ultimate result of a well-workingmanufacturing system is the synchronization and orchestration of devices tobuild a well-synchronized business process execution engine that results in aflawless product. It’s not much different from the principles you can see todaywith cloud computing, web services, infrastructure in the cloud, and Softwareas a Service applications.

Taking analogies of enterprise computing to other areas ofmanufacturing execution and control opens up a whole new set of technologies,like virtualization and high-performance computing. Virtualization would allowmoving, replicating, and scaling the solution in a fast way that simply isn’tpossible today. For computation-intensive tasks, work items could be“outsourced” to Microsoft Windows Compute Cluster Server computation farms andwouldn’t affect the local system anymore. Allowing for more cycles orrespectively for handling more events has a dramatic impact on thearchitectural paradigms in manufacturing, and no one can say with 100-percentcertainty what the model of the future will be. On-demand computationalresources could very well enable more use of event-driven architectures inmanufacturing.

While these technical advances sound tempting, there are still alot of problems that need to be solved. For example, there’s a need forreal-time operating systems and Windows to coexist on the same device. Thiscan’t be done with current virtualization technology, because there’s currentlyno implementation that supports guaranteed interrupt delivery for a real-timeOS.

Cloud Computing and Manufacturing

When you compare the cloud offerings from multiple vendors to theneeds in manufacturing, cloud computing looks like the perfect solution to manyproblems: integrated, claim-based authentication, “unlimited” data storage,communication services, message relaying, and the one factor that has changedmanufacturing and software development as no other factor could have done: economicsof scale.

Unfortunately, there is one basic attribute of cloud servicesthat make the use of current cloud offerings in manufacturing pretty muchimpossible: the cloud itself.

Manufacturing is a highly sensitive process and any disruptioncould not only cost millions of dollars, but also be a serious risk to the livesof thousands of consumers. Just imagine what would happen if a hacker gotaccess to the process that controls the recipes for production of food items.It is pretty much unthinkable to ever have a manufacturing environmentconnected to the cloud.

Still, plenty can be learned from cloud-based infrastructure andservices. Let’s look at SQL Server Data Services as an example. SSDS is notsimply a hosted SQL Server in the cloud. It’s a completely new structured datastorage service in the cloud; designed for simple entities that are groupedtogether in containers and a very simple query, which allows direct access tothese entities. The big advantage of SSDS is that it is a cloud-based solutionthat hides the details of replication, backup, or any other maintenance taskfrom the application using it. By providing geo replication and using the hugedata centers from Microsoft, it can guarantee virtually unlimited data storageand indestructible data.

Hosting a system like this on premise in manufacturing plantswould solve many problems we are facing today: Manufacturing processes aregenerating a vast amount of data. Every single step of the production processneeds to be documented. This means we are looking at large amounts of verysimple data entities. The data-collection engine cannot miss any of thegenerated data; otherwise, the product might not be certified to sell it. Andone very interesting aspect of manufacturing systems is that once a devicestarts sending an unusually high amount of data, for example due to an errorcondition, this behavior quickly replicates on the whole system and suddenlyall devices are sending large volumes of log data. This is a situation that’svery difficult to solve with classical enterprise architecture database servers.

The automated replication feature is also of great use: Manymanufacturing systems span thousands of devices on literally thousands ofsquare feet of factory floors—many times, even across buildings or locations.Putting data in time close to where it is needed is essential to guarantee awell-synchronized process.

While cloud computing is still in its infancy and almost allvendors are busy building up their first generation of services in the cloud,the applicability of those services as an on-premise solution is oftenneglected to stay focused on one core competency.

About the authors

Christian Strömsdörferis a senior software architect in a research department of Siemens Automation,where he is currently responsible for Microsoft technologies in automationsystems. He has 18 years of experience as a software developer, architect, andproject manager in various automation technologies (numerical, logical, anddrive controls) and specializes in PC-based high-performance software. Hereceived his M.A. in linguistics and mathematics from Munich University in1992. He resides in Nuremberg, Germany, with his wife and three children.

Peter Koen is a SeniorTechnical Evangelist with the Global Partner Team at Microsoft. He works on newconcepts, ideas, and architectures with global independent software vendors(ISVs). Siemens is his largest partner where he works with Industry Automation,Manufacturing Execution Systems, Building Technologies, Energy, Healthcare, andCorporate Technologies. Peter resides in Vienna, Austria and enjoys playing thepiano as a way to establish his work/life balance.

 

This article was published in the Architecture Journal, a printand online publication produced by Microsoft. For more articles from thispublication, please visit the Architecture Journal Web site.