Download the Supporting Documents
Think about baking a cake, a chocolate cake, that you want to put chocolate chips on. Conventional wisdom says to put some icing on the cake and then sprinkle the chocolate chips on top. But now, if someone wipes off the icing, their chocolate chips go with it. We need to ensure that the users (malicious or otherwise) of our apps can’t wipe off the icing and take the chocolate chips with it. So, how do we do this? We put the chips in the batter and bake them into the cake. Instead of sprinkling on security at the end of development, we bake it into the product.
In this article, I’ll show how to build out the project described in the first article in this series, moving from the planning phase to the design phase. In the first article, we conceptualized our system. Now we’re going to design how it works. The design phase has three critical components: developing an attack surface analysis and an attack surface reduction and performing a software architectural risk analysis (more commonly known at Microsoft as a threat model).
In the first article, we started conceptualizing Contoso’s new case management system. We built out a few preliminary artifacts to facilitate designing the system:
In this article, we’ll develop a few more artifacts in preparation for the implementation phase of the project. The first thing we need to do is ensure that we are designing our LOB Windows Store app to meet secure design principles.
In the design of any system, there are a few design principles that help you produce a more secure system. Here’s a list of those principles and how they apply to our case management system.
At some point, your system will fail. Whether an attacker tries to break your system or a user does something really unexpected, at some point your application will encounter an unpredictable situation. How you respond to this situation can result in threats to your application’s security. Fail safe is a concept that your application should fail closed. In other words, plan that when the system fails, the confidentiality, integrity and availability of the system is protected. As an example, assume that our case management’s database server crashes. When the app tries to write a case to the database it encounters a timeout and fails. Displaying the stack trace to the user with details about the server connection and call stack is an example of violating the principle of fail safe. Employing the fail-safe principle would entail displaying an error message such as “A connection error has occurred. Please contact the Help Desk at x999.”
When the app first starts, does the user need to configure the system to be secure? This principle comes in two parts. First, if your application has security features such as a check box used to specify whether the system requires a strong password, you want to enable this by default. Out of the box, all security features should be configured to provide the best security posture possible for the product. Second, your application should enable only the minimum necessary features required for the product to function. As an example, in our case management system we would want to be sure that security features such as requiring a password reset every 90 days is enabled by default. This ensures that users are required to perform this reset unless they specifically disable the feature. An example of including only the minimal features is separating the resource allocation features into their own module, which can be excluded during installation. Most users will not use this feature (only management team members who are distributing the work for case managers), so do not include it.
Have you ever seen a database connection string connecting as “sa”? This is an example of violating the principle of least privilege. Instead of providing just the specific functionality required for the application to work, this type of database connection has power to do anything . . . including dropping the entire databases. Least privilege means just that: provide the least amount of privileges necessary to get the job done. For our case management system, this would manifest by permitting users to do only what’s necessary for their jobs and no more. We want to be sure that case managers have the right permissions, and that those permissions do only what is necessary for case managers to do their work.
This principle, from The Secure Development Lifecycle, by Michael Howard and Steve Lipner, is one that is often forgotten but can be summed up in one sentence: Is the product easy to use? If a product’s security causes a burden on its users, they will reject the system. Security needs to be present while not being overbearing. In our case management system, this principle would be present in features such as the logging that occurs without user interaction, using failover partners in the connection strings to ensure database availability when the primary data store is unavailable and providing a visual queue on required fields to identify those fields to the user.
There are many principles to keep in mind when you’re building secure software. If you have not already done so, read Writing Secure Code (also by Michael Howard and Steve Lipner) for a more thorough listing of secure design principles. This book is not just a great resource for building secure software, it provides the foundational knowledge required in any software security discussion.
Think of how a burglar sees a house. The house has numerous points of entry, from doors to windows. Some might be locked, others unlocked. As the burglar is assessing where he can break into the house, he’s analyzing the attack surface of the house. Breaking into software is a similar process, where an attacker analyzes the application to find the flaws in its armor. Just as with a house, the first step to security in software is attack surface analysis (ASA), followed by attack surface reduction (ASR).
To secure our LOB Windows Store app, we first must assess the attack surface. For this we must return to our concept document to assess our application’s functionality and high-level design. While the attack surface of an application entails much more than I cover here (such as its services, protocols, interfaces to the system and so on), I’ll focus on the specific functionality defined in the concept document. When performing your own ASA, be sure to read between the lines and see whether there are impacts to your application that are not listed in the feature set.
Using the risk assessment questionnaire, we can answer questions such as “Who uses this application?” We know that this is an LOB application that will be used only by Contoso case management and remote medical users. We know that users will not be able to access the application from outside the network and that there will not be access for anonymous users. Now, we just need to identify who uses which features of the application.
One technique that I use is to list the features and users and then check which features apply to each user group. Figure 1 shows an example of how I compile this information.
Figure1. Sample Feature/Role Matrix
Using this map, we can identify who needs access to which features. This map also sets up the next question in our ASA.
Will 80 percent of our customers use a feature? As we examine our features list, we see a few items that will not be used by our main user base (which is case managers in call centers). To begin, we can identify that the ability to adjust views based on different form factors does not apply to 80 percent of our users. The majority of users will be using a specific monitor configuration, so the ability to adjust to different form factors and orientations is not a core product feature. While this is an important feature that we need to deliver, it does not need to be enabled by default. By not including this feature by default, we reduce the attack surface of our product.
Another example in our concept document is the ability to “Assign resources to a case based on complexity and location.” This is functionality that would be specific to users in a supervisory role who would assign cases to staff members. Supervisors will not be the most common users, so again, this feature does not meet the needs of 80 percent of users. By disabling this functionality by default we reduce the potential for security issues related to this module for the majority of users. If a security vulnerability is discovered, we would need to patch only those systems affected and not every deployment of the case management system.
If you remember from our planning documents, we specified that our system will interact with Microsoft HealthVault for medical record information and with Azure for geographic data. How do these services impact the attack surface? As an exercise, think about how using these services could affect the attack surface of our application. In "Fending Off Future Attacks by Reducing Attack Surface," Michael Howard lists a few other areas to consider for analyzing your attack surface. The following is an abbreviated list of points to consider from his article:
Now that we have analyzed our attack surface, we need to figure out how to handle it. Remember from the first article that we can manage risk in four different ways: accept it, mitigate it, redirect it or avoid it. Our ASR strategy will use these same techniques.
When reducing your attack surface, be mindful of what is psychologically acceptable. Return to the example of the house given earlier. We can reduce the attack surface by locking all the doors and windows. The windows can still be broken, so we can put bars on the windows and motion sensors with spotlights all around the house’s perimeter. We have a house with a reduced attack surface, but will the residents feel like they’re living in a prison? The balance between what is secure and what is acceptable to our users is difficult to maintain.
Fortunately, there are some strategies we can use to ensure that we reduce the attack surface without creating a poor experience for users or developers. To do this we will focus on economy of mechanism and least privilege.
Complex software is likely to have more bugs, the general concept being that more code equals more bugs. Economy of mechanism is a design principle that directs us to keep things simple and small. Detailed information can be found about economy of mechanism at the Build Security In site. Here, I’ll discuss how to leverage the Windows 8 contracts to achieve this secure design principle.
In the concept document, we outline a few features that our case management system will have, including:
Each of these items can be built into the case management system from scratch. The positive aspect of doing this is having full control over the functionality of these features, but as you can imagine, each requires substantial amounts of code to develop. By using the Windows 8 contracts for these features, we can reduce the complexity of our own code and leverage the security research of others at Microsoft who have performed their own security review of these features. We can simplify our application by using functionality built into Windows 8.
This approach can be a double-edged sword if you are not careful. The Windows Share, Search and Settings features are themselves complex. We transfer the risk out of our application, but if Windows is not up to the task of handling these operations securely, we might invite insecurity into our application. The lesson here is to make sure that if you’re using existing systems that they were developed with security in mind.
In the ASA we identified who should be able to do what in the application. In building the application we need to be sure that only users who need to do a task have permissions to do that task. As an example, case managers do not need to be able to assign other case managers to a case. This is a supervisor-level activity only, and only supervisors should have permissions to do it. There’s an inverse of this: supervisors should not be able to create a new case because this is the domain of case managers.
When thinking of least privilege, also consider how this principle manifests itself with data and privacy concerns. The case management system will deal with personally identifiable information and personal health records. This information should not be accessible to anyone other than the people directly working on the case. As we design the application, we want to ensure that we employ security controls that permit only case managers or remote medical staff members with permission to see details about health or personal information—only those users who require the information—to provide Contoso’s assistance services. Supervisors, for example, should not be allowed to access data on random cases without an authorization process.
Finally, we want to perceive least privilege from a system-to-system perspective. What can the case management system do in the HealthVault system? As we reduce the possible attack surface for our application, we need to determine how we are allowing other systems to access our system’s data. Map out interactions and understand how the systems work together. This leads to an activity that is related to ASR, threat modeling.
In the Systems Development Lifecycle (SDLC), developers and architects build out data flow diagrams (DFDs) that track how data moves through the system. These data flows can be analyzed in the context of software security to determine whether data is exposed to risk. Assessing the risk to data is not the only threat model; you can also build threat models that examine the risk to specific assets. These two approaches can work in concert: the first (which uses the SDL threat-modeling tool) helps you envision the system, and the other (which uses a tool referred to as TAM) helps you analyze your specific deployment configuration. Using both tools yields the most complete view of the security of your application.
As stated many times before, developers are not always experts in software security assurance. Using the SDL tool, you can take an artifact that developers are familiar with and let the tool tell you the security challenges you face. Using the SDL threat-modeling tool breaks down into four steps.
When starting a threat model, I always begin with the supporting information, which is the section of the threat model that describes the system we are modeling. In this case, we provide information about our case management system (a lot of which is directly from the concept document).
One of my favorite features of the SDL threat-modeling tool is its ability to produce a report for developers, management or anyone else. The system definition section provides the context around the application so that when you provide the report, readers will have the context for the threats, vulnerabilities and risks.
In the threat-modeling tool, shown in Figure 2, create a new threat model and click the Describe Environment button on the menu at the bottom left. Others do this as the third step, but I always start here, usually because once I begin with the analysis, I need to hand the report to another person for verification and to ensure that I’m not missing anything. Providing the context is critical for ensuring that you don’t make omissions.
Figure 2. The SDL Threat-Modeling Tool
Using the drawing tools, build out how data flows through your system. In this case, we have created a diagram that illustrates a case management data flow. This data flow would represent the processes around viewing medical case information.
Notice that a new diagram element is available for threat modeling that is not normally found in DFDs: the boundary. There are four types of boundaries: trust, machine, process and other. Boundaries are used to illustrate where data exists in the system and group processes. For instance, accessing the Geo Data from Azure crosses a trust boundary (among others). This tells the architects of the system that our case management system is getting data from a source outside our trust. Normally, this is a signal for the need for input validation. Draw out the DFD in as much detail as you would like. (See Figure 3 as an example.) Just remember that each shape requires analysis. Be mindful of your time when doing threat modeling because it can quickly grow out of control.
Figure 3. A Sample Data Flow Diagram
This step is where you’ll spend most of your time. Using the STRIDE approach (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service and Elevation of Privilege), the threat-modeling tool examines each element to determine which parts of STRIDE apply. The tool then describes the individual threat (tampering being the one pictured in Figure 4) and remediation techniques.
Under the description, you can explain how the threat would impact your application. In this case we are providing content about tampering as it relates to the Contoso Data Center (which, if you remember, is where our medical case data is going to be stored). For impact, we can provide a quick description, such as “Tampering of medical case data could result in a HIPAA violation if we cannot determine who made the data alteration and that they had permission to do so.” Use this area to describe impact from a business perspective. Just as in the risk register exercise, avoid thinking like a technician; think like a businessperson and about how the threat impacts the business. In the end, you need time or money from the business to address the threat. Here is where you provide evidence for why it should be fixed.
You can also use this time to add to your risk register. The threat-modeling tool will provide a lot of risk ideas that can feed your risk register. Remember that risk assessment is an on-going process. Threat modeling is just another part of the process, so return to your documents as needed.
Now, with the impact provided, write out a solution. How are you going to avoid the impact? Think back to our risk management strategies and consider them as possible solutions. For this particular risk we are going to enable auditing on the database tables so that all change operations record who did them and when. This will prevent data tampering directly in the database. We could add another threat (see the link to the right in Figure 4) and describe how we prevent tampering in another way. You can add as many threats as you would like and even associate a bug in Team Foundation Server (TFS) to address the issue.
Figure 4. Building the Treat Model
The SDL threat-modeling tool provides professional-looking reports that you can use with your team to discuss your threat model. Just remember that threat modeling is an iterative process and ties very closely to your risk management process. Keep the two processes interactive and interacting with each other (risk management affects threat modeling and vice versa).
To delve deeper into using the SDL threat-modeling tool, download the supporting documentation for this article. There you will find the SDL threat model file for your examination, along with a sample report.
Another approach to threat modeling is to use the threat analysis and modeling tool (TAM). This tool is much more detailed than the SDL threat-modeling tool and focuses on assets rather than data flow. Using TAM requires a lot more setup than the SDL tool and also requires a deeper knowledge of the system that you’re building. I won’t go into detail about using the TAM for this article, but there are some really great videos online that walk you through the application. After reviewing the tutorials, check out the supporting documentation for the case management system TAM file.
In this article we moved our planning into a design for our application. We focused on secure design principles and practices. In the next article, we’ll begin taking our designs from idea to implementation. So far, we have reduced our development costs significantly by proper planning and by identifying security issues in our design prior to putting our fingers to the keyboard. Now we just need to take our ideas and avoid the pitfalls of an less secure implementation.
Tim Kulp leads the development team at FrontierMEDEX in Baltimore, Maryland. You can find Tim on his blog at http://seccode.blogspot.com or the Twitter feed @seccode, where he talks code, security and the Baltimore foodie scene.
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.