February 2019

Volume 34 Number 2

[Azure Confidential Computing]

Protect Your Data with Azure Confidential Computing

By Stefano Tempesta | February 2019

Security is a key driver for accelerating the adoption of cloud computing, but it’s also a major concern when you’re moving extremely sensitive intellectual property (IP) and data to a publiccloud. There are common ways to secure data at rest and in transit, but threats may also occur when data is being processed in memory. Confidential computing adds new data security capabilities to your applications by using trusted execution environments (TEEs) and encryption mechanisms to protect your data while in use. TEEs, also known as enclaves, are hardware or software implementations that safeguard data being processed from access from outside the enclave. An enclave provides a protected container by securing a portion of the processor and memory, as shown in Figure 1. Only authorized code is permitted to run and access data, so code and data are protected against viewing and modification from outside of the TEE.

Code and Data Running in a Protected Enclave
Figure 1 Code and Data Running in a Protected Enclave

A Matter of Trust

With the recent announcement of the public availability of confi­dential computing in Azure, Microsoft became the first cloud provider to offer protection for data in use (bit.ly/2OyyxaH and bit.ly/2BxQkpp). The official post on the Azure blog (bit.ly/2I692X1) describes it well: “… Azure confidential computing protects your data while it’s in use. It is the final piece to enable data protection through its lifecycle whether at rest, in transit, or in use. It is the cornerstone of our ‘Confidential Cloud’ vision, which aims to make data and code opaque to the cloud provider.”

The concept of “opaque data and code” is revolutionary. For the first time, you can have trust in the cloud that no one, including the cloud provider, can read your data. The data is encrypted at every stage, and only authorized applications have the key to decrypt it and access it. This is accomplished in two ways:

• Hardware: Thanks to a partnership with Intel, Azure can offer hardware-protected virtual machines that run on Intel Software Guard Extensions (SGX) technology. Intel SGX is a set of extensions to the Intel CPU architecture that aims to provide integrity and confidentiality guarantees to sensitive computation performed on a computer, where all the privileged software (kernel, hypervisor and so on) might potentially be compromised.

• Hypervisor: Virtualization-based security (VBS) is a software-­based TEE that’s implemented by Hyper-V in Windows 10 and Windows Server 2016. Hyper-V prevents administrator code from running on the computer or server, as well as local administrators and cloud service administrators from viewing the contents of the enclave or modifying its execution.

The potential applications for confidential computing are really unlimited. Every time there’s a requirement for protecting sensitive data, trusted execution environments represent the building blocks on top of which you can enable new secure business scenarios and use cases. SQL Server Always Encrypted represents a typical application that provides increased data confidentiality and integrity (bit.ly/2zS7TPQ). Always Encrypted protects data in use from malicious insiders with administrative privilege and safeguards against hackers and malware that exploit bugs in the OS, application or hypervisor. With the use of confidential computing, SQL Always Encrypted protects sensitive data in use by providing in-place encryption while preserving SQL Server’s rich querying capabilities. This is an enhancement of the current Always Encrypted capability in SQL Server, which now ensures that sensitive data within a SQL database can be encrypted at all times without compromising the functionality of SQL queries. Always Encrypted accomplishes this by delegating computations on sensitive data to an enclave, where the data is safely decrypted and processed.

In addition to SQL Server, many industries and technologies can benefit from Azure Confidential Computing. In finance, for example, personal portfolio data and wealth management strategies would no longer be visible outside of a TEE. Healthcare organizations can collaborate by sharing their private patient data, like genomic sequences, to gain deeper insights from machine learning across multiple data sets, without the risk of data being leaked to other organizations. Organizations could share their datasets confidentially in order to combine multiple data sources to support secure multi-party machine learning scenarios. Machine learning services can obtain a higher accuracy of prediction by working on a larger trained model, but organizations can still preserve their own customer information (data is shared in encrypted format, visible only to the machine learning service). In oil and gas and IoT scenarios, sensitive seismic data that represents the core intellectual property of a corporation can be moved to the cloud for processing, but with the protection of encrypted-in-use technology.

Another significant application is the creation of a trusted distributed network among a set of untrusted participants. The Confidential Consortium Blockchain Framework enables highly scalable and confidential blockchain networks to reside in a public cloud infrastructure and to reap the broad benefits of Azure. Permissioned blockchain networks that rely on trusted nodes called validators to certify transactions benefit from the Azure confidential compute platform to better verify the chain of trust in a decentralized network. This simplifies consensus and, eventually, transaction processing for high throughput and confidentiality.

For this to happen, applications running in an enclave need:

1. A common cross-platform API that’s consistent across TEEs, both hardware and software-based, so that confidential application code is portable.

2. Attestation, which involves verifying the identity of code running in TEEs to establish trust with that code and determine whether to release protected data to it.

Let’s Get Started

To get started with confidential computing in Azure, you access the Azure Marketplace, deploy and configure a virtual machine, and install the Open Enclave SDK (openenclave.io/sdk). The Open Enclave SDK is an open source project for creating a single unified enclaving abstraction for developers to build TEE-based applications in the C and C++ languages. It supports an API set that lets you build an application once and deploy it on multiple platforms (Linux and Windows) and environments, from cloud to hybrid to edge.

During the deployment of a virtual machine, many of the basic VM deployment configurations are supported through the Confidential Computing VM Deployment workflow in the Azure Portal, including selection of the supported platform, creation of a new or join of an existing resource group and VNet, choice of storage and disk type, enabling diagnostics, and other properties.

You can read how to install the Open Enclave SDK from the official GitHub repository at bit.ly/2AdKs4D.

Once the SDK is installed, you can start building your first appli­cation to run in an enclave. As illustrated in Figure 2, an enclave application partitions itself into two components: an untrusted component, called the host, and a trusted component, called the enclave. An enclave is a secure container whose memory is protected from access by outside entities, including the host OS, privileged users and even the hardware. All functionality that needs to be run in a TEE should be compiled into the enclave binary. A host is a normal user mode application that loads an enclave into its address space before starting to interact with the enclave.

The Enclave App Model
Figure 2 The Enclave App Model

The Enclave Application

The sample application that I’m going to build prints a message in the enclave before calling back to the host to print a message from there, too. The host initially creates an enclave, and then it calls the enclave_message function in the enclave to print a message. This function then calls back to the host to print an acknowledgment message before returning to the enclave. Once the enclave function returns back to the host, the process is terminated.

First, I define the functions that I want to call between the enclave and host. To do this, I create a functions.edl file, which holds the enclave and host function definitions:

enclave {
  trusted {
    public void enclave_message();
  };

  untrusted {
    void host_acknowledgment();
  };
};

The EDL file defines the functions that call into and out of enclaves, along with the parameters that are passed into these functions. The oeedger8r tool, available in the Open Enclave SDK, is used to generate the marshaling code necessary to call functions between the enclave and the host. Marshaling code from the host to the enclave is done for security purposes, in order to mitigate certain processor vulnerabilities (such as spectre). You’ll find more information on using the Open Enclave oeedger8r tool at bit.ly/2BaQB2d.

Let’s examine the two functions defined in the EDL file more in detail. The enclave_message function in Figure 3 is implemented inside the trusted enclave and is invoked by the untrusted host. For the host to be able to call this function, the host needs to call through the Open Enclave SDK to transition from the untrusted host into the trusted enclave. To help with this, the oeedger8r tool generates some marshaling code in the host directory with the same signature as the function in the enclave, with the addition of an enclave handle so the SDK knows which enclave will execute the code.

Figure 3 The Enclave Function

void enclave_message()
{
  // Print a message from the enclave.
  fprintf(stdout, "Hello from the enclave\n");

  // Call back into the host.
  oe_result_t result = host_acknowledgment();
  if (result != OE_OK)
  {
    fprintf(stderr, "Call to host failed: %u (%s)\n", result, 
      oe_result_str(result));
  }
}

Please note that although enclave_message is using fprintf to print a message, this function has a dependency on the kernel to print a message on the screen, so this code can’t execute within the enclave itself. Instead, this function marshals the call through to the host to carry out the call on the enclave’s behalf.

The reverse is also true for functions defined in the untrusted host that the trusted enclave needs to call into. The untrusted host runs the host_acknowledgment function, shown in Figure 4, and the oeedger8r tool generates some marshaling code with the same signature as the function in the host.

Figure 4 The Host Application

#include <openenclave/host.h>
#include <stdio.h>
#include "functions_u.h"

void host_acknowledgment()
{
  fprintf(stdout, "Call from enclave acknowledged.\n");
}

int main(int argc, const char* argv[])
{
  oe_result_t result;
  oe_enclave_t* enclave = NULL;

  // Create the enclave.
  result = oe_create_functions_enclave(argv[1], OE_ENCLAVE_TYPE_SGX, 
    OE_ENCLAVE_FLAG_DEBUG, NULL, 0, &enclave);
  if (result != OE_OK)
  {
    fprintf(stderr, "Create enclave failed: %u (%s)\n", result, 
      oe_result_str(result));
    return 1;
  }

  // Call into the enclave.
  result = enclave_message(enclave);
  if (result != OE_OK)
  {
    fprintf(stderr, "Call to enclave failed: %u (%s)\n", result, 
      oe_result_str(result));
    // Clean up the enclave.
    oe_terminate_enclave(enclave);
    return 1;
  }
    
  return 0;
}

The host process, a regular C-language executable with a standard main function that creates the enclave and calls into it, is what drives the enclave app. It’s responsible for managing the lifetime of the enclave and invoking enclave methods. A host, though running in a cloud service, should always be considered an untrusted component that’s never allowed to handle plain text data intended for the enclave.

It’s worth noting the inclusion of the untrusted functions_u.h header that’s generated during the build. This file is created by calling the SDK tool oeedger8r against the functions.edl file. I also include stdio.h for the fprintf function. Unlike the enclave implementation, which includes a special enclave version of the stdio library that marshals APIs to the host, the host isn’t protected, so it uses all the normal C libraries and functions. This oe_create_­functions_enclave function is generated by oeedger8r. This function creates an enclave for use in the host process and allocates the enclave address space. The code and data to protect are then loaded into the enclave at the allocated address. To execute the host application, which is called functionshost, you can simply invoke it from the command line as:

functionshost ./enc/functionsenc.signed

Instructions on how to build the application are available on the Open Enclave SDK repository (bit.ly/2CrTM6m). The first parameter of the host’s main method identifies the path to the signed enclave library file. For testing purposes, you can also run the application in simulation model with the --simulate command:

functionshost ./enc/functionsenc.signed --simulate

The “Getting Started with Open Enclave in Simulator mode” guide describes how to set up the enclave simulator (bit.ly/2LvA2BT).

Attestation

Before an enclave can be trusted with confidential data, it needs to prove that it’s an enclave running in a valid TEE and that it has the correct identity and runtime properties to be trusted. This process of proving its identity and trustworthiness to a challenger is known as attestation.

Intel SGX supports CPU-based attestation, enabling a remote system to cryptographically verify that specific software has been loaded within an enclave. The process also bootstraps an end-to-end encrypted channel with the enclave for sharing data in a
protected manner. During enclave creation, a secure hash, known as a measurement, defines the enclave’s initial state. The enclave may later retrieve a report signed by the processor that proves its identity and communicates a unique value (such as a public key) to another local enclave. By using a trusted quoting enclave, this mechanism can be leveraged to obtain an attestation known as a quote, which proves to a remote system that the report comes from an enclave running on a genuine SGX implementation. Ultimately, the processor manufacturer (for example, Intel) is the root of trust for attestation.

The enclave being attested first needs to generate a cryptographically strong proof of its identity that the host can verify. This is done by asking the SGX platform to generate a report signed by the oe_get_report method in the Open Enclave SDK, whose signature is:

oe_result_t result = oe_get_report(OE_REPORT_OPTIONS_REMOTE_ATTESTATION, 
  reportDashHash, sizeof(reportDashHash), NULL, 0, quoteBuffer, quoteBuffesSize);

You can toggle between local and remote forms of attestation by removing or adding the OE_REPORT_OPTIONS_REMOTE_ATTESTATION option, respectively. The local report can only be verified by another instance of this enclave on the same machine, whereas the remote report can be verified by the oe_verify_report method running on a different machine.

An important feature of oe_get_report is that you can pass in application-specific data to be signed into the report. Typically,you sign data (using the reportDashHash parameter) into the report by first hashing it before passing it to the oe_get_report call. This is useful for bootstrapping a secure communication channel between the enclave and the challenger.

Once the report is generated and passed to the challenger, the challenger can call oe_verify_report to validate that the report originated from a valid SGX platform. A local report is verified using the SGX report signing keys held by the platform, and a remote report is verified using the certificate chain issued by Intel only for valid SGX platforms. At this point, the challenger knows that the report originated from an enclave running in a valid SGX platform, and that the information in the report can be trusted:

oe_result_t result = oe_verify_report(quote, quoteSize, &parsedReport);
bool verified = memcmp(parsedReport.identity.authorID, 
  g_MRSigner, sizeof(g_MRSigner)) == 0;

Finally, it’s up to the enclave app to check that the identity and properties of the enclave reflected in the report matches its expectation. The Open Enclave SDK exposes a generalized identity model to support this process across TEE types, defined in the oe_identity_t structure:

typedef struct _oe_identity
{
uint32_t idVersion;
uint32_t idVersion;
uint32_t securityVersion;
uint64_t attributes;
uint8_t uniqueID[OE_UNIQUE_ID_SIZE];
uint8_t authorID[OE_AUTHOR_ID_SIZE];
uint8_t productID[OE_PRODUCT_ID_SIZE];
} oe_identity_t;

I normally test for productID and securityVersion to validate the expected enclave identity, as follows:

bool productVerified = parsedReport.identity.productID[0] == 1;
bool versionVerified = parsedReport.identity.securityVersion >= 1;

I’d also ensure that the identity of the enclave matches the expected value, by verifying the uniqueID value. Bear in mind that any patches to the enclave will change the uniqueID in the future.

Before You Go

Before the enclave can be run, the properties that define how the enclave should be loaded need to be specified. These properties, along with the signing key, define the enclave identity that’s used for attestation and sealing operations. In the Open Enclave SDK, these properties can be attached to the enclave as part of the signing process. To do this, you’ll need to use the oesign tool, which takes the following parameters:

oesign ENCLAVE CONFFILE KEYFILE

It’s worth mentioning that, for testing purposes, you can run an enclave in debug mode without signing it first. However, as a word of warning, please be aware that enclaves running in debug mode are not confidential, and you should make sure that debug mode is disabled before deploying an enclave to production. Details on how to build and sign an enclave, and how to enable debug mode, are available on the Open Enclave SDK documentation (bit.ly/2EH0eJ7).

Finally, the host process is what drives the enclave app. It’s responsible for managing the lifetime of the enclave and invoking enclave functions. Hosts, though they run in a cloud environment, should be considered an untrusted component that’s never allowed to handle clear text or binary data intended for the enclave. As opposed to enclave functions, there are relatively fewer restrictions on building a host application. In general, you’re free to link your choice of additional libraries into the host application. In contrast, enclave functions have limited support for external libraries, for security reasons. As the Open Enclave SDK evolves, its support for additional libraries continues to improve. For information on supported libraries, please consult the Open Enclave SDK Web site (openenclave.io).

It’s Open Source!

As I mentioned, the Open Enclave SDK is open source! The inten­tion is for it to be a non-vendor-specific solution that supports enclave applications on both the Linux and Windows platforms. The current implementation of Open Enclave is built on Intel SGX; other enclave architectures, such as solutions from AMD or ARM, will be added in the future.

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant the Open Enclave team the rights to use your contribution. If you would like to contribute to the Open Enclave SDK, please refer to the guidelines for contribution at bit.ly/2TbQ4DJ.


Stefano Tempesta is a Microsoft Regional Director, MVP on AI and Business Appli­cations, and member of Blockchain Council. A regular speaker at international IT conferences, including Microsoft Ignite and Tech Summit, Tempesta’s interests extend to blockchain and AI-related technologies. He created “Blogchain Space” (blogchain.space), a blog about blockchain technologies, writes for MSDN Magazine and MS Dynamics World, and publishes machine learning experiments on the “Azure AI Gallery” (gallery.azure.ai).


Discuss this article in the MSDN Magazine forum