Article
01/28/2019

February 2017

Volume 32 Number 2

[Mobile DevOps]

Driving Development with Active Monitoring of Apps and Services

By Kraig Brockschmidt | February 2017

A perfect world for writing a mobile app would be one in which, first, you know exactly what your customers want the very instant those desires arise in their minds and, second, the code you write to fulfill those needs is delivered instantly to those customers. In short, there’s no gap whatsoever between customer needs, development and delivery.

Complete fantasy, right? If you’ve been around software development for, well, at least an hour, you know that reality comes nowhere close to this scenario. And yet every developer has probably encountered such a seamless loop, because it’s exactly what you experience when you write software for yourself. In fact, it’s probably what got you hooked on the magic of coding in the first place. You thought up an idea, wrote some code, pressed F5 and voila! Something amazing happened that you could control and shape and play with to your utmost heart’s desire … often very late into the night.

What probably happened next is that you started writing software for other people, which introduced various time lags into the process. Compared with pressing F5, the time it takes to get code changes delivered to external customers is a veritable eternity, hence source control, continuous integration, and continuous delivery to optimize the process, as discussed in the previous articles in this series. The same can be said about communication—compared with writing software for yourself, when you’re both customer and developer, the challenge of clearly understanding what customers need and what they experience is almost like traveling to a distant galaxy!

Thus, although you’ve done your best to deliver apps and services that meet customer needs, and tried to prevent defects from reaching those customers, the truth is that some defects will still make it through and some customer needs will remain unfulfilled. This is where monitoring, the last stage of the release pipeline illustrated in Figure 1, becomes the primary concern. Here, the focus shifts from delivery to listening, which means being aware of bugs and issues that affect customers, and to learning, which means discovering unfulfilled needs and validating all the assumptions that went into the creation of the release in the first place. What you hear and what you learn feeds back around to the beginning of the release pipeline to drive the next series of code changes.

Figure 1 Monitoring is Essential to Understand What Should Drive Subsequent Releases

In short, monitoring closes the loop between releases, ensuring a continuous stream of value being delivered to customers, and that it’s the right value. As such, it’s an essential practice in the continuous validation of performance, which is what DevOps is all about.

With the Microsoft DevOps stack for mobile, HockeyApp delivers monitoring for mobile apps, and Application Insights provides monitoring for Web applications and services. As we’ll see, both provide multiple ways to listen to what’s happening to the software out in the wild.

The Monitoring Culture

In Kraig’s last article (msdn.com/magazine/mt791797), he spoke about continuous deployment as a culture, because achieving that level of delivery optimization requires a deep commitment to code reviews and automated testing. It also means a commitment to monitoring:

Because no suite of tests is perfect, some defects will get through to production, so your team must actively monitor crash reports, telemetry, load imbalances, ratings, reviews, and direct customer feedback. This includes monitoring the app’s performance on each supported platform, and monitoring the health of your back-end services.
Your team must be committed to triaging and prioritizing issues and feeding them into the dev backlog so that corrections quickly get into subsequent releases.

Furthermore, monitoring is as much a matter for your test and staging environments as it is for production. Code you write to monitor the app and collect telemetry is just as prone to bugs as any other code, and requires time to debug and troubleshoot before release. Monitoring in test and staging environments also gives your team the chance to test your processes for handling crashes and feedback.

For example, custom telemetry that you collect from apps and services is typically designed to answer questions about customer usage and whether you’re meeting certain business goals (see Kraig’s post, “Instrumenting Your App for Telemetry and Analytics,” at bit.ly/2dZH5Vx and his Build 2014 presentation, “Understanding Customer Patterns,” at bit.ly/2d1p6fC). Defects in your telemetry code, however, don’t affect customers. Instead, telemetry defects can easily invalidate all the data you collect. High-quality data is everything, so it’s essential to fully exercise telemetry collection in both test and staging environments, keeping that data separate from production data, of course, and to scrutinize it for any collection and tabulation errors.

Test and staging environments also help you test feedback mechanisms in the app. When using HockeyApp for distribution to pre-launch testers, encourage those testers to provide feedback through the app itself, because that’s how your production customers will be doing it.

Most important, though, is that monitoring your test and staging environments is how you practice being responsive to crashes, customer feedback, and issues revealed by telemetry. Put simply, monitoring is both listening and responding, because responsiveness proves you really are listening. Responsiveness, too, means acknowledging feedback and then acting on it by setting up a regular process to get appropriately prioritized issues into your backlog so your developers can get working on them.

Establishing this process isn’t just an afterthought—you should start well before your first release, right alongside setting up build servers and release pipelines. In doing so, you can practice and hone your process with your test and staging environments preparation for your first release, because once you start making releases, monitoring must be continual.

A strong monitoring process can even, to some extent, make up for initial weaknesses in other forms of testing. It takes time and expertise to create really good automated tests, which could be challenging within the constraints of your release schedule or the availability of qualified candidates. Monitoring, on the other hand, requires less-specialized skills, and setting your team up for monitoring activities is generally quicker and simpler than writing a full suite of tests. Oftentimes, too, your early adopters are likely to be more forgiving about defects if they trust that you’re listening and working hard to address them. Good monitoring, then, gives you the opportunity to get your apps and services out to customers quickly, and to start collecting valuable telemetry and feedback, before necessarily having fleshed out your testing processes.

Monitoring with HockeyApp

The particular challenge of mobile apps (and desktop apps, too) is that they’re installed on millions of devices and device configurations that are out of your control. In contrast to a Web service, you can’t simply sign into the user’s mobile device and analyze a log file or attach a debugger. As a result, you need another system that can retrieve crash and usage information from those devices (respecting privacy, of course), and upload that data to a central service for analysis. That’s what HockeyApp is for.

In addition to providing valuable services for pre-launch distribution (as discussed last time), HockeyApp also helps you continuously monitor each release of your app throughout its lifecycle. You can collect sessions, events, crashes, and feedback starting with your first prototype, and then continue collecting information in your beta versions, release candidates, and finally the live app. This continuity gives you valuable data at every step of the process, and by monitoring from the earliest stages you’ll be able to catch defects that made it through other pre-release testing processes. Thus, you’ll be aware of defects as soon as they affect live customers, which helps you prioritize fixes in near-term app updates.

Obtaining information from within running apps means instrumenting the app with HockeySDK. Native SDKs are available for Android, iOS, Windows and macOS; additional SDKs support cross-platform technologies like Xamarin, Cordova, React Native and Unity.

Let’s look at a Xamarin app as an example. First, you’ll need a unique App ID from HockeyApp for each target platform. Start by creating a free account on hockeyapp.net, then click New App on the main dashboard. This invites you to upload an app package, or click “Create the app manually instead” to enter details directly. Either way, you’ll then see a page like the one in Figure 2 (for the MyDriving iOS app, aka.ms/iotsampleapp), where the App ID appears in the upper left.

Figure 2 Finding the App ID on the HockeyApp Portal

For Xamarin, install the HockeySDK.Xamarin NuGet package into each project in the Xamarin solution, except with Windows where you use HockeySDK.UWP. You’ll then initialize the SDK during startup using the App IDs from the portal. For iOS, place the following code within the FinishedLaunching method (AppDelegate.cs), replacing APP_ID_IOS with the value from the portal:

using HockeyApp;
// ...
var manager = BITHockeyManager.SharedHockeyManager;
manager.Configure("APP_ID_IOS");
manager.StartManager();
manager.Authenticator.AuthenticateInstallation();

This code automatically enables crash reporting, user and session tracking, and in-app updates for pre-release builds (those distributed through HockeyApp). To disable features, set properties of the BITHockeyManager class before calling StartManager. For example, the following code disables the in-app update feature:

var manager = BITHockeyManager.SharedHockeyManager;
manager.Configure("APP_ID_IOS");
manager.DisableUpdateManager = true;
manager.StartManager();
manager.Authenticator.AuthenticateInstallation();

In Android, put code like the following into OnCreate (MainActivity.cs), again using your App ID, and add a call to MetricsManager.Register because session tracking isn’t done automatically:

using HockeyApp.Android;
using HockeyApp.Android.Metrics;
CrashManager.Register(this, "APP_ID_ANDROID");
MetricsManager.Register(Application, "APP_ID_ANDROID");
UpdateManager.Register(this, "APP_ID_ANDROID");

For Windows with the HockeySDK.UWP NuGet package, it’s just one line in the App constructor:

HockeyClient.Current.Configure("APP_ID_WINDOWS");

Complete documentation for the HockeyApp API can be found on support.hockeyapp.net/kb. For Xamarin.iOS and Xamarin.Android, specifically look at “How to Integrate HockeyApp with Xamarin,” at bit.ly/2elBqGp. For Windows, see “HockeyApp for Applications on Windows” at bit.ly/2eiduXf. You can always use the MyDriving app code as a reference, too.

With the SDK initialized, let’s try out the crash reporting on iOS. The best approach is to throw an exception in a button handler or use the test crash that’s part of HockeySDK. In iOS, for example:

crashButton.TouchUpInside += delegate {   
  var manager = BITHockeyManager.SharedHockeyManager;
  manager.CrashManager.GenerateTestCrash();
};

After adding this code (and a button to the UI), build and deploy a Release version to a test device, run without debugging, tap the button to trigger a crash, and then launch the app again. A dialog should appear that asks you to submit the crash report, and a few seconds later the crash will appear on the HockeyApp portal. There, you’ll see that the crash report shows only memory addresses because the Release build doesn’t contain debug symbols. But don’t enable symbols, because that would allow hackers to easily disassemble your app. Instead, the bin folder of your Xamarin project contains .dSYM (for iOS) and .pdb (Windows/Android) files, which are standalone bundles of debug symbols. You can upload these symbol files to HockeyApp (called server-side “symbolication”) so that crash reports will display full class, method and file names, as well as the line numbers, as shown in Figure 3.

Figure 3 The HockeyApp Portal Showing Full Symbol Information for a Crash Report

The crash overview page shows how often a crash has happened, how many users were affected, and which OS version and device model those users were using. You can compare this to the number of unique users and sessions shown on the app overview page. Look especially at the indicator for the percentage of crash-free users, which is a good measure of app quality.

In addition to these standard metrics, you can get more insights into how your customers use your app by adding custom events. For example, the following line would record when a user starts playing a video:

HockeyApp.MetricsManager.TrackEvent("Video started");

You’ll use the name “Video started” when searching for events in the HockeyApp portal, so it’s important to give every event a meaningful name. It’s also important to define events at a level that will produce meaningful and actionable data; again, for general guidance refer to the Instrumenting Your App for Telemetry and Analytics blog post at bit.ly/2dZH5Vx. Note that HockeyApp limits you to 300 unique event names per app, but will log unlimited instances of each event name.

To attach more information to an event, use properties like this:

HockeyApp.MetricsManager.TrackEvent(
  "Video Started",
  new Dictionary<string, string> { { "filename", "RickRoll.mp4" } },
  new Dictionary<string, double> { { "duration", 3.5 } }
);

Although these properties and measurements aren’t visible in HockeyApp, they can be explored with the Analytics feature in Microsoft Application Insights, as explained on bit.ly/2ekJwjD.

Another way to understand user behavior it is to directly engage with customers through the HockeyApp feedback feature, which is important for both pre-launch and post-launch customers alike.

When you make a new test release during the pre-launch phase, all your testers will receive an e-mail to which they can reply directly to report issues and suggestions. HockeyApp parses those responses and collects them for you on the portal. Then, for both pre- and post-launch customers, you can integrate a feedback view right in the app using the HockeyApp SDK. Here’s how to use the iOS shake event for this purpose:

public override void MotionEnded(UIEventSubtype motion, UIEvent event)
{
  if (motion == UIEventSubtype.MotionShake)
  {
    var manager = BITHockeyManager.SharedHockeyManager;
    manager.FeedbackManager.ShowFeedbackComposeViewWithGeneratedScreenshot();
  }
}

The HockeyApp FeedbackManager automatically captures a screenshot of the view that’s visible when the shake gesture occurs. It then brings up a view in which the user can annotate the screenshot, attach more screenshots, and provide a text message to describe the problem or suggestion, as shown in Figure 4.

Figure 4 Annotating Screenshots on iOS

Submitting feedback opens a conversation between the user and your team, and once your team replies, the user gets an in-app notification to keep that conversation going. This is perhaps one of the most valuable features of HockeyApp—putting you in direct contact with your customers, which is hard to achieve otherwise. Those direct conversations are a wonderful source of specific, detailed data that can help to validate (or invalidate) your interpretation of the broader data you obtain from telemetry events. Both are really necessary complements.

Like crash reports, user feedback should ultimately create new work items in your development backlog so your team can address issues in future releases. This closes the DevOps loop as shown back in Figure 1. In fact, you can connect HockeyApp directly to your backlog by clicking on the Manage App button, then clicking on Bug Tracker and selecting your backlog tool (such as Visual Studio Team Services). Once connected, HockeyApp automatically creates a new work item for every crash group and feedback thread.

Assuming you have a process in place to regularly triage and prioritize new work items, you can respond quickly to crashes and feedback from your customers by making the appropriate changes in your source code. If you’ve also set up an automated release pipeline as you’ve seen through this series of articles, those changes will quickly be built, tested and fed into release management so they quickly make their way to customers. And once that release is done, your continuous monitoring should reveal fewer crash reports, higher customer satisfaction and greater overall app performance in every way that matters to your business.

Monitoring Web Services with Application Insights

Application Insights is a service that gives you a clear view of how your Web apps and services are performing and what your users are doing with them. It notifies you as soon as any problem arises, and provides diagnostic data and tools. It can also analyze and visualize properties of custom events in association with HockeyApp, as described at bit.ly/2ekJwjD. With all that knowledge, you can plan and prioritize future development work with confidence.

Application Insights works for Web apps running on a variety of platforms, including ASP.NET, J2EE, and Node.js, on both cloud and private servers. To instrument an app, start on the Platform Support page (bit.ly/2eydqQU), select your language/platform of choice, and follow the instructions. With a few simple steps, you can then monitor your Web app code, client code in Web pages and any other back-end services, bringing the key data from all components (including HockeyApp) together on a dashboard.

A typical group of performance charts for a Web service, as shown in Figure 5, displays information like the following:

Release annotations (first chart): A blue dot marks deployment of a new app version. In Figure 5, notice the slight increase in server response times, which suggests doing some diagnostics to determine why.
Rate of HTTP requests to the Web service (also the first chart): The thin, lighter segment distinguishes user requests, test requests and requests from search engine bots. Click through for breakdowns by URL, client location and other dimensions.
Average response time across all requests (second chart): The unusual peak appears to be associated with a spate of exceptions.
Uncaught exceptions and failed requests (third chart): Shows requests with a response code of 400 or more; click through for individual failures and stack traces.
Failed calls to dependencies (fourth chart): Shows databases or external components accessed through REST APIs. The peak suggests that some app failures are caused by a failure in another component.
Availability (fifth chart): Web tests send synthetic requests from around the world to test global availability. The first gap appears to be associated with the sudden rise in exceptions; the second gap might be a network issue or server maintenance.

Figure 5 A Typical Group of Application Insights Performance Charts for a Web Service

You can also get charts for a variety of other metrics, such as user, session and page view counts; page load times, browser exceptions and AJAX calls; system performance counters for CPU, memory and other resource usage; and custom metrics that you code yourself.

Of course, you don’t want to stare at charts waiting for a failure, so Application Insights can send detailed e-mail notifications for unusual failure rates, availability drops, or any number of other metrics, including those you code yourself. Some alerts are configured automatically, such as smart detection of failures, which learns the normal patterns of request failures for your app, then triggers an alert if the rate goes outside that envelope. In any case, the e-mails contain links to appropriate reports, like that shown in Figure 6, through which you can diagnose the problem by clicking through the metric charts and looking at recent events. From a request log, too, you can navigate to the details of any dependency calls, exceptions or custom telemetry events that occurred while the request was being processed.

Figure 6 Clicking Through a Diagnostic Report to See Details

When you’ve identified a problem, you’ll want to get the issue into your development backlog right away. If you have an automated release pipeline set up for your services, you can make the necessary code changes in short order and quickly test and deploy those changes to your live servers.

To help with this, Application Insights lets you easily create a work item in Visual Studio Team Services directly from a detail view. Just look for the New Work Item + control. Instructions for configuring this connection are at bit.ly/2eneoAD.

You can also consume diagnostics from Application Insights (and HockeyApp!) from directly within Visual Studio using the Developer Analytics Tools plug-in at bit.ly/2enBwlg(a plug-in is also available for Eclipse). For example, every method that’s called in the process of handling a Web request is annotated in the code editor with the number of requests and exceptions. There’s also an option to instrument the server side of a Web app that’s already live, giving you most of the telemetry without the need for access to the code.

With logging, Application Insights captures log traces from the most popular logging frameworks, such as Log4J, Nlog, Log4Net and System.Diagnostics.Trace. These events can be correlated with requests and other telemetry and, of course, you can include custom events and metrics with diagnostic or usage information, both in your back-end code and client code. For example, you can make the following call in a service to periodically monitor the length of an internal buffer, where telemetry is the Application Insights object:

telemetry.TrackMetric("Queue", queue.Length);

In the Application Insights portal, you can then create a chart to display it, as shown in Figure 7.

Figure 7 A Chart in Application Insights for a Custom Metric

Other calls can be used to count business or user events such as winning a game or clicking a particular button. Events and metrics can be filtered and segmented by any additional details you send. These events give you insights into what users are doing with your app and how well it’s serving them.

If you want to look beyond the regular charts and search facilities that Application Insights provides, you can use Analytics, a powerful query language that you can apply to all the stored telemetry. In Figure 8, you can see a query that determines the times of day our Web site is being used, along with the countries that have the most users.

Figure 8 An Analytics Query in Application Insights

Wrapping Up

We hope you’ve enjoyed this series of articles that, along with Justin Raczak’s article on Xamarin Test Cloud (msdn.com/magazine/mt790199), introduced you to the Microsoft DevOps stack for mobile apps and their associated back-end services. Referring back to Figure 1, we’ve covered all the stages of the release pipeline from the source repository through build, release management, distribution and monitoring. At the same time, we’ve only scratched the surface: There’s much more for you to explore as you deepen the sophistication of your build and release processes, especially where adding multiple levels of automated testing is concerned. The different types of testing that can take place along the release pipeline are again shown in Figure 1, and here are some links to get you started:

Unit testing is the process of exercising individual units of code (such as methods). See “Getting Started with Unit Testing for Cross-Platform Mobile Apps” at bit.ly/2ekXt2v, which is a Microsoft Virtual Academy course with Kraig Brockschmidt and Jonathan Carter.
Integration testing also uses unit testing tools but focuses on the interactions between units of code (rather than individual units). Integration testing often uses mock data from live servers or databases, or live data in a read-only manner. Running integration tests on real devices using a service like Xamarin Test Cloud is highly recommended.
For UI testing, see Justin’s aforementioned article on Xamarin Test Cloud; for Apache Cordova apps, see “UI Testing with Appium” (bit.ly/2dqTOkc) in the Visual Studio Tools for Apache Cordova documentation.
Security or penetration testing means simulating attacks on your apps and services through a variety of vectors, something that can’t be taken for granted. We recommend James Whittaker’s book, “How to Break Software Security” (amzn.to/2eCbpW4), to get started. This is also an area where you might consider hiring a specialized consultant.
Performance diagnostics are obtained using many of the profiling tools available in Visual Studio itself (see information at bit.ly/2en62c3), along with specialized tools like the Xamarin Profiler (xamarin.com/profiler).
Load or stress testing exercises your back-end services under a variety of conditions to evaluate how they accommodate variable levels of demand. Here, you can also evaluate how your services scale up and down with demand so you can efficiently manage hosting costs. For more, check out Getting Started with Performance Testing at bit.ly/2ey6Dqe, in the Visual Studio Team Services documentation.
Localization testing validates app performance in different locales, which generally means direct UI testing to make sure localized resources and formatting variations are correct. It’s also very effective to recruit pre-launch testers around the world because they can point out both technical and cultural issues.

With that, we again hope you’ve enjoyed this series, and wish you all the best in your mobile app development endeavors!

Kraig Brockschmidt works as a senior content developer for Microsoft and is focused on DevOps for mobile apps. He’s the author of “Programming Windows Store Apps with HTML, CSS and JavaScript” (two editions) from Microsoft Press and blogs on kraigbrockschmidt.com.

Thomas Dohmke is a co-founder of HockeyApp, which was acquired by Microsoft in late 2014. At Microsoft, Thomas is a Group Program Manager responsible for driving the product vision and managing a team of program managers for HockeyApp and Xamarin Test Cloud. Reach him via e-mail at thdohmke@microsoft.com or @ashtom on Twitter.

Alan Cameron Wills is a senior content developer in Microsoft, writing about Application Insights. He lives and works at Ceibwr Bay, Wales.

Thanks to the following Microsoft technical expert for reviewing this article: Simina Pasat

Discuss this article in the MSDN Magazine forum