Revised February 2008
Summary: This article shares ideas on how to keep your Web site steady, safe, and running at full capacity. (5 printed pages)
As an architect, you play a key role in the definition process of a business-application framework; how you define the framework will make or break business continuity. Let me share with you some aspects of operational management, by using the example of an Internet-banking application.
Millions of people make transactions over the Internet whether by e-commerce shopping, credit-card transactions, bank-account statements via e-mail, and so on. Many companies have invested millions of dollars in hardware, security, and personnel to offer financial services over the Internet. I was a member of just such a team on an Internet-banking project for corporate customers of a financial corporation. As such, we asked ourselves some key questions when embarking upon this kind of project.
How Many Users Will Access My Web Site?
This is perhaps the single, most important question when starting such a project. Many times, the business teams that request our services do not know really how many users will access their site. This question goes beyond the number of users, for example, and becomes a question of when users will want access. The answer to these questions is the starting point for building a capacity plan to host an application. In my case, the number of clients that would access our Web application was not a problem, because this Web site was a product that was directed toward a small number of huge corporate investors.
What Kind of Web Site/Application Will We Create: Mission-Critical or Institutional?
Depending on the profile of an application, you could suggest that some systems must have redundancy to guarantee business continuity. In my case, although the number of expected concurrent users was low, I had to guarantee that the service would be available any time that my customer wanted to access the site. If a customer did not get access after three or more tries, it was a known fact that immediately the customer would call the business-support team to ask about the failure. If this event occurred frequently, I would run the risk of losing clients. No clients, no application. No application, no architects. No architects, no job!
In other words, we had to supply a system with failover capacity—mitigating as much as possible any potential lack of access, processing, and storage. In my site, I used clustered servers and load-balancing hardware. I had to identify some possible points of failure and apply redundancy over them.
How Secure Will My Application Have to Be?
Security is one of the most important aspects of any application. When you are talking about securing applications, the possibilities are infinite. You must determine something that is called "paranoia scale." This might sound like a joke, but "paranoia scale" is a measure of how secure your Web site must be, according to the business team's perspective. Some people think that a secure Web site must be like an FBI Web site, but this is not always the case. In my experience, however, security was a key point for our Internet-banking Web site; hence, our paranoia scale was high. We had to offer security, so that our customers could feel secure while performing their financial transactions. Internet-banking Web sites are preferred "targets" for hackers, as you can imagine, so that the security investment in financial corporations is large.
How Much Money Will We Have to Do This?
To implement hardware, failover, redundancy, security patterns, and high availability, you must know if your budget will be sufficient for all of this. This point can be a limiting factor, when you have to implement an application framework. High paranoia-scale values obviously require more money to implement.
By following the following pointers, you should be able to develop a good idea of how your environment will work. Some practices, however, are fundamental for any kind of project:
· Establish reference frameworks and architectural guidelines before you start a project.
· Build a stable and trustworthy version-control system for your applications. This will help you when a new application version that has been installed on your system crashes, and you have to rollback changes to the older versions. This is possible only with a centralized version-control system.
· Implement bug tracking for your systems. These databases are very important for future projects, and will help you to identify infrastructure issues.
· Create a change-management system to track all version updates and infrastructure changes. It is important for change-process documentation, and it's useful to formalize the process by IT and business teams.
· Have monitoring tools for your application site, so that, in case of failure alerts, the infrastructure team can take appropriate maintenance action.
· Consider redundancy, failover, and scalability aspects. Keep in mind, in the case of failure, that your hardware system will have to support more than the typical workload. For example, if you have three machines that are working with 90 percent load on their CPUs, and one of them crashes, the other two machines might be insufficient to support the overload.
· Have a disaster-recovery site. If your application has a high-availability requisite, you could consider a disaster-recovery site, in which you can replicate all systems with a reduced hardware platform. In this way, you can provide business continuity.
Nonfunctional characteristics are items that are not related directly to the business process—for instance, the number of concurrent users of an application. The following nonfunctional characteristics should be considered in an operational application environment:
Architects are responsible for balancing these nonfunctional characteristics, to provide a robust application environment. These characteristics can be distributed into three groups:
· System performance—Contains performance, availability, and short-term scalability. The driving force behind these characteristics is that the system is always there and always responds quickly.
· System control—Contains security, manageability, and maintainability. The driving force behind this group is that the system is always under control.
· System evolution—Contains flexibility, portability, and long-term scalability. The driving force here is to make it easier to change or add to the functionality of the system, or to migrate it between technologies and platforms as time goes on.
Balancing these groups is like running a Formula One race car: If you add more drag, you can run faster on curves, but you will have poor performance in top speeds. In our world, this means that if you improve performance too much, you might suffer some loss in control. If you build a system with poor portability, you might lose out in flexibility that would allow the system to migrate to another platform, and so on.
Keep in mind, when a new application is deployed into your framework, that you must know all of the aspects that have been previously described. Items such as number of concurrent users, security aspects, and transaction load are key factors that can affect the entire environment.
Be aware of how much time it could take to re-create your site, in case of disaster. For example, if your site's reactivation time is 12 hours, does the business area accept this period of non-trade?
· What type of business is this: critical or non-critical?
· Am I prepared to reconstruct my site, in case of disaster?
· Do I have an efficient test and deployment-process management?
· How balanced are my nonfunctional characteristics?
Girdley, Michael, Rob Woollen, and Sandra L. Emerson. J2EE Applications and BEA WebLogic Server. Upper Saddle River, NJ; London: Prentice Hall PTR, 2001. (ISBN: 0-13-091111-9)
Gulzar, Nadir. Practical J2EE Application Architecture. New York: McGraw-Hill/Osborne, 2003. (ISBN: 0-07-222711-7)
Kuchana, Partha. Software Architecture Design Patterns in Java. Boca Raton, FL: Auerbach Publications, 2004. (ISBN: 0-8493-2142-5)
Menascé, Daniel A., and Virgilio A.F. Almeida. Capacity Planning for Web Performance: Metrics, Models, and Methods. Upper Saddle River, NJ: Prentice Hall, 1998. (ISBN: 0-13-693822-1)
Web Portal Focusing on Capacity-Planning Issues for Enterprise Application Deployments. (Available at: http://www.capacityplanning.com/)
Wong, Brian L. Configuration and Capacity Planning for Solaris Servers. Mountain View, CA: Sun Microsystems Press, 1997. (ISBN: 0-13-349952-9)
Rogério Cruz is architect-in-charge of enterprise architecture for J2EE solutions in Europe's largest commercial-banking organization. He is responsible for the maintenance and support of all Java J2EE intranet and Internet platforms. His team is responsible for technical regulations and architecture-guidelines definitions; and he works with lines of business to delivery solutions, to maintain efficiency and contribute to business time-to-market. You can contact Rogério at email@example.com.
This article was published in Skyscrapr, an online resource provided by Microsoft. To learn more about architecture and the architectural perspective, please visit skyscrapr.net.