CBA Case Study (Part I)
August 28, 2006
Click ARCast: CBA Case Study (Part I) to listen to this ARCast.
Ron Jacobs: Well, this is Ron Jacobs and I am here at Commonwealth Bank of Australia, where I am joined by the team who is responsible for kind of putting this whole thing together. And, guys, I am delighted to chat with you today about the CommSee Project. And, just so that we can get this off the table to begin with, what does CommSee mean? What does that stand for anyway?
Stuart Johnson: The first part, "Comm," is "Commonwealth Bank" and the last part, "See," is "Service Excellence Everyday."
Ron Jacobs: Ah! OK. It's really a project that's focused on increasing your customer service aspect of the bank. [Stuart: Absolutely.] Alright! So, tell me a little bit about the history. I mean, where did this whole project begin? Where did it come from?
Stuart Johnson: OK. Back in 2004, we had an existing VB application that was servicing the premium part of the business. Back then, it was… We had to build an application separately to the rest of the bank in that we had a relationship model. Our business was focused on knowing more about customers and their total holdings with the bank. The previous products in the bank had focus more on an account basis. So, that's where our initial project was.
Back in 2004, the bank made a decision that that concept was going to be rolled out across the whole bank. So, the initial architecture was, like I said before, a VB application that is predominantly batch focus. So, we are getting feeds from back end, mainframes systems putting into our back-end database. We only had a small number of users, so that was OK. It didn't matter if the balance was a day out of date. We had one or two online services, but it was predominantly batch.
Once the decision was to roll this across the bank, the whole world changed for us. We had to switch to more online, integrated application. We end… We are going from couple of thousand users to now we have up to 30,000 users. So… We then have to rethink how we are going to do the architecture and that was early 2004.
Ron Jacobs: So, one of the things, I think is interesting about this is the comment you made that, previously the bank kind of thought of accounts [Stuart: Yeah.] and now you are switching to thinking about people who have accounts, who might have lots of different accounts through different parts of the business that may be these accounts live on different systems and that need to be integrated together?
Stuart Johnson: And that was the reason the bank launched a project called "Which New Bank" and it was about one of the CommSee was a part of that and it was about a single view of customer.
Ron Jacobs: Well, I think that's really important and a very subtle distinction yet when you think about, from the customer's point of view, They don't want to come in and deal with somebody who only knows part of their of story, they want to deal with their... all of their accounts, they think of their world in that way. Why shouldn't you?
You know, but let's move on to think a little bit more about the scope. You mentioned going from 2,000 roughly users to about 30,000 and I imagine then lots of locations, as well.
Stuart Johnson: We have over 1,100 branches all around Australia. And the original scope was actually less than 30,000… The original scope was to roll out, um… replace the branch network. Since that time, we have… It's now then rolled across all the call centers. So, it's up to about 1,700 sites now. At any one time we have 18,000 connected users.
Ron Jacobs: Wow! So, guys, when you were architecting the system, there probably were lots of decisions you had to make up-front about how you are gonna connect all these users in these vast different locations and make it work with reasonable performance. What kind of things did you think about, early on?
Edward Gallimore: We had a very slow network, initially. A lot of our bank branches even had sort of 64K link through to the bank branches and there were probably five people working on a connection. So, it was very important that we had a very efficient communication between client and the server. We subsequently increased the size of the pipe but we still had to ensure that we were very miserly about how we communicated up the wire.
Dan Green: [Ron: Yeah.] And I think that ties in also with the choice of a rich client application. It enabled us to think about how we could potentially cache data to help improve responsiveness, it also enabled us to think about certain techniques for designing front end to allow the drip feeding of information, so that, you know, our users could see certain parts of a customer's information before other parts might have been available whether that be because the network was running slowly at that time or whether it be that certain systems we were integrating with were slower than others, by designing UI in a certain way we could keep that perception of responsiveness even when networks might be tight or back-end systems might be slow.
Ron Jacobs: Now, you've kind of come to the first big fork in the decision tree of the architecture that lot of people will make in an application like this which is, Should it be a Web application or a smart client application and a lot of people are, to tell you the truth, a lot of people sort of default to Web application, because it seems like its simpler, no deployment and these kind of things, yet you opted for smart client. Why did you choose that?
Edward Gallimore: We control the desktop. So, the big down side of a smart client traditionally have been the deployment issue. But given that we do control the desktop, given that we have environment where we can push out the client, that wasn't such an issue. So, we tended towards a smart client implementation. Because it gave us the benefits of the caching, it gave us so much richer client experience.
Dan Green: It gave us the ability to do visualizations of data that can be very difficult to do on the Web. [Ron: Yeah.] That's something we definitely exploited in certain parts of the application. Then we are sending less data down the wire, because the GUI is already there. We are just sending small packets of data.
Edward Gallimore: It can also tend to be a nicer development model, as well, in terms of not having considerable code goes in the browser and what code goes in the server, so…
Ron Jacobs: So, did you also have in mind... um... an offline functionality, so if for some reason the network was down that the branch could function, people could function offline? Does that work?
Edward Gallimore: It was considered. Some of the predecessor code, which was prior to CommSee, did perform in that way. But that was in time when the network was very unreliable. The network to the bank branch is now comparatively reliable, so we don't need it there. The other place where it would have been advantageous was for the relationship manager. But it was deemed that the cost of developing an environment, which worked in a disconnected manner wasn't justified relative to a GPRS sort of mobile type solution.
Ron Jacobs: OK. Well, that's a reasonable thing to do. I mean, you did analysis, you said "Well, it would be nice but it's not worth the cost."
Dan Green: Yeah. And it certainly, I think, continues to be considered. There are parts of the application where real time nature is just inherent in requirement. But for other parts of the application where it might just be form entry onsite with a customer, if you are originating a loan or something like that, consideration is still being given to potential of embedding offline capabilities in the application, you know in storing that data and forwarding it on to our back-end systems when they connect up to the network. We know we have that potential.
Ron Jacobs: OK. So, if we are thinking about the services that you guys decided to. Well, first off, you decided to use a service oriented architecture type of approach with this. Um... how did you think about creating the services, from a tech...? Let's just begin with the technology. What did you do?
Stuart Johnson: OK. So, originally in the old days when we had the VB model, we had some Web services at that time. But what we tended to build was, we built more of an application that happen to expose Web services. When we were moving to the new model, we knew we had to support a lot more developers and a much more agile, flexible approach. So, we initially sat down and went through requirements about what we needed our service framework to do.
So, we kind of listed the requirements we thought, initially we thought we gonna build this or… because there wasn't anything in the marketplace at that time that we could go and buy. We happened to come across the work that was being done in the patterns group, EDRA, and we did an evaluation and we felt that it did about 80 percent or more of what we thought we needed.
We took that as a base and we built on top of that. And that's being largely… very successful and one of the contributors to the success of the project. Once we… What we… One of the reasons of the success is once we had a stable framework for services, we haven't then built, we haven't built it out to become a bloated framework. It's kept… It stayed largely as it is for last two years.
Ron Jacobs: Now, you went with Web services as really the backbone of the way that the client is communicating with the back ends. A lot of people have said, "Ah! You know, Web services' performance is really going to be a concern." Did you find that to be the case or do they perform adequately for you?
Dan Green: One of the things people might talk about with regards to Web services and performance is the bloated potential for XML. So, we have certainly built some SOAP extensions and through the various interception points are exposed to you in the Web services pipeline, we've got some compression algorithms, for example in there to help us out on that front. So, in terms of the performance impact based on the bloat of XML the Web services might result in, we haven't necessarily seen that. We have put it through the perf tests and we are getting the results that we need.
Dan Green: Coming back to the discussion before we got into services, I think it is also interesting to look at how we actually partitioned the application up and internally we talk about the idea of having private services and public services and so our rich client tier talks to our application servers through a layer we call private Web service, private Web services, I should say. Because when you talk about Service Oriented Architecture and Web services, there are various tenets you want to follow. One of them, you know, is contract first and how do you design your contracts with those services.
Now between our rich client and our application server we can only have the same developers working on that functionality, so we want to enable them to be as productive as possible. And therefore we don't want necessarily to constrain them in terms of how they design those services to may be support other consumers. They're going to be the only consumer of that service. So, private Web services are really designed to support a form or a screen. Or, in our terminology, a WinPart. And so, that flexibility is with the developer there. They don't have to get to enterprise focused in how they design those services. So, it's just like a normal n-tier app before we were using DCOM in the past or whatever. It wouldn't necessarily have to get too worked up about the contract that they come up with. It's the same thing now.
They can design it to support themselves. But then at the application server tier, that code is then going to talk to back-end systems or to a database for example. And when it talks to back-end systems, that's when we plug-in to the services Stuart was talking about before. That has gone through this process of the service oriented architecture approach making sure that the security is right at that level, making sure that the way that the contracts are designed, the APIs support an enterprise focus.
So, we actually have these different approaches. A rich client talks to an application server through a Web service that's generally the domain of a particular developer and they can design that to suit themselves. There is some prescription, prescriptive areas that we force them to follow through frameworks and code generators. But then, when they want to talk to back-end systems, they go through these public services that uses the EDRA framework, etc., that Stuart just spoke about.
Ron Jacobs: OK. Now we are joined by Jon, who is a... who has been working on the data tier of the environment here. So, Jon, tell me little bit about what's the data environment look like in this application.
Jon Waldron: Um… Part of the whole problem we are trying to solve with respect to data is being sourced from many back-end systems, pretty much all over the place and that's the legacy of a having an organization that's been around for a hundred years with the systems built in the sixties and all that sort of stuff [Ron: So, we have some file cabinet somewhere… and that sort of things.] Oh, we got all sorts of things. [Laughter]
So, and there's multiple ways to deal with that problem and the decision that we made was, we were pretty much going to leave the back-end system where they are and augment them with their own database. Rather than try and take off, a sort of, bite of a big chunk and try and reorganize all of that and put that into new open system modern sort of database. Or anything of that, we will do that iteratively as time goes by. But we were augmenting that with their own database. So, we've got a mainframe, which basically calls the customer information file along with various product systems, which as… as Stuart and Ed were talking about before, the product systems are focused on an account-by-account basis product like a credit card system or demand deposit account system, very much very a siloed based kind of data model. And then also scattered around the bank elsewhere you will find out this sort of systems, as well.
So, we basically decided to leave that as they were and we're going to using the Web services, sort of Service Oriented Architecture, just pull in the data as we needed and augment with our own data. So, our database that we built as part of CommSee, it includes a bit of sort of glue type data to hold only sort of identifiers together, relationship information, a lot of what you call customer soft data, things like your interactions and your tasks and bit of workflow stuff and then another component all together, which was the systems being used for, is product origination, as well. So, it's now storing a lot of the information for the new home loan applications and personal lending and credit card applications, etc.
So, it's become equally as important as any of the other back-end systems, the mainframes and anything else we have, and it's there all kind of used together, so the whole thing needs to hang up together for everything to work together, but we have certainly got ways of making them more resilient and more redundant, as well.
Ron Jacobs: So, you know, it's interesting that you mentioned, you have this problem whenever you bring legacy environments together, that they don't necessarily store something you need to store and so you have to introduce this other database, like you mentioned, to bring it together. The other problem that we often see is the problem of, kind of, identifiers that link together things, maybe a customer is referred to with one ID in one system but an entirely different ID in that system, so you have the... that was the glue that you were talking about.
Jon Waldron: That's exactly right. We've got at least, off of the top of my head I can think of, at least four or five different types of customer identifiers in other back-end systems and in our database we link all those together, so we try and keep it together, so we can find everything about the customer in one query.
Ron Jacobs: And did you have that... a lot of people have this problem where, they are not even sure which customers are the same ones between the systems, you know, and the address looks the same but the names slightly different and that sort of thing
Jon Waldron: Yes. So, we had that happening all the time, so first of all from the mainframe every night we get a nightly feed of customers which are merged together because they were found to be actually the same even though they are reported as two different entities but they have been merged, so we've some batch process going on that cleans all that up and then intraday, as well, whenever a user actually comes across a client that really is actually in the system twice for whatever reasons that they can actually put an entry into a queue and say overnight could you go and clean this up please and actually merge all together overnight.
Ron Jacobs: Ah! That's fantastic. Now, when we think about the Service Oriented Architecture in the constraints of the environment you guys have mentioned, you mentioned caching of data down at the client side, how did you think about kind of efficiently getting this data over to the client side only what's necessary, so that you are not clogging up these very restricted pipes with a lot of unnecessary push of data? How did you do that?
Edward Gallimore: The services are built as generic services, which sort of abstractly bring back all client information and the private service system, Dan was just talking to, are built to basically service the GUI component. So, we have a component that just shows name and address, there will be a private service that sheerly returns name and address back to the client, that private service may in turn, may be calling a public service which brings back the entire client information, but it's filtered and constrained down across the wire.
Ron Jacobs: OK. So, at the application server level, really the thin pipe, if you will, is between the application server and the client.
Dan Green: Absolutely. It does. Services are designed to make life easy for a developer and how they design things but it also optimize in terms of there is not really redundant data being passed down the wire.
Jon Waldron: So, there is also a very close correlation between each of these private services all way down to the database to an actual stored procedure of the database who returns data and is very much what you see on the screen is what's been got from the database we're not… the way mid-tier aggregation perhaps brings things from other data sources, as well. But we only really get back the data we need to service just that particular component.
Ron Jacobs: OK. But I am curious though still about the caching aspect.
Dan Green: So. On that front, each of our private Web services, as we call them, can be flagged to be the response within to be cached. So, if you are retrieving just salutations what titles Mr., Mrs., etc., for some, you don't have to retrieve that every single time. So, we have a Web service designed for certain reference data. We have variety of services designed to support reference data and we flag those as being cached. And then on the client you get the response back from the Web service, and then within our infrastructure, if you will, like the proxies we call them service agents. But the service agents have the smart to short circuit the request to the application server for those services that have been flagged to be cached. Then obviously its complex with any caching mechanism the complexity lies in when you invalidate the cache, so we have smarts in there about these to set it to invalidate after certain amount of time, or do you persist it to disk. Those are some of the challenges you face with caching.
Ron Jacobs: And do you use a database at the client side as well, or files or what?
Dan Green: We don't… What happens is, our services response cache is persisted to the disk. They are encrypted and compressed.
Ron Jacobs: Yeah, OK, well, so that sounds fantastic and then now there is, then kind of account transaction sort of data. Are you caching that at all or for some period of time or not at all?
Dan Green: Generally not. No if it's a user specific data we don't generally cache it because the invalidation of the cache then can become extremely tricky. If you start caching user specific data or I should say customer specific data and the user interacting with the application were to change that data and then move to another screen or something like that, is that other screen going to use the cached version or is it going to go to the source and get… some of there's complexity there. So, no, for user data we generally retrieve that.
Ron Jacobs: But the performance you're getting is good enough, so you're not worried about.
Dan Green: Indeed and again, I mean, some of that performance that we are getting is because of the way our screens are designed to be partitioned up into areas. And so we can drip feed information down, so you might be waiting, we have our equivalent of an hourglass, it's a little fancier, but you can be waiting for a certain part of data in one quadrant of the screen while the others are completing. So, that perception of responsiveness is important there, as well.
Ron Jacobs: So, you guys have really, for the user interface, you've gone with a composite style interface, it's very similar to what the patterns and practices team built later, the Composite UI Application Block, which is actually based on some of the work you did. Which I love that model, right, because in... in... we both had a little interchange here that we gave some ideas for your framework that you built the services with and you guys likewise became the seed for an idea that we later put into this application block. But the idea of these little parts that kind of independently function, so that like you were saying we can feed this one and while this one is populating this one is doing something useful, so you don't end up with that kind of very serialized wait, now I can do something. Right? [Dan: That's right. Yeah.]
Edward Gallimore: The core tenet of the architecture is WinPart. It… what's called the SmartPart in the CAB architecture. And it does allow that sort of atomicity of the components. So, if you do have three or four WinParts on the screen, WinParts can independently go away and get information. They can be independently responsible for saving back the information. If one of the WinParts throws an exception or errors in any way it doesn't impact on any other WinParts. So, as well as giving that parallelism of the data it gives us certain parallelism of development as well. So, developers can be independently working on WinParts on a single screen and the WinParts can then potentially be redeployed on other screens.
Dan Green: Yeah, that's an important point that Ed raises there regarding one of the things we discussed upfront, was various dimensions of isolation that we wanted to achieve, we knew we were gonna be scaling up to a large development team, a 150 or so devs. And there were discussions therefore around do we come up with common scheme? Is a customer a customer everywhere? So, the problems with isolation that can bring that someone wants to add an attribute to that object or change or split something and then you have got the coupling that you need to deal with.
So, what Ed was just talking about in terms of the WinParts achieving that isolation is important from the developer perspective, as well, in the… the… We retrieve the data from the back-end systems through public Web services or through a database and aggregate that into a data transfer object we happen to use the derivative of the dataset and then pass that up to our front end.
We… So, the scoping of these objects is really at a WinPart level. We haven't tried to build a one size fits all domain model for our application where you then get into those arguments across the development team and if they want to change something and had he performed impact analysis on what that change might have. So, the WinPart model also has affected the way we've designed our code base.
Ron Jacobs: So, recognizing that a project like this is never really DONE [laughs], but it is in production today...
Dan Green: And that's the key point. On that like the methodology we've stuck with stringent three-month release cycle which has been I think critical part of the success the business has recognized the requirements are never always correct up front, so the sooner you can release something, determine you got that requirement right, do we need to tweak it or not the better.
And we knew that up front, as well, that there will be on that three-month cycle, so that talks to our approach towards instrumentation for example we recognized that this was not going to be zero-defect software. But that didn't make sense. That economically didn't make sense to the business to be waiting until we have zero defects before we released, I think it made sense to release on the three-month mark. Obviously, testing is critical and you want to have as few defects as possible. We are doing our best on that front.
But we knew we are gonna have defects in production, so we built an instrumentation piece, baked that in spent fair bit of time on that. Because we knew our developers are going to need road maps to fix problems when the occurred in production.
And accompanying I guess that instrumentation is also our model about how do you update the application, how do you patch it when you do discover problems and you determine they are critical have you then fixed them and rolled them out. It also talks to a strategy we have I guess about piloting our application. We have this three-month cycle and every time we do a release we pick the target of our users and we deploy the new version to those while the existing users continue to use the old version, so we had to bake into the way we do certain things, certain code and techniques to help us that backwards compatibility, the ability to run side-by-side, if you will.
Ron Jacobs: So, I am just curious, though, what has the result been? It's been working? Are users happy? Is management happy? What's happening?
Jon Waldron: It's working very, very well and the general feedback has been very positive and I think as Dan alluded to, it has required a little bit of cultural shift from the whole organization for software that's as you said never really finished, you keep releasing every quarter, it's not a big turnkey solution that you got all finished now and with that in mind I think it's been incredibly successful and users are getting used to that and it's been doing very, very well.
Ron Jacobs: Well, guys it sounds terrific that it has been a great success, so thanks so much for joining me today. [All: Thanks, Ron. Thanks.]