Best Practices: Archiving Workspaces using Groove Server 2007 Data Bridge
Summary: Review the recommended best practices for configuring, monitoring, troubleshooting, and programming against the Groove Server 2007 Data Bridge (GDB). This article focuses on using the Groove Data Bridge to archive workspaces. (11 printed pages)
Jesse Howard, Microsoft Corporation
Applies to: Microsoft Office Groove 2007, Microsoft Office Groove 2007 Server, Groove Server 2007 Data Bridge
This article provides recommended practices for configuring, monitoring, troubleshooting, and programming against the Microsoft Office Groove Server 2007 Data Bridge (GDB). In a production environment, there are additional configuration, monitoring and troubleshooting tasks and practices that are advisable. This article focuses on using the Groove Data Bridge to archive workspaces. For more information about the minimum system requirements, pre-requisites, and minimum basic configuration tasks for the Groove Data Bridge, see Groove Server Data Bridge Functionality.
Archiving with Groove Data Bridge
Groove Data Bridge provides a central, server-based mechanism to manage Groove workspaces programmatically, using Groove Web services (GWS), and to back up Groove workspaces of which the Data Bridge is a member. While it may be possible, and even convenient, to use a Groove Data Bridge as both backup agent and data integration point, doing so may create contention for resources within the server. Creating Groove workspace archives requires a large amount of Data Bridge resources which is true of most event-driven data transaction models. Consequently, if a Data Bridge is being used to perform both archiving of workspaces and data integration, external transactions can become constrained during archive cycles, and archive cycles may take longer than expected during times of peak transaction load. It is recommended that any single Groove Data Bridge be used for workspace archiving or for data integration, but not both simultaneously.
Getting GDB into Workspaces
The first decision to make when planning to archive Groove workspaces with the Data Bridge is to determine how to get the Groove Data Bridge into each workspace. This can be done by making the GDB identity contact available to the Groove user population as described later in this article, and letting users invite the GDB into each space as needed which can be done without requiring custom code. It can also be accomplished with custom code using Groove Web services.
Adding GDB into Workspaces Manually
Users can invite the GDB into their workspaces (as needed) as an easy, low-cost, way to make the archive service available without managing code or involving SharePoint services (or other server-side applications). With a little bit of training, users can gain the benefit of protecting important workspaces without using an additional system and without needing to follow a server-side process to create workspaces. Some of the drawbacks are:
Capacity planning. Without a central system to allocate workspaces across servers, system administrators need to monitor servers more closely, and rely on end-user processes to push workspaces onto additional servers, if needed.
Manual workspace archive retrieval. Users and / or administrators need to retrieve archives manually, as there is no service or application to provide them. The Data Bridge does not support an automated workspace restoration feature, so users need access to the file share on which the archives are stored, or users must request the archive files from an administrator.
Policy enforcement. If the archive process is an “opt-in” process, there is no way to enforce the inclusion of a Data Bridge in all workspaces, or in workspaces of a specified type. In situations where critical data is produced in Groove, it may be advisable to enforce a policy, which must be accomplished using a custom, external application. In addition, see Best Practices: Integrating Data using Groove Server 2007 Data Bridge.
Adding GDB into Workspaces from a Central Server
If the ad-hoc process of users inviting the Data Bridge into workspaces does not meet the needs of the business, you can write custom code to manage workspace creation centrally, either through a SharePoint list or by using another server-based application, through Groove Web services. This architecture is the one exception to the previous recommendation that you not use GDB both for workspace archiving and data integration.
Managing the lifecycle of Groove workspaces through GWS is not equivalent to custom data integration with a back-office data source in terms of server load and impact on capacity. The advantages of managing Groove workspaces centrally for archival purposes can solve a number of the problems with client-driven, opt-in workspace management. It provides more control over capacity planning and expansion, allows streamlined archive retrieval, enforces a policy of archiving for all workspaces, or for workspaces of specified types. This article addresses those aspects of customized solutions that impact configuration and management of the Data Bridge, and not the architecture or coding of such solutions.
Configuring the GDB Identity
The GDB Setup process guides the administrator through setting up an Identity for the data bridge. For more information, see Groove Server Data Bridge Functionality. This identity is the same as an identity for a Groove client: the identity owns all workspaces and sends and receives all identity-targeted messages (for example, invitations). One notable difference between GDB and a Groove client, however, is that the account for a data bridge is not managed; instead, the identity is managed directly. So when configuring your data bridge for use in a managed domain (a typical scenario), using a managed identity on the data bridge enables your GDB to query the domain manager’s contact directory and enable policies on the GDB. Moreover, since the identity sends and responds to invitations, users seeking to invite the GDB into any workspaces need the contact information for the identity, and not the account.
While it is possible to run multiple identities under a single server, in most cases this is not a good practice. Each identity incurs a small amount of overhead to maintain, so the decision to use multiple identities should not be taken without consideration of the potential impact on performance and capacity.
After the identity is set up, a few options need to be configured. In most cases, you want the GDB to be a manager in every workspace of which it is a member, and to accept all invitations automatically.
To select invitations automatically
Select the GDB identity.
Click the Edit Properties menu from the toolbar.
Set the following option in the Invitation Processing window:
Under Acceptance Conditions click Only accept invitations from users in the same domain.
Using Only accept invitations from users in trusted domains allows the GDB to be used as a resource in cross-certified domains, as when a single enterprise has multiple domains.
Choose Manager as the minimum role.
The following sections describe a number of important topics and considerations when using the GDB to archive workspaces.
Working with Earlier Versions of Groove
By default, GDB is configured to work only with 2007 versions of workspaces. To allow GDB to participate in 3.X workspace, set the identity property to allow 3.0 and later workspaces.
Groove Folder Synchronization
If GDB is configured to automatically accept invitations, it’s a good idea to configure the identity to store Groove Folder Synchronization (GFS) documents on a monitored, well-managed, and sizable disk volume.
New in GDB 2007 is the ability to manage the remote Web services key and port number from the administrative console. If you plan to use the GDB to create workspaces (as in a server-based process), you should set the GDB to use a specified remote request key, and this key is needed for the external application to talk to GWS. Web service's settings apply to the server, and are the same for all identities.
Forcing Relay Connections
While you can deploy the GDB on any part of the network, there are steps that you can take to reduce the number of peer connections from Groove clients to the GDB.
When you have a small numbers of clients (fewer than 100), peer connections are a faster way of transferring data to the GDB, and Groove prefers to transfer data that way. As the number of connections to the GDB increases, however, performance on the GDB may degrade and cause GWS timeouts and increase workspace archive times.
To prevent peer connections on the GDB, it is recommended that you disable native SSTP connections. This is done by opening the proxy settings tab for the server (not an identity) and checking Disable TCP connections to the server. After you do this, GDB no longer accepts SSTP connections on port 2492. One side effect, however, is that GDB now encapsulates all communications to the relay with HTTP over port 443, which is much slower than SSTP, unless a direct outbound route from the GDB on 443 is created. If GDB can reach all of the relays it needs to reach over a direct route using port 443, it uses SSTP over 443 without HTTP encapsulation.
You can also accomplish similar results strictly with routing rules, but it may be simpler to apply the setting on the server, and then open a single route.
Considering Previous Versions
The integration of workspace archiving with the Groove Data Bridge 2007 is much improved over archiving with the Data Bridge available with Groove V3.1.
The Data Bridge now incorporates this functionality as a standard feature so you are no longer required to add a separate component. Workspace archive is enabled for a given service by checking the input box on the Archive schedule pane that allows you to set various options.
It is also necessary to set the Scan new workspace property set for the identity in order for the archive service to work properly.
The repeat interval is set to 24 hours by default, and depending on the volume of workspaces to be archived and sensitivity of the data, this should be adequate. If it takes longer than 24 hours to cycle through a workspace archive cycle, it is likely that the GDB is overloaded (see the Monitoring section in this article for more information). In any event, the repeat interval should always be set to a number substantially greater than the complete GDB cycle time. Creating workspace archives creates load on the GDB, just as assimilating deltas and processing GWS calls creates load. For this reason, the number of workspaces a GDB can accommodate depends on total usage including delta processing, archiving, GWS transactions and general administration. It is recommended that any given GDB support either workspace archiving or GWS processing, and not both.
The archive service adds a tool to each space it backs up, containing the following information:
Unique workspace identifier (for example, “n4c7xuhcs6zbeefvmfhjnxee9xhxmrkh7wwmkn2”)
The name of the identity backing up the workspaces
The message specified in the “Archive Message” setting on the Archive Schedule page.
It is recommended that you customize this to reflect the specific support process and SLA details for retrieving an archive if the location of the .GSA files is not open to the user population.
An option box (accessible only by workspace Managers) to enable or disable archiving on that space
A log table listing all existing logs for the workspace. The table contains the following:
Date of the log file
Complete path (as specified in the “Archive Location” setting on the Archive Schedule page)
The Media Set property (as specified in the “Archive Location” setting on the Archive Schedule page)
The size of the archive file
The Archive Comment (as specified in the “Archive Location” setting on the Archive Schedule page)
When configuring the archive service, you should customize the Archive Message and Archive Event Comment to reflect the specific policies governing workspace archive frequency and availability to the end user. Each workspace manager then has the contact information and SLA information available at all times.
The Archive File Retention Period setting determines how many copies of the workspace backup file (.GSA) to keep. For example, if you want to keep one calendar week’s worth of backups, set the period to 7. You then see a folder in the volume that is specified as a backup folder for each day that backups are created.
Managing Archive Security and Passwords
To manage archive security, choose the Allow the archive agent to set the password for the archive as the best option for most enterprise settings and then specify a single password for all archives. This allows the system administrator to easily access any archive, while protecting the archives from inappropriate access. If users are allowed to retrieve archives themselves, also allow the workspace manager to control access to each individual archive. Other options include configuring the archive service to allow the manager to specify a password. However, if the password is lost, it is impossible to access and open the .GSA file. You can also allow the archive service to set a random password however this may make it unwieldy to restore archives on demand.
Using the Archive Master Log Workspace
After you configure the Archive Service, you must monitor the Data Bridge in a number of ways to ensure operational continuity. The Data Bridge provides a tool called the Archive Master Log workspace that enables administrators to monitor the status of the archive service remotely. This workspace, like any other Groove workspace, requires that you install the Groove client on the administrator’s computer, and that the administrator invites her account into the Archive Master Log workspace using the GDB console. The log workspace contains a tool that enables the administrator to see the status and recent history of the archive service, including the date and time that each workspace of which the GDB is a member was backed up. It is critical that the administrator monitor both individual archive backups as well as calculating the total archive cycle time on a regular basis using this workspace. Failure to archive a workspace is cause for concern, and you should investigate. More importantly, backup cycle time is an indicator of the total load placed on the server during the archive period, and although the cycle time increases in general, any sharp upward trend indicates that the resources of the Data Bridge are under stress. This can be caused by external factors, such as delta processing, or because of the number of workspaces on the server. These topics discussed in more detail later in this article.
Monitoring the File System
As the number and usage of workspaces increases, more and more disk space is consumed. Depending on the configuration of storage for the GDB, you can store GDB data, GFS data, and archives on the same volume, different volumes on the same array, or on separate arrays. In any event, it is important to monitor each of these volumes to ensure that the data bridge is not running out of space.
The paths to monitor are as follows:
GDB_Data. The path set during the installation for the data bridge workspace and account data. You should store this data separately from all other data and monitor it carefully. If the data bridge runs out of storage for this data, the account or workspaces may be corrupted, the data bridge stops running, and may not be able to restart. It is recommended that you store this data on redundant, attached storage in its own volume.
GFS_Data. The path (configured as part of the identity) that GDB uses to store all GFS-synchronized file structures. If the GDB runs out of data storage, it may stop synchronizing some or all workspaces. It is recommended that this path point to a separate volume from other GDB-related data, and that you monitor it closely, as it may increase suddenly due to unanticipated workspace usage. An increase of this data corresponds to an increase in GDB workspace data, although not necessarily by a 1:1 ratio.
Groove Archive Data. The location set in the archive service configuration pane where Groove stores all of the archive files. During each archive cycle, a new folder under this path is created, and the archives for that period stored in the folder. As a result, the frequency of archive cycles and number of cycles retained impacts the size of this file structure. When configuring your archive service, you should plan on a maximum of 2 GB per .GSA file, multiplied by the maximum number of workspaces anticipated, multiplied by the number of archive cycles retained. If GDB runs out of storage for this path, it stops archiving workspaces.
Scaling and Capacity Planning
Planning any deployment of the GDB as an archive service requires capacity and scalability planning. For small deployments, where the number of workspaces is not expected to exceed 1000 workspaces, a single GDB should suffice; for larger deployments, however, you need to develop a more detailed plan. In general, the capacity of a GDB varies depending on the following factors:
Number of workspaces
Volume of data change (velocity of data) in each space
Number of members per workspace
Total size of each workspace
Rate of workspace creation
Rate of workspace deletion
As any of these factors increases, the amount of processor available for archiving decreases, and archive cycle time increases. These factors also influence each other so that, for example, an increase in the average velocity of data across all spaces decreases the total number of workspaces that the GDB can support.
Managing the Ramp
The primary concern of an administrator deploying GDB for archive purposes should be the rate at which new workspaces are created, or the ramp. If a large number of workspaces are created in a short period, it may be very difficult to track and accurately predict resource consumption by the GDB. Consequently, if a very steep ramp is anticipated, the deployment plan should include steps to add additional data bridges on very short notice.
Adding data bridges on short notice can be complicated if users are inviting the GDB into workspaces in an ad-hoc manner because the administrator must communicate the new GDB name and vCard to the users on short notice and rely on the users to respond properly. It is always much easier to manage the ramp through a central, server-based workspace creation process.
If a shallower ramp exists, it is possible to monitor GDB usage over time and predict the need for additional resources. Adding capacity may be less complicated, particularly if workspaces are being added to the GDB by user invitation.
Determining Available Capacity
As noted above, available capacity on the GDB is derived from a number of factors, many of which are not under the control of the administrator. Administrators must consider a number of factors, including average processor consumption, memory consumption, storage capacity, total archive cycle time, and delta processing capacity. The latter is particularly hard to measure, as there is no mechanism for monitoring delta processing directly. Instead, the administrator must rely on a combination of indirect measurements, mostly taken from system counters in performance monitor (perfmon), to judge resource consumption. Aspects of archive service capacity which you can measure directly are:
File system capacity, as noted previously.
Archive service cycle time—There is no good benchmark for an average archive service cycle; each implementation varies based on hardware, configuration, workspace usage and size of workspaces. A more important determinant is the velocity of the cycle time or the rate at which it either increases or decreases. A sharp one-time increase in the cycle time may not indicate a trend toward exceeding capacity; a series of increases in cycle time indicates growing use of the archive service, and can indicate that additional capacity may be required.
Expanding capacity of the archive service entails adding additional GDB servers to the environment. GDB does not support clustering, nor does it support aliasing of identities or other external load balancing techniques, so adding additional GDB servers means that the archive solution must incorporate each new server into the workspace creation process as needed.
In scenarios where users are creating workspaces on their client, then inviting GDB to the workspaces, this generally means distributing the vCard of the new GDB identity to the users with instructions to use the new contact. The old spaces would need to retain the old GDB member.
In scenarios where workspaces are created centrally, by the GDB, the application initiating the workspace creation must call GWS on the new server, as opposed to the old.
In both scenarios, the addition of GDB servers adds complexity to the managing of the archives themselves, especially to the process of restoring archives.
Each GDB server stores the archive files it creates in a location specified in the Archive Service configuration tool. The data bridge does not support remote requests to restore archives from the client, so any deployment plan should include a process (and possibly an application) that allows users to request restoration of a given archive. One possible way to accomplish this is to have users send a message (e-mail or Groove message) to the administrator, requesting the archive. This can be cumbersome for both parties and may not scale in an enterprise. Another mechanism is to point the Archive Service at a shared location that users can access themselves, but this may present security or other administrative problems. A recommended approach is to create a custom web application that exposes the archive spaces to users while protecting the file share on which they reside, and allows users to download them directly or to request restoration, which can be done on the GDB using GWS, followed by an invitation to the restored workspace.
After users receive an archive, they can restore it. This creates an instance of the workspace that is current as of the time the archive was created; it does not overwrite or restore the existing instance of the workspace. Users must choose either copying the content they want to restore from the new instance to the initial instance of the workspace, or deleting the initial instance and inviting its members to the new instance of the workspace.
If workspace restoration uses an application and GWS on the Data Bridge, the restoration process should allow for the destruction of either the initial workspace or the restored workspace to prevent proliferation of “duplicate” workspaces.
In many cases, it may be preferable to manage workspace creation and user invitation from a central application. This requires custom application development. If you use an external application to manage workspace creation and invitation, it is recommended that the same application also manage archive restoration. If the application needs to support workspace lifecycle management and workspace data integration, it is recommended that each workspace contain two data bridges—one for archive and one for data transactions.
Removing Old Workspaces
Over time, workspaces on the data bridge age, and may no longer be actively used by the human participants. It is also possible that all human members delete the workspace locally, leaving the GDB as the only member. In such cases, the presence of the workspace on the GDB may not be creating much load except during the archive process. Although users are encouraged to manage their workspaces actively, and to delete old or inactive workspaces, it is likely that this is not always be the case. The GDB manager should periodically look for old or inactive workspaces and remove them. It is recommended that the last archive taken of the workspace be preserved in a separate, secure location.
Backing Up the Archive File System
Although the data bridge stores multiple cycles of workspace data, it may be desirable to back up the file system on which the archives are stored. It is recommended that the backup process begin 12 hours apart from the start of the workspace archive process as part of a cycle to avoid running both at the same time. You should monitor both processes closely, however, to avoid overlapping. Under no circumstances should a backup agent be run against the directory containing the GDB account and workspace data while the GrooveEIS.exe process is running. This may result in irretrievable account or workspace corruption. If the GDB account and archive data are stored on the same volume, the backup agent should be limited by the file structure, or even better, by file type (.GSA).
The same limitations apply to virus scans that open or lock files. Scans of this type should never scan Groove account or workspace data.
Groove workspaces cannot be easily archived from the file system, but you can archive Groove workspaces using the Groove Data Bridge. You can have members of a workspace manually request the Groove Data Bridge to archive a workspace or you can establish a central application that manages workspace creation and ensures that each workspace is archived. There are many scaling and capacity issues that you must consider when you are planning an archiving system.
For more information, see the following resources: