Planning an audit and monitoring service deployment

 

System Integrators often rely on the number of audited systems to estimate the hardware requirements and to come up with the overall strategy of audit and monitoring service deployment. For example, an environment with 100 audited systems may look like a small setup and one may incorrectly conclude that it's a small scale deployment that won’t require a powerful hardware to support it. Once setup however, such assumptions may turn it into a deployment that seldom scales and often produces poor performance, both when capturing the audit activity and when querying the already captured audit data.

Below are a few factors that you must consider before making any deployment decisions,

SQL Server

Out of all the components in the audit and monitoring service ecosystem, SQL is the most heavyweight and will share most of the burden when it comes to workload. Using a properly equipped and optimally configured SQL Server is very important. The version and edition of SQL Server being used (such as Express or Standard or Enterprise) or the type of machine being used to host the SQL Server (such as a virtual or physical machine) can noticeably improve the overall performance. On the contrary, a poorly configured SQL Server may produce a very poor performance no matter how powerful the underlying hardware is.

Number of concurrently audited users

Relying on the number of audited systems is not always a good assumption. For example, an environment may have just a handful of systems but may have a large number of users logging into these systems on a daily basis. A jumpbox scenario such as Citrix XenApp Server is a perfect example. When planning, you should plan for the number of concurrently audited users, not just the total number of audited systems. User activity patterns and behaviors also play an important role in overall performance and storage requirements. For example, the audited data will be much smaller in an environment where no logins are expected most of the time as compared to a network control systems wherein audited users are logging on and logging out throughout the day. The sizing guidelines specified in the later section of this whitepaper have all been based on workload simulations for the exact same reason.

What needs to be captured

What's being captured controls the overall workload on various components. Capturing video is more expensive than not doing so in terms of disk usage and load on collectors and SQL Server. Similarly, capturing interactive sessions is always going to produce more audited data when compared to capturing a handful of commands thus putting system under more pressure. Capturing large quantities of data has another side effect; it slows down database backups and other maintenance processes which is not always liked by the database administrators.

Who needs to be audited

Who is being audited is equally important. Under default settings, the audit and monitoring service audits everything and everybody and this may not be a practical solution in many large environments. In production environments, it's very common to see processes or scheduled tasks that periodically monitor UNIX/Linux or Windows systems for their health by remotely executing certain commands (System Monitoring and Management software, such as BMC Patrol that periodically runs vmstat or iostat command on each of the UNIX/Linux systems is a good example). Activities like these needlessly generate thousands of Audited sessions on a daily basis and in many cases create tremendous load on an entire audit and monitoring service system.

UNIX/Linux and Windows

The type of system being audited influences the amount of data that will be captured from that system and the overall CPU load on collectors. For example, a Windows audited system almost always generates more data per day compared to a UNIX audited system with comparable number of concurrent users. This also means that an environment with Windows audited systems will most likely be more demanding (in terms of hardware resources) compared to an environment with same number of UNIX/Linux audited systems.

Query performance

Query performance is one factor that often gets ignored. Capturing user activity and storing it in the database in a reasonable time is important. What's also important is to be able to search these records in a predictable time frame irrespective of the combined size and number of all the databases in the Centrify Audit & Monitoring Service system.

Audit data retention policy

Audit data retention policy dictates how many days of data should be online and readily available for querying purpose and this number varies from one enterprise to another. Pay special attention to data retention policy requirements in the target environment. A longer retention policy typically results in large databases which also suffer from poor query performance if databases are not well maintained. On the contrary, too frequent rotation will also result in poor query performance if you keep too many inactive databases attached to the audit store.

System overheads

Keep in mind the overhead that is caused by the Centrify Audit & Monitoring Service system itself; there are a number of background jobs carried out by various components of the audit and monitoring service system, including the audited systems themselves, collectors, and the Audit Management Server. This includes activities such as sending the audited system's heartbeat to the database (by way of collector), sending the collector's heartbeat to the database, processing active sessions list, processing and synchronizing information of audit roles with Active Directory Group criteria, calculating effective size of audited sessions, storing license usage information in Active Directory, and many more.

Latency

Geography/Network topology play an important role as it introduces latency. For example, an environment may well have just a handful of audited systems but if they're not geographically co-located, you may see delays in getting the audited user activity to its final destination (the database server); the same may happen if audited systems are not connected to collectors by a network link with reasonable bandwidth. A general rule of thumb is to group together audited systems, collectors and databases that are connected by a high speed network using the concept of audit store.