Identifying typical deployment issues

It's fairly easy to identify scaling/performance issues with a Centrify Audit & Monitoring Service system that are typically a result of poor planning or deployment. Below are some of the most common deployment issues.

Large spool files on audited systems

A healthy audit and monitoring service system should be able to keep up the pace with users’ audited activity. When the system cannot keep up the pace, it means either the user's audited activity is generating too much data (such as when a user runs the cat command on a very large file) or the audit and monitoring service system components (such as collectors and databases) are not able to process and store the generated data fast enough. In such cases, you'll typically see large spool files on the audited systems that often need more time to get despooled completely.

Constant high CPU on collector/SQL Server

It's perfectly normal to see high CPU activity on collector and SQL Server machines during peak hours as this is the time when data is continuously getting pumped from the audited system to the collector and finally to the database. However, when you see similar activity during off-peak hours (especially when it doesn’t correspond to the number of active users in that environment at that time), it indicates that the audit and monitoring service system is getting backlogged.

Low despool rate

The despool rate largely depends on the type of data being captured, the speed of network/latency between audited system and collector, the speed of the network/latency between the collector and the database, and ultimately the performance of the SQL Server itself. Because of these factors, there’s no ideal value or range for the despool rate. However, you should not see a despool rate that’s significantly lower than the rate of data capture, especially when there are no known issues related to network speed or SQL Server performance.

False “Agent disconnected” alerts

Each Agent periodically sends its heartbeat to the database (by way of collector) and the Audit Manager console relies on this ping to determine if the Agent is connected or not. If there are deployment issues with audit and monitoring service, the Agent heartbeat may not get registered even if the Agent is online, and this may raise false alarms as the system will be shown as disconnected in Audit Manager Console. Whenever you see such contradicting information regarding the status of the audited system, it typically is indicative of underlying deployment issues.

Too many SQL Server tasks in queue

SQL Server has a fixed set of worker threads that it can use to perform its job and this number depends on the CPU architecture, such as 32-bit or 64-bit, and the total number of CPUs on the SQL Server. If SQL Server is given more tasks than it can finish, they’ll end up waiting at the bottom of this queue, thus consuming memory and degrading overall system performance. Always consult the DBA to confirm if the environment is consistently showing a lot of tasks in the worker queue; this can indicate that the workload is too much for this SQL Server to handle. For more information, see the Microsoft article https://msdn.microsoft.com/en-us/library/ms177526(v=sql.105).aspx.