Using Application Surveillance Component

Overview

The application surveillance component can be used to diagnose problems with applications that occasionally respond slowly, and is mostly useful for complex setups where e.g. you make remote calls across multiple different servers – something that is usually quite hard to diagnose.

Essentially, the Surveillance component keeps track of how long time a task (such as the invocation of an EJB method) is taking, and if the time exceeds a configured number of seconds (default 60) then it will log a warning, and it can then either dump the JVM stack traces of all running threads on the server experiencing the problem, or it can dump JVM stack traces on all servers connected to PortalProtect.

Note that this feature should rarely be used in production, since dumping JVM information is an expensive operation on some Java virtual machines (generally the newer the JVM, the cheaper it is to do).

There are 2 ways of using the surveillance components, either you can call start() and stop() methods explicitly, or you can get a “Proxy” wrapper for an object, where all method calls on that object causes start() and stop() to be invoked automatically, thus tracing all method calls on the object.

All tasknames which are passed to start/stop method will show up in the statistics for the agent they are running under, so you can see how long time they on average take to complete.

Configuration

There are a couple of configuration entries to enable this functionality – its explained in more detail in the configuration reference section of this document, but essentially there are 2 properties that can be set for each agent which affects this:

surveillance.warningseconds specifies the number of seconds that passes before a task is considered to be hanging, and an action can be taken.

Surveillance.action.xxx where xxx is a number between 1 and 512 specifies which action to take for which task. The value of the property is in the format <taskname>=<action> where taskname is the name of the task being executed (same name as passed to the start() method, or if the proxy invocation method is used, it will be the classname of the object passed to the getProxyFor() method. As for the taskname, wildcard can be used, and they will be matched in the order they are specified in (surveillance.action.1 to surveillance.action.512). The action can be either “none”, “dumpjvm” or “dumpjvmall” – where “none” will not do anything, “dumpjvm” will attempt to log a dump of the local JVM (stacktrace of all running threads), and dumpjvmall will broadcast a command to all JVMs connected to PortalProtect config server, and ask them to dump their JVM – thus making it possible to diagnose problems affecting multiple servers.

Examples of configuration parameters could be:

surveillance.warningseconds=60
surveillance.action.1=ejb*=dumpjvmall
surveillance.action.2.prod*=dumpjvm
surveillance.action.512.*=none  

Examples

Example of calling directly:

import dk.itp.portalprotect.surveillance.Surveillance;
...
try {
   Surveillance.start("calling EJB server", "method");
   someserver.doStuffThatMightBeSlow();
} finally {
   Surveillance.stop();
}


Note that it is important to use try{…} finally{} blocks to make sure that stop() is always called no matter what happens.

Example of calling via Proxy wrapper:

import dk.itp.portalprotect.surveillance.Surveillance;
...
Iservice service = getServiceFromSomewhere();
service = (Iservice)Surveillance.getProxyFor(service);
 
service.doStuffThatMightBeSlow();

© Ceptor ApS. All Rights Reserved.