måndag 28 maj 2012

Java Profiling with Visual VM (and a little Spring)

When aiming for performance improvements a profiler or similar tool is crucial. The profiler is used to identify the biggest bottle necks in you application by indicating where your program spends most time it. Trying to perform software optimization without a profiler is a safe way to waste your time and achieve far from optimal results.

One profiler for Java that I've tried out the last time is the profiler that is part of Visual VM which is a nice tool shipped with the standard JDK since 1.6. Visual VM can be used for a multitude of other things than profiling and if you haven't yet tried it you really should. It can be connected to both local and remote JVMs using jstat and JMX. I'm not going to dive into the details using Visual VM in general there are great resources for that out there.

What I want to show here is an example of a small profiling session. For this simple example a prime generator that writes it's results to disk will be the subject. I'll be using some Spring configuration and dependency injection since I wanted to try it out. That is completely irrelevant to the profiling though.

Step 1, a naive implementation

Below is the complete program including the spring configuration.
// PrimeMain.java
package primegenerator;

import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PrimeMain {

public static void main(String[] args) {
   ClassPathXmlApplicationContext beanFactory = new ClassPathXmlApplicationContext(
                       "PrimeGeneratorApplication.xml");
   PrimeGenerator generator = (PrimeGenerator) beanFactory.getBean("generator");
   generator.generatePrimes(2000000);
   }
}
// FileDestination.java
package primegenerator;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;

public class FileDestination implements IPrimeDestination {

   private final String fileName;

   public FileDestination(String fileName) {
      this.fileName = fileName;
   }

   @Override
   public void put(long prime) {
      Writer out = null;
      try {
         out = new OutputStreamWriter(new FileOutputStream(fileName, true));
         out.write(Long.toString(prime) + "\n");
      } catch (IOException e) {
         e.printStackTrace();
      } finally {
         try {
            if(out != null) {
               out.close();
            }
         } catch (IOException e) {
            e.printStackTrace();
         }
      }
   }
}
// IPrimeDestination.java
package primegenerator;

public interface IPrimeDestination {
   public void put(long prime);
}
// PrimeGenerator.java
package primegenerator;

import org.springframework.beans.factory.annotation.Autowired;

public class PrimeGenerator {

   @Autowired
   private IPrimeDestination destination;

   private boolean isPrime(long number) {
      for(long i = 2; i<Math.sqrt(number); i++) {
         if(number % i == 0) {
            return false;
         }
      }

      return true;
   }

   public void generatePrimes(long upTo) {
      long startTime = System.currentTimeMillis();

      for(long i = 3; i < upTo; i++) {
         if(isPrime(i)) {
            destination.put(i);
         }
      }

      System.out.println("Done generating primes, time = " +
            (System.currentTimeMillis() - startTime) + " ms");
   }
}
// PrimeMain.java
package primegenerator;

import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PrimeMain {

   public static void main(String[] args) {
      ClassPathXmlApplicationContext beanFactory = new ClassPathXmlApplicationContext(
            "PrimeGeneratorApplication.xml");

      PrimeGenerator generator = (PrimeGenerator) beanFactory.getBean("generator");
      generator.generatePrimes(2000000);
   }
}
// ..\PrimeGeneratorApplication.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:aop="http://www.springframework.org/schema/aop"
      xmlns:context="http://www.springframework.org/schema/context"
      xsi:schemaLocation="http://www.springframework.org/schema/beans
                     http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
                     http://www.springframework.org/schema/context
                     http://www.springframework.org/schema/context/spring-context-3.0.xsd">

<context:component-scan base-package="primegenerator" />

<bean id="generator" class="primegenerator.PrimeGenerator" />
<bean id="destination" class="primegenerator.FileDestination">
   <constructor-arg value="primes.txt" />
</bean>

</beans>
Generating all primes under 200000 takes about 12500 milliseconds with above application. To profile the application run kit with an upper limit high enough to allow for some profiling. In the left window your java process will show up and you can choose to connect to it. Then, using the sampler, and selecting CPU will result in a screen like that below.


As seen most time is used handling the file to which the primes are written (~69%) while about 30 % of the time is actually used to generate the primes. More specifically it seems like most of the time is spent on closing the file (which will trigger a flush to disk).
To get the bast results from our optimizations we should hence focus on improving the file handling in the first stage.

Step 2, improved file handling

Let's change the file handling to only open the file when the program starts and close it when the program finishes. To do that we will update the FileDestination class as well as the spring configuration file and the main class to allow Spring to manage the life cycle of  FileDestination so that the file is opened before the first write and closed when the program exits.

Updates as follows.

// FileDestination.java
package primegenerator;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;

public class FileDestination implements IPrimeDestination {

   private final String fileName;
   private Writer out;

   public FileDestination(String fileName) {
      this.fileName = fileName;
   }
   
   public void initBean() {
      out = null;
      try {
         out = new OutputStreamWriter(new FileOutputStream(fileName, true));
      } catch (FileNotFoundException e) {
         e.printStackTrace();
      }

      System.out.println("Init called");
   }
   
   public void destroyBean() {

      try {
         if(out != null) {
            out.close();
         }
      } catch (IOException e) {
         e.printStackTrace();
      }

      System.out.println("Destroy called");         
   }

   @Override
   public void put(long prime) {
      try {
         out.write(Long.toString(prime) + "\n");
      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}
// PrimeMain.java
package primegenerator;

import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PrimeMain {

   public static void main(String[] args) {
      ClassPathXmlApplicationContext beanFactory = new ClassPathXmlApplicationContext(
            "PrimeGeneratorApplication.xml");
      beanFactory.registerShutdownHook();
      
      PrimeGenerator generator = (PrimeGenerator) beanFactory.getBean("generator");
      generator.generatePrimes(200000);
   }
}
// ..\PrimeGeneratorApplication.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:aop="http://www.springframework.org/schema/aop"
      xmlns:context="http://www.springframework.org/schema/context"
      xsi:schemaLocation="http://www.springframework.org/schema/beans
                     http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
                     http://www.springframework.org/schema/context
                     http://www.springframework.org/schema/context/spring-context-3.0.xsd"
      default-init-method="initBean"
      default-destroy-method="destroyBean">

<context:component-scan base-package="primegenerator" />

<bean id="generator" class="primegenerator.PrimeGenerator" />
<bean id="destination" class="primegenerator.FileDestination">
   <constructor-arg value="primes.txt" />
</bean>

</beans>

Through the default-init-method and default-destroy-method in the application context we have now defined a method to be called on all beans when they have been created and initialized and one method to be called when the application context is destroyed. To make the application context aware of when the program is ended PrimeMain has been updated to register the application context with the JVM by calling the registerShutdownHook method.


With this update in place the time taken for all primes under 20000 is now down to 1870 ms. A dramatic speedup indeed! And somewhat surprising maybe since 1870 ms is far less than 30% (which was the time taken by the actual prime generation) of the time in Step 1. We may have reaped additional benefits from out improved I/O that were not obvious before. Examples of such may be improved caching, less context switching for I/O, etc.

Profiling the new application shows the following.



Indeed, the file handling is hardly noticeable now. If we want to improve the performance of this application further we should focus on the algorithm.

Step 3, improved algorithm

Lets switch that inefficient algorithm to a better one. This is a slightly tweaked version of the Sieve of Eratosthenes.

// PrimeGenerator.java
package primegenerator;

import org.springframework.beans.factory.annotation.Autowired;

public class PrimeGenerator {

   @Autowired
   private IPrimeDestination destination;

   public void generatePrimes(int upTo) {
      long startTime = System.currentTimeMillis();

      int N = upTo;

        // assume 2 and all odd integers >= 3 are prime
        boolean[] isPrime = new boolean[N + 1];
        isPrime[2] = true;
        for (int i = 3; i <= N; i+=2) {
            isPrime[i] = true;
        }

        // mark non-primes <= N using Sieve of Eratosthenes
        for (int i = 2; i*i <= N; i++) {

            // if i is prime, then mark multiples of i as nonprime
            // suffices to consider multiples i, i+1, ..., N/i
            if (isPrime[i]) {
                for (int j = i; i*j <= N; j++) {
                    isPrime[i*j] = false;
                }
            }
        }

        // Output primes
        for (int i = 2; i <= N; i++) {
            if (isPrime[i]) destination.put(i);
        }
        
      System.out.println("Done generating primes, time = " +
      (System.currentTimeMillis() - startTime) + " ms");
   }   
}

Now the runtime for all primes under 200000 is down to 120 ms! That's less than 1% of the original runtime.

The profiler shows us the following.


Now I/O is once again the dominating factor. If we want to further improve the performance we should therefor focus on improving that.

I will settle with the current improvements and hope that this was good illustration of how to work with optimizations. Measure -> Tweak -> Measure -> Tweak -> ... Until good enough. Don't be tempted to perform optimizations on code that have not been proven to be a bottle neck. Code optimizations will often result in code that is less clear and harder to maintain.