Software Actually: 2012

fredag 28 december 2012

Five things that make Java painful (and some remedies)

I figured I'd present a list of the five things that disturbs me the most with Java as a language. Just give me nice standardized solutions to the following issues and I promise not to ask for more... ;-)

I will also present some third party libraries that will remedy some of the pain that these problems cause.

No closures or lambdas

No one on earth has missed the fact that clojures and lambdas are _the_ big thing these days (and have been for quite a while, even C++ has them in the latest version of the standard). Without them many of the nice "functional" (although one can argue about how functional they are) constructs that are present in quite a few modern languages (and old, Lisp for example) are ugly or impossible to implement.
I'm really not asking for that much more than what is already in the language today in the form of anonymous classes toay. It's just that the code gets so damn bloated when using these anonymous classes that you want to rip your eyes out when you look at it a week later.

Hopefully Java 8 will include closures through project lambda and hopefully it will be delivered on time 2013 as planned and hopefully it will be adopted as the industry java standard version really soon thereafter... Too many hopefully in that sentence to be really truthworthy.

Remedy until then

Check out JPropel. It is library that (among other things) can transform static functions into anonymous classes that implements a Function iterface by a simple annotation. This allows you to pass them around as objects. It ain't closures but it reduceds some of the boilerplate needed to pass function objects around. This actually mimics the way lambdas are implemented in Scala quite closely.

Setters and getters

What a pain to write all those setters and getters. Ok, they can be generated by your IDE... But what a pain to read all those setters and getters! Some claim that getters and definitely setters are a design smell. I can agree in some situations but in situations they may actually be required or at least useful. But I don't want to see all that noise they should just be there. Again, check out Scala. Getters can be automatically generated from the constructor parameters or we could just use public members and later replace these with methods as needed without breaking the interface.

As far as I know there is no short term remedy planned for this in Java.

Possible remedy

Project lombok has a couple of annotations that let you annotate the classes and fields for which setters and/or getters should be generated. Based on this lombok will then generate the necessary code. Check out the @Data annotation for example.

Painful declaration of variables

Declaring variables in java is a pain, especially when generics are involved. Up until Java 7 it was (is) a real pain with a previously unseen level of verbosity. Look at this:
final Map<String, List<Integer>> cpuLoad = new HashMap<String, List<Integer>>()

In Java 7 this was somewhat improved:

final Map<String, List<Integer>> cpuLoad = new HashMap<>()

So, what do I actually want? Pretty much what's in Scala today actually. A more powerful type inference system. We all know the compiler knows the types. Why can't it just infere them for us and let us focus on other things?
The other thing that annoys me about java reference declarations is the extra final keyword needed to make a variable immutable. I'm a big fan of immutable data. It makes the code easier to reason about and more maintainable. But even I cannot force myself to make the input parameters and local variables in a functions final because of all the clutter.
Also here I think Scala is a language to look at. Mutable variables are declared using the var keyword. Immutable "variables" are declared using the val keyword. The exact same number of letters!

A pain killer

At least when using Java <= 1.6 is the Guava library that removes some of the boilerplate in declaring generic collections. To my knowledge there is no ongoing effort to make the type inference system in Java any better than it currently is. Guava is worth investigating for other nice features as well!

No persistent collections

No persitent collections exist in the standard java API. By persistent I mean collections that are immutable but supports efficient "modification" by returning a copy of itself with the applied changes. Since structural sharing is used most of the data stored in the new and old can often be shared between the two collections hence unnecessary copying is reduced.
The persistent collections bring with them all the good of immutable data (easier to reason about, thread safe, ...). They are used by default in JVM languages such as Clojure and Scala.

Librabries do exist that provide these type of collections for Java as well, and this may not be considered a big deal by some. The fact that they are not part of the standard library will limit their use though and most of the Java code written today does not utilize these lovely collections (I dare say) even if, in most cases, it would be benificial.

I don't know of any initiative to introduce a standardized alternative for persistent collections.

An option

The pcollections library is a nice player. Play with it!

No for comprehension

The lack of for comprehensions (as implemented in Python or Scala for example) is a shame. Although the for comprehension is just a syntactic sugar built with higher order functions such as map and filter it's a very sweet sugar that can lead to some beautiful solutions. It also makes it possible to introduce constructs close to those of Microsofts .NET LINQ to dig for data in collections. I don't know if there will ever be anything similar in Java but the introdution of lambdas and higher order functions will allow java to move one step closer to it.

An alternative

JPropel (light) has a really nice alternative here as well. Check it out!

Concluding thoughts

So in the end I think this was mostly a rant showing some of my frustration on the (non moving) state of Java. To me it seems like the biggest problem in the java development process is that there's to much focus on keeping things backward compatible. Of course you want your old Java 2 code to compile and run on a Java 8 JVM but do we really have to retro fit the massive API to make use of new features in the language? Can't it just rest in peace? The collections API for example, how about leaving that behind (maintenance should be minimal given it's maturity) and go for a new more up to date collections API?

In the case were interwork between old and new collections (or other APIs) are actually needed some adapter code could be introduced.

So, seems like there are a bunch of other languages running on the JVM that already include most of the stuff I'm asking for. Why not use them? Well, maybe I would in my day to day job but language mixes (no, I'm currently not on a green field project) often come with their own problems. Languages such as Scala also come with their own weak points (not always directly related to the language). IDE support that still lags behind compared to Java (although improving), slow compilation, some overly academic features, ...

Still, I would happily try out an alternative JVM language in a real world project! Given todays Java development pace it might just be the only way out.

söndag 7 oktober 2012

Using Sonar to monitor a JAspect-based application

I started a small project a while ago to learn some more about AspectJ and aspect oriented programming in Java. Some days ago Recently I decided to try to apply Sonar on it to get some nice quality measures such as code complexity, test coverage and static code analysis to detect potential bugs.

This is a small writeup of my experiences from that experiment.

Installation

Installation was quite straight forward. I fetched the latest version Sonarsource and followed the installation instructions. I've opted not to install Sonar as a background service yet but instead start it manually whenever I need it. If I later find that I use it alot then installing it as a service should be as simple as running a batch script.

To run Sonar you must install a SQL database in which the analysis results are stored. I opted for the community version of MySQL since there was an example on the sonarsource web site for that particular DB. As I wanted to get started quickly that was my choice. In hindsight I don't think it would have been any more complicated to use another DB.

Installation of MySQL was equally straight forward. I just ran the supplied installer using the default installation.

To get Sonar working with MySQL was a simple two step procedure described below. I will do it only briefly as the installation instructions provided on the Sonar website are very good.

Create DB and sonar user in MySQL

Doing this was as simple as running the following lines (found through the installation instructions on the Sonar web page) in the mysql command shell (mysql -u root -p):


CREATE DATABASE sonar CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE USER 'sonar' IDENTIFIED BY 'sonar';

GRANT ALL ON sonar.* TO 'sonar'@'%' IDENTIFIED BY 'sonar';

GRANT ALL ON sonar.* TO 'sonar'@'localhost' IDENTIFIED BY 'sonar';

FLUSH PRIVILEGES;

Configure Sonar

The sonar configuration file must be updated with parameters to connect to the DB and port and address conlfiguration. It is found under <SONAR INSTALLATION DIR>/conf/sonar.properties.

# Listen host/port and context path (for example / or /sonar). Default values are 0.0.0.0:9000/.

sonar.web.host: 0.0.0.0

sonar.web.port: 9000

sonar.web.context: /

#----- Credentials

# Permissions to create tables and indexes must be granted to JDBC user.

# The schema must be created first.

sonar.jdbc.username: sonar

sonar.jdbc.password: sonar

#----- MySQL 5.x/6.x

# Comment the embedded database and uncomment the following line to use MySQL

sonar.jdbc.url:jdbc:mysql://localhost:3306/sonar?useUnicode=true&characterEncoding=utf8&rewriteBatchedStatements=true

# Optional properties

sonar.jdbc.driverClassName: com.mysql.jdbc.Driver

Setting up the project for sonar analyze

My initial plan was to use the Sonar Runner which is the recommended way to run a sonar analyze according to the Sonar website. Doing this proved to be easy. I just followed the instructions found here. Running the analyze was equally easy, see the instructions here. After running the analyze the results were browsable at the URL and port specified above (e.g. http://localhost:9000).

There was one cavaet using the Sonar runner though. It doesn't execute any code and can hence not be used for execution of automated tests and test coverage analysis. For this another method had to be used. The recommended approach is to use targets adjusted for your build system. Both Maven plugins and Ant tasks are available to run the analyze. For this particular project I use Maven which made the choice obvious.

Using a maven plugin to analyze the project

I'll start by dropping the entire POM here and will then discuss the details of it.


<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://maven.apache.org/POM/4.0.0"
   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
   <modelVersion>4.0.0</modelVersion>
   <groupId>datalogger</groupId>
   <artifactId>DataLogger</artifactId>
   <packaging>jar</packaging>
   <version>1.0-SNAPSHOT</version>
   <name>DataLogger</name>
   <url>http://maven.apache.org</url>
   <build>
      <plugins>
         <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>2.7.2</version>
            <configuration>
               <test>**/*.java</test>
            </configuration>
         </plugin>
         <plugin>
            <groupId>org.jacoco</groupId>
            <artifactId>jacoco-maven-plugin</artifactId>
            <version>0.5.10.201208310627</version>
            <executions>
               <execution>
                  <id>jacoco-initialize</id>
                  <goals>
                     <goal>prepare-agent</goal>
                  </goals>
               </execution>
            </executions>
         </plugin>
         <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>aspectj-maven-plugin</artifactId>
            <version>1.4</version>
            <executions>
               <execution>
                  <goals>
                     <goal>compile</goal>
                     <!-- use this goal to weave all your main classes -->
                     <goal>test-compile</goal>
                     <!-- use this goal to weave all your test classes -->
                  </goals>
               </execution>
            </executions>
            <configuration>
               <complianceLevel>1.6</complianceLevel>
            </configuration>
         </plugin>
         <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>sonar-maven-plugin</artifactId>
            <version>2.0</version>
         </plugin>
         <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>1.7</version>
            <executions>
               <execution>
                  <phase>package</phase>
                  <goals>
                     <goal>shade</goal>
                  </goals>
                  <configuration>
                     <artifactSet>
                        <excludes>
                           <exclude>junit:junit</exclude>
                           <exclude>org.aspectj:aspectjrt</exclude>
                           <exclude>org.slf4j:slf4j-simple</exclude>
                        </excludes>
                     </artifactSet>
                  </configuration>
               </execution>
            </executions>
         </plugin>
      </plugins>
   </build>

   <properties>
      <coverage.reports.dir>${basedir}/target/coverage-reports</coverage.reports.dir>
      <sonar.core.codeCoveragePlugin>jacoco</sonar.core.codeCoveragePlugin>
      <sonar.dynamicAnalysis>reuseReports</sonar.dynamicAnalysis>
   </properties>
   <dependencies>
      <dependency>
         <groupId>org.aspectj</groupId>
         <artifactId>aspectjrt</artifactId>
         <version>1.6.11</version>
      </dependency>
      <dependency>
         <groupId>junit</groupId>
         <artifactId>junit</artifactId>
         <version>4.10</version>
      </dependency>
      <dependency>
         <groupId>com.google.code.gson</groupId>
         <artifactId>gson</artifactId>
         <version>1.7.1</version>
      </dependency>
      <dependency>
         <groupId>org.slf4j</groupId>
         <artifactId>slf4j-api</artifactId>
         <version>1.6.4</version>
      </dependency>
      <dependency>
         <groupId>org.slf4j</groupId>
         <artifactId>slf4j-simple</artifactId>
         <version>1.6.4</version>
      </dependency>
   </dependencies>
</project>

There are four plugins of interest here. The surefire plugin which is used to execute the test cases. The jacoco plugin which is used to instrument the classes for coverage measurement (using a java agent). The sonar plugin which performs the actual analyse and the aspectj plugin which performs compile time weaving of my aspects.
I initially had some problems getting any coverage reports since the code I wanted to test was actually the aspects and they weren't woven into the code the way I wanted. The problem was that I never managed to get the load time weaving that I initially used (using a java agent) to work together with the code coverage agent of jacoco. Switching to compile time veawing using the AspectJ plugin did the trick.
Also, to get the AspectJ plugin to work properly I had to explicitly set the compliance level to 1.6.

With the above configuration and pom I now have full blown quality measures of my project without having to set up and configure every tool individually.

onsdag 27 juni 2012

Three flavors of Brainfuck in Clojure

I was inspired to implement a Brainfuck interpretator in Clojure after a workshop at work were we started doing the same in Haskell. I ended up doing a couple of implementations on different themes to explore their different characteristics.

Approach one

This was my first attempt, it features no global variables and a different approach to parsing the program. Instead of managing the parsing of loops ourselves by some loop depth counter the parsing has been left to the built in function read-string. It takes a string and makes whatever Clojure data structures it can from that string. Since Brainfuck has some similarities to a nested Clojure vector, enough to treat it like one. We just have to make sure that some special characters in Clojure are escaped properly before applying read-string.
Overall the solution feels a bit hacky. The workings of the parse-bf isn't obvious although quite compact. exec-bf-internal which is the the brain of the interpreter is quite large and contains quite a bit of duplication. Overall it was a nice try though. :-)

;; brainfuck.clj
(ns brainfuck
  (:use clojure.test)
  (:use midje.sweet))

; Dynamic meta data needed since 1.3 in order to override with dynamic binding.
; Only used for test really
(defn ^:dynamic read-input [] (.read *in*))

(defn str= [a b] (= (str a) (str b)))
(defn end? [elem] (nil? elem))
(defn loop? [elem] (sequential? elem))
(defn plus? [elem] (str= elem "+"))
(defn minus? [elem] (str= elem "-"))
(defn left? [elem] (str= elem "<"))
(defn right? [elem] (str= elem ">"))
(defn put? [elem] (str= elem "."))
(defn get? [elem] (str= elem ","))

(defn- exec-bf-internal [program the-memory the-mem-pos]
  (loop [remain program
         memory the-memory
         mem-pos the-mem-pos]
    (let [elem (first remain)]
      (cond
        (end? elem) {:mem memory :pos mem-pos}
        (loop? elem) (if (> (nth memory mem-pos) 0)
                       ; Enter loop body
                       (let [{:keys [mem pos]} (exec-bf-internal elem memory mem-pos)]
                         (recur remain mem pos))
                       ; Skip loop body
                       (recur (rest remain) memory mem-pos))                   
        (plus? elem) (recur (rest remain)
                            (assoc memory mem-pos (inc (nth memory mem-pos)))
                            mem-pos)
        (minus? elem) (recur (rest remain)
                             (assoc memory mem-pos (dec (nth memory mem-pos)))
                             mem-pos)
        (right? elem) (recur (rest remain)
                             memory
                             (inc mem-pos))
        (left? elem) (recur (rest remain)
                            memory
                            (dec mem-pos))
        (get? elem) (recur (rest remain)
                           (assoc memory mem-pos (int (read-input)))
                           mem-pos)
        (put? elem) (do
                      (print (char (nth memory mem-pos)))
                      (recur (rest remain)
                             memory
                             mem-pos))
        
        ; Not a symbol which we recognize, just ignore it to allow comments, etc.
        :default (recur (rest remain)
                        memory
                        mem-pos)))))

(defn- escape-conflicts [string]
  (clojure.string/replace string "," "\",\""))
    
(defn- parse-bf [program]
  ; Make use of the Clojure like syntax and create a vector (nested if loops exist)
  ; containing the program by using the built in read-string
  ; This turned out to be quite hacky as Clojure considers "," as air so they had to
  ; be escaped before transforming the program into a Clojure vector
  (read-string (escape-conflicts (clojure.string/join " " (seq (str "[" program "]"))))))

(defn exec-bf [program]
  (exec-bf-internal (parse-bf program) (vec (replicate 10000 0)) 0)
  1) ; Avoid returning the memory in the wrapping function

Approach two

This approach is more similar to most simple imperative solutions that I've seen. It uses a couple of global atoms to hold state such as memory and program position. This makes the overall program structure a bit cleaner since there's no need to pass the entire state around. On the other hand it makes the individual functions harder to test. As always global variables become worse and worse as programs grow (compare to instance variables in large classes).
This implementation also keeps track of loop state on its own. It is also quite a bit slower than the first one.

(ns brainfuck2
  (:use clojure.test)
  (:use midje.sweet))

; Dynamic meta data needed since 1.3 in order to override with dynamic binding.
; Only used for test really
(defn ^:dynamic read-input [& args] (.read *in*))

(def ^:dynamic *program*)

(def memory (atom nil))
(def mem-pos (atom nil))
(def prog-pos (atom nil))

(defn- fetch-instruction []
  (nth *program* @prog-pos))

(defn- current-value []
  (let [pos @mem-pos
        val (nth @memory @mem-pos)]
    val))

(defn- enter-loop? []
  (not (zero? (current-value))))

(defn- exit-loop? []
  (zero? (current-value)))

(defn- doto-mem [op]
  (swap! memory assoc @mem-pos (op (current-value))))

(defn- doto-mem-pos [op]
  (swap! mem-pos op))

(defn- doto-prog-pos [op]
  (swap! prog-pos op))

(defn- put-char []
  (print (char (current-value))))

(defn- get-char []
  (doto-mem #(int (read-input %))))

; Used for searching block boundaries both forward and backwards
(defn- goto-block-bound [inc-fn dec-fn]
  (loop [level 1]
    (when (> level 0)
      (do
        (doto-prog-pos inc-fn)
        (recur (condp = (fetch-instruction)
                 \[ (inc-fn level)
                 \] (dec-fn level)
                 level))))))

(defn- goto-block-start []
  (goto-block-bound dec inc))

(defn- goto-block-end []
  (goto-block-bound inc dec))

(defn- execute-instruction []
  (case (fetch-instruction)
    \[ (when (not (enter-loop?))
         (goto-block-end))
    \] (when (not (exit-loop?))
           (goto-block-start))
    \+ (doto-mem inc)
    \- (doto-mem dec)
    \> (doto-mem-pos inc)
    \< (doto-mem-pos dec)
    \. (put-char)
    \, (get-char)
    :default)) ; Ignore all other symbols to allow comments, etc.

(defn- exec-bf-internal []
  (when (< @prog-pos (count *program*))
    (do 
      (execute-instruction)
      (doto-prog-pos inc)
      (recur))))

(defn- reset-state []
  (reset! memory (vec (replicate 10000 0)))
  (reset! mem-pos 0)
  (reset! prog-pos 0))

(defn exec-bf [program]
  (reset-state)
  (binding [*program* (seq program)] (exec-bf-internal)))

Approach 3

The biggest difference between this one and the previous is that this one is no longer an interpreter for Brainfuck but instead a compiler. In the first pass it compiles the program into a number of anonymous Clojure functions that are then executed in the second pass.

This allows for optimization's such as the simple ones found in compile-arithm and compile-mem-step where multiple consecutive arithmetic and memory stepping operations are merged into one. This is by far the fastest of the three implementations shown here.

;; brainfuck5.clj

(ns brainfuck5
  (:use clojure.test)
  (:use midje.sweet))

(defn ^:dynamic read-input [] (.read *in*))

(def ^:dynamic *program*)

;(def state (atom nil))
(def prog-pos (atom nil))

(def memory (atom nil))
(def mem-pos (atom nil))

(defn- fetch-instruction []
  (get *program* @prog-pos))
  
(defn- step-program-counter []
  (swap! prog-pos inc))

(defn- current-value []
  (nth @memory @mem-pos))

(defn- doto-mem [op]
  (swap! memory assoc @mem-pos (op (current-value))))

(defn- doto-mem-pos [op]
  (swap! mem-pos op))

(defn- put-char []
  (print (char (current-value))))

(defn- get-char []
  (doto-mem (fn [_] (int (read-input)))))

(def compile-block)
(def execute-block)
(defn- compile-loop []
  (let [loop-block (compile-block)]
    (fn []
      (when (not (zero? (current-value)))
        (do
          (execute-block loop-block)
          (recur))))))

(defn- compile-arithm [start]
  (loop [steps start]
    (case (fetch-instruction)
      \+ (do
           (step-program-counter)
           (recur (inc steps)))
      \- (do
           (step-program-counter)
           (recur (dec steps)))
      (fn [] (doto-mem #(+ % steps))))))

(defn- compile-mem-step [start]
  (loop [steps start]
    (case (fetch-instruction)
      \> (do
           (step-program-counter)
           (recur (inc steps)))
      \< (do
           (step-program-counter)
           (recur (dec steps)))
      (fn [] (doto-mem-pos #(+ % steps))))))

(defn- compile-put []
  #(put-char))

(defn- compile-get []
  #(get-char))

(defn- compile-block []
  (loop [compiled []]
    (if-let [inst (fetch-instruction)]
      (do
        (step-program-counter)
        (case inst
          \[ (recur (conj compiled (compile-loop)))
          \] compiled
          \+ (recur (conj compiled (compile-arithm 1)))
          \- (recur (conj compiled (compile-arithm -1)))
          \> (recur (conj compiled (compile-mem-step 1)))
          \< (recur (conj compiled (compile-mem-step -1)))
          \. (recur (conj compiled (compile-put)))
          \, (recur (conj compiled (compile-get)))
          (recur compiled))) ; Ignore all other symbols to allow comments, etc.
      compiled))) 

(defn- execute-block [block]
  (doseq [instruction block]
    (instruction)))

(defn- exec-bf-internal []
  (let [prog (compile-block)]
    (execute-block prog)
    1))

(defn- reset-state []
  (reset! prog-pos 0)
  (reset! memory (vec (replicate 10000 0)))
  (reset! mem-pos 0))

(defn exec-bf [program]
  (reset-state)
  (binding [*program* (into [] (seq program))] (exec-bf-internal)))

Some tests

These are some tests written in Midje to verify the functionality of the above programs.

;; brainfuck5.clj

; Some tests
(def a-str (str "+++++++++++++++++++++++++++++++++"
                "+++++++++++++++++++++++++++++++++"
                "+++++++++++++++++++++++++++++++"))

(def hello-world (str "++++++++++[>+++++++>++++++++++>+++>+<<<<-]"
                      ">++.>+.+++++++..+++.>++.<<+++some comments in the middle"
                      "++++++++++++."
                      ">.+++.------.--------.>+.>."))

(defn dummy-read []
  \a)

(fact (with-out-str (exec-bf (str a-str "."))) => "a")
(fact (with-out-str (exec-bf (str a-str "+-."))) => "a")
(fact (with-out-str (exec-bf (str a-str ">" a-str "++" ".<."))) => "ca")
(fact (with-out-str (exec-bf "+++++[[>+++++++++++++++++++<-]]>++.")) => "a")
(fact (with-out-str (exec-bf hello-world)) => "Hello World!\n")
(fact (binding [read-input dummy-read] (with-out-str (exec-bf ",+."))) => "b")

måndag 28 maj 2012

Java Profiling with Visual VM (and a little Spring)

When aiming for performance improvements a profiler or similar tool is crucial. The profiler is used to identify the biggest bottle necks in you application by indicating where your program spends most time it. Trying to perform software optimization without a profiler is a safe way to waste your time and achieve far from optimal results.

One profiler for Java that I've tried out the last time is the profiler that is part of Visual VM which is a nice tool shipped with the standard JDK since 1.6. Visual VM can be used for a multitude of other things than profiling and if you haven't yet tried it you really should. It can be connected to both local and remote JVMs using jstat and JMX. I'm not going to dive into the details using Visual VM in general there are great resources for that out there.

What I want to show here is an example of a small profiling session. For this simple example a prime generator that writes it's results to disk will be the subject. I'll be using some Spring configuration and dependency injection since I wanted to try it out. That is completely irrelevant to the profiling though.

Step 1, a naive implementation

Below is the complete program including the spring configuration.

// PrimeMain.java
package primegenerator;

import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PrimeMain {

public static void main(String[] args) {
   ClassPathXmlApplicationContext beanFactory = new ClassPathXmlApplicationContext(
                       "PrimeGeneratorApplication.xml");
   PrimeGenerator generator = (PrimeGenerator) beanFactory.getBean("generator");
   generator.generatePrimes(2000000);
   }
}

// FileDestination.java
package primegenerator;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;

public class FileDestination implements IPrimeDestination {

   private final String fileName;

   public FileDestination(String fileName) {
      this.fileName = fileName;
   }

   @Override
   public void put(long prime) {
      Writer out = null;
      try {
         out = new OutputStreamWriter(new FileOutputStream(fileName, true));
         out.write(Long.toString(prime) + "\n");
      } catch (IOException e) {
         e.printStackTrace();
      } finally {
         try {
            if(out != null) {
               out.close();
            }
         } catch (IOException e) {
            e.printStackTrace();
         }
      }
   }
}

// IPrimeDestination.java
package primegenerator;

public interface IPrimeDestination {
   public void put(long prime);
}

// PrimeGenerator.java
package primegenerator;

import org.springframework.beans.factory.annotation.Autowired;

public class PrimeGenerator {

   @Autowired
   private IPrimeDestination destination;

   private boolean isPrime(long number) {
      for(long i = 2; i<Math.sqrt(number); i++) {
         if(number % i == 0) {
            return false;
         }
      }

      return true;
   }

   public void generatePrimes(long upTo) {
      long startTime = System.currentTimeMillis();

      for(long i = 3; i < upTo; i++) {
         if(isPrime(i)) {
            destination.put(i);
         }
      }

      System.out.println("Done generating primes, time = " +
            (System.currentTimeMillis() - startTime) + " ms");
   }
}

// PrimeMain.java
package primegenerator;

import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PrimeMain {

   public static void main(String[] args) {
      ClassPathXmlApplicationContext beanFactory = new ClassPathXmlApplicationContext(
            "PrimeGeneratorApplication.xml");

      PrimeGenerator generator = (PrimeGenerator) beanFactory.getBean("generator");
      generator.generatePrimes(2000000);
   }
}

// ..\PrimeGeneratorApplication.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:aop="http://www.springframework.org/schema/aop"
      xmlns:context="http://www.springframework.org/schema/context"
      xsi:schemaLocation="http://www.springframework.org/schema/beans
                     http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
                     http://www.springframework.org/schema/context
                     http://www.springframework.org/schema/context/spring-context-3.0.xsd">

<context:component-scan base-package="primegenerator" />

<bean id="generator" class="primegenerator.PrimeGenerator" />
<bean id="destination" class="primegenerator.FileDestination">
   <constructor-arg value="primes.txt" />
</bean>

</beans>

Generating all primes under 200000 takes about 12500 milliseconds with above application. To profile the application run kit with an upper limit high enough to allow for some profiling. In the left window your java process will show up and you can choose to connect to it. Then, using the sampler, and selecting CPU will result in a screen like that below.

As seen most time is used handling the file to which the primes are written (~69%) while about 30 % of the time is actually used to generate the primes. More specifically it seems like most of the time is spent on closing the file (which will trigger a flush to disk).
To get the bast results from our optimizations we should hence focus on improving the file handling in the first stage.

Step 2, improved file handling

Let's change the file handling to only open the file when the program starts and close it when the program finishes. To do that we will update the FileDestination class as well as the spring configuration file and the main class to allow Spring to manage the life cycle of FileDestination so that the file is opened before the first write and closed when the program exits.

Updates as follows.

// FileDestination.java
package primegenerator;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;

public class FileDestination implements IPrimeDestination {

   private final String fileName;
   private Writer out;

   public FileDestination(String fileName) {
      this.fileName = fileName;
   }
   
   public void initBean() {
      out = null;
      try {
         out = new OutputStreamWriter(new FileOutputStream(fileName, true));
      } catch (FileNotFoundException e) {
         e.printStackTrace();
      }

      System.out.println("Init called");
   }
   
   public void destroyBean() {

      try {
         if(out != null) {
            out.close();
         }
      } catch (IOException e) {
         e.printStackTrace();
      }

      System.out.println("Destroy called");         
   }

   @Override
   public void put(long prime) {
      try {
         out.write(Long.toString(prime) + "\n");
      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}

// PrimeMain.java
package primegenerator;

import org.springframework.context.support.ClassPathXmlApplicationContext;

public class PrimeMain {

   public static void main(String[] args) {
      ClassPathXmlApplicationContext beanFactory = new ClassPathXmlApplicationContext(
            "PrimeGeneratorApplication.xml");
      beanFactory.registerShutdownHook();
      
      PrimeGenerator generator = (PrimeGenerator) beanFactory.getBean("generator");
      generator.generatePrimes(200000);
   }
}

// ..\PrimeGeneratorApplication.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:aop="http://www.springframework.org/schema/aop"
      xmlns:context="http://www.springframework.org/schema/context"
      xsi:schemaLocation="http://www.springframework.org/schema/beans
                     http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
                     http://www.springframework.org/schema/context
                     http://www.springframework.org/schema/context/spring-context-3.0.xsd"
      default-init-method="initBean"
      default-destroy-method="destroyBean">

<context:component-scan base-package="primegenerator" />

<bean id="generator" class="primegenerator.PrimeGenerator" />
<bean id="destination" class="primegenerator.FileDestination">
   <constructor-arg value="primes.txt" />
</bean>

</beans>

Through the default-init-method and default-destroy-method in the application context we have now defined a method to be called on all beans when they have been created and initialized and one method to be called when the application context is destroyed. To make the application context aware of when the program is ended PrimeMain has been updated to register the application context with the JVM by calling the registerShutdownHook method.

With this update in place the time taken for all primes under 20000 is now down to 1870 ms. A dramatic speedup indeed! And somewhat surprising maybe since 1870 ms is far less than 30% (which was the time taken by the actual prime generation) of the time in Step 1. We may have reaped additional benefits from out improved I/O that were not obvious before. Examples of such may be improved caching, less context switching for I/O, etc.

Profiling the new application shows the following.

Indeed, the file handling is hardly noticeable now. If we want to improve the performance of this application further we should focus on the algorithm.

Step 3, improved algorithm

Lets switch that inefficient algorithm to a better one. This is a slightly tweaked version of the Sieve of Eratosthenes.

// PrimeGenerator.java
package primegenerator;

import org.springframework.beans.factory.annotation.Autowired;

public class PrimeGenerator {

   @Autowired
   private IPrimeDestination destination;

   public void generatePrimes(int upTo) {
      long startTime = System.currentTimeMillis();

      int N = upTo;

        // assume 2 and all odd integers >= 3 are prime
        boolean[] isPrime = new boolean[N + 1];
        isPrime[2] = true;
        for (int i = 3; i <= N; i+=2) {
            isPrime[i] = true;
        }

        // mark non-primes <= N using Sieve of Eratosthenes
        for (int i = 2; i*i <= N; i++) {

            // if i is prime, then mark multiples of i as nonprime
            // suffices to consider multiples i, i+1, ..., N/i
            if (isPrime[i]) {
                for (int j = i; i*j <= N; j++) {
                    isPrime[i*j] = false;
                }
            }
        }

        // Output primes
        for (int i = 2; i <= N; i++) {
            if (isPrime[i]) destination.put(i);
        }
        
      System.out.println("Done generating primes, time = " +
      (System.currentTimeMillis() - startTime) + " ms");
   }   
}

Now the runtime for all primes under 200000 is down to 120 ms! That's less than 1% of the original runtime.

The profiler shows us the following.

Now I/O is once again the dominating factor. If we want to further improve the performance we should therefor focus on improving that.

I will settle with the current improvements and hope that this was good illustration of how to work with optimizations. Measure -> Tweak -> Measure -> Tweak -> ... Until good enough. Don't be tempted to perform optimizations on code that have not been proven to be a bottle neck. Code optimizations will often result in code that is less clear and harder to maintain.

tisdag 24 april 2012

Leiningen on Windows/Cygwin

I've started to experiment with Noir which is a web framework for Clojure that seems really cool. In doing so I also made my first close encounter with Leiningen which seems really convenient.
Since I'm using Windows (strange huh?) I decide to run Leiningen through Cygwin (you also have the option of running a batch file but I haven't tried that yet). In doing so I found small problem with running Leiningen plugins.

Downloading and installing the plugin seemed to work as planned:

$ lein plugin install lein-noir 1.2.1
[INFO] Unable to find resource 'lein-noir:lein-noir:jar:1.2.1' in repository central (http://repo1.maven.org/maven2)
Copying 1 file to C:\cygwin\tmp\lein-db2bc8b7-7c28-4c97-a2f1-71caaa04d00c\lib
Including lein-noir-1.2.1.jar
Including clojure-1.2.1.jar
Created lein-noir-1.2.1.jar

When trying to run the plugin it doesn't seem to be installed however:

$ lein noir new first-site
That's not a task. Use "lein help" to list all tasks.

Seems like the plugin isn't installed after all... It turns out that Leiningen, Cygwin and Windows do not seem have the same idea of where your home directory is located. All plugins should be installed under $HOME/.lein/plugins/.

What you need to do is locate where Leiningen stored your new plugin. For example by doing:
$ find . -name 'lein-noir-1.2.1.jar'

Once you know where it is you simply have to copy it to $HOME/.lein/plugins/:

$ cp <Location from previous step>/lein-noir-1.2.1.jar $HOME/.lein/plugins/

That's it! You should now be able to use you're newly acquired plugin.

$ lein noir new first-site
Creating noir project: first-site
Creating new dirs at: D:\Development\noir\first-site
Adding files...
Project created!

söndag 8 april 2012

Project Euler

So, how are things going with my Clojure? Pretty good I think.

I've started out writing solutions to some of the problems (about 10 so far) presented Project Euler (the site is currently down, that's how I got the time to log a post here. :-)). I've also done a Kata or two that I've previously done in other languages. There's also a new project taking shape in my head that I've started writing some code for but I'll write more about that some other time.

I think that there are a number of aspects on learning a new language through Project Euler (PE) that may be of interest.

Project Euler is not designed to teach you programming

As stated at the start page at project Euler:

Project Euler exists to encourage, challenge, and develop the skills and enjoyment of anyone with an interest in the fascinating world of mathematics.

The primary objective is hence (from my interpretation) to dig into the world of mathematics rather than into the world of programming. You simply happen to do some programming along the way to solve the problems presented to you. This is usually done with small snippets of code that (so far) haven't exceeded ten lines.

Since the form of the problem is always the same (create a function which calculates a number which is the answer to a question) and the domain is always mathematics (in particular discrete mathematics so far) there are many aspects of real world programming that are left out. Examples of such include I/O-handling, user interaction, database access and all other stuff that's not hardcore algorithm development. In my experience this caters for at least 90% of all programming time in my professional Telecom/ERPish life.

Furthermore no other feedback is given to you regarding other quality aspects of your solution apart from the correctness of the particular case that it is designed to solve. Unless you really focus on finding weak spots and smells in the code that you're producing your not likely to find them since the solutions are usually not complicated, or large enough, to make that smell stink.

The size of the solutions is also another problem if you were to use PE as your only source for programming experience. As I mentioned earlier I haven't yet seen solutions in any languages stretching more than 20 lines of code. This means that you never get close to features of object oriented languages (and Clojure) such as polymorphism which should be considered an essential feature. You usually get away with one or two loop constructs, knowing a list type and a bit of array juggling and are never really pushed to move beyond those features. One sign of this are the many (often short and elegant) solutions to the problems presented in x86 assembler by people. It shouldn't be possible to write a short and elegant solution to a programming problem using assembler if it's to push your limits as a high level programmer! ;-)

I think that the sizing problem is a bigger issue for someone trying to learn many aspects of a mainstream OO language such as Java rather than for someone trying to learn a functional language such as Lisp/Clojure. My impression is that design in the small (sub-function level) and design in the large (class and package composition) has more in common in functional languages than it does in the mainstream OO languages.

How you still can learn a lot of programming from Project Euler
Given my above findings you might think that I think PE sucks if you want to learn programming. On the contrary! I think it's great! You just have to take some things into consideration when using it as a tool for learning programming. These are a few things that have worked out really well for me so far.

Allow yourself to rush to the solution

I usually feel a bit stressed to find the solution to a problem once I've read and understood it. As soon as I've opened a new problem I immediately want to start throwing code at it to find the solution and get that big green check that you get when entering the correct solution.

The way I'm trying to cope with this is to force myself to take at least one or two minutes to contemplate on the mathematics behind the problem. After that i allow myself to start hacking away until I reach green knowing that I made it (with respect to the only criteria measured at PE).

But then stop, reflect and redo

Once I have a confirmed solution to a problem i force myself to go back and look it over to see if I can find a better (cleaner, faster, more idiomatic) or different (using a different technique) solution to the problem.

Once I've done that I usually hit the PE forum for the problem at hand to see what other persons have come up with (in other languages than Clojure). That's usually a good source of inspiration for another one or two solutions to the problem.

When that's is done I usually hit Clojure-Euler to browse other solutions to the problem that are written in Clojure. That's a great way to pick up some really nice idiomatic Clojure and learn a couple of new functions or macros that you're likely not to have heard of before.

Do more programming than PE

Do not use PE as your only source for exercises when learning a new language. Start a toy project that complements the things that you learn from PE.
An ideal such projects would not require a lot of algorithmic though, that you will get from PE for sure. Instead try to focus on other areas (I/O handling, user interaction, some multi-threading, etc.). A chat client/server is one example of a program that would fulfill a number of the above criterias.

Stop when you feel like

There will probably come a time when you discover that you learn more mathematics from the problems than programming. Then you should stop and consider if continued work on PE is really the best way forward.

Closing Notes

All in all doing PE problems have been a great experience so far. Every new problem is like opening a small present. I know that I still have a lot more to do (about 360 I think :-)) and that some of the points I'm trying to make above may be proven wrong once I move up the ladder. I'll write a new post when I reach 100 solved problems (if I ever)! :-)

onsdag 21 mars 2012

What's on my bedside and in my head

I'm always reading two or three development related books in parallel. I'm trying to keep the number down and aim at never working more than two books simultaneously. More than two will cause a tempo that's too slow and a momentum that's too low. That's usually what make me quit books. In a way I guess it's self adjusting but I usually want to finish the books I start.

Anyway...

Programming Pearls
I just finished reading the second edition of Programming Pearls by Jon Bentley. The first edition was written 1985 while this edition was written 1999. In other words, this is a real classic. :-)
The reason I got interested in the book was an anagram coding kata described by Dave Thomas. After doing the kata a few times with various approaches Goggled the problem to see what alternative solutions there were to the problem. I think that a post on Stack Overflow pointed me in the direction of this book which presents a complete and very elegant solution in less than 20 lines of C-code. In addition to being very elegant it's also very fast.

When reading that solution I felt I wanted to see more code like that and figured that there might be more of interest in the book. The book is focused on classic (but of course still very relevant) programming. By that I mean that it has a high focus on algorithms and data structures rather than program structures and and code organization (a.k.a Design Patterns et. al).
There are a lot of discussions covering various sorting and searching algorithms and suitable data structures therefor. There are also about a zillion references to Donald Knuths The Art of Computer Programming which is really dear to the author.
Other topics covered in the book include program verification, back of the envelope estimations and code tuning.
The verification parts were nice to read and (obviously) not fully aligned to today's TDD trends. Rather the author leans on setting up invariants describing facts that must hold at various places in the code to allow reasoning about the correctness of the code. To me this looks like something not too far away from what has been tried in formal program verification.
Back of the envelope estimations is a practice used far to seldomly when assessing if the system we've built (or are about to build) is up for the task it's meant for. The book gives some good estimates on which you can lean when doing these estimates as well as some good techniques and advises.

Overall the book was a nice read and there were some really nice pearls in there although I have to confess that I didn't find any other of them as nice as the anagram program.

Spring in Action
The other book I've been reading lately (and still am) is the third edition of Spring in Action by Craig Walls. I'm reading it more to try to grasp what the Enterprise Frameworks that are used in Java are all about rather than for Spring specific knowledge. I figured that since Spring is such a widespread and almighty framework an introduction to it would give me the introduction to the "Enterprise World" that I needed.

Even though I'm not a huge fan of these heavy duty frameworks with a ton of XML, annotations and even some byte code manipulation the book is a nice read. It serves my purposes quite well.
Spring is a topic far to wide for one book if you want to dig really deep but this introduction will show you the basics and allow you to attain the knowledge needed to search more information on you own.

What's next?
Now I will (finally!) allow myself to go Clojure in my reading but I'll tell you more about that in another post...

onsdag 22 februari 2012

(Re)Starting with Clojure

After attending JFokus last week I was inspired to alot of things. That's usually the way things go for me when I attend conferences.

One of the things I got really keen on doing was picking up Clojure again. I spent some time trying to pick up parts of the language about a year ago but not enough for it to really stick.

This time I went for the Eclipse plugin Counterclockwise. Eclipse is the IDE I currently use at work and I try to spend as much time in it as possible to get fluent. My heart tells me that I should use Emacs when programming LISP but Counterclockwise has worked out really well. It's quite minimal but functional. Just in line with LISP I guess.

After installing Counterclockwise and getting the REPLF running I started looking for test frameworks for Clojure. I'm a TDD fan and feel most at comfort when being able to write tests before writing the code to make them pass.

What I found (in addition to clojure-test which is what ships with the Clojure distribution) was Midje which seems to have some really nice features. I've just gotten it up and running, written my first tests and watching them both fail and succeed. I'll have to play around with it a bit more to get to know it better.

My plan now is to do a couple of katas to get my Clojure skills up to a level where I can actually produce a couple of lines code without having to Google each and every function and functionality.

I also bought The Joy of Clojure which was not available last time I spent time with Clojure. My expectations on the book are quite high as I've read some really positive comments on it.

Back to Clojure!

torsdag 16 februari 2012

Hi!

My plan for this blog is to write down my thoughts on various software issues and to document some of the software activities that I involve myself in.