onsdag 27 juni 2012

Three flavors of Brainfuck in Clojure

I was inspired to implement a Brainfuck interpretator in Clojure after a workshop at work were we started doing the same in Haskell. I ended up doing a couple of implementations on different themes to explore their different characteristics.

Approach one

This was my first attempt, it features no global variables and a different approach to parsing the program. Instead of managing the parsing of loops ourselves by some loop depth counter the parsing has been left to the built in function read-string. It takes a string and makes whatever Clojure data structures it can from that string. Since Brainfuck has some similarities to a nested Clojure vector, enough to treat it like one. We just have to make sure that some special characters in Clojure are escaped properly before applying read-string.
Overall the solution feels a bit hacky. The workings of the parse-bf isn't obvious although quite compact. exec-bf-internal which is the the brain of the interpreter is quite large and contains quite a bit of duplication. Overall it was a nice try though. :-)


;; brainfuck.clj
(ns brainfuck
  (:use clojure.test)
  (:use midje.sweet))

; Dynamic meta data needed since 1.3 in order to override with dynamic binding.
; Only used for test really
(defn ^:dynamic read-input [] (.read *in*))

(defn str= [a b] (= (str a) (str b)))
(defn end? [elem] (nil? elem))
(defn loop? [elem] (sequential? elem))
(defn plus? [elem] (str= elem "+"))
(defn minus? [elem] (str= elem "-"))
(defn left? [elem] (str= elem "<"))
(defn right? [elem] (str= elem ">"))
(defn put? [elem] (str= elem "."))
(defn get? [elem] (str= elem ","))

(defn- exec-bf-internal [program the-memory the-mem-pos]
  (loop [remain program
         memory the-memory
         mem-pos the-mem-pos]
    (let [elem (first remain)]
      (cond
        (end? elem) {:mem memory :pos mem-pos}
        (loop? elem) (if (> (nth memory mem-pos) 0)
                       ; Enter loop body
                       (let [{:keys [mem pos]} (exec-bf-internal elem memory mem-pos)]
                         (recur remain mem pos))
                       ; Skip loop body
                       (recur (rest remain) memory mem-pos))                   
        (plus? elem) (recur (rest remain)
                            (assoc memory mem-pos (inc (nth memory mem-pos)))
                            mem-pos)
        (minus? elem) (recur (rest remain)
                             (assoc memory mem-pos (dec (nth memory mem-pos)))
                             mem-pos)
        (right? elem) (recur (rest remain)
                             memory
                             (inc mem-pos))
        (left? elem) (recur (rest remain)
                            memory
                            (dec mem-pos))
        (get? elem) (recur (rest remain)
                           (assoc memory mem-pos (int (read-input)))
                           mem-pos)
        (put? elem) (do
                      (print (char (nth memory mem-pos)))
                      (recur (rest remain)
                             memory
                             mem-pos))
        
        ; Not a symbol which we recognize, just ignore it to allow comments, etc.
        :default (recur (rest remain)
                        memory
                        mem-pos)))))

(defn- escape-conflicts [string]
  (clojure.string/replace string "," "\",\""))
    
(defn- parse-bf [program]
  ; Make use of the Clojure like syntax and create a vector (nested if loops exist)
  ; containing the program by using the built in read-string
  ; This turned out to be quite hacky as Clojure considers "," as air so they had to
  ; be escaped before transforming the program into a Clojure vector
  (read-string (escape-conflicts (clojure.string/join " " (seq (str "[" program "]"))))))

(defn exec-bf [program]
  (exec-bf-internal (parse-bf program) (vec (replicate 10000 0)) 0)
  1) ; Avoid returning the memory in the wrapping function

Approach two

This approach is more similar to most simple imperative solutions that I've seen. It uses a couple of global atoms to hold state such as memory and program position. This makes the overall program structure a bit cleaner since there's no need to pass the entire state around. On the other hand it makes the individual functions harder to test. As always global variables become worse and worse as programs grow (compare to instance variables in large classes).
This implementation also keeps track of loop state on its own. It is also quite a bit slower than the first one.

(ns brainfuck2
  (:use clojure.test)
  (:use midje.sweet))

; Dynamic meta data needed since 1.3 in order to override with dynamic binding.
; Only used for test really
(defn ^:dynamic read-input [& args] (.read *in*))

(def ^:dynamic *program*)

(def memory (atom nil))
(def mem-pos (atom nil))
(def prog-pos (atom nil))

(defn- fetch-instruction []
  (nth *program* @prog-pos))

(defn- current-value []
  (let [pos @mem-pos
        val (nth @memory @mem-pos)]
    val))

(defn- enter-loop? []
  (not (zero? (current-value))))

(defn- exit-loop? []
  (zero? (current-value)))

(defn- doto-mem [op]
  (swap! memory assoc @mem-pos (op (current-value))))

(defn- doto-mem-pos [op]
  (swap! mem-pos op))

(defn- doto-prog-pos [op]
  (swap! prog-pos op))

(defn- put-char []
  (print (char (current-value))))

(defn- get-char []
  (doto-mem #(int (read-input %))))

; Used for searching block boundaries both forward and backwards
(defn- goto-block-bound [inc-fn dec-fn]
  (loop [level 1]
    (when (> level 0)
      (do
        (doto-prog-pos inc-fn)
        (recur (condp = (fetch-instruction)
                 \[ (inc-fn level)
                 \] (dec-fn level)
                 level))))))

(defn- goto-block-start []
  (goto-block-bound dec inc))

(defn- goto-block-end []
  (goto-block-bound inc dec))

(defn- execute-instruction []
  (case (fetch-instruction)
    \[ (when (not (enter-loop?))
         (goto-block-end))
    \] (when (not (exit-loop?))
           (goto-block-start))
    \+ (doto-mem inc)
    \- (doto-mem dec)
    \> (doto-mem-pos inc)
    \< (doto-mem-pos dec)
    \. (put-char)
    \, (get-char)
    :default)) ; Ignore all other symbols to allow comments, etc.

(defn- exec-bf-internal []
  (when (< @prog-pos (count *program*))
    (do 
      (execute-instruction)
      (doto-prog-pos inc)
      (recur))))

(defn- reset-state []
  (reset! memory (vec (replicate 10000 0)))
  (reset! mem-pos 0)
  (reset! prog-pos 0))

(defn exec-bf [program]
  (reset-state)
  (binding [*program* (seq program)] (exec-bf-internal)))

Approach 3

The biggest difference between this one and the previous is that this one is no longer an interpreter for Brainfuck but instead a compiler. In the first pass it compiles the program into a number of anonymous Clojure functions that are then executed in the second pass.
This allows for optimization's such as the simple ones found in compile-arithm and compile-mem-step where multiple consecutive arithmetic and memory stepping operations are merged into one. This is by far the fastest of the three implementations shown here.
;; brainfuck5.clj

(ns brainfuck5
  (:use clojure.test)
  (:use midje.sweet))

(defn ^:dynamic read-input [] (.read *in*))

(def ^:dynamic *program*)

;(def state (atom nil))
(def prog-pos (atom nil))

(def memory (atom nil))
(def mem-pos (atom nil))

(defn- fetch-instruction []
  (get *program* @prog-pos))
  
(defn- step-program-counter []
  (swap! prog-pos inc))

(defn- current-value []
  (nth @memory @mem-pos))

(defn- doto-mem [op]
  (swap! memory assoc @mem-pos (op (current-value))))

(defn- doto-mem-pos [op]
  (swap! mem-pos op))

(defn- put-char []
  (print (char (current-value))))

(defn- get-char []
  (doto-mem (fn [_] (int (read-input)))))

(def compile-block)
(def execute-block)
(defn- compile-loop []
  (let [loop-block (compile-block)]
    (fn []
      (when (not (zero? (current-value)))
        (do
          (execute-block loop-block)
          (recur))))))

(defn- compile-arithm [start]
  (loop [steps start]
    (case (fetch-instruction)
      \+ (do
           (step-program-counter)
           (recur (inc steps)))
      \- (do
           (step-program-counter)
           (recur (dec steps)))
      (fn [] (doto-mem #(+ % steps))))))

(defn- compile-mem-step [start]
  (loop [steps start]
    (case (fetch-instruction)
      \> (do
           (step-program-counter)
           (recur (inc steps)))
      \< (do
           (step-program-counter)
           (recur (dec steps)))
      (fn [] (doto-mem-pos #(+ % steps))))))

(defn- compile-put []
  #(put-char))

(defn- compile-get []
  #(get-char))

(defn- compile-block []
  (loop [compiled []]
    (if-let [inst (fetch-instruction)]
      (do
        (step-program-counter)
        (case inst
          \[ (recur (conj compiled (compile-loop)))
          \] compiled
          \+ (recur (conj compiled (compile-arithm 1)))
          \- (recur (conj compiled (compile-arithm -1)))
          \> (recur (conj compiled (compile-mem-step 1)))
          \< (recur (conj compiled (compile-mem-step -1)))
          \. (recur (conj compiled (compile-put)))
          \, (recur (conj compiled (compile-get)))
          (recur compiled))) ; Ignore all other symbols to allow comments, etc.
      compiled))) 

(defn- execute-block [block]
  (doseq [instruction block]
    (instruction)))

(defn- exec-bf-internal []
  (let [prog (compile-block)]
    (execute-block prog)
    1))

(defn- reset-state []
  (reset! prog-pos 0)
  (reset! memory (vec (replicate 10000 0)))
  (reset! mem-pos 0))

(defn exec-bf [program]
  (reset-state)
  (binding [*program* (into [] (seq program))] (exec-bf-internal)))

Some tests

These are some tests written in Midje to verify the functionality of the above programs.
;; brainfuck5.clj

; Some tests
(def a-str (str "+++++++++++++++++++++++++++++++++"
                "+++++++++++++++++++++++++++++++++"
                "+++++++++++++++++++++++++++++++"))

(def hello-world (str "++++++++++[>+++++++>++++++++++>+++>+<<<<-]"
                      ">++.>+.+++++++..+++.>++.<<+++some comments in the middle"
                      "++++++++++++."
                      ">.+++.------.--------.>+.>."))

(defn dummy-read []
  \a)

(fact (with-out-str (exec-bf (str a-str "."))) => "a")
(fact (with-out-str (exec-bf (str a-str "+-."))) => "a")
(fact (with-out-str (exec-bf (str a-str ">" a-str "++" ".<."))) => "ca")
(fact (with-out-str (exec-bf "+++++[[>+++++++++++++++++++<-]]>++.")) => "a")
(fact (with-out-str (exec-bf hello-world)) => "Hello World!\n")
(fact (binding [read-input dummy-read] (with-out-str (exec-bf ",+."))) => "b")