Monday, May 02, 2016

Excel calculations in F# - The Code, Part II

I promised a bit of a geek-out this time around; I hope you won’t be disappointed.

So far, we’ve seen how to take a general Excel expression of the form =Calculate(expression), extract the relevant bit of text from it using existing Excel interop libraries, and then put whatever result we get from the calculation back onto an Excel sheet.

All that’s left is the core of the matter: doing the calculation. The top-level code looks like this:

let rec CalcFormulaFromText = 
    StringToTokenList
    >> InfixToPostfix
    >> Calculate

At this stage it’s worth introducing some of the domain constructs that I created for the project. The first is that of a Token, which represents an individual unit that can be evaluated. I’ve used a discriminated union to model it as shown below.


The domain model, in brief

type Token = 
    | Operator of Operator
    | LeftBracket
    | RightBracket
    | ValList of float list
    | Val of float
    | Ref of Ref
    | Handle of string

where I’ve gone on to define the subtypes Ref and Operator as

type Ref = 
    { col : int
      row : int }

type Operator = 
    | Exponent = '^'
    | Divide = '/'
    | Multiply = '*'
    | Add = '+'
    | Subtract = '-'

A couple of questions probably cropped up as you scanned through these types:

  • Why are left and right brackets treated differently than other mathematical operators?
    • This is a particular quirk of the algorithm that I’m about to explain which is used in the InfixToPostfix function to convert our expression to Reverse Polish Notation. Basically, it strips brackets from the final form of the expression.
  • Why is Ref even included as a Token? After all, if it just points to another Excel cell, don’t we have the same problem of not being able to directly evaluate it?
    • A valid point. One of the first things I would do to refactor the solution would be to recurse over any references to other cells and replace the reference with the raw expression
  • What’s the obsession with float?
    • This is clearly unjustified, and was put purely for demo purposes. In reality I want to be able to deal with a more arbitrary type of value object, and would probably model things differently.

Parsing a string into a token list

Fundamentally though, I hope you understand the basic domain. We’ve got some operators, and some different kinds of stuff on which they can operate.

So how do we convert a plain old string to a list of these Token instances? I’d like to say that the code was simple, but it’s not. You can find the raw source in ExpressionParser.fs, and I’ll try to pick out some highlights of it here.

let readToken (ts : char list) = 
    match ts with
    | [] -> None
    | _ -> 
        let token = String.Concat(Array.ofList (List.rev ts))
        match token with
        | IsVal v -> Some(Val v)
        | IsRef r -> Some r
        | IsHandle h -> Some(Handle h)
        | _ -> None

let StringToTokenList(str : string) = 
    let rec handleOp op tail currentToken tokens = 
        let token = readToken currentToken
        match token with
        | None -> readChars tail (op :: tokens) List.empty
        | Some token -> readChars tail (op :: token :: tokens) List.empty

    and readChars chars tokens currentToken = 
        match chars with
        | [] -> 
            let token = readToken currentToken
            match token with
            | None -> tokens
            | Some token -> token :: tokens
        | h :: t -> 
            match h with
            | Op -> handleOp (Operator(asOperator h)) t currentToken tokens
            | Space -> readChars t tokens currentToken
            | LBracket -> handleOp LeftBracket t currentToken tokens
            | RBracket -> handleOp RightBracket t currentToken tokens
            | Other -> readChars t tokens (h :: currentToken)

    List.rev (readChars (Array.toList (str.ToCharArray())) List.empty List.empty)

These two functions, StringToTokenList and ReadToken, are the workhorses of the algorithm. The logic works roughly as follows:

  • Start with a string, represented by a list of characters.
  • Take the first character of the string.
    • If it’s an operator then add that to the output Token list
    • If it’s whitespace, ignore it and move onto the next character
    • Otherwise, put the character into a temporary local list and move onto the next.
  • Each time we add an operator to the list, we also check the temporary local list of characters and call readToken to figure out what it is. This function uses Active Patterns to see if we’ve got an int, float, etc.
  • Carry on doing this until the string is empty. At this point, figure out what it is we’ve got left.
  • After a final reverse, we end up with a list of tokens that correspond to our original string. Neat, huh?

The other cool thing is the way we set up our active patterns. Remember we’re treating all numbers as floating points, and so to see if the string we’ve got can be treated as one, we write

let (|IsVal|_|) str = 
    match Int32.TryParse(str) with
    | (true, int) -> Some(float int)
    | _ -> 
        match Double.TryParse(str) with
        | (true, float) -> Some(float)
        | _ -> None

Personally, I think this is a great way to figure out whether you’re dealing with a valid number, and much better than the C# equivalent.


Dijkstra’s shunting-yard

Now we’ve got a list of token items, how do we evaluate them? They’ve got all sorts of operators with different precedences, brackets, etc. You can probably guess the answer already: we can use shunting-yard algorithm to convert our Infix list to a Postfix (i.e. Reverse Polish Notation) list, at which stage the calculation becomes much easier (hint: we’ve done it before!!).

Without further ado, here’s my shunting-yard code:

type PostFixList = PostFixList of Token list

let InfixToPostfix ts = 
    let rec handleOperator operator tokens opStack outQueue = 
        match opStack with // add brackets to the Operator type?
        | [] -> readTokens tokens (Operator operator :: opStack) outQueue
        | Operator head :: tail -> 
            match associativity operator with
            | Left -> 
                if precedence operator <= precedence head then 
                    handleOperator operator tokens tail (outQueue @ [ Operator head ])
                else readTokens tokens (Operator operator :: opStack) outQueue
            | Right -> 
                if precedence operator < precedence head then 
                    handleOperator operator tokens tail (outQueue @ [ Operator head ])
                else readTokens tokens (Operator operator :: opStack) outQueue
        | _ -> readTokens tokens (Operator operator :: opStack) outQueue

    and transferOperators opStack outQueue = 
        match opStack with
        | [] -> outQueue
        | head :: tail -> 
            match head with
            | _ -> transferOperators tail (outQueue @ [ head ])

    and handleBrackets tokens opStack outQueue = 
        match opStack with
        | [] -> []
        | LeftBracket :: t -> readTokens tokens t outQueue
        | h :: t -> handleBrackets tokens t (outQueue @ [ h ])

    and readTokens tokens opStack outQueue = 
        match tokens with
        | head :: tail -> 
            match head with
            | Operator op -> handleOperator op tail opStack outQueue
            | LeftBracket -> readTokens tail (head :: opStack) outQueue
            | RightBracket -> handleBrackets tail opStack outQueue
            | _ -> readTokens tail opStack (outQueue @ [ head ])
        | [] -> transferOperators opStack outQueue

    PostFixList (readTokens ts List.empty List.empty)

One of the first things I did when coding this up was to tag my output type as a PostFixList to make absolutely sure I didn’t get it confused with my input list in the algorithm. This advanced type-safety was immeasurably helpful; I got the work done so much quicker for it, and with far fewer bugs.

The rest of the code is complicated but follows the logic of the Wikipedia article. it uses all the awesome bits of F#: mutual recursion, pattern matching, discriminated unions. I’ll admit it’s not readable but then again it’s not the kind of code that you write every day!


Finally, some actual calculation!

The final bit of the original top-level function does the final calculation. The evaluation code is pretty similar to the RPN calcualtor we did before:

let Eval(PostFixList ls) = 
    let solve items current = 
        match (current, items) with
        | Handle h, _ -> ValList(EvaluateHandleAsFloatList h) :: items
        | Ref r, _ -> (EvaluateReference r) :: items
        | Operator Operator.Add, y :: x :: t -> Add x y :: t
        | Operator Operator.Subtract, y :: x :: t -> Subtract x y :: t
        | Operator Operator.Multiply, y :: x :: t -> Multiply x y :: t
        | Operator Operator.Divide, y :: x :: t -> Divide x y :: t
        | Operator Operator.Exponent, y :: x :: t -> Exponent x y :: t
        | _ -> current :: items
    (ls |> List.fold solve []).Head

The one really great thing I managed to do here was to abstract addition, subtraction etc. from the main function. Rather than saying ‘when we have two things and the Add operator, do x + y, I wrote a custom function Add that is defined as let Add x y = ApplyInfixOperator x y (+), and then created an extensible way to add different types of values (int, float, list):

let rec ApplyInfixOperator x y op =         
    match (x, y) with
    | (Val xv, Val yv) -> Val(xv |> op <| yv)
    | (ValList xvs, Val yv) -> ValList(List.map (fun elt -> elt |> op <| yv) xvs)
    | (Val xv, ValList yvs) -> ValList(List.map (fun elt -> xv |> op <| elt) yvs)
    | (ValList xvs, ValList yvs) -> ValList(List.map2 (fun x y -> x |> op <| y) xvs yvs)
    | (Handle xh, Handle yh) -> ApplyInfixOperator (ValList(EvaluateHandleAsFloatList xh)) (ValList(EvaluateHandleAsFloatList yh)) op
    | (Handle xh, _) -> ApplyInfixOperator (ValList(EvaluateHandleAsFloatList xh)) y op
    | (_, Handle yh) -> ApplyInfixOperator x (ValList(EvaluateHandleAsFloatList yh)) op
    | _ -> raise InvalidInput

This relatively simple function is remarkably powerful: we have used it to do vector addition and multiplication, and can trivially extend it to matrix addition and multiplication, or whatever we like.

I think this might be the most important part of the whole project, as if I want to be able to add two instances of some internal type, all I need to do is add a branch to this function and we’re away.


Next Time

Whilst I haven’t mentioned it here, I tried to do a lot of the algorithm development using a test-driven methodology. Next time, we’ll look at some of the tests and more generally at F#’s support for testing frameworks. This should give a much more intuitive feel of what these Token lists and other domain concepts look like!

No comments:

Post a Comment