Wednesday, October 26, 2016

Back to the Start


This week, I’ve had the dubious pleasure of doing a code review of my first ever production F# code. The application behaviour needed to be changed, and so another team member has had to look at my F# code and change it.

As we all know, looking at your old code can often make you shudder, so when it’s some of the first real code you wrote in a particular language that feeling gets even stronger!

I thus went about reviewing the changes with three primary questions in mind:

  • Do the changes to the code follow the right ‘architecture’? (i.e. what one normally looks for in a code review)
  • Is there anything in the original version that should definitely be changed?
  • How do I feel now, reading the original version?

What did I learn?


Point-free problems


In my haste to be idio(ma)tic, I made a number of my main API functions point-free.

As the functions are called from C# projects, this has the unfortunate effect of making them compile down to instances of the FSharpFunc class, meaning that it we have a function f with a tacit parameter x, it must be called in C# using f.Invoke(x). Yuck!

Instead, if we explicitly write the parameter (so let f = (...) becomes let f x = (...)x, the function compiles to a static method and can be called using f(x). Much better!

This kind of hidden, gnarly situation is one that should have made me think during the original implementation, but I was clearly blindsided by my newly-acquired functional programming knowledge and keen to apply unnecessary techniques.


Option and Result aren’t obvious


My absolute favourite thing about F# is its ability to do domain modelling — I challenge you to watch this video and not fall a little bit in love with the ML type system.

Once you get the hang of it, things come very naturally: when things might fail, use Result rather than throwing exceptions. When they might be missing, use Option rather than null. When you want something that is a string but not all strings are valid, use a single-case DU.

If you haven’t seen this stuff before, then exceptions, null, and plain old strings will still be your go-to constructs.

This made me realise the importance of sharing around my knowledge, specifically when it comes to F# domain modelling. A large part of my upcoming F# workshop will focus on how to model complex domains with the F# type system, and if I can share this with the community I can definitely share it with my team-mates!


Custom operators confuse


Just by looking, do you know what the operators =, & and | do?

Boolean logic, right?

What about >>=, >=> and <*>?

Er, what?

If you’re a keen functional programmer you might know that the latter three are used in monadic composition and represent bind, Kleisli composition and apply. You might even be able to look at your function signature and know exactly which of the three you need to glue your functions together.

Or, you might not.

It’s a bit of a crime to assume too much theoretical knowledge on the part of your collaborators (present and potential future). If you understand something only through a lot of reading and research, don’t kid yourself that the next person to read your code will be so impassioned by the new concepts to bone up on them for hours on end.

Keep it simple! At the very least, for every custom operator you need to expose the underlying function that it represents.

Being more prudent, you shouldn’t be using these kind of operators in code that’s (by some measure) likely to be often modified. Sure, if it’s a core library with some well-known functions then use custom operators sparingly, but if your business logic is peppered with (>=>) you’re doing it wrong.


Help your Discriminated Unions


Some features of F# need a little work to carry them over to C#. The main one I’m thinking about is the discriminated union (of which I include Option<T> as a member).

Typically I end up writing a few helper functions to deal with using DUs in C#:

  • ToString and FromString functions (basically these) that allow you to use your custom DUs from C# code by writing a simple wrapper for your type.
  • Specific C# helpers for the Option type that wrap the F# IsSome and IsNone types, as well as something to create a new FSharpOption.
  • An OptionConverter based on this one that enables (de)serlialsation using Json.NET.

What became clear was that it’s not obvious that these even exist to another developer, let alone how and when to use them.

The solution: curate these methods, put them in a separate project and share them using a NuGet package.

Alternatively, I could use something like F# extras which no doubt do what I want (but also do a lot, lot more!)


Guard your patterns


When I looked over the code, I realised that I didn’t ever use a guard in one of my match expressions.

There were definitely times when I should have done — take a look at this line:

if input.AnalysisDate = DateTime.MinValue || input.AnalysisDate = (DateTime.FromOADate 0.0) then 
    Failure "A valid analysis date must be supplied"
else Success input

In my view, this can be rehashed as:

match input.AnalysisDate with
| x when x = DateTime.MinValue || x = (DateTime.FromOADate 0.0) 
    -> Failure "A valid analysis date must be supplied"
| _ -> Success input

At the time of writing there are 16 pattern types of which I have probably used six.

Advanced pattern matching features are very powerful, and I don’t think I’ve used them well enough in my F# work so far.


Get piping!


One of the elements of F# that took a while to come naturally was function pipelining.

This is pretty strange — I have a C# background and have used these techniques extensively under the guise of IEnumerable<T>.

For whatever reason, when writing F# code it was more natural to put g (f (x)) than x |> f |> g. If I’m honest the temptation to write the former still lingers, but when reading real code this gets ugly, quickly.

One point I’m still mulling over is using the double and triple pipe operators. If anything I think they clear up the code, but for many they might serve to confuse.

Over to you

Here’s a real function from my code that gets data from a web service asynchronously, written three ways:

No pipelining:

let getSchemeAssetAllocation handler schemeName analysisDate = 
     Async.RunSynchronously (getAssetAllocationAsync handler schemeName analysisDate)

Single pipe only:

let getSchemeAssetAllocation handler schemeName analysisDate = 
    getAssetAllocationAsync handler schemeName analysisDate |> Async.RunSynchronously`

Allowing the triple pipe:

let getSchemeAssetAllocation handler schemeName analysisDate = 
    (handler, schemeName, analysisDate)
    |||> getAssetAllocationAsync 
    |> Async.RunSynchronously

Which version would you pick: 1, 2 or 3?

I’m still not sure what I prefer.


Brackets aren’t banned


I like not writing curly braces. I shouldn’t have extended this to parentheses.

I remember reading in Real-World Functional Programming that the preferred F# style is to include brackets for single-argument functions (so writing f(x) rather than f x.

I can definitely see why now — looking at code written using the latter style is a little disconcerting, and simple lines like let validate = validateLinks handler looks much more confusing than they actually are. For whatever reason, writing as let validate = validateLinks(handler) makes it much clearer what’s going on — we are calling the validateLinks method and passing handler as a parameter.


Conclusions


I’ve learnt a huge amount from returned to old code, especially as it was written with very little real-world F# knowledge.

There are definitely some specific things I’d like to act on — writing a small common library to help working with Discriminated Unions, and sharing my thoughts on domain modelling with F# are two clear actions.

Other than that, my main hope is that when writing production F# code in the future, I do so in a manner than makes it clearer for others (and myself) to read and modify.

Thursday, October 06, 2016

Bottom-up Akka.NET using F#


Ever tried to take an existing application and convert it to use the Actor Model? Not easy, right!

It seems like there is no easy way to make ‘a bit’ of your app use Actors — but in this post I’m going to show you how.

We’ll being with a quick overview of the Actor Model, focus in on Akka and Akka.NET, then think about how we design actor systems.

I’ll then show you a slightly different way to work with Actors, one that leverages the Akka.NET F# API and helps you build Actor systems from the bottom-up.

Introduction to the Actor Model

To give a whirlwind overview, the Actor model is a way to do concurrency that treats ‘Actors’ as the unit of computation. The main alternative in the .NET world is multithreading, compared to which Actors are meant to be extremely lightweight and thus support more concurrent computations.


Akka and Akka.NET

One framework for creating Actor sytems in .NET is Akka.NET, a port of the Akka framework that runs on the JVM and it written in Scala. Though it is written in C#, there is also an F#-specific API that allows you to write more idiomatic functional code compared to calling their C# methods directly in F# code.

For a more hands-on introduction, I recommend the Akka.NET bootcamp (there is a C# version as well as one using F#). Written by the makers of Akka.NET, it it easy to follow and gives you an idea about how to create Actor systems.


Designing actor systems

As with all broad topics, there is no one true method for designing Actor systems, and indeed there is no one type of application that you would build using the Actor model. Overall, there are probably just two principles that we must adhere to:

  • Actors do not expose any state
  • Actors communicate through immutable messages

However, when we see examples using the Actor model, they most often take the form of whole-app, top-down systems where ‘everything is an actor’. Case in point: this post from the creators of Akka.NET.

This is a good way to design systems, but what if you don’t have the luxury of re-wiring your whole application yet still want to try and use actor-based concurrency?

In this post, I’ll show you how to start using the actor model in just part of an existing application


F# API and libraries

F# API

The actor computation expression


Using Actors, Bottom-up

It’s time to see some code!

I’ve based this project loosely on the content in Visualizing Stock Prices Using F# Charts. I will take a system that gets stock data from Yahoo finance and charts it on Windows Forms, and use Akka.NET actors to concurrently retrieve the data.

The idea behind that is that, were you to build this system for real, going to Yahoo for every bit of data would soon get pretty slow. Rather than use Threading and Tasks to process the data requests in an asycnhronous and potentially parallel manner, let’s see if we can harness the speed and low footprint of Akka.NET actors instead.


Data Retrieval

open FSharp.Data
open System

type Stocks = CsvProvider< AssumeMissingValues=true, IgnoreErrors=true, Sample="Date (Date),Open (float),High (float),Low (float),Close (float),Volume (int64),Adj Close (float)" >

let url = "http://ichart.finance.yahoo.com/table.csv?s="
let startDateString (date : DateTime) = sprintf "&a=%i&b=%i&c=%i" (date.Month - 1) date.Day date.Year
let endDateString (date : DateTime) = sprintf "&d=%i&e=%i&f=%i" (date.Month - 1) date.Day date.Year

let getStockPrices stock startDate endDate = 
    let fullUrl = url + stock + startDateString startDate + endDateString endDate
    Stocks.Load(fullUrl).Rows
    |> Seq.toList
    |> List.rev

Charting

open FSharp.Charting
open FSharp.Charting.ChartTypes
open System
open System.Windows.Forms

let defaultChart = createChart (new DateTime(2014, 01, 01)) (new DateTime(2015, 01, 01))

let getCharts (tickerPanel : Panel) mapfunction (list : string []) = 
    let sw = new System.Diagnostics.Stopwatch()
    sw.Start()
    let charts = mapfunction defaultChart list
    let chartControl = new ChartControl(Chart.Combine(charts).WithLegend(), Dock = DockStyle.Fill, Name = "Tickers")
    if tickerPanel.Controls.ContainsKey("Tickers") then tickerPanel.Controls.RemoveByKey("Tickers")
    tickerPanel.Controls.Add chartControl
    sw.Stop()
    MessageBox.Show(sprintf "Retrieved data in %d ms" sw.ElapsedMilliseconds) |> ignore

Running these either in sequence or using Task-based parallelism is then very simple:

let getChartsSync (tickerPanel : Panel) = getCharts tickerPanel Array.map
let getChartsTasks (tickerPanel : Panel) = getCharts tickerPanel Array.Parallel.map

Converting to actors

The first thing we do is define the messages that will be passed around our actor system. A DataMessage is one that will be passed around the top-level actor responsible for collecting data — it will either say ‘get me the data for these tickers when coming in from our application’ or ‘I have data for this ticker’ when going back out. The DrawChart message will tell an actor to get the data from Yahoo, and we have implemented a very basic caching strategy which means we need a way to clear the cache — here, just a simple message!


type DrawChart = 
    | GetDataBetweenDates of StartDate : DateTime * EndDate : DateTime
    | ClearCache 

type DataMessage = 
    | StockData of string * Stocks.Row list
    | GetData of string []

We next define the actor responsible for getting a single ticker’s data from Yahoo. This is the tickerActor. It is implemented as two mutually recursive actor computation expressions that correspond to a mini FSM implementation — it starts in the start doesNotHaveData, when it receives a message to get data it does so, passes the data back to the message sender, and moves to the hasData state. In this state, further requests for the same data can be serviced instantly, and a request to clear the cache puts us back as doesNotHaveData. You can also see how easy it would be to remove this caching feature — the commented out line where the actor kills itself after getting the data is all it would take!

let tickerActor (ticker : string) (mailbox : Actor<_>) = 
    let rec doesNotHaveData() = 
        actor { 
            let! message = mailbox.Receive()
            match message with
            | GetDataBetweenDates(startDate, endDate) -> 
                let stockData = StockData((ticker, getStockPrices ticker startDate endDate))
                mailbox.Sender() <! stockData
                //mailbox.Self <! (PoisonPill.Instance)
                return! hasData (stockData)
            | ClearCache -> return! doesNotHaveData()
        }

    and hasData (stockData : DataMessage) = 
        actor { 
            let! message = mailbox.Receive()
            match message with
            | GetDataBetweenDates(_) -> 
                mailbox.Sender() <! stockData
                return! hasData (stockData)
            | ClearCache -> return! doesNotHaveData()
        }

    doesNotHaveData()

Next, we define the actor that will take multiple ticker requests, dispatch each to a tickerActor, and wait for them all to come back. This is the gatheringActor. Again this uses mutual recursion, here we start in the waiting state until the application asks for tickers. We then get the address of the tickerActor instances responsible for getting that data (the ActorRef will just be the ticker name), creating a new actor if we don’t already have one for that ticker. The gatheringActor then changes state to gettingData, which starts off knowing how many sets of ticker data it is awaiting. Every time it gets some it decreases this value, when it’s waiting for no more it draws the ticker data onto the WinForms chart.

let gatheringActor (tickerPanel : Panel) (sw : Stopwatch) (system : ActorSystem) (mailbox : Actor<_>) = 
     let rec waiting (existingActorRefs : IActorRef Set) = 
           actor { 
               let! message = mailbox.Receive()
               match message with
               | GetData d -> 
                   sw.Restart()
                   let existingNames = existingActorRefs |> Set.map (fun (x : IActorRef) -> x.Path.Name)
                   let newActors = existingNames |> Set.difference (Set.ofArray d)

                   let newActorRefs = 
                       [ for item in newActors do
                             yield spawn system (item.ToString()) (tickerActor (item.ToString())) ]

                   let combinedActorRefs = existingActorRefs |> Set.union (Set.ofList newActorRefs)
                   let tell = fun dataActorRef -> dataActorRef <! (GetDataBetweenDates(new DateTime(2014, 01, 01), new DateTime(2015, 01, 01)))
                   Set.map tell combinedActorRefs |> ignore
                   return! gettingData (Set.count combinedActorRefs) combinedActorRefs []
               | _ -> return! waiting (existingActorRefs)
           }

       and gettingData (numberOfResultsToSee : int) (existingActorRefs : IActorRef Set) (soFar : (string * Stocks.Row list) list) = 
           actor { 
               let! message = mailbox.Receive()
               match message with
               | StockData(tickerName, data) when numberOfResultsToSee = 1 -> 
                   let finalData = ((tickerName, data) :: soFar)
                   createCharts tickerPanel finalData
                   sw.Stop()
                   MessageBox.Show(sprintf "Retrieved data in %d ms" sw.ElapsedMilliseconds) |> ignore
                   return! waiting existingActorRefs
               | StockData(tickerName, data) -> return! gettingData (numberOfResultsToSee - 1) existingActorRefs ((tickerName, data) :: soFar)
               | _ -> return! waiting existingActorRefs
           }

       waiting (Set.empty)

Finally, we create our actor system when initialising our Windows Form. Note that we need a bit of hocon that makes things play nicely with the UI thread.

let sw = new System.Diagnostics.Stopwatch()
let gatheringActor = spawn system "counters" (MyActors.gatheringActor tickerPanel sw system)
<hocon>
  <![CDATA[
      akka {
        actor{
          deployment{
            /counters{
              dispatcher = akka.actor.synchronized-dispatcher
            }
          }
        }
      }
  ]]>
</hocon>

Was it faster?

A little bit.

The sequential access was slowest of all, as expected, and took an average of 550 ms to retrieve 10 tickers.

The task-based method took an average of 185ms .

The first call to retrieve data from actors took 162ms, with subsequent requests around 30ms due to the caching implementation.

This isn’t a huge performance bonus, but with only 10 requests anything should work ok!


Summary

Well, that wasn’t so bad (?!).

I think it’s clear that there’s a lot more code involved in setting up an actor system, compared to using task-based parallelism.

Is it worth it? Hard to say. For a small application such as this, probably not. When building something that might scale to the point where it needs distributing over multiple machines, it’ll be a different story.

The main point to take away is that we didn’t have to start the process with actors in mind. We took an existing app, found the bit that could potentially benefit from the actor concurrency model, and converted only that section to use actors.