Thursday, October 06, 2016

Bottom-up Akka.NET using F#


Ever tried to take an existing application and convert it to use the Actor Model? Not easy, right!

It seems like there is no easy way to make ‘a bit’ of your app use Actors — but in this post I’m going to show you how.

We’ll being with a quick overview of the Actor Model, focus in on Akka and Akka.NET, then think about how we design actor systems.

I’ll then show you a slightly different way to work with Actors, one that leverages the Akka.NET F# API and helps you build Actor systems from the bottom-up.

Introduction to the Actor Model

To give a whirlwind overview, the Actor model is a way to do concurrency that treats ‘Actors’ as the unit of computation. The main alternative in the .NET world is multithreading, compared to which Actors are meant to be extremely lightweight and thus support more concurrent computations.


Akka and Akka.NET

One framework for creating Actor sytems in .NET is Akka.NET, a port of the Akka framework that runs on the JVM and it written in Scala. Though it is written in C#, there is also an F#-specific API that allows you to write more idiomatic functional code compared to calling their C# methods directly in F# code.

For a more hands-on introduction, I recommend the Akka.NET bootcamp (there is a C# version as well as one using F#). Written by the makers of Akka.NET, it it easy to follow and gives you an idea about how to create Actor systems.


Designing actor systems

As with all broad topics, there is no one true method for designing Actor systems, and indeed there is no one type of application that you would build using the Actor model. Overall, there are probably just two principles that we must adhere to:

  • Actors do not expose any state
  • Actors communicate through immutable messages

However, when we see examples using the Actor model, they most often take the form of whole-app, top-down systems where ‘everything is an actor’. Case in point: this post from the creators of Akka.NET.

This is a good way to design systems, but what if you don’t have the luxury of re-wiring your whole application yet still want to try and use actor-based concurrency?

In this post, I’ll show you how to start using the actor model in just part of an existing application


F# API and libraries

F# API

The actor computation expression


Using Actors, Bottom-up

It’s time to see some code!

I’ve based this project loosely on the content in Visualizing Stock Prices Using F# Charts. I will take a system that gets stock data from Yahoo finance and charts it on Windows Forms, and use Akka.NET actors to concurrently retrieve the data.

The idea behind that is that, were you to build this system for real, going to Yahoo for every bit of data would soon get pretty slow. Rather than use Threading and Tasks to process the data requests in an asycnhronous and potentially parallel manner, let’s see if we can harness the speed and low footprint of Akka.NET actors instead.


Data Retrieval

open FSharp.Data
open System

type Stocks = CsvProvider< AssumeMissingValues=true, IgnoreErrors=true, Sample="Date (Date),Open (float),High (float),Low (float),Close (float),Volume (int64),Adj Close (float)" >

let url = "http://ichart.finance.yahoo.com/table.csv?s="
let startDateString (date : DateTime) = sprintf "&a=%i&b=%i&c=%i" (date.Month - 1) date.Day date.Year
let endDateString (date : DateTime) = sprintf "&d=%i&e=%i&f=%i" (date.Month - 1) date.Day date.Year

let getStockPrices stock startDate endDate = 
    let fullUrl = url + stock + startDateString startDate + endDateString endDate
    Stocks.Load(fullUrl).Rows
    |> Seq.toList
    |> List.rev

Charting

open FSharp.Charting
open FSharp.Charting.ChartTypes
open System
open System.Windows.Forms

let defaultChart = createChart (new DateTime(2014, 01, 01)) (new DateTime(2015, 01, 01))

let getCharts (tickerPanel : Panel) mapfunction (list : string []) = 
    let sw = new System.Diagnostics.Stopwatch()
    sw.Start()
    let charts = mapfunction defaultChart list
    let chartControl = new ChartControl(Chart.Combine(charts).WithLegend(), Dock = DockStyle.Fill, Name = "Tickers")
    if tickerPanel.Controls.ContainsKey("Tickers") then tickerPanel.Controls.RemoveByKey("Tickers")
    tickerPanel.Controls.Add chartControl
    sw.Stop()
    MessageBox.Show(sprintf "Retrieved data in %d ms" sw.ElapsedMilliseconds) |> ignore

Running these either in sequence or using Task-based parallelism is then very simple:

let getChartsSync (tickerPanel : Panel) = getCharts tickerPanel Array.map
let getChartsTasks (tickerPanel : Panel) = getCharts tickerPanel Array.Parallel.map

Converting to actors

The first thing we do is define the messages that will be passed around our actor system. A DataMessage is one that will be passed around the top-level actor responsible for collecting data — it will either say ‘get me the data for these tickers when coming in from our application’ or ‘I have data for this ticker’ when going back out. The DrawChart message will tell an actor to get the data from Yahoo, and we have implemented a very basic caching strategy which means we need a way to clear the cache — here, just a simple message!


type DrawChart = 
    | GetDataBetweenDates of StartDate : DateTime * EndDate : DateTime
    | ClearCache 

type DataMessage = 
    | StockData of string * Stocks.Row list
    | GetData of string []

We next define the actor responsible for getting a single ticker’s data from Yahoo. This is the tickerActor. It is implemented as two mutually recursive actor computation expressions that correspond to a mini FSM implementation — it starts in the start doesNotHaveData, when it receives a message to get data it does so, passes the data back to the message sender, and moves to the hasData state. In this state, further requests for the same data can be serviced instantly, and a request to clear the cache puts us back as doesNotHaveData. You can also see how easy it would be to remove this caching feature — the commented out line where the actor kills itself after getting the data is all it would take!

let tickerActor (ticker : string) (mailbox : Actor<_>) = 
    let rec doesNotHaveData() = 
        actor { 
            let! message = mailbox.Receive()
            match message with
            | GetDataBetweenDates(startDate, endDate) -> 
                let stockData = StockData((ticker, getStockPrices ticker startDate endDate))
                mailbox.Sender() <! stockData
                //mailbox.Self <! (PoisonPill.Instance)
                return! hasData (stockData)
            | ClearCache -> return! doesNotHaveData()
        }

    and hasData (stockData : DataMessage) = 
        actor { 
            let! message = mailbox.Receive()
            match message with
            | GetDataBetweenDates(_) -> 
                mailbox.Sender() <! stockData
                return! hasData (stockData)
            | ClearCache -> return! doesNotHaveData()
        }

    doesNotHaveData()

Next, we define the actor that will take multiple ticker requests, dispatch each to a tickerActor, and wait for them all to come back. This is the gatheringActor. Again this uses mutual recursion, here we start in the waiting state until the application asks for tickers. We then get the address of the tickerActor instances responsible for getting that data (the ActorRef will just be the ticker name), creating a new actor if we don’t already have one for that ticker. The gatheringActor then changes state to gettingData, which starts off knowing how many sets of ticker data it is awaiting. Every time it gets some it decreases this value, when it’s waiting for no more it draws the ticker data onto the WinForms chart.

let gatheringActor (tickerPanel : Panel) (sw : Stopwatch) (system : ActorSystem) (mailbox : Actor<_>) = 
     let rec waiting (existingActorRefs : IActorRef Set) = 
           actor { 
               let! message = mailbox.Receive()
               match message with
               | GetData d -> 
                   sw.Restart()
                   let existingNames = existingActorRefs |> Set.map (fun (x : IActorRef) -> x.Path.Name)
                   let newActors = existingNames |> Set.difference (Set.ofArray d)

                   let newActorRefs = 
                       [ for item in newActors do
                             yield spawn system (item.ToString()) (tickerActor (item.ToString())) ]

                   let combinedActorRefs = existingActorRefs |> Set.union (Set.ofList newActorRefs)
                   let tell = fun dataActorRef -> dataActorRef <! (GetDataBetweenDates(new DateTime(2014, 01, 01), new DateTime(2015, 01, 01)))
                   Set.map tell combinedActorRefs |> ignore
                   return! gettingData (Set.count combinedActorRefs) combinedActorRefs []
               | _ -> return! waiting (existingActorRefs)
           }

       and gettingData (numberOfResultsToSee : int) (existingActorRefs : IActorRef Set) (soFar : (string * Stocks.Row list) list) = 
           actor { 
               let! message = mailbox.Receive()
               match message with
               | StockData(tickerName, data) when numberOfResultsToSee = 1 -> 
                   let finalData = ((tickerName, data) :: soFar)
                   createCharts tickerPanel finalData
                   sw.Stop()
                   MessageBox.Show(sprintf "Retrieved data in %d ms" sw.ElapsedMilliseconds) |> ignore
                   return! waiting existingActorRefs
               | StockData(tickerName, data) -> return! gettingData (numberOfResultsToSee - 1) existingActorRefs ((tickerName, data) :: soFar)
               | _ -> return! waiting existingActorRefs
           }

       waiting (Set.empty)

Finally, we create our actor system when initialising our Windows Form. Note that we need a bit of hocon that makes things play nicely with the UI thread.

let sw = new System.Diagnostics.Stopwatch()
let gatheringActor = spawn system "counters" (MyActors.gatheringActor tickerPanel sw system)
<hocon>
  <![CDATA[
      akka {
        actor{
          deployment{
            /counters{
              dispatcher = akka.actor.synchronized-dispatcher
            }
          }
        }
      }
  ]]>
</hocon>

Was it faster?

A little bit.

The sequential access was slowest of all, as expected, and took an average of 550 ms to retrieve 10 tickers.

The task-based method took an average of 185ms .

The first call to retrieve data from actors took 162ms, with subsequent requests around 30ms due to the caching implementation.

This isn’t a huge performance bonus, but with only 10 requests anything should work ok!


Summary

Well, that wasn’t so bad (?!).

I think it’s clear that there’s a lot more code involved in setting up an actor system, compared to using task-based parallelism.

Is it worth it? Hard to say. For a small application such as this, probably not. When building something that might scale to the point where it needs distributing over multiple machines, it’ll be a different story.

The main point to take away is that we didn’t have to start the process with actors in mind. We took an existing app, found the bit that could potentially benefit from the actor concurrency model, and converted only that section to use actors.


No comments:

Post a Comment