Sunday, January 11, 2015

Jumping ship from Scala to Avail

I've been a very happy Scala user for the past 7 years. This JVM language has served me well.

I believe that Scala has truly advanced the field of programming languages, while still being pragmatic. And most certainly, Scala's type system has caught many of my programming errors during compile time (thanks!). But I want more. I want to be in control of a type system, not to be a victim of its (complex) implementation.

In order to gain control over Scala's type system, I've used scalaz and shapeless, which are both excellent examples of how far Scala's type system can be pushed. But using them felt like programming in another language.

For example, during the pursuit of my Spread project, I wanted to implement a strong typed Relational algebra. The only viable option was to use shapeless' HList implementation. I came a long way, but I didn't like the final result. It was to dependent on the warts of the Scala compiler (and the brilliance of Miles Sabin, the author of shapeless).

So what about Haskell's HList implementation? Don't get me wrong: Haskell is a beautiful language, the runtime is top-notch, but I just don't like lazy evaluation as a default. Because I also want to control a language's runtime behaviour, without polluting - otherwise beautiful functional - code with unnecessary strictness annotations. I've tried Haskell in the past, but only the programming Gods know what came of that.

Enter Avail.

Currently, Avail is hosted on the JVM. And alas, it is not production ready. It consumes too much memory, it is rather slow and the toolchain is much left to be desired. But, the concepts are solid. Avail's type system is amazing.

Avail supports gradual typing, where you start using dynamic types (run time), and later decide to optionally 'strengthen' them via semantic restrictions (compile time). Via strengthening, you can go as far as to implement full blown dependent types (for instance, to track the size of matrices during compile time).

Interestingly, Avail's strengthening of types can be expressed in 'plain' Avail code. That's the meta circular beauty of Avail. You may wonder why Avail has such an advanced type system? Because,

Avail's strong type system enables efficient multiple dispatch during run time, while catching errors during compile time. 

I believe that to be the corner stone of the Avail language. And yes, Avail will be eventually compiled efficiently (which is currently not the case). Here are some more Avail gems:

- unrestricted syntax
- a scalable module system
- universal serialisation (including running computations)
- and a lot more.

So long Scala, I've found a new love. I've decided to join the Avail team!

Friday, October 24, 2014

Eve: incidental complexity

The Eve project has caught my attention lately. Especially the latest diary entry was a very interesting read.

With Eve, it appears that the authors want to achieve similar goals as what I want with SPREAD.  The difference is that I'm not too worried about actual users (yet). I'm more concerned about implementation.

At its core, Eve is essentially a datalog engine that is based on a novel constraint propagation algorithm, enabling the efficient joining of datalog relations.

However, the authors seem to struggle to come up with a solution to store and re-use provenance data (constraints) that is captured during joins. The central idea is that via the clever re-use of provenance data - you can incrementally update a materialised join whenever the dependent relations are updated.

Think spreadsheets!

According to the Eve dairy it appears that the authors have tried, but abandoned Self Adjusting Computation (SAC) as a possible solution. I don't know the reason why they have abandoned SAC, because I believe SAC can effectively be used to do incremental joins.

With SPREAD, it is rather easy to translate any algorithm to become Self Adjusting. To be precise, I think SPREAD is more related to 'Self-explaining Computation' as it is to SAC.

The trouble with traditional SAC that it is rather cumbersome. Standard algorithms have to be especially tailored to become Self Adjusting. Not so with SPREAD.

SPREAD is based on the pervasive use of (trace) provenance and forgetful memoization, combined with a single clever data structure, called the Treap.

At this point, I have to disappoint the reader, because the remainder of this post is not going to demonstrate how SPREAD can be used to do efficient joins. But I will demonstrate a SPREAD solution for incremental sums (or any folding operation). Hopefully, that will convince the reader that SPREAD can be a viable solution for the problems the Eve project is facing.

The first thing to know about SPREAD is that any (computed) value holds a (back) reference to its origin. Here is the canonical example for a computed value 21:

((1 + 2) * (3 + 4)) ~> [([(1 + 2) ~> 3] * [(3 + 4) ~> 7]) ~> [(3 * 7) ~> 21]]

Now consider the (sum) folding of a traditional list:

sum(0 1 2 3 4 5 6 7 8 9) => 45

In SPREAD such list is encoded as an uniquely represented, confluently persistent treap. To sum a treap, the following (scala) algorithm can be evaluated:

  def sum(t: TreapSet[Int]): Int = {
    if (t.left.isEmpty) {
      if (t.right.isEmpty) t.value
      else t.value + sum(t.right)
    }
    else {
      if (t.right.isEmpty) sum(t.left) + t.value
      else sum(t.left) + t.value + sum(t.right)
    }
  }

The SPREAD trace (... ~> 45) of the above algorithm would look like this:

sum(0 1 2 3 4 5 6 7 8 9) ~> [((sum(0 1 2 3 4 5) + 6) + sum(7 8 9)) ~> [([(sum(0 1 2 3 4 5) + 6) ~> [([sum(0 1 2 3 4 5) ~> [(sum(0 1 2 3 4) + 5) ~> [([sum(0 1 2 3 4) ~> [(sum(0 1 2 3) + 4) ~> [([sum(0 1 2 3) ~> [(sum(0 1 2) + 3) ~> [([sum(0 1 2) ~> [(sum(0 1) + 2) ~> [([sum(0 1) ~> [(sum(0) + 1) ~> [([sum(0) ~> 0] + 1) ~> [(0 + 1) ~> 1]]]] + 2) ~> [(1 + 2) ~> 3]]]] + 3) ~> [(3 + 3) ~> 6]]]] + 4) ~> [(6 + 4) ~> 10]]]] + 5) ~> [(10 + 5) ~> 15]]]] + 6) ~> [(15 + 6) ~> 21]]] + [sum(7 8 9) ~> [(sum(7 8) + 9) ~> [([sum(7 8) ~> [(sum(7) + 8) ~> [([sum(7) ~> 7] + 8) ~> [(7 + 8) ~> 15]]]] + 9) ~> [(15 + 9) ~> 24]]]]) ~> [(21 + 24) ~> 45]]]

Adding the number 10 to the treap and summing it will result in a total of (... ~> 55). Here is the resulting trace:

sum(0 1 2 3 4 5 6 7 8 9 10) ~> [((sum(0 1 2 3 4 5) + 6) + sum(7 8 9 10)) ~> [([(sum(0 1 2 3 4 5) + 6) ~> [([sum(0 1 2 3 4 5) ~> [(sum(0 1 2 3 4) + 5) ~> [([sum(0 1 2 3 4) ~> [(sum(0 1 2 3) + 4) ~> [([sum(0 1 2 3) ~> [(sum(0 1 2) + 3) ~> [([sum(0 1 2) ~> [(sum(0 1) + 2) ~> [([sum(0 1) ~> [(sum(0) + 1) ~> [([sum(0) ~> 0] + 1) ~> [(0 + 1) ~> 1]]]] + 2) ~> [(1 + 2) ~> 3]]]] + 3) ~> [(3 + 3) ~> 6]]]] + 4) ~> [(6 + 4) ~> 10]]]] + 5) ~> [(10 + 5) ~> 15]]]] + 6) ~> [(15 + 6) ~> 21]]] + [sum(7 8 9 10) ~> [(sum(7 8 9) + 10) ~> [([sum(7 8 9) ~> [(sum(7 8) + 9) ~> [([sum(7 8) ~> [(sum(7) + 8) ~> [([sum(7) ~> 7] + 8) ~> [(7 + 8) ~> 15]]]] + 9) ~> [(15 + 9) ~> 24]]]] + 10) ~> [(24 + 10) ~> 34]]]]) ~> [(21 + 34) ~> 55]]]

Notice how many (sub)traces are equal between first and second run, which I've marked bold. That's a lot of similarity!
Notice also, that it is solely due to the nice properties of the treap data structure that we are able to share so much structure. Next to that, this similarity can be exploited via memoization, requiring only log(n) incremental rework after an insertion, deletion, etc.

Memoization in SPREAD is implemented via a (very) weak hash table. As every value holds a (strong) reference to its origin (a trace), there is no need for the memoization table to hold strong references to traces.

Voila, I've just demonstrated the incremental sum of a list that is optimally efficient!

Friday, August 8, 2014

Forgetful memoization through computational traces

The SPREAD framework is based on pure functions. This is no coincidence because spreadsheets (in their purist form) are also purely functional.

By definition, the result of a pure function should only depend on its arguments. It is this property of pure functions that gives us the capability to 'cache' the result of a function. In turn, cached results can later be re-used.

So how does a 'function' cache work?

When we are about to call a function, we first check if we already have called the same function with exactly the same arguments. If not, we add the function call (key) and its result (value) to the cache. Otherwise, we just return the result from the cache.

Such caching scheme is called 'memoization'. With memoization, naive algorithms can be turned from exponential into lineair, with the fibonacci function as the canonical example.

But naive memoization has a problem. If we don't clean entries from the cache, the cache will accumulate all function calls and their results.

To combat such indefinite growth, one can apply various eviction schemes to clean the memoization cache. For instance, we can decide to evict cached items that are 'Least Recently Used' (LRU). Unfortunately,  LRU complicates the static analysis of runtime complexity. Even worse, LRU can turn the improved linear algorithm back into an exponential one.

In this post I propose a new 'eviction' scheme that is based on cached computational traces. I've coined this as: "forgetful memoization through computational traces".

Remember that, in SPREAD, every value keeps a strong reference to its origin. Here is the canonical example:

((1 ++ 2) ** (3 ++ 4)) ~> [([(1 ++ 2) ~> 3] ** [(3 ++ 4) ~> 7]) ~> [(3 ** 7) ~> 21]]

So we have the value 21 that has a back reference to its origin. When 21 becomes non-reachable, it would be a candidate for garbage collection, together (with the reference to) its origin. But if 21 is still reachable, we can potentially re-use parts of its computational trace.

Let's say that we would want to compute 3 ++ 4. Because 21 (indirectly) holds a reference to (3 ++ 4) ~> 7 we don't need to recompute: we just return the result 7.

But how do we gain access to (3 ++ 4) ~> 7?  We achieve this through the use of a (very) weak reference map. In this map, both the key (the function call) and the value (the function result) are weakly referenced.

That means that a map entry will be evicted when either its key or its value is not strongly referenced. In essence, the weak map acts as an index on computational traces. By using weak references, computational traces will be evicted from the index when they are garbage collected.

Friday, August 1, 2014

To the basics

For almost a year, I've been struggling (and developing!) to get to the core of what SPREAD is all about. If I have to summarise it into one single sentence today, I would formulate it as such:

Every interesting value should keep a reference to its origin

As an example, the addition of 1 and 2 is not very interesting, but instructive:

(1 ++ 2) ~> 3

In most languages, you destructively end up with just 3. But in SPREAD, you end up with 3 and a reference to its origin (1 ++ 2). Notice the usage of the ++ operator to differentiate it from the destructive addition operation +.

Here is a another example, demonstrating the non-destructive multiplication operator **:

((1 ++ 2) ** (3 ++ 4)) ~> [([(1 ++ 2) ~> 3] ** [(3 ++ 4) ~> 7]) ~> [(3 ** 7) ~> 21]]

So the resulting value is 21, which refers back to its origin 3 ** 7, which in turn references to its origin, etc. In short, we capture the computational trace of (what we deem) an interesting value.

Interestingly, such traces enable incremental spreadsheet-like computation powers (hence the name SPREAD). But how such SPREAD traces exactly work will be the subject of a future post!

Wednesday, May 28, 2014

The Avail Programming Language

One month ago, the Avail programming language was released. And since then it has been the subject of my day dreams!

At its release date, Avail immediately struck me as a very novel language.

The main authors,  Mark van Gulik and Todd Smith, have poured a lot of sweat into the first release. The (open source) code base shows that. Both Mark and Todd have a solid track record in industry. I guess that's why the code is so well done.

For me personally, Avail shares many of my ideas and requirements of what 'the next language' should be - and much more.

For a starters, Avail's syntax is completely free form - infix, postfix, prefix notation - it just doesn't matter. Next to that, immutability is a core notion that has been carefully architected into Avail's type system.

Avail is so powerful that it could easily embed (syntactically and semantically) my programming language Enchilada - but then a typed version!

For the type purists, Avail's type system is probably the most interesting bit. The type system strikes a pragmatic balance between completely untyped programs versus very advanced 'dependent types'. Most interestingly, type checking is fully controlled by the programmer!

I'm still delving into Avail's secret gems, and I have to say, there are many to find - if you care to search for them.

Anyway, here is my first avail program that I want to share with you:
https://github.com/rapido/spread/blob/master/avail/Treap.avail

Now what about SPREAD? Don't worry, I'm still working on it. But my next step is to port the scala prototype to Avail, just to see what happens!

Saturday, May 11, 2013

Online SPREAD interpreter version v0.2

I just finished a very rough SPREAD interpreter!

The interpreter probably contains many bugs, but the basics are there. Since my last post, I also changed some of the operators and symbols (they will also probably change slightly at a later stage).

Here is the stuff:

STRUCTURE:
 variants: {m1:a1,m2:a2,m3:a3}
 map:      [a={m1:a1},b={m2:a2},c={m3:a3}]
 sequence: a.b.c
 trace:    (t1 => t2 => t3)
 string:   "abc"

UNARY:
 reduce:   $
 wipe:     #
 turn:     %
 lift:     ^
 iota:     ~
 pack:     <
 unpack:   >

BINARY:
 add:      +
 subtract: -
 maximum:  |
 minimum:  &
 bind:     !
 concat:   ; (infix)

HIGHER ORDER:
 foreach:  @
 swap:     \
 fold:     .

Tuesday, April 30, 2013

Symbolic (string) computing with SPREAD

In my last post, I promised to cover SPREAD's unification algorithm in more detail.

Alas, this posts breaks that promise, but I'll do cover some other interesting property of the SPREAD language: its capability to compute with symbols, unbound labels and strings.
With the use of symbols it is easy to demonstrate SPREAD's arithmetical laws and operations on multisets and maps. 

Any SPREAD expression may contain symbols, unbound labels and strings. When an operator has either one of them as arguments, such operator evaluates (reduces) to itself.
Another way of saying is that such expression is in normal form. Our first example is a compound statement that contains one single symbol.

Basics
compound: 1 2 + a * => (1 2 + => 3) a * =>  3 a *

Surprisingly, multiplicities in a multiset are also expressions and thus may contain symbols! Here is an example: Given the previous, we can express some interesting mathematical laws that hold for multimaps.

Interestingly, such laws are just valid SPREAD expressions that can be evaluated!

(Multi)Map laws

ladd1:   [k1={m1:v1}] [k1={m2:v1}] + => [k1={m1 m2 +:v1}]
ladd2:   [k1={m1:v1}] [k2={m2:v1}] + => [k1={m1:v1}],k2={m2:v1}]
ladd3:   [k1={m1:v1}] [k1={m2:v2}] + => [k1={m1:v1,m2:v2}]

lmul1:   [k1={m1:v1}] [k1={m2:v1}] * => [k1={m1 m2 *:v1}]
lmul2:   [k1={m1:v1}] [k2={m2:v1}] * => 0
lmul3:   [k1={m1:v1}] [k1={m2:v2}] * => 0

lmax1:   [k1={m1:v1}] [k1={m2:v1}] | => [k1={m1 m2 |:v1}]
lmax2:   [k1={m1:v1}] [k2={m2:v1}] | => [k1={m1:v1}],k2={m2:v1}]
lmax3:   [k1={m1:v1}] [k1={m2:v2}] | => [k1={m1:v1,m2:v2}]

lmin1:   [k1={m1:v1}] [k1={m2:v1}] & => [k1={m1 m2 &:v1}]
lmin2:   [k1={m1:v1}] [k2={m2:v1}] & => 0
lmin3:   [k1={m1:v1}] [k1={m2:v2}] & => 0

lbind1:  [k1'={m1:v1}] [k1={m2:v1}]  ! => [k1'{m2:v1}={m1:v1}]
lbind2:  [k1'={m1:v1}] [k2={m2:v1}]  ! => [k1'={m1:v1}]
lbind3:  [k1={m1:v1}]  [m1={mm2:v1}] ! => [k1={{mm2:v1}:v1}]
lbind4:  [k1={m1:v1}]  [m2={mm2:v1}] ! => [k1={m1:v2}]
lbind5:  [k1={m1:v1'}] [v1={m2:v11}] ! => [k1={m1:v1'{m2:v11}]
lbind6:  [k1={m1:v1'}] [v2={m2:v11}] ! => [k1={m1:v1'}]

Strings
Multi character strings are sequences of single character strings . Here are some example:s

str:      [0="s",1="t",2="r"] == "s"."t"."r" == "str"
stadd1:   1 "a" + => 1 "a" +
stadd2:   "a" "b" + => "a" "b" +
stadd3:   "abcd" "ef" + => {"a","e"}.{"b","f"}."cd" =>
          {"abcd","afcd","ebcd",efcd"}

stcat1:   "hello ";"world" => "hello world"
stcat2:   "hello ";{"world","moon"} => {"hello world","hello moon"}
stcat3:   "hello ";1.2;" world" => "hello ".1.2." world"

Literate programming
Strings allow the literal combination of text and programs. Here is an example:

lit1:  "The addition of two numbers ".(1 2 +)." is easy" =>
       "The addition of two numbers: ".(1 2 + => 3)." is easy"



(Edit: botched the multiset examples, so I removed them).