Friday, August 8, 2014

Forgetful memoization through computational traces

The SPREAD framework is based on pure functions. This is no coincidence because spreadsheets (in their purist form) are also purely functional.

By definition, the result of a pure function should only depend on its arguments. It is this property of pure functions that gives us the capability to 'cache' the result of a function. In turn, cached results can later be re-used.

So how does a 'function' cache work?

When we are about to call a function, we first check if we already have called the same function with exactly the same arguments. If not, we add the function call (key) and its result (value) to the cache. Otherwise, we just return the result from the cache.

Such caching scheme is called 'memoization'. With memoization, naive algorithms can be turned from exponential into lineair, with the fibonacci function as the canonical example.

But naive memoization has a problem. If we don't clean entries from the cache, the cache will accumulate all function calls and their results.

To combat such indefinite growth, one can apply various eviction schemes to clean the memoization cache. For instance, we can decide to evict cached items that are 'Least Recently Used' (LRU). Unfortunately,  LRU complicates the static analysis of runtime complexity. Even worse, LRU can turn the improved linear algorithm back into an exponential one.

In this post I propose a new 'eviction' scheme that is based on cached computational traces. I've coined this as: "forgetful memoization through computational traces".

Remember that, in SPREAD, every value keeps a strong reference to its origin. Here is the canonical example:

((1 ++ 2) ** (3 ++ 4)) ~> [([(1 ++ 2) ~> 3] ** [(3 ++ 4) ~> 7]) ~> [(3 ** 7) ~> 21]]

So we have the value 21 that has a back reference to its origin. When 21 becomes non-reachable, it would be a candidate for garbage collection, together (with the reference to) its origin. But if 21 is still reachable, we can potentially re-use parts of its computational trace.

Let's say that we would want to compute 3 ++ 4. Because 21 (indirectly) holds a reference to (3 ++ 4) ~> 7 we don't need to recompute: we just return the result 7.

But how do we gain access to (3 ++ 4) ~> 7?  We achieve this through the use of a (very) weak reference map. In this map, both the key (the function call) and the value (the function result) are weakly referenced.

That means that a map entry will be evicted when either its key or its value is not strongly referenced. In essence, the weak map acts as an index on computational traces. By using weak references, computational traces will be evicted from the index when they are garbage collected.

Friday, August 1, 2014

To the basics

For almost a year, I've been struggling (and developing!) to get to the core of what SPREAD is all about. If I have to summarise it into one single sentence today, I would formulate it as such:

Every interesting value should keep a reference to its origin

As an example, the addition of 1 and 2 is not very interesting, but instructive:

(1 ++ 2) ~> 3

In most languages, you destructively end up with just 3. But in SPREAD, you end up with 3 and a reference to its origin (1 ++ 2). Notice the usage of the ++ operator to differentiate it from the destructive addition operation +.

Here is a another example, demonstrating the non-destructive multiplication operator **:

((1 ++ 2) ** (3 ++ 4)) ~> [([(1 ++ 2) ~> 3] ** [(3 ++ 4) ~> 7]) ~> [(3 ** 7) ~> 21]]

So the resulting value is 21, which refers back to its origin 3 ** 7, which in turn references to its origin, etc. In short, we capture the computational trace of (what we deem) an interesting value.

Interestingly, such traces enable incremental spreadsheet-like computation powers (hence the name SPREAD). But how such SPREAD traces exactly work will be the subject of a future post!

Wednesday, May 28, 2014

The Avail Programming Language

One month ago, the Avail programming language was released. And since then it has been the subject of my day dreams!

At its release date, Avail immediately struck me as a very novel language.

The main authors,  Mark van Gulik and Todd Smith, have poured a lot of sweat into the first release. The (open source) code base shows that. Both Mark and Todd have a solid track record in industry. I guess that's why the code is so well done.

For me personally, Avail shares many of my ideas and requirements of what 'the next language' should be - and much more.

For a starters, Avail's syntax is completely free form - infix, postfix, prefix notation - it just doesn't matter. Next to that, immutability is a core notion that has been carefully architected into Avail's type system.

Avail is so powerful that it could easily embed (syntactically and semantically) my programming language Enchilada - but then a typed version!

For the type purists, Avail's type system is probably the most interesting bit. The type system strikes a pragmatic balance between completely untyped programs versus very advanced 'dependent types'. Most interestingly, type checking is fully controlled by the programmer!

I'm still delving into Avail's secret gems, and I have to say, there are many to find - if you care to search for them.

Anyway, here is my first avail program that I want to share with you:
https://github.com/rapido/spread/blob/master/avail/Treap.avail

Now what about SPREAD? Don't worry, I'm still working on it. But my next step is to port the scala prototype to Avail, just to see what happens!

Saturday, May 11, 2013

Online SPREAD interpreter version v0.2

I just finished a very rough SPREAD interpreter!

The interpreter probably contains many bugs, but the basics are there. Since my last post, I also changed some of the operators and symbols (they will also probably change slightly at a later stage).

Here is the stuff:

STRUCTURE:
 variants: {m1:a1,m2:a2,m3:a3}
 map:      [a={m1:a1},b={m2:a2},c={m3:a3}]
 sequence: a.b.c
 trace:    (t1 => t2 => t3)
 string:   "abc"

UNARY:
 reduce:   $
 wipe:     #
 turn:     %
 lift:     ^
 iota:     ~
 pack:     <
 unpack:   >

BINARY:
 add:      +
 subtract: -
 maximum:  |
 minimum:  &
 bind:     !
 concat:   ; (infix)

HIGHER ORDER:
 foreach:  @
 swap:     \
 fold:     .

Tuesday, April 30, 2013

Symbolic (string) computing with SPREAD

In my last post, I promised to cover SPREAD's unification algorithm in more detail.

Alas, this posts breaks that promise, but I'll do cover some other interesting property of the SPREAD language: its capability to compute with symbols, unbound labels and strings.
With the use of symbols it is easy to demonstrate SPREAD's arithmetical laws and operations on multisets and maps. 

Any SPREAD expression may contain symbols, unbound labels and strings. When an operator has either one of them as arguments, such operator evaluates (reduces) to itself.
Another way of saying is that such expression is in normal form. Our first example is a compound statement that contains one single symbol.

Basics
compound: 1 2 + a * => (1 2 + => 3) a * =>  3 a *

Surprisingly, multiplicities in a multiset are also expressions and thus may contain symbols! Here is an example: Given the previous, we can express some interesting mathematical laws that hold for multimaps.

Interestingly, such laws are just valid SPREAD expressions that can be evaluated!

(Multi)Map laws

ladd1:   [k1={m1:v1}] [k1={m2:v1}] + => [k1={m1 m2 +:v1}]
ladd2:   [k1={m1:v1}] [k2={m2:v1}] + => [k1={m1:v1}],k2={m2:v1}]
ladd3:   [k1={m1:v1}] [k1={m2:v2}] + => [k1={m1:v1,m2:v2}]

lmul1:   [k1={m1:v1}] [k1={m2:v1}] * => [k1={m1 m2 *:v1}]
lmul2:   [k1={m1:v1}] [k2={m2:v1}] * => 0
lmul3:   [k1={m1:v1}] [k1={m2:v2}] * => 0

lmax1:   [k1={m1:v1}] [k1={m2:v1}] | => [k1={m1 m2 |:v1}]
lmax2:   [k1={m1:v1}] [k2={m2:v1}] | => [k1={m1:v1}],k2={m2:v1}]
lmax3:   [k1={m1:v1}] [k1={m2:v2}] | => [k1={m1:v1,m2:v2}]

lmin1:   [k1={m1:v1}] [k1={m2:v1}] & => [k1={m1 m2 &:v1}]
lmin2:   [k1={m1:v1}] [k2={m2:v1}] & => 0
lmin3:   [k1={m1:v1}] [k1={m2:v2}] & => 0

lbind1:  [k1'={m1:v1}] [k1={m2:v1}]  ! => [k1'{m2:v1}={m1:v1}]
lbind2:  [k1'={m1:v1}] [k2={m2:v1}]  ! => [k1'={m1:v1}]
lbind3:  [k1={m1:v1}]  [m1={mm2:v1}] ! => [k1={{mm2:v1}:v1}]
lbind4:  [k1={m1:v1}]  [m2={mm2:v1}] ! => [k1={m1:v2}]
lbind5:  [k1={m1:v1'}] [v1={m2:v11}] ! => [k1={m1:v1'{m2:v11}]
lbind6:  [k1={m1:v1'}] [v2={m2:v11}] ! => [k1={m1:v1'}]

Strings
Multi character strings are sequences of single character strings . Here are some example:s

str:      [0="s",1="t",2="r"] == "s"."t"."r" == "str"
stadd1:   1 "a" + => 1 "a" +
stadd2:   "a" "b" + => "a" "b" +
stadd3:   "abcd" "ef" + => {"a","e"}.{"b","f"}."cd" =>
          {"abcd","afcd","ebcd",efcd"}

stcat1:   "hello ";"world" => "hello world"
stcat2:   "hello ";{"world","moon"} => {"hello world","hello moon"}
stcat3:   "hello ";1.2;" world" => "hello ".1.2." world"

Literate programming
Strings allow the literal combination of text and programs. Here is an example:

lit1:  "The addition of two numbers ".(1 2 +)." is easy" =>
       "The addition of two numbers: ".(1 2 + => 3)." is easy"



(Edit: botched the multiset examples, so I removed them).

Monday, April 29, 2013

What kind of programming language is SPREAD?

Good news!

The design of the SPREAD programming language has been completed, while the quick-and-dirty implementation is almost done. I've put a lot of effort into SPREAD's syntax and semantics, and the final result is  - to what I believe - a pretty unique language. You may disagree.

But for me the central question is: how should one classify SPREAD?

At first sight, I believe the array language J resembles SPREAD the most. Firstly, because of J's powerful array operators, and secondly, because of J's terseness and tacitness.

SPREAD: an array language?

But SPREAD has much more to offer than J. It has a more economical FORTH like postfix syntax; it also has SCHEME like symbolic manipulation and hygienic 'macros', together with PROLOG's non-determinism and unification (not resolution). Next to that, SPREAD has spreadsheet semantics, while remaining a pure functional language. So?

SPREAD: a concatenative language?
SPREAD: a symbolic macro language?
SPREAD: a non-deterministic language?
SPREAD: a functional reactive language?

To good to be true? May be not, because there is a catch:

SPREAD is also a total function language which means that self referential structures cannot be expressed directly in SPREAD (as in pure spreadsheets). Luckily, there is one - subtle - exception to this prohibition of self-referentiality:

A SPREAD number is an infinite map of maps of itself !

With some imagination, SPREAD numbers can be considered as a type of co-data. However, it's still an open question whether such numbers makes SPREAD Turing-complete. But more about that later. Still, my gut feeling says SPREAD is not Turing-complete, but that it is still powerful enough to do any meaningful computation.

SPREAD: A non-Turing complete language?

Least but not least, SPREAD keeps and judiciously memoizes all (re)computations. Moreover, traces of computation can be manipulated and queried as first class objects. This retro-active property alone is what - I believe - really differentiates SPREAD from other programming languages.

SPREAD: a retro-active programming language!

Now that I've more or less settled SPREAD's classification, are you now ready for this retro-active way of programming? If so, let me pique your interest some more by demonstrating real SPREAD programs:

Basic arithmetics
add:      1 2 + => 3 
multiply: 3 4 * => 12
maximum:  4 5 | => 5
minimum:  4 5 & => 4
negate:     4 - => _4
compound: 1 2 + 3 _1 + * => (1 2 + => 3) (3 _1 + => 2) * => 6

The first thing you'll notice is that each value has a backward 'trace' (the left of =>) that has lead to itself (unless a value is atomic). Traces are first-class in SPREAD and they enable SPREAD to be retro-active.

Alternatively, non-determinacy is achieved via the cartesian combination and operation over multisets. Next to multisets, SPREAD also has first class multimaps (multisets -> multisets) which - just like multisets - can be combined with the same standard arithmetical operators.

Multisets
add:      {1,2,3} {2,4} + => {3,4,2:5,6,7}
multiply: {1,2,3} {2,4} * => {2,2:4,6,8,12}
maximum:  {1,2,3} {2,4} | => {2:2,3,3:4}
minimum:  {1,2,3} {2,4} & => {2:1,3:2,3}
negate:   {1,2,_2:3} -  => {_1,_2,_2:_3}
compound: {1,2} 3 + 5 * => ({1,2} 3 + => {4,5}) 5 * => {20,24}

(Multi)Maps
add:      [a=1,b={2:2}] [b={3:2,3},c=4] + => [a=1,b={5:2,3},c=4]
multiply: [a=1,b={2:2}] [b={3:2,3},c=4] * => [b={6:2}]
maximum:  [a=1,b={2:2}] [b={3:2,3},c=4] | => [a=1,b={3:2,3},c=4]
minimum:  [a=1,b={2:2}] [b={3:2,3},c=4] & => [b={2:2}]
negate:   [b={3:2,3},c=4] - => [b={_3:2,_1:3},c={_1:4}]
alts:     {[a=1],[b=1}] [a=2] + => {[a={1,2}],[a=2,b=1]}

Still with me? Good! There is more to come.

Binding Labels
SPREAD defines a label type and a target type, both of which are expressions. A label and its corresponding target can be alternatively interpreted as 'this label is bound to that target'. It is possible for a label to be bound to an empty target which we call an "unbound label". It's the (re)binding of labels with other targets what is the basis of SPREAD's retro-active (and spreadsheet like) computations. More about that later.

bind:  [a=x'1 4 +,b=y' 5 *] [x=2,y=3] ! => [a=x'2 4 +,b=y'3 5 *]

Evaluate
Unevaluated expressions in a map can be evaluated ($). This operation allows the precise control over what and when to compute something.

eval: [a=x'2 4 +] $ => [a=((x'2 => 2) 4 + => 6)]

Wipe
Optionally, computational traces in a map can be wiped (#) to yield the actual, and most recent value. Effectively, wiping also detaches labels from their targets.

wipe: [a=((x'2 => 2) 4 + => 6),b=(y'2,2)] # => [a=6,b=2]

Sequences
SPREAD acknowledges the fact that a sequence is just a special kind of map with consecutive domain values as integers (starting from 0). Syntactically, SPREAD separates range values with dots, ignoring corresponding domain values. For example:

a.b.c

.. is just efficient syntactic sugar for:

[0=a,1=b,2=c]

add:      1.2.3 4.2 + => {1,4}.{2:2}.3 
multiply: 1.2.3 4.2 * => [1=2]
maximum:  1.2.3 4.2 | => 4.2.3
minimum:  1.2.3 4.2 & => 1.2
negate:   1.2.3 -     => {_1:1}.{_1:2}.{_1:3}

Sequences (which are ultimately maps) can be concatenated with the aid of the special purpose infix (;) operator. Rephrased as SPREAD's ironical rule:

Every SPREAD operator has postfix syntax, except the concatenation operator, which has infix syntax

Concatenation
concat1:  1.2.3;4.5.6 => 1.2.3.4.5.6
concat2:  0;[a=1,b=2,c=3] => 1.2.3
concat3:  [a=b,c=d];[f=h,i=k] => b.d.h.k

The arithmetical operations on multisets and (multi)maps is what makes SPREAD a 'basic' array language. But to make 'arrays' work harder, SPREAD also defines three higher-order operators: fold, runfold and foreach. To emphasise, higher-order operators are different from ordinary operators because they take other operators as an argument.

Folding
The fold operator (%) takes two arguments: a map argument and a binary operator argument. When a fold is evaluated, it folds the binary operator argument over all the range values of the argument map, in sequence:

sum:     1.2.3.4 +% => 10
product: 1.2.3.4 *% => 24
max:     1.2.3.4 |% => 4
min:     1.2.3.4 &% => 1
bind:    [a=b'].[b=c'].[c=2] !% => [a=b'(c'2)]

Run-folding
The run-fold operator (@) takes two arguments: a map argument and a binary operator argument. When a run-fold is evaluated, it run-folds the binary operator argument over all the range values of the map argument, in sequence:

sum:     1.2.3.4 +@ => 1.3.6.10
product: 1.2.3.4 *@ => 1.2.6.24
max:     1.2.3.4 |@ => 1.2.3.4
min:     1.2.3.4 &@ => 1.1.1.1
bind:    [a=b'].[b=c'].[c=2] !@ => [a=b'].[a=b'(c')].[a=b'(c'2)]

For-each
The for-each operator (/) takes two arguments: a map argument and an unary operator argument. When for-each is evaluated, it applies the unary operator argument over all  the range values of the map argument, in sequence:

add1:    1.2.3.4 (1 +)/ => 2.3.4.5
mul2:    1.2.3.4 (2 *)/ => 2.4.6.8
bind:    [a=b'].[b=c'].[c=2] !/ => [a=b'(c'2)]

Retro-active
With higher-order operators we can do a lot of fancy stuff. But SPREAD's real fanciness is solely due to its retro-activity - achieved via the re-binding of labels which triggers consecutive automatic evaluation. Here are an example:

react1:  [a=(((b'1 => 1) 2 + => 3) (3 4 + =>7) * => 21)] [b=2] ! # =>
         [a=(((b'2 => 2) 2 + => 4) (3 4 + =>7) * => 28)] # =>
         [a=28]

Back(s)lash
A trace can be recast as a sequence via the back(s)lash(\) operator. This allows an interesting importunity to manipulate traces of computations.

back1:   1 2 + 3 _1 + * \ => (1 2 + => 3) (3 _1 + => 2) * \ => 6 \ =>
         (1 2 + 3 _1 + * \).((1 2 + => 3) (3 _1 + => 2) * \).6

Turn
Turning(~) a map turns the domain and range of a map. This may be of use to be able to fold over the domain, for example:

turn1:   [a=b,c=d] ~ => [b=a,d=c]
turn2:   [a=b,c=b] ~ => [b={a,c}]
turn3:   [2=3,4=5] ~ +% => [3=2,5=4] +% => 6 

WEIRD, BUT IMPORTANT LAST BITS!

Numbers
A SPREAD number is syntactic sugar for an infinite recursive map, defined as follows:

num1:  4 == [0={4:0}]

Examples:

num2:  [0={2:0}] [0={3:0}] + == 2 3 + => 5 == [0={5=0}]  
num3:  [a=1] 3 + => [0={3:0},a=1]

Unification
Finally, the prolog style unification (?) of two maps is SPREAD's most versatile operator. BUT it is also most difficult to implement correctly and efficiently. Here are some preliminary examples, as I've not completely settled for its semantics:

uni1:  1.2.3.4 1.2.3.4 ?   => [0]
uni2:  1.2.3.4 1.x'.3.x' ? => 0
uni3:  1.2.3.4 1.x'.3.y' ? => [[x=3,y=4]]
uni4:  x'.2.3 1.y'.z' ? => [[x=1,y=2,z=3]]
uni5:  1.2.3 x';y' ? => 
       [[x=0,y=1.2.3],[x=[0=1],y=[1=2,2=3]],[x=1.2,y=[2=3]],[x=1.2.3,y=0]]

What's left?
Unification needs to be further developed. This will be the main topic of my next post. Next to that, there are some additional operators that need a definition (I've still got some ASCII characters left!)

Hopefully you will give me some suggestions to improve the SPREAD language! Care to comment on this post? Please feel free to do so.

(EDIT: The for-each examples were botched - they are now fixed)

Copyright 2013 - Robbert van Dalen

Friday, April 19, 2013

Implementation v0.1

I've more or less settled for SPREADs semantics so from now on it's implementation time!

This is my implementation schedule in stages:
  1. Quick and dirty implementation (will be horribly exponentially inefficient in many cases)
    • Try to implement various standard algorithms
    • Missing any convenient operators to implement standard algorithms?
      • Add convenient operators and re-iterate
  2. 'Efficient' implementation (to prove linear or log-linear efficiency in many cases)
    • Test efficiency (in terms of O complexity)
    • Test non-efficiencies (in terms of constant factors)
  3. 'Super efficient' implementation
    • Production quality performance
    • Production quality robustness
I hope to finalise Stage 1 within one month (deadline end of May). 
Stage 2 will take me at least 1/2 a year (deadline end of 2013) only to consider Stage 3 if there is genuine interest in the SPREAD language.

Anyway, I've already done a sketchy implementation that more or less works.