Wednesday, May 28, 2014

The Avail Programming Language

One month ago, the Avail programming language was released. And since then it has been the subject of my day dreams!

At its release date, Avail immediately struck me as a very novel language.

The main authors,  Mark van Gulik and Todd Smith, have poured a lot of sweat into the first release. The (open source) code base shows that. Both Mark and Todd have a solid track record in industry. I guess that's why the code is so well done.

For me personally, Avail shares many of my ideas and requirements of what 'the next language' should be - and much more.

For a starters, Avail's syntax is completely free form - infix, postfix, prefix notation - it just doesn't matter. Next to that, immutability is a core notion that has been carefully architected into Avail's type system.

Avail is so powerful that it could easily embed (syntactically and semantically) my programming language Enchilada - but then a typed version!

For the type purists, Avail's type system is probably the most interesting bit. The type system strikes a pragmatic balance between completely untyped programs versus very advanced 'dependent types'. Most interestingly, type checking is fully controlled by the programmer!

I'm still delving into Avail's secret gems, and I have to say, there are many to find - if you care to search for them.

Anyway, here is my first avail program that I want to share with you:

Now what about SPREAD? Don't worry, I'm still working on it. But my next step is to port the scala prototype to Avail, just to see what happens!

Saturday, May 11, 2013

Online SPREAD interpreter version v0.2

I just finished a very rough SPREAD interpreter!

The interpreter probably contains many bugs, but the basics are there. Since my last post, I also changed some of the operators and symbols (they will also probably change slightly at a later stage).

Here is the stuff:

 variants: {m1:a1,m2:a2,m3:a3}
 map:      [a={m1:a1},b={m2:a2},c={m3:a3}]
 sequence: a.b.c
 trace:    (t1 => t2 => t3)
 string:   "abc"

 reduce:   $
 wipe:     #
 turn:     %
 lift:     ^
 iota:     ~
 pack:     <
 unpack:   >

 add:      +
 subtract: -
 maximum:  |
 minimum:  &
 bind:     !
 concat:   ; (infix)

 foreach:  @
 swap:     \
 fold:     .

Tuesday, April 30, 2013

Symbolic (string) computing with SPREAD

In my last post, I promised to cover SPREAD's unification algorithm in more detail.

Alas, this posts breaks that promise, but I'll do cover some other interesting property of the SPREAD language: its capability to compute with symbols, unbound labels and strings.
With the use of symbols it is easy to demonstrate SPREAD's arithmetical laws and operations on multisets and maps. 

Any SPREAD expression may contain symbols, unbound labels and strings. When an operator has either one of them as arguments, such operator evaluates (reduces) to itself.
Another way of saying is that such expression is in normal form. Our first example is a compound statement that contains one single symbol.

compound: 1 2 + a * => (1 2 + => 3) a * =>  3 a *

Surprisingly, multiplicities in a multiset are also expressions and thus may contain symbols! Here is an example: Given the previous, we can express some interesting mathematical laws that hold for multimaps.

Interestingly, such laws are just valid SPREAD expressions that can be evaluated!

(Multi)Map laws

ladd1:   [k1={m1:v1}] [k1={m2:v1}] + => [k1={m1 m2 +:v1}]
ladd2:   [k1={m1:v1}] [k2={m2:v1}] + => [k1={m1:v1}],k2={m2:v1}]
ladd3:   [k1={m1:v1}] [k1={m2:v2}] + => [k1={m1:v1,m2:v2}]

lmul1:   [k1={m1:v1}] [k1={m2:v1}] * => [k1={m1 m2 *:v1}]
lmul2:   [k1={m1:v1}] [k2={m2:v1}] * => 0
lmul3:   [k1={m1:v1}] [k1={m2:v2}] * => 0

lmax1:   [k1={m1:v1}] [k1={m2:v1}] | => [k1={m1 m2 |:v1}]
lmax2:   [k1={m1:v1}] [k2={m2:v1}] | => [k1={m1:v1}],k2={m2:v1}]
lmax3:   [k1={m1:v1}] [k1={m2:v2}] | => [k1={m1:v1,m2:v2}]

lmin1:   [k1={m1:v1}] [k1={m2:v1}] & => [k1={m1 m2 &:v1}]
lmin2:   [k1={m1:v1}] [k2={m2:v1}] & => 0
lmin3:   [k1={m1:v1}] [k1={m2:v2}] & => 0

lbind1:  [k1'={m1:v1}] [k1={m2:v1}]  ! => [k1'{m2:v1}={m1:v1}]
lbind2:  [k1'={m1:v1}] [k2={m2:v1}]  ! => [k1'={m1:v1}]
lbind3:  [k1={m1:v1}]  [m1={mm2:v1}] ! => [k1={{mm2:v1}:v1}]
lbind4:  [k1={m1:v1}]  [m2={mm2:v1}] ! => [k1={m1:v2}]
lbind5:  [k1={m1:v1'}] [v1={m2:v11}] ! => [k1={m1:v1'{m2:v11}]
lbind6:  [k1={m1:v1'}] [v2={m2:v11}] ! => [k1={m1:v1'}]

Multi character strings are sequences of single character strings . Here are some example:s

str:      [0="s",1="t",2="r"] == "s"."t"."r" == "str"
stadd1:   1 "a" + => 1 "a" +
stadd2:   "a" "b" + => "a" "b" +
stadd3:   "abcd" "ef" + => {"a","e"}.{"b","f"}."cd" =>

stcat1:   "hello ";"world" => "hello world"
stcat2:   "hello ";{"world","moon"} => {"hello world","hello moon"}
stcat3:   "hello ";1.2;" world" => "hello ".1.2." world"

Literate programming
Strings allow the literal combination of text and programs. Here is an example:

lit1:  "The addition of two numbers ".(1 2 +)." is easy" =>
       "The addition of two numbers: ".(1 2 + => 3)." is easy"

(Edit: botched the multiset examples, so I removed them).

Monday, April 29, 2013

What kind of programming language is SPREAD?

Good news!

The design of the SPREAD programming language has been completed, while the quick-and-dirty implementation is almost done. I've put a lot of effort into SPREAD's syntax and semantics, and the final result is  - to what I believe - a pretty unique language. You may disagree.

But for me the central question is: how should one classify SPREAD?

At first sight, I believe the array language J resembles SPREAD the most. Firstly, because of J's powerful array operators, and secondly, because of J's terseness and tacitness.

SPREAD: an array language?

But SPREAD has much more to offer than J. It has a more economical FORTH like postfix syntax; it also has SCHEME like symbolic manipulation and hygienic 'macros', together with PROLOG's non-determinism and unification (not resolution). Next to that, SPREAD has spreadsheet semantics, while remaining a pure functional language. So?

SPREAD: a concatenative language?
SPREAD: a symbolic macro language?
SPREAD: a non-deterministic language?
SPREAD: a functional reactive language?

To good to be true? May be not, because there is a catch:

SPREAD is also a total function language which means that self referential structures cannot be expressed directly in SPREAD (as in pure spreadsheets). Luckily, there is one - subtle - exception to this prohibition of self-referentiality:

A SPREAD number is an infinite map of maps of itself !

With some imagination, SPREAD numbers can be considered as a type of co-data. However, it's still an open question whether such numbers makes SPREAD Turing-complete. But more about that later. Still, my gut feeling says SPREAD is not Turing-complete, but that it is still powerful enough to do any meaningful computation.

SPREAD: A non-Turing complete language?

Least but not least, SPREAD keeps and judiciously memoizes all (re)computations. Moreover, traces of computation can be manipulated and queried as first class objects. This retro-active property alone is what - I believe - really differentiates SPREAD from other programming languages.

SPREAD: a retro-active programming language!

Now that I've more or less settled SPREAD's classification, are you now ready for this retro-active way of programming? If so, let me pique your interest some more by demonstrating real SPREAD programs:

Basic arithmetics
add:      1 2 + => 3 
multiply: 3 4 * => 12
maximum:  4 5 | => 5
minimum:  4 5 & => 4
negate:     4 - => _4
compound: 1 2 + 3 _1 + * => (1 2 + => 3) (3 _1 + => 2) * => 6

The first thing you'll notice is that each value has a backward 'trace' (the left of =>) that has lead to itself (unless a value is atomic). Traces are first-class in SPREAD and they enable SPREAD to be retro-active.

Alternatively, non-determinacy is achieved via the cartesian combination and operation over multisets. Next to multisets, SPREAD also has first class multimaps (multisets -> multisets) which - just like multisets - can be combined with the same standard arithmetical operators.

add:      {1,2,3} {2,4} + => {3,4,2:5,6,7}
multiply: {1,2,3} {2,4} * => {2,2:4,6,8,12}
maximum:  {1,2,3} {2,4} | => {2:2,3,3:4}
minimum:  {1,2,3} {2,4} & => {2:1,3:2,3}
negate:   {1,2,_2:3} -  => {_1,_2,_2:_3}
compound: {1,2} 3 + 5 * => ({1,2} 3 + => {4,5}) 5 * => {20,24}

add:      [a=1,b={2:2}] [b={3:2,3},c=4] + => [a=1,b={5:2,3},c=4]
multiply: [a=1,b={2:2}] [b={3:2,3},c=4] * => [b={6:2}]
maximum:  [a=1,b={2:2}] [b={3:2,3},c=4] | => [a=1,b={3:2,3},c=4]
minimum:  [a=1,b={2:2}] [b={3:2,3},c=4] & => [b={2:2}]
negate:   [b={3:2,3},c=4] - => [b={_3:2,_1:3},c={_1:4}]
alts:     {[a=1],[b=1}] [a=2] + => {[a={1,2}],[a=2,b=1]}

Still with me? Good! There is more to come.

Binding Labels
SPREAD defines a label type and a target type, both of which are expressions. A label and its corresponding target can be alternatively interpreted as 'this label is bound to that target'. It is possible for a label to be bound to an empty target which we call an "unbound label". It's the (re)binding of labels with other targets what is the basis of SPREAD's retro-active (and spreadsheet like) computations. More about that later.

bind:  [a=x'1 4 +,b=y' 5 *] [x=2,y=3] ! => [a=x'2 4 +,b=y'3 5 *]

Unevaluated expressions in a map can be evaluated ($). This operation allows the precise control over what and when to compute something.

eval: [a=x'2 4 +] $ => [a=((x'2 => 2) 4 + => 6)]

Optionally, computational traces in a map can be wiped (#) to yield the actual, and most recent value. Effectively, wiping also detaches labels from their targets.

wipe: [a=((x'2 => 2) 4 + => 6),b=(y'2,2)] # => [a=6,b=2]

SPREAD acknowledges the fact that a sequence is just a special kind of map with consecutive domain values as integers (starting from 0). Syntactically, SPREAD separates range values with dots, ignoring corresponding domain values. For example:


.. is just efficient syntactic sugar for:


add:      1.2.3 4.2 + => {1,4}.{2:2}.3 
multiply: 1.2.3 4.2 * => [1=2]
maximum:  1.2.3 4.2 | => 4.2.3
minimum:  1.2.3 4.2 & => 1.2
negate:   1.2.3 -     => {_1:1}.{_1:2}.{_1:3}

Sequences (which are ultimately maps) can be concatenated with the aid of the special purpose infix (;) operator. Rephrased as SPREAD's ironical rule:

Every SPREAD operator has postfix syntax, except the concatenation operator, which has infix syntax

concat1:  1.2.3;4.5.6 =>
concat2:  0;[a=1,b=2,c=3] => 1.2.3
concat3:  [a=b,c=d];[f=h,i=k] => b.d.h.k

The arithmetical operations on multisets and (multi)maps is what makes SPREAD a 'basic' array language. But to make 'arrays' work harder, SPREAD also defines three higher-order operators: fold, runfold and foreach. To emphasise, higher-order operators are different from ordinary operators because they take other operators as an argument.

The fold operator (%) takes two arguments: a map argument and a binary operator argument. When a fold is evaluated, it folds the binary operator argument over all the range values of the argument map, in sequence:

sum: +% => 10
product: *% => 24
max: |% => 4
min: &% => 1
bind:    [a=b'].[b=c'].[c=2] !% => [a=b'(c'2)]

The run-fold operator (@) takes two arguments: a map argument and a binary operator argument. When a run-fold is evaluated, it run-folds the binary operator argument over all the range values of the map argument, in sequence:

sum: +@ =>
product: *@ =>
max: |@ =>
min: &@ =>
bind:    [a=b'].[b=c'].[c=2] !@ => [a=b'].[a=b'(c')].[a=b'(c'2)]

The for-each operator (/) takes two arguments: a map argument and an unary operator argument. When for-each is evaluated, it applies the unary operator argument over all  the range values of the map argument, in sequence:

add1: (1 +)/ =>
mul2: (2 *)/ =>
bind:    [a=b'].[b=c'].[c=2] !/ => [a=b'(c'2)]

With higher-order operators we can do a lot of fancy stuff. But SPREAD's real fanciness is solely due to its retro-activity - achieved via the re-binding of labels which triggers consecutive automatic evaluation. Here are an example:

react1:  [a=(((b'1 => 1) 2 + => 3) (3 4 + =>7) * => 21)] [b=2] ! # =>
         [a=(((b'2 => 2) 2 + => 4) (3 4 + =>7) * => 28)] # =>

A trace can be recast as a sequence via the back(s)lash(\) operator. This allows an interesting importunity to manipulate traces of computations.

back1:   1 2 + 3 _1 + * \ => (1 2 + => 3) (3 _1 + => 2) * \ => 6 \ =>
         (1 2 + 3 _1 + * \).((1 2 + => 3) (3 _1 + => 2) * \).6

Turning(~) a map turns the domain and range of a map. This may be of use to be able to fold over the domain, for example:

turn1:   [a=b,c=d] ~ => [b=a,d=c]
turn2:   [a=b,c=b] ~ => [b={a,c}]
turn3:   [2=3,4=5] ~ +% => [3=2,5=4] +% => 6 


A SPREAD number is syntactic sugar for an infinite recursive map, defined as follows:

num1:  4 == [0={4:0}]


num2:  [0={2:0}] [0={3:0}] + == 2 3 + => 5 == [0={5=0}]  
num3:  [a=1] 3 + => [0={3:0},a=1]

Finally, the prolog style unification (?) of two maps is SPREAD's most versatile operator. BUT it is also most difficult to implement correctly and efficiently. Here are some preliminary examples, as I've not completely settled for its semantics:

uni1: ?   => [0]
uni2: 1.x'.3.x' ? => 0
uni3: 1.x'.3.y' ? => [[x=3,y=4]]
uni4:  x'.2.3 1.y'.z' ? => [[x=1,y=2,z=3]]
uni5:  1.2.3 x';y' ? => 

What's left?
Unification needs to be further developed. This will be the main topic of my next post. Next to that, there are some additional operators that need a definition (I've still got some ASCII characters left!)

Hopefully you will give me some suggestions to improve the SPREAD language! Care to comment on this post? Please feel free to do so.

(EDIT: The for-each examples were botched - they are now fixed)

Copyright 2013 - Robbert van Dalen

Friday, April 19, 2013

Implementation v0.1

I've more or less settled for SPREADs semantics so from now on it's implementation time!

This is my implementation schedule in stages:
  1. Quick and dirty implementation (will be horribly exponentially inefficient in many cases)
    • Try to implement various standard algorithms
    • Missing any convenient operators to implement standard algorithms?
      • Add convenient operators and re-iterate
  2. 'Efficient' implementation (to prove linear or log-linear efficiency in many cases)
    • Test efficiency (in terms of O complexity)
    • Test non-efficiencies (in terms of constant factors)
  3. 'Super efficient' implementation
    • Production quality performance
    • Production quality robustness
I hope to finalise Stage 1 within one month (deadline end of May). 
Stage 2 will take me at least 1/2 a year (deadline end of 2013) only to consider Stage 3 if there is genuine interest in the SPREAD language.

Anyway, I've already done a sketchy implementation that more or less works.

    Saturday, April 13, 2013

    Programming with variants

    I'm moving towards a more generic incremental computational model, away from the spreadsheet model.

    These are the main concepts:
    1. expressions variate {}
    2. expressions can be labeled with other expressions '
    3. labeled expressions can be manipulated via maps of expressions []
    4. evaluated expressions refer to their un-evaluated versions @
    5. variants and maps nest
    6. labels have (level) scope .
    Here are some examples:

    x'{1,2,3} 2 +               =>  {3,4,5}@x'{1,2,3} 2 +

    x'{1,2,3} y'{2,3} +         =>  {3,2:4,2:5,6}@x'{1,2,3} y'{2,3} +

    [y=a' b' *] [a={1,2},b=2] ! =>  [y={2,4}@a'{1,2} b'{2} *]

    [a=1,a=x'{3,4} 2 +] [x=4] ! =>  [a={1,5,2:5}@{1,x'{3,2:4} 2 +}]

    [x',.x',[.x']] [x=2] !      =>  [x'2,.x',[.x'2]]

    Friday, March 29, 2013

    Chemical spreadsheets

    Spreadsheet cells contain atomic values or formulas. Formulas refer to other cells and are evaluated by following cell references recursively, down to the atomic values.

    In this post I consider a different spreadsheet model, which I call the 'chemical spreadsheet model'.

    In this model, references that sit in formulas will be bound with their corresponding cell values. Next to that, bound cells are consumed by formulas and consequently removed from the enclosing spreadsheet.

    Consider the following spreadsheet example with four cells:

    A3=A1 + A2
    A4=A1 * 4

    In the standard model the result would be:

    A3=5@(A1'2 + A2'3)
    A4=8@(A1'2 * 4)

    But in the chemical model we have two possible alternative spreadsheets/evaluations:

    1) A3=5@(A1'2 + A2'3)
       A4=A1 * 4

    2) A3=A1 + A2'3
       A4=8@(A1'2 * 4)

    So either A3 or A4 consumes A1, while A3 always consumes A2. Two alternatives may collapse into one however, if we allow multiple equal definitions of A1:

    A1=2  __
    A1=2  __\ (may be abbreviated with 2:A1=2)
    A3=A1 + A2
    A4=A1 * 4

    .. reducing to:

    1) A3=5@(A1'2 + A2'3)
       A4=8@(A1'2 * 4)

    2) A3=5@(A1'2 + A2'3)
       A4=8@(A1'2 * 4)

    Conversely, different definitions of A1 gives us another possible set of alternatives:

    A3=A1 + A2
    A4=A1 * 4

    .. reduces to:

    1) A3=5@(A1'2 + A2'3)
       A4=8@(A1'4 * 4)

    2) A3=5@(A1'4 + A2'3)
       A4=8@(A1'2 * 4)

    Occurrences and alternatives

    Each spreadsheet cell is annotated with a fractional occurrence. Most importantly, fractions allow us to collapse alternative spreadsheets into single consolidated ones.

    For example, the following two alternative spreadsheets:

    1) A3=5@(A1'2 + A2'3)
       A4=8@(A1'4 * 4)

    2) A3=7@(A1'4 + A2'3)
       A4=8@(A1'2 * 4)

    .. are consolidated into:

    1/2:A3=5@(A1'2 + A2'3)
    1/2:A3=7@(A1'4 + A2'3)
      1:A4=8@(A1'2 * 4)

    Fractions are also considered during the binding and consumption of cells, for example:

        A4=A1 * 4

    .. reduces to:

    1/2:A4=8@(A1'2 * 4)
    1/2:A4=A1 * 4

    Unbinding with negative cell occurrences 

    Cells also can have negative occurrence (indicated by prefix _). Such cells unbind previously bound references. Here is an elaborate example:

     1/2:A3=5@(A1'2 + A2'3)
     1/2:A3=7@(A1'4 + A2'3)
       1:A4=8@(A1'2 * 4)

    .. reducing to:

    1/4:A3=5@(A1'2 + A2'3)
    1/4:A3=A1'2 + A2
    1/4:A3=7@(A1'4 + A2'3)
    1/4:A3=A1'4 + A2
      1:A4=8@(A1'2 * 4)

    Chemical spreadsheets appear to be novel, I couldn't find anything related on the web (except for using spreadsheets to run chemical experiments). Anyway, this will be the way forward to finally implement the SPREAD language.