Wednesday 22 February 2012

Computing in Libretto (1 of 5). Sequences and Paths

The basic feature of Libretto is that it handles single values and value collections in a uniform way. Actually Libretto works with a ‘generalized’ value, which may contain many concrete values. This mechanism is based on the concept of a linear (flat) sequence. Sequence-handling operations are the core of Libretto’s operational semantics.


Sequences and the Dot Operator

Syntactically a sequence is a linear list of values enclosed in parentheses and separated by comma:
(1,2,3)
(“abc”, 1, Person(“john”))
()
The sequence (), which does not contain elements, is called empty. The key feature of sequences is that they behave as linear lists (a sequence can not contain another sequence as an element):
((1,2),(3,4)) == (1,2,3,4)
(1, (2,3), (4), (), (5,6)) == (1,2,3,4,5,6)
((),()) == ()
Sequences can be assigned to variables and fields. For instance, variables typed by the asterisk can contain an arbitrary number of elements (see section Block Variables):
{
  fix x* = (1, 2)
  fix y* = (3, 4)
  fix z* = (x,(x,y),y)  
  z //  1 2 1 2 3 4 3 4
}
Sequences play the crucial role in Libretto because they represent the computing context. For instance, let us define the factorial function:
def fact = if (this == 0) 1 else this * fact(this - 1) 
With the dot operator we can compute the factorial of a single value:
5.fact  //  120
Here the dot operator applies fact to a single integer. But the dot can work in the context of an arbitrary sequence, for instance:
(1, 2, 3, 4, 5).fact  //  1 2 6 24 120
(1, 2, 3, 4, 5).fact.($ + 1000)  //  1001 1002 1006 1024 1120
In the first expression the dot operator iterates over the values of the initial context sequence – by successively passing its values to fact. In the second expression the initial sequence (1,2,3,4,5) is converted at step 2 into (1,2,6,24,120), and at the last step it is converted into (1001,1002,1006,1024,1120). The symbol $ is used for getting the current context value of a step (see section Current Context Value).

A path step, which produces the sequence with more than one value is called a choice point.

For instance, in the second example all steps are choice points, because all of them result in sequences containing five values.

More examples:
def plus1 = this + 1

{
  var x* = (1,2)
  var y* = (3,4)

  (x, y).plus1  //  2 3 4 5
  (x.plus1, y.plus1)  //  2 3 4 5
}

def f12 = (1,2)
(f12, f12.plus1.plus1, f12.plus1.plus1.plus1.plus1) // 1 2 3 4 5 6
A single value v and a singleton sequence (v) are not distinguished in Libretto:
5 == (5)  //  true
5.fact  //  120
(5).fact  //  120
Singleton sequences can be used for ordinary value grouping. For instance,
(2+3)*4  //  20
Arithmetic operations in Libretto can work with arbitrary sequences:
(2+3, 20+30)*4  //  20  200
The empty context sequence is also allowed:
().fact  //  ()
Note that the dot operator working with flat sequences has similar semantics as flatMap in Scala. It is also close to the notion of a monad actively used in Haskell. Sequences and the dot operator are the basis for the Libretto iterative mechanism. This mechanism has the following advantages:

  1. It allows handling a single value and an arbitrary sequence of values in a uniform way. Moreover, it is sufficient to introduce handling rules for a single value, and the dot generalizes them on an arbitrary number of context elements.
  2. There is no need for introducing a special constant like null or similar constructs like the class Option in Scala or Maybe in Haskell. This role is played by the empty sequence ().
  3. The influence of heavy recursive definitions is limited in Libretto, although the functional style is important for the language.

For instance, in Libretto the definition of the function
def plus1 = this + 1
takes into account all three situations: when the context is empty, when it is single, and when it is numerous:

  • for the empty sequence (an analogue of null):

().plus1  //  ()

  • for a single value:

5.plus1  //  6

  • and for a value sequence:

(1, 2, 3, 4, 5).plus1  //  2 3 4 5 6

Current Context Value

Similarly to ‘this’ in methods, the symbol $ provides access to the current context value at each path step. For instance the function plus1 above can be defined directly:
(1, 2, 3, 4, 5). ($ + 1)  //  2 3 4 5 6
Arguments also depend on the context, for instance:
def f(x) = x + this
(1,2,3,4,5).f($)  //  2 4 6 8 10
In the last expression each context value of the choice point (1,2,3,4,5) is passed to the function f as both the context and the argument.

Sequence Indexing

We can access elements of a sequence by their index in the sequence. The indices start from 0:
(“abc”, “def”, “ghi”, “jkl”)(0)  //  “abc”
(“abc”, “def”, “ghi”, “jkl”)(2)  //  “ghi”
(“abc”, “def”, “ghi”, “jkl”)(-1)  //  “jkl”
(“abc”, “def”, “ghi”, “jkl”)(-2)  //  “ghi”
The programmer can also work with ranges:
(“abc”, “def”, “ghi”, “jkl”)(1..2)  //  “def”  “ghi”
(“abc”, “def”, “ghi”, “jkl”)(2..1)  //  “ghi”  “def”
(“abc”, “def”, “ghi”, “jkl”)(-3..-1)  //  “def”  “ghi” “jkl”
(“abc”, “def”, “ghi”, “jkl”)(-1..-2)  //  “jkl”  “ghi”
(“abc”, “def”, “ghi”, “jkl”)(2..)  //  “ghi”  “jkl”
(“abc”, “def”, “ghi”, “jkl”)(-2..)  //  “ghi”  “def” “abc”
As an example, let us consider the naive definition of the quick sort function:
def* qsort = if (this) {
  fix pivot = this(0)
  (this(1..)?[$ < pivot].qsort, pivot, this(1..)?[$ >= pivot].qsort)
}

(4,2,3,5,1).qsort  //  1 2 3 4 5
The function qsort is defined as a collection function. The else block is omitted in its definition – the default value () is used instead.

Indexing works also on fields, for instance,
class C(var n*)
object c extends C(1,2,3,4,5)
c.n(3)  //  4
Only collection fields marked by the asterisk can be indexed:
class C(var m, var n*)
object c extends C(0,1,2,3,4,5)
c.m  // 0
c.n(3)  //  4
c.m(3)  //  ERROR: method do is not defined for Int
For m, since it is not a collection field, Libretto tries to interpret the only value of m as a function, and apply it to the argument in parentheses – but fails. The next example deals with an anonymous function
class C(var m, var n*)
object c extends C(%{this+1}, %{this+1})
c.m(0)  //  1
c.n(0)  // %{this+1} 
c.n(0)(0)  //  1
The situation is similar for parameter variables:
def f(x*) = (x(0), x(0)(0))
f(%{this+1})  //  %{this+1}  1
If an argument should be applied to each function of a sequence, the programmer can use the following technique:
class C(var fun*)
def plus1 = this + 1
def mult5 = this * 5
def sub10 = this - 10

object c extends C(%plus1, %mult2, %sub10)
c.fun.$(5)  //  6  25  -5
Note that the expression c.fun(5) is equal to (), since the field c.fun contains only three objects:
(%plus1, %mult5, %sub10)(2)  //  sub10
(%plus1, %mult5, %sub10).$(2)  //  3  10  -8 

Dot Operator Generalization

In Libretto the dot operator applies fields and methods to objects in a way similar to that in C, Java and Scala:
class C(n)
C(5).n  //  5
But, unlike these languages, e1.e2 is valid in Libretto for any pair of expressions e1 and e2. Thus, the dot operator is generalized in Libretto to all kinds of expressions and values. In particular, any expression can be applied to a sequence and a sequence can be applied to any expression, for instance:
def plus5 = this + 5
def mult2 = this * 2

(1,2,3,4,5). plus5  //  6 7 8 9 10
2.(plus5, mult2)  //  7 10
In practice, this capability significantly improves code expressiveness and compactness.

The table below shows the results of the dot operator application e1.e2 dependently on the kind of the expression e2.

e2 Result interpretation Example
a field The field value(s) of the context object
 
class C(fix n: Int)
C(5).n   // 5
a class name Filtered context values – only class members are allowed through
(1,“a”,2,“b”).String
  //  (“a”, “b”)
an object or a constant An object or a constant itself – independently on the context values
class A
object a extends A
(1,2,3).a  //  a a a
a.5  //  5
a variable The value of the variable – independently on the context values
{
  var x = 5
  (1,2,3).x   //  5 5 5
}
a call of an element function A sequence accumulating the results of the function application to each element of the context sequence (the number of calls is equal to the number of context values)
def plus1 = this + 1

(1,2).plus1  // 2 3 

().plus1  //  ()
a call of a collection function The result of the function application to the whole context sequence (the function is called once)
def Any* middle =
  this(this.size div 2)

(1,2,3).middle  //  2

().middle  //  0
a sequence A sequence accumulating the results of computation of the sequence e2 for each context value
2.($+5, $*2)
  //  (7,4)
(2,3).($+5, $*2)
  // 7 4 8 6
a block The value of the last expression of the block (each expression of the block is computed in the context of the block)
 
2.{fix x=$*5; x-$}
  //  8
a predicate Filtered context values – only those are allowed through, in the context of which the predicate expression is not equal to ()
(3,4,5,1) ?[$ > 3]
  //  (4,5)
a sorter The context sequence reordered in accordance with the conditions of the sorter
(3,1,2,4) ^(-$)
  //  (4,3,2,1)
Note that the special functions (like dot_, next_ and nextSeq_) can change the semantics of the dot operator.

Sequences as Field Values

By default, a field can have at most one object as its value (one value or the empty sequence ()). But if the field-type description contains the asterisk or the indicator of the maximum number of elements (such a field is called a collection field) then a multielement sequence can be assigned to this field. In the class
class C(var k: Int, var m: Int*, var n: Int(3))
the fields k, m and n are defined, such that

  • k can contain at most one (zero or one) integer value
  • m can contain an arbitrary integer sequence
  • n can contain at most 3 (0 to 3) integer values.

For instance,
object c extends C((), (1,2,3,4,5), 1, 2)
def plus1 = this + 1
c.m.plus1  //  2 3 4 5 6
There are some indexing examples on collection fields:
class C(var m*)
object c extends C(4,5)
object f extends C(%{this+1}, %{this*100})
c.m(1)  // 5
f.m(1)  //  %{this*100}
f.m(1)(10)  //  1000
f.m(10)  //  ()
f.m.$(10)  //  11 1000
The declaration var m* is a shorthand for var m:Any*. In the penultimate expression the element with a non-existent index 10 is being looked for. Sequences in immutable fields can not be modified:
class C(fix m*)
var c = C(1,2,3)
f.m = (3,4,5)  //  ERROR: Attempt to reassign fixed field f
A sequence in a field is treated as a generalized single value. Hence, its change is considered as a modification of the field. Sequences stored in var fields can be modified with the assignment operators (see section Block Assignment Operators).

Open and Closed Sequences

A closed sequence is a finite sequence, which can be handled as a whole (e.g. by collection functions). Otherwise a sequence is open. All explicitly defined sequences are closed, for instance, (1,2,3,4,5). This is an example of an open sequence:
class Nats {
  def dot_(path) {
    var n = 1
    while (true) {
      n.path!
      n = n + 1
    }
  }
}
The class Nats uses the special method dot_ for generating an infinite sequence of natural numbers starting from 1. For instance:
Nats().print()  //  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ........
The exit from this loop must be provided within the path following the choice point Nats, e.g.,
Nats(). if ($ < 5) print else break  //  1 2 3 4
In this example the operator break halts the iteration (see section Operators break and continue). Another example of an open sequence is a function keyboard returning a pressed-key sequence:
keyboard  //  A a Ctrl-C Shift-F1 ....
It is well known that this sequence can be arbitrarily long…

Paths and Steps

An expression of the form
S1. S2. … . Sm
is called a path. The steps Si of the path are separated by the dot operator. In general, a step Si is a chain of expressions:
E1 E2 … En
where Ei is either a basic expression (object/field name, variable, number or string), a block, a function/constructor call, a sequence, a predicate, a sorter, or a path assignment. For instance, the step
(5, 3, 4, 1, 2) ?[$>3] ^($)  //  4 5
is a chain of three expressions: a sequence, a predicate, and a sorter. Let a step Si be E1 E2 … En. Then E1 is evaluated

  • in the external context, if i == 1,
  • in the context of the preceding step value, if i > 1.

In the example above the first expression (5, 3, 4, 1, 2) is followed by the predicate ?[$>3]. The evaluation of the predicate results in (5, 4). Then this sequence is ordered by the sorter ^($).

In the definition
def f = (this + 1) ?[$ > 3] ^($)
the body of f consists of one-step path (this + 1) ?[$ > 3] ^($). Its first expression (this + 1) is evaluated in the context of f. In particular, we have
(5, 3, 4, 1, 2).f  //  4 5 6
The following path consists of several steps:
(5, 3, 4, 1, 2). ($ * 2) ?[$ > 4] ^($). ($ * 2)  //  12 16 20
The second step ($ * 2) ?[$ > 4] ^($) contains three expressions: multiplying by 2, a predicate and a sorter. Now, let us consider how expressions are computed within one step. The operational semantics of a step E1 E2 … En is based on the following rule

The expressions Ei, i>1 can not change the context sequence produced by the evaluation of E1 except for the cases when

  • Ei is a predicate, which can remove values from the context sequence, or
  • Ei is a sorter, which can reorder the context sequence.
Here are examples of a predicate and a sorter as the second expressions of a step:
(4,3,1,2) ?[$ div 2 == 0]  //  4 2
(3,4,1,2) ^(desc $)  //  4 3 2 1
All other expressions can not modify the context within a step:
def f = this+1
(4,3,1,2) f  //  4 3 1 2
(4,3,1,2) {$ * 100} // 4 3 1 2
(4,3,1,2) {$ * 100} ^($)  //  1 2 3 4
Both, the function and the block are computed in these examples, but the result of this computation is ignored. Compare with almost the same expressions:
def f = this+1
(4,3,1,2).f  //  5 4 2 3
(4,3,1,2).{$ * 100} // 400 300 100 200
(4,3,1,2).{$ * 100} ^($)  //  100 200 300 400
The only difference between these examples is the appearance of the dot operator, which makes f and {$ * 100} the first expressions of their steps. Now these expressions have an impact on the value of the whole path.

This scheme is very useful. The programmer can safely insert the calls of functions without the risk of spoiling its value. In particular, a debugging function can be defined
def debugPrint = {print(“value:” + this); “printed”}
and safely inserted in a step, although debugPrint returns its own value “printed”:
(1,2,3,4,5) debugPrint. ($+1)
  //  value:1 value:2 value:3 value:4 value:5
  //  2 3 4 5 6
The blocks within steps can be used for modifying objects without affecting the context:
class C(var n)
(C(1), C(2), C(3)) {n = 100; “changed”}.n  //  100 100 100
The block modifies the value of n, but does affect the context, although it has its own value “changed”. Compare it with
class C(var n)
(C(1), C(2), C(3)).{n = 100; “changed”}.n  
                //  ERROR: field n not defined for String
In the last expression the value of the block does matter.

The programming technique based on paths and steps is convenient for the ‘pipeline’ programming style. Assume that we have some instances of class Foo, on which a method description is defined. The task is to concatenate all non-empty descriptions separated by the new-line symbol. Empty descriptions are represented by the empty string “”. A solution in Libretto looks as follows:
fooList.description?[not isEmpty].join("\n")
Here fooList is a sequence of instances of Foo, and isEmpty is the predefined method checking the emptiness of a string. The collection method join concatenates the sequence of strings and separates them with "\n".

Empty Sequences in Paths

The empty sequence plays an important role in the operational semantics of paths. In the description concatenation task from section Paths and Steps above an empty description was represented by the empty string “”. But in Libretto it is more natural to use in such cases the empty sequence (). Then we have a very compact solution of the task:
fooList.description.join("\n")
This query works correctly due to the semantics of the empty sequence, which is defined as follows:

  1. If an evaluation of a path step results in the empty sequence, then it is considered as failure, and backtracking occurs to the nearest choice point, where the next context value is selected.
  2. If an evaluation of the path is finished successfully, then its result is added to the resulting sequence of the path, and then backtracking occurs to the nearest choice point – for the selection of another context value.
  3. If there are no choice points left in the path, then its computation stops, and the resulting sequence of the path is returned.

In particular this means that if the method description returns the empty sequence, it adds nothing to the resulting sequence concatenated by join("\n"), so there is no need for additional checks. Items 1-3 determine the depth-first-search algorithm. The following example demonstrates it in action:
def even = if (this mod 2 == 0) this
(1,2,3,4). even. (10 * $, 100 * $)  //  20 200 40 400
In the last expression, the second step equals () for odd context numbers, and thus the third step (10 * $, 100 * $) is unreachable for them. In this path two choice points are engaged – at the first and third steps. The ‘search’ process is:
1 -> ()
2 -> 2 -> 2*10
          2*100
3 -> ()
4 -> 4 -> 4*10
          4*100

Empty Sequence Handling

If we need to avoid backtracking while processing the empty sequence, the collection functions and local traps can be applied.

Collection functions (see section Collection Functions) are applied as follows: if a collection function call occurs after a step resulting in the empty sequence, then the collection function is called instead of backtracking with the empty sequence as its context:
def thisMult5 = this + 5
def* sizeMult5 = size + 5

().thisMult5  //  ()
().sizeMult5  //  5

(1,2,3,4) ?[$>100]. thisMult5  //  ()
(1,2,3,4) ?[$>100]. sizeMult5  //  5
In this example the element function thisPlus5 is not called, but the collection function sizeMult5 is called once in the context of the empty sequence.

Libretto also provides empty sequence traps ?(exp) (see section The Empty Context as the Weakest Exception). For instance, in the last example we can catch the empty sequence before the call of thisMult5, and replace it by -1:
(1,2,3,4) ?[$>100] ?(-1). thisMult5  //  4 4 4 4 4

Operator as

The path assignment operator as stores the current context value in a variable. For instance,
(1, 2) as x. (3, 4). ($ * x)  //  3 4 6 8
One more example:
fix class Person(name: String, hasChild: Person*)

object ANN extends Person(“Ann”)
object TOM extends Person(“Tom”)
object PAUL extends Person(“Paul”, ANN, TOM)
object JOHN extends Person(“Johh”, PAUL)
object MARIE extends Person(“Marie”, PAUL)

var persons = (ANN, TOM, PAUL, JOHN, MARIE)

persons as p. hasChild. hasChild. 
                «{p.name} is the grandparent of {name}»!

   // John is the grandparent of Ann
   // John is the grandparent of Tom
   // Marie is the grandparent of Ann
   // Marie is the grandparent of Tom
The scope of a variable initialized by as is to the end of the path. If there is an entity with the same name in the context of the path, then this entity is overshadowed by the variable.

Operator index

The path operator index assigns the index of the current context value to a variable:
(100, 200, 300) index i. i  //  0 1 2
(100, 200, 300) index i. if (i mod 2 == 0) $  //  100 300
Here the index value is assigned to the variable i. The last query selects the elements with even indices. The scope of a variable initialized by index is to the end of the path. If there is an entity with the same name in the context of the path, then this entity is overshadowed by the variable.

Operators break and continue

The operators break and continue control iteration processes in paths. For instance, the query
(1,2,4,0,6,8) ?[$ mod 2 == 0]  //  2 4 0 6 8
selects all even numbers from the context sequence. To collect all even numbers occurring before the first occurrence of 0, we can use break. It stops the computation of the path and returns the values collected before break is applied:
(1,2,4,0,6,8) ?[$ mod 2 == 0]. if ($ == 0) break else $  //  2 4
The operator break can have an argument – a value, which is added to the resulting sequence of the path. For instance, in order to put 0 in the resulting sequence we define
(1,2,4,0,6,8)?[$ mod 2 == 0]. if ($ == 0) break 0 else $ //  2 4 0
The query to “find the first John in the sequence of persons” can be defined as:
persons. if (name == “John”) break $
The operator continue performs deep backtracking within the path, for instance,
persons as p. hasChild. hasChild. if (name == “Ann”) continue p age
This query is looking for the age of granddaughters named Ann (at most one Ann for each grandparent). The first argument of continue must be a variable connected with an operator as at some choice point. The expression continue p can be interpreted as the instruction to select the next value for p and continue the computation with it. The second argument (age) is optional. It is interpreted as the output of the computation branch, in which continue is called.

Predicates

Predicates can filter the elements of the context sequence by allowing through only those values, which satisfy the condition of the predicate. Predicates have the form ?[Exp], where Exp is an expression called the predicate condition. For instance:
(1, 2, 3, 4, 5) ?[$ > 3]  //  4 5
A predicate is true on a context value, if its condition is not equal to (). So, the empty sequence plays the role of false. For instance,
class Cls(fix exclamation: String*)

object O1 extends Cls()
object O2 extends Cls(“hello”)
object O3 extends Cls(“goodbye”, “bye”)

(O1, O2, O3) ?[exclamation]  //  O2 O3

(O1, O2, O3) ?[“goodye” in exclamation]  //  O3
The first predicate selects the context objects with a non-empty value of the field exclamation. The second predicate selects the objects with exclamation containing “goodbye”.

Boolean combinations also can be used in predicates:
(1, 3, 6, 4, 5, 2) index x ?[x mod 2 == 0 and not ($ mod 3 == 0)]
  //  1 5
In this example, those elements are getting through, which have an even index, and do not contain 3.

Sequence Ordering

Libretto supports a special sorting operator, which can order the context sequence. For instance,
(4,1,3,2,5) ^(asc $)  //  1 2 3 4 5
(4,1,3,2,5) ^(desc $)  //  5 4 3 2 1
The sorting operation is denoted by ^. The context sequence is ordered in accordance with the values in the parentheses, which must be comparable to each other. The keywords asc and desc indicate the ascending or descending direction of ordering. Ascending order is default:
(4,1,3,2,5) ^($)  //  1 2 3 4 5
Several criteria can be involved in ordering, for instance,
fix class Cls(name: String, num: Int)

object A1 extends Cls(“A”, 1)
object B1 extends Cls(“B”, 1)
object B2 extends Cls(“B”, 2)

(B1, A1, B2) ^(name, desc num)  //  A1 B2 B1
Here the objects of the class Cls are sorted by name (in ascending order), and those with the same name are sorted by num (in descending order).

The operator ^ can order the instances of any class, which inherits from the underdetermined class Comparable:
class Comparable {
  def compare(snd: Comparable) 
}
This allows the programmer to introduce her/his own ordering methods. The value of the function compare must be

  • < 0, if the context value is greater than the argument snd,
  • 0, if they are equal,
  • > 0, if the context value is less than snd.

If either the context value or snd does not determined, then () is returned. Now we can redefine the above example in the following way:
class Cls(fix name: String, fix num: Int) extends Comparable {
  def compare(snd: Cls) {
    var n = name.compare(snd.name)
    if (n == 0) n = -num.compare(snd.num)
    n
  }
}
object A1 extends Cls(“A”, 1)
object B1 extends Cls(“B”, 1)
object B2 extends Cls(“B”, 2)

(B1, A1, B2) ^($)  //  A1 B2 B1
The operation ^ is a collection method, which modifies the context sequence as a whole. Thus, it can work only with closed sequences (see section Open and Closed Sequences).

No comments:

Post a Comment