Thursday 23 February 2012

Computing in Libretto (2 of 5). Expressions

In this post I describe the basic types of Libretto expressions.


Assignment Operators

Libretto offers several assignment operators. They provide flexible sequence processing in fields and variables. There are two groups of assignment operators:

  • block (collection-wise) operators (=, +=, .= and --), which handle sequences as collections;
  • path (element-wise) operators (as and index), which handle sequences element by element.

Block Assignment Operators

Libretto supports several block assignment operators:

  • destructive assignment =
  • adding a value to the end of a sequence or after an element +=
  • adding a value to the beginning of a sequence or before an element .=
  • object deletion --

Here is an example:
class Cls(var n: Int)
object c extends Cls()

c.n = 3
c.n .= 2
c.n += 4
c.n .= 1
c.n += (5, 6)
c.n  //  1 2 3 4 5 6
c.n--  //  ()

Assignments, Indexes and Predicates

A path can be at the left-hand side of a block assignment. For instance, the index operator in the left-hand side indicates that the assignment works with selected objects of a sequence:
{
  var seq* = (1,2,3,4,5)
  seq(3) = 0  //  seq == (1,2,3,0,5)
  seq(3) += “b”  //  seq == (1,2,3,0,“b”,5)
  seq(3) .= “a”  //  seq == (1,2,3,“a”,0,“b”,5)
}
Here modifications occur around the third element of the sequence.

A modification of a sequence stored in a variable or a field is interpreted as a modification of this variable/field. This means that the modified variable must be mutable:
{
  fix seq* = (1,2,3,4,5)
  seq(3) = 0  //  ERROR: Attempt to reassign fixed variable seq
}
Predicates occurring in the left-hand side of a block assignment can select elements for handling:
{
  var seq* = (1,2,3,4,5)
  seq ?[$ mod 2 == 0] = 10    //  seq == (1,10,3,10,5)
  seq ?[$ mod 2 == 0] += “b”  //  seq == (1,10,“b”,3,10,“b”,5)
  seq ?[$ mod 2 == 0] .= “a”  //  seq == (1,“a”,10,“b”,3,“a”,10,“b”,5)
  seq ?[$ mod 2 == 0] --      //  seq == (1,“a”,“b”,3,“a”,“b”,5)
}
The left-hand side can also contain a path assignment operator, e.g. as. Usually it is used to parametrize the assigned value:
{
  var yesno = (“yes”, “no”, “maybe yes”)
  yesno as x ?[substring(“yes”)] = x + “, madam!”
  yesno.print()  //  “yes, madam!”   “no”  “maybe yes, madam!” 
}
Indices, path assignments and predicates can be combined:
class Lyrics(var lines: String*)
object HeyJude extends Lyrics(
   “Hey Jude, don't be afraid”,
   “You were made to go out and get her”,
   “The minute you let her under your skin”,
   “Then you begin to make it better”
)

HeyJude.lines as ln ?[not substring(“her”)] = ln + “:)”
HeyJude.lines
  //  Hey Jude, don't be afraid:)
  //  You were made to go out and get her
  //  The minute you let her under your skin
  //  Then you begin to make it better:)
In this example, the smileys are added to those strings, which do not contain the word “her”. The chains of predicates and indices are also allowed:
HeyJude.lines as ln ?[not substring(“her”)](-1) = ln + “:)”
Here the smiley is added only to the last appropriate string. The following assignment sets the age of a John’s grandson Paul:
persons?[name == “John”].hasChild.hasChild?[name == “Paul”].age = 16

Assignment in Paths

Paths can also contain assignments – both the block assignment operators, and special path operators as and index.

In paths, the block operators =, +=, .= and -- can be applied only in blocks {...} within path steps. For instance, they can be used for collecting the result:
{
  var x*  //  () by default
  var sum = 0

  (1,2,3,4,5). {x += $ * 100; sum = sum + $}
  
  x  //  100 200 300 400 500
  sum  //  15
}
A block variable occurring in several steps of a path must be declared outside the path (x and sum above). A block variable can be defined inside a block within a path step, but then its scope is limited to this block.
 {
  var x*    
  (1,2,3,4,5). {var x = 1000; print($ + 1000)}. {x += $ * 100}
  // 1001 1002 1003 1004 1005
  x  //  100 200 300 400 500
}
Here the externally defined variable x accumulates an integer sequence. This x is not accessible within the block in the second step, due to the locally defined x.

The path assignment operator as can store the current context value:
(1,2,3) as x. (“a”, “b”, “c”) as y. “{y}{x}”! 
        //  a1 b1 c1 a2 b2 c2 a2 b3 c3
Finally, the index assignment operator stores the index of the current context value:
(0,1,2,3,4,5) index i. ($ * i)  //  0 1 4 9 16 25
For all types of the assignment the following rule is applied

An assignment operator does not change (is transparent for) the context value.

In other words, the assignment operator can not replace the context value with a new one:
class C(var n)
object CC extends C(5) 
CC. {n = 0}  //  CC
CC as c index i  //  CC
The scope of local variables introduced by as and index is to the end of the path, where they are defined:
{ 
  (1,2,3) as y. (y + 1)  //  2 3 4
  y  //  ERROR: uninstantiated variable y
}

Block Operator

A block operator is enclosed in curly brackets, for instance,
{ fix x = 5; x + 4 }  //  9
A block operator is a sequence of expressions separated by the semicolon. The value of a block is the value of its rightmost expression. Local variables declared in a block are accessible only within this block. Block can be a subexpression of other expressions:
{fix x = (5, 10); x + 4} - 3  //  6  11
The context of each expression in a block is defined as the context of the whole block:
“haha”. {fix x = $ + “---”; x + $}  //  “haha---haha”
Depending on whether a block is the first expression in a path step, the block can either modify the context value or leaves it unchanged (see section Paths and Steps):
5.{$ + 2}  //  7
5 {$ + 2}  //  5
In the first case, {$ + 2} is the first expression of the second step, and in the last case it is the second expression of the first step. A block can be used inside a path step as an object modifier, which does not change the context:
class Person(var name: String)
object john extends Person(“John”)
john {name = “Johnny”; 115}. name   //  “Johnny”
Here the block {name = “Johnny”; 115} is performed in the context of the object john. Since john and the block belong to the same step, john stays as the context value, although the block value is equal to 115.

Such an ‘in-between’ use of blocks is convenient for ‘pipeline’ object processing, for instance,
class Person {
  fix name: String
  var hasChild: Person*
}
object JOHN extends Person {
  name = “John”
  hasChild = Person() {name=“Ann”; hasChild = Person() {name=“Paul”}}
  hasChild += Person() {name = “Marie”}
}

{
  fix persons* = (JOHN, JOHN.trans(%hasChild))
  persons as parent. hasChild. “{parent.name} :: {name}”!
    //  Johh :: Ann   John :: Marie   Ann :: Paul
}
In this example, a chain of relatives is built simultaneously with the creation of person objects. Note that # should not be applied here, because the structure of the objects is not affected by the modifying blocks. Only field values are changed, but it is not a metaprogramming operation.

A block can be used as a function body, for instance,
def Int* sum = {var s = 0; this. {s = s + $}; s} 
If a function body is a block then the equation symbol can be omitted:
def Int* sum {var s = 0; this. {s = s + $}; s}

Operator if

The conditional operator in Libretto has the standard semantics, but can work with context sequences:
(1, 2, 3, 4, 5). (if ($ mod 2 = 0) “even” else “odd”)
  //  “odd” “even” “odd” “even” “odd”
All three components of a conditional expression (the condition, the if-branch and the else-branch) are computed in the context of the whole conditional expression. else-branch can be omitted – its default value is ():
if (3 < 2) “hello!”  //  ()
if is a keyword of Libretto. The priority of if is higher than the priority of the dot operator, so if a conditional expression occurs in a path, it must be enclosed in parentheses:
(1,2,3,4,5) index i. (if (i mod 2 == 0) 0 else 100). ($ * 2) 
    //  0 200 0 200 0
A bit of syntactic sugar allows us to omit parentheses, if a conditional expression is the rightmost step of a path:
(1,2,3,4,5) index i. if (i mod 2 == 0) 0 else 100  //  0 100 0 100 0
As in case of predicates, the condition of an if-expression is false if it is equal to the empty sequence ():
class Person(fix name: String, var hasChild: Person*)

object ANN extends Person(“Ann”)
object TOM extends Person(“Tom”, ANN)

(ANN, TOM). 
  if (hasChild) “{name} has children”! 
  else “{name} does not have children”!

  // “Ann does not have children”
  // “Tom has children”
In this example the conditional expression depends on whether the field hasChild is empty or not.

Note that a predicate exp1?[exp2] is a shorthand for
exp1. if (exp2) $

Operator while

Libretto does not need a foreach-operator, because its role is played by the dot operator. Only a while-operator is defined in it. For instance, we can give the following definition of the factorial:
def fact(n) {
  var f = 1
  while (n > 1) {f = f*n; n = n-1}
  f
} 

fact(5) // 120
The while operator is an element function, which is easily defined in Libretto:
def while(%cond, body: Function) = 
    if (cond!) {body!; while(cond, body)} else this
The familiar form of a while expression is possible due to syntactic sugar introduced in section The Rightmost Argument is a Function. The while operator is defined in such a way that it does not change the context value:
{
  var n = 5
  var f = 1
  “abc”. (while (n > 1) {f = f * n; n = n-1})  //  “abc”
}

Operator match and Pattern Matching

Libretto supports a general mechanism of pattern matching mostly adopted from Scala and based on partial functions (see section Partial Functions).

The element method match, defined as an infix operator, provides the control over pattern matching. For instance,
(1, 2, 3, “a”) match {
  case 1 => “one”
  case 2 => “two”
  case n if Int => “integer” + n
}

  // “one” “two” “integer3”
The evaluation of the match operator consists in the successive comparison of its first argument with the case conditions, and if some condition matches, then the corresponding option is selected. In the example above the context values match with integers 1 and 2. A constant (a literal or an object) can match only with itself or a variable. A variable matches with any single value. The check if Int is evaluated in the context of the compared value.

For those context values, which do not match with any alternative (“a” in the example above) match returns ().

The underscore _ (called a placeholder) matches with any non-empty value:
(1, “a”, 2.2) match {
  case _ if Int => “integer”
  case _ if String => “string”
  case _ => “unknown”
}

  // “integer”  “string”  “unknown”
The last alternative in this example is an analogue of default in Java.

Structure Matching

For constants, pattern matching is a simple operation: two elements match if and only if they coincide. For arbitrary expressions the situation is more complicated, because matching depends on the object structure. In Libretto a general mechanism of pattern matching is implemented, which works on functions and classes. Let, for instance, f be a function, and we need to match an object obj against the expression f(x), where x is a variable. They match if there exists such val that f(val) == obj. Libretto allows the programmer to implement methods for solving such ‘equations’. Let, for instance,
def twice(x) = x * 2
and we need to match 42 against the pattern twice(x). Clearly, x must be 21. To get it, the following definition is added
def %twice undo(x) = if (x mod 2 == 0} x div 2
The external method undo, which is defined in the context of the function object %twice, implements the function inverse to twice (see section Method undo). The function undo

  • is always defined as unary;
  • is evaluated in the context of the function object;
  • has a decomposed object as its only argument.

The method undo is used for matching. Its result is compared with the argument(s) of a pattern (x in the pattern twice(x)):
twice(21) match {
  case twice(x) => print(x)
}
  // 21
The first occurrence of twice(21) is interpreted as the call of twice (that is, %twice.do(21)), and is equal to 42. The second occurrence of twice(21) within the case statement is interpreted as %twice.undo(42). As the result we have 21, which matches with the variable x.

The function twice and its inverse function undo can be defined as internal methods:
object twice extends Function {
  def do(x) = x * 2
  def undo(x) = if (x mod 2 == 0} x div 2 
}
Actually, twice should not be necessarily an instance of Function:

  • For interpreting an object as a function, it is sufficient to have a method do defined on it.
  • For interpreting an object as a pattern, it is sufficient to have a method undo defined on it.

The following pattern matching rules are applied in Libretto:

  1. A variable (e.g. x) matches with any single value, which is assigned to this variable. The empty value () does not match with any variable. A variable with the asterisk (e.g. x*) matches with any sequence including the empty one.
  2. The placeholder _ matches with any non-empty value but does not match with (). Any sequence including singletons and () matches with _*.
  3. An expression of the form @prop matches with the value of the field prop in the external context of matching.
  4. Constant literals like 1 or “abc” match with themselves, variables and _.
  5. Named constants, like the named objects, match with themselves, variables and _.
  6. If a pattern x.f(y1, ..., yn), where f is an object, on which undo is defined, is compared with some value a, then %f.undo(a) is evaluated, and the resulting sequence is compared element by element with (x, y1, ..., yn).
  7. If there is no context x in the previous rule, then the result of the evaluation of %f.undo(a) is compared element by element with (y1, ..., yn).
  8. If a pattern has the form P* as x, then those elements of the sequence compared with P* as x are collected in x, which match with P. The empty sequence () is also allowed to be collected in x. An expression x* is syntactic sugar for _* as x.
  9. If the last argument of a pattern is marked by the asterisk (x.f(P1, ..., Pn*) or f(P1, ..., Pn*)), where P1, …, Pn* are patterns, then this pattern can be compared with sequences of arbitrary length not less than n-1. Then the first n-1 values are compared with P1, …, Pn-1, respectively, and the rest is compared with Pn*.

Pattern matching in Libretto can work with context expressions. For instance, let us redefine twice as a nullary function depending on the context:
def Int twice = this * 2
def %twice undo(x) = if (x mod 2 == 0} x div 2
Now:
21.twice match {
  case x.twice => print(x)
}
  // 21
This example is based on rule 6 of pattern matching.

A pattern P as x is equivalent to P with the difference that it assigns to x a value matched with P:
21.twice match {
  case x.twice as y => print(x+y)
}
  // 63
If a sequence is compared with P as x, then x can be assigned only to the first matching value of the sequence. If we need to collect all elements matching with P, it is necessary to use P* as x.
fix class (val1)-->(val2)
fix class C(s: --> *) 

C(“a”-->1, “b”-->2, “a”-->3) match {
  case C((“a”-->_)* as y) => print(y)
} 
  // S(“a”,1)  S(“a”, 3)
Pattern matching can work with arbitrary objects, on which undo is defined.

Immutable structures (see section Immutable Structures) are specially adjusted for efficient work with pattern matching. For instance, define an immutable structure
fix class Person(name: String, hasChild: Person*)
The declaration ‘fix class‘ does not only declare all fields of the class as immutable (fix), but also adds a definition of undo. Now:
{
  fix persons = (
    Person(“Paul”, Person(“John”, Person(“Tim”)), Person(“Ann”)),
    Person(“Tom”, Person(“Marie”), Person(“John”))
  )

  persons match {
    case Person(_, Person(“John”, _*)* as ch) => ch.hasChild.print(name)
    case Person(name, _) => print(“{name} is a wrong person”)
  }
    //  “Tim”   “Tom is a wrong person”
}
In this example Person(“John”, _*) and similar expressions serve as patterns for matching. In particular, if in this example an object matches with Person(“John”, _*), it is assigned to the variable ch. Since the pattern contains the asterisk
Person(...)* as ch
all elements of the sequence matching with this pattern are collected in ch. If no element matches with Person(...)* as ch, then ch is assigned to (). In order to match only non-empty sequences, we can use the pattern Person(...)* as ch if ch. One more example:
fix class A(n)
fix class B(n)
fix class C(x*)

C(A(1), B(2), A(3), B(4)) match {
  case C(B(_)* as c) => c.n
  case _ => “yo!”
}
  // 2  4

C(A(1), B(2), A(3), B(4)) match {
  case C(B(_) as c) => c.n
  case _ => “yo!”
}
  // 2
In the first query, pattern matching collects all instances of B in c. This result is obtained in accordance with matching rules 7 and 8. In the second query, since there is no asterisk, the pattern B(_) matches with the first appropriate element.

To interpret a name as a field name in a pattern (matching rule 3) the prefix @ must be used (without @ a name is considered as a local variable):
class C(fix n) {
  def eqn(x) = x match {
    case @n => “bingo”
    case _ => “wrong”
  }
}
In this example pattern matching works in the context of instances of C having the field n. In order to get an access to the value of this field, @n should be used.

External variables can be used in the if-block of a case expression:
def f(x) = this match {
  case z if z == x => “equal”
  case _ => “unequal”
}

1.f(1)  //  “equal”
In this example an access to the external variable x is performed. Note that the version
def f(x) = this match {
  case x => “equal”
  case _ => “unequal”
}
fails, because the variable x in the pattern is interpreted as local for the case expression.

If the body of a function consists only of a case block, then its more compact form is allowed. The definition
def f(x) {
  case z if z == x => “equal”
  case _ => “unequal”
}
is equivalent to
def f(x) = this match {
  case z if z == x => “equal”
  case _ => “unequal”
}
Immutable structures together with matching are convenient for processing structured data in the functional style – the role of immutable structures is close to that of algebraic types in functional programming. As an example, let us make a duck typing implementation of a list based on pattern matching:
fix class (hd) | (tl)
fix object NIL

def head {
  case (h | _) => h
  case NIL => ()
  case _ => error(“not a list”)
}
def tail {
  case (_ | t) => t
  case NIL => ()
  case _ => error(“not a list”)
}
To compare, introduce a definition based on inheritance:
class MyList
fix class (head) | (tail) extends MyList
fix object NIL extends MyList

def MyList head {
  case (h | _) => h
  case NIL => ()
}
def MyList tail {
  case (_ | t) => t
  case NIL => ()
}
Both versions are correct. In the duck typing version, a list is an arbitrary object, on which methods head and tail are defined. In the inheritance version, a list is an instance of the class MyList.

Now the concatenation function can be defined:
def conc(snd) {
  case NIL => snd
  case (h | t) => (h | t.conc(snd)) 
}
For pretty-printing of lists, the method toString can be redefined on the class |:
fix class (head) | (tail) {
  override toString {
    var elem = this
    var str
    while (elem) {
      elem match {
        case (hd | el) => {
          str = (if (str) str + “, ” else “<”) + hd
          elem = el
        }
        case _ => elem = ()
      }
    } 
    str + “>”
  }
}
Now,
((1 | NIL) | (2 | NIL)).conc(3 | NIL).toString
  //  <<1>, 2, 3> 

Assignment with Matching

The assignment operator can be combined with pattern matching. For instance,
fix class Person(name: String, hasChild: Person*)
object JOHN extends Person(“John”, Person(“Paul”))

{
  fix Person(name, ch) = JOHN
  name  //  John
  ch.name  //  Paul
}
Here pattern matching determines the values, which are assigned to the variables. Each variable – name and ch – is declared as immutable (the prefix fix sets this).

The assignment with pattern matching is applicable to any object, on which undo is defined:
object twice extends Function {
  def do(x) = x * 2
  def undo(z) = if (z mod 2 == 0} z div 2
}

{
  fix m = twice(21)
  m  //  42

  fix twice(n) = m
  n  //  21
}
The left-hand side of an assignment can contain a sequence of variables, for instance,
{
  var (x,y,z) = (1,2,3)
  x  //  1
  y  //  2
  z  //  3
}
If the number of variables does not correspond to the number of values, an exception occurs:
{
  var (x, y, z) = (1,2,3,4,5)
    // ERROR: Failed to match 
}
The last variable in a sequence can be marked by the asterisk, and then it can collect an arbitrary number of values:
{
  var (x, y, z*) = (1,2,3,4,5)
  x  //  1
  y  //  2
  z  //  3 4 5
}
If the number of elements in the right-hand side of the assignment is greater than the number of variables (as in the example above), then the rightmost variable is assigned to the tail of the sequence. The situation when the rightmost variable gets nothing is also correct:
{
  fix (x, y*) = 2
  x  //  2
  y  //  ()
}
Recall that the single value 2 is not distinguishable in Libretto from the singleton sequence (2).

The path assignment also can be combined with pattern matching. For instance:
fix class Person(name: String, age: Int)
var persons = (Person(“John”, 42), Person(“Tom”, 21))
Now:
persons as Person(nm, ag). “{nm} is {ag} years old”!
  //  John is 42 years old
  //  Paul is 21 years old
The variables nm and ag are immutable, they are initialized each time the operator as is performed. The scope of these variables is to the right end of the query.

For both assignment operators = and as the following holds: if the left- and right-hand sides of the assignment do not match, then an exception occurs.

Omitting Semicolons

The semicolon is used in Libretto for separating expressions in blocks. But in most cases semicolons can be omitted, because the Libretto parser is able to restore them.

The general rule here is as follows. When a program code is split into tokens, a semicolon is added to the end of a non-empty string, if the last token in this string is:

  • an identifier;
  • a literal (e.g. a string or a number);
  • one of keywords return, case;
  • a postfix operator (e.g. --) or a delimiter ‘)’, ‘]’, ‘}

The exception is: a semicolon is not put if the following string starts with the keyword else or with the opening curly bracket ‘{’.

No comments:

Post a Comment