Home              GitHub Twitter StackOverflow Slack Mail


A Cluster-Oriented Virtual Machine Architecture

The mm-ADT virtual machine is programmed using a low-level bytecode language that is specifically designed to integrate a heterogeneous collection of processing engines and storage systems. An mm-ADT program is a semi-cyclic graph of primitive functional operations called instructions. These instructions can be executed naturally by any processor based on the reactive pattern, the actor model, or any asynchronous or batch message-passing architecture. Processors drive swarms of traverser-monads over data structures stored in systems ranging from databases with random-access data paths to distributed file systems with linear data layouts optimized for sequential-access processing. Data structures are sorted into model-ADTs using mm-ADT’s type system which is capable of expressing canonical-types, refinement-types, dependent-types, and ultimately “Turing types” that leverage looping and branching instructions in support of universal, general-purpose, cluster-oriented, distributed computing.

The Structures of Obj

The base structure of mm-ADT is obj—object. Diagrammed below are the 7 fundamental, built-in types. Custom derived types can be defined by combining and constraining the built-in types. mm-ADT provides a collection of generally useful derived types such as short, long, varchar, complex, pair, etc. Users will typically define domain-specific types such as person or company.

Every obj is a carrier in an algebraic structure. The set of operators that each type draws from are denoted by the symbols *, +, /, -, &, |, and !. Along with operators, a type can specify a 0 and/or 1 identity element. The meaning of these symbols is overloaded in that each type specifies its own operator semantics. The default algebraic structures for the built-in types are provided in the table below.

type algebraic structure operators identities
bool commutative ring with unity *,+,-,&,|,! 0=>false,1=>true
int commutative ring with unity *,+,- 0=>0,1=>1
real field *,+, /, - 0=>0.0,1=>1.0
str synthetic group +,- 0=>''
list synthetic group +,- 0=>[]
rec synthetic group +,- 0=>[:]
inst ring with unity *,+,-,&,| 0=>[none],1=>[id]

An algebraic structure defines a set of axioms whose entailments yield theorems. In mm-ADT, theorems are used by the virtual machine to perform type checking, type inference, and program optimization. Furthermore, via the constructs of model-ADTs and embeddings, different algebraic structures (which ultimately use the same fundamental types as carriers) can be defined and mixed within the same computation. Models specify isolated algebraic environments and rules for moving between them. In the lexicon of category theory, models are categories and embeddings are functors. Thus, the “default algebraic structures” specified in the table above are understood to be a collection of models denoted mm—the core model-ADT from which all other models are ultimately derived.

mmadt> true | false             // bool or
mmadt> true + true              // bool exclusive or
mmadt> 'mar' + 'ko'             // str concatenation
mmadt> ['a':1,'b':2] - ['a':1]  // rec retraction
mmadt> 'hey' + 42               // int is first embedded in str

The Processes of Inst

An mm-ADT object is a static structure. A process is required to transform objects. All processes are described by the inst ring, where * is serial instruction composition (compose) and + is parallel instruction composition (branch). Thus, inst is unique in that it bridges the static world of structure and the dynamic world of process to yield the phenomena of computing.

The mm-ADT virtual machine’s instruction set architecture is defined by the elements of inst. The table below itemizes the 7 base instructions that all other instructions can be written in terms of. Note that there also exists a set of machine instructions for configuring the virtual machine (and its associated components) and thus, exist outside the described theortical framework.

instruction signature description
[map] X -> Y Transform an element of X to an element of Y.
[filter] X -> X{?} Transform an element of X to itself or nothing.
[flatmap] X -> Y{*} Transform an element of X to zero or more elements of Y.
[reduce] X{*} -> X Transform zero or more elements of X to an element of X.
[barrier] X{*} -> Y{*} Transform zero or more elements of X to zero or more elements of Y.
[initial] X{0} -> X Transform nothing to an element of X.
[terminal] X -> X{0} Transform an element of X to nothing.

The mmlang assembly language (distributed with mm-ADT) provides some human-friendly syntactic sugar which are compiled to inst—the algebraic structure used by the mm-ADT VM to compute. For instance, the first three expressions below ultimately compile to the final expression. The second expression best demonstrates an embedding. It takes the str 'mar' to the inst [start,'mar'], where * in the inst ring then enacts a serial composition with [plus,'ko'].

mmadt> 'mar' + 'ko'
mmadt> 'mar' * [plus,'ko']
mmadt> [start,'mar'] * [plus,'ko']
mmadt> [start,'mar'][plus,'ko']

The Embeddings of Model

The mm-ADT virtual machine was designed for cluster computing, where structures are distributed and processes are parallelized across any number of physical machines. The systems that store structures are databases and the engines that process instructions are stream pipelines. The database industry classifies its databases by their abstract data type (ADT). Examples database ADTs include relational, graph, key/value, and document. mm-ADT provides a set of model-ADT embeddings that define popular database ADTs in terms of mm-ADT objects.

A Key/Value Store Example

A model-ADT is stored on the virtual machine using [model] (a machine instruction). A key/value model-ADT is define below. This model defines 4 types: a key (k), a value (v), a key/value pair (kv), and the store as a whole (kvstore). A key is any obj, a value is any obj, and a key/value pair is a list with the first element being a key and the second element being a value. Finally, kvstore (which is not coincidentally also the name of the model) is understood as the “root” object of the database and is defined as zero or more key/value pairs.

 [define,k,       obj]
 [define,v,       obj]
 [define,kv,      [k;v]]
 [define,kvstore, kv{*}]]

A model contains both types and the instructions that map between them. The example above is elaborated upon below with the addition of 4 instruction rewrite rules. If a program tries to remove an element of a key/value pair, then an error is thrown. Similarly, if a program tries to change a key/value pair’s key, an error is thrown. Next, the stream of zero or more key/value pairs typed as kvstore must never contain duplicate keys and thus, a [dedup] on keys yields a rewrite to a no-op which simply removes the instruction from the program. Finally, when a new key/value pair is written to the store, a pair with an equivalent key is searched for and updated. If no such key/value pair exists, then the provided key/value pair is added to the stream (i.e. the database).

 [define,k,       obj]
 [define,v,       obj]
 [define,kv,      [k;v]
  -> [drop,0|1]           => [error]
  -> [put,0,k]            => [error]]
 [define,kvstore, kv{*}
  -> [dedup,[get,0]]      => 
  -> [plus,[k~a;v~b]]     => [coalesce,

The kvstore model-ADT defines an abstract data type that must ultimately be physically implemented by a database capable of encoding the requisite structures and their respective denotational semantics. The Apache Ignite project team develops a distributed key/value database called Ignite. In order for the mm-ADT VM to work over Ignite, a model-ADT must be defined that maps the constructs of Ignite to those of kvstore and thus, mm-ADT. The ignite model is defined below, where the set of possible keys and values are constrained. The ignite type extends kvstore. If Ignite is configured to sort its key/value pairs in ascending order by key, then a no-op occurs. If the number of key/value pairs is requested via [count], each node in the cluster will linearly iterate its entire key/value pair partition to compute a distributed count reduction. However, if this count can be computed more efficiently by Ignite (e.g. in less than O(n)), then [count] is rewritten to a [map] instruction that transforms the key/value stream (as a whole) to a single int. This int is dereferenced using the instructions on the right hand side of the maps from-token (<=), where [=] is a machine instruction that manages the connection between the VM and its integrated components (e.g. the Ignite cluster). An [eval] machine instruction issues an Ignite-specific remote procedure call (RPC) that offloads the count calculation from the VM’s integrated processors to Ignite. Finally, the last rewrite rule below leverages database indices in an analogous manner to the aforementioned [count] rewrite.

 [define,k,      kvstore.k  & (int|str)]
 [define,v,      kvstore.v  & (bool|int|real|str|list)]
 [define,kv,     kvstore.kv & [k;v]]
 [define,ignite, kvstore  <= [eval,'connect',['ip'   :'',
                                              'port' :10800,
  -> [order,[gt,[get,0]]] =>
  -> [count]              => [map, int   <=[=ignite][eval,'store-size']]
  -> [is,[get,0][eq,k~a]] => [map, kv{?} <=[=ignite][eval,'idx-query',a]]]]


mm-ADT The mm-ADT virtual machine can integrate any number of heterogeneous processing engines and storage systems into a universally configurable data processing platform. This amalgamation is made possible via the construct below which specifies a type (set) and the instructions necessary to manifest its instances (elements) within the mm-ADT address space. Its power comes from the fact that there is no fundamental distinction between a type and an instance in mm-ADT. An instance is simply a type that defines itself. While obj and inst may not always be explicitly coupled via <=, it is always there (e.g. via type inferencing). Expressions of this form represent the boundary between structure and process.

obj <= inst