Parallel Computing

From MemCP
Jump to navigation Jump to search

Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard.

Drawbacks of Imperative Programming Languages

Imperative programming languages do have one mayor drawback: state. The concept of an imperative language is that commands are executed which change the content of variables or complex objects in the memory. When trying to create an optimizing compiler that from itself finds parallelizable parts in the code, the compiler has to keep track of data dependencies and the random side effects of each command and function call.

The possibly simplest solution to this problem is to tell the compiler exactly which loops are parallelizable. This however forces the developer to write nearly side-effect-free code. So we decided to go the pure way – to design a programming language that does not allow side-effects.

The Functional World

A "pure" functional programming language is a language where every function will compute its result only and only from its inputs. This builds a great basis for highly parallel map-reduce algorithms like we need in our clusterable in-memory database.

We took the scheme interpreter from Pieter Kelchtermans written in golang and added some extra features:

  • We removed the set! instruction because it is the only function to cause global side effects All other functions are local to the current environment and as long as you don’t change the environment, every piece of code can be run in parallel without affecting each other
  • We made begin to open its own environment, so self recursion can be done by defining a function in a begin block (!begin is the scopeless version)
  • We fixed if
  • We also allowed strings as native datatypes as well as the concat function which will concatenate all strings to one string
  • We added a serialization mechanism to fully recover values and turn them into valid scheme code again.
carli@launix-MS-7C51:~/projekte/memcp/server-node-golang$ make
go run *.go
> 45
==> 45
> (+ 1 2)
==> 3
> (define currified_add (lambda (a) (lambda (b) (+ a b))))
==> "ok"
> ((currified_add 4) 5)
==> 9
> (define add_1 (currified_add 1))
==> "ok"
> (add_1 6)
==> 7
> (add_1 (add_1 3))
==> 5
> (define name "Peter") 
==> "ok"
> (concat "Hello " name)
==> "Hello Peter"
> 

MemCP functions that support parallelism

The following functions support parallelism:

  • scan runs filter, map and reduce in parallel for each shard, reduce2 is serial
  • scan_order runs filter as well as the sorting in parallel and map and reduce in serial
  • parallel evaluates each given parameter in parallel and continues if all jobs are done
  • newsession is a threadsafe key-value store to share context over threads
  • once and mutex help to synchronize control flow

You can read the manual by typing (help "scan") in the scheme console.

Conclusion

What did we achieve?

  • We chose scheme to be our language of choice
  • We stripped away those parts from scheme that make it unsafe for parallel computing
  • We added some useful functions to scheme to fit our needs (string processing, parallelization primitives…)
  • We implemented a serialization function that can recreate scheme code from memory objects that can be loaded on other machines
  • Now we can start implementing our highly-parallel map-reduce algorithms that can take map and reduce lambda-functions, execute them in parallel and enjoy the highly parallel result