Parallel Computing
Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard.
Drawbacks of Imperative Programming Languages
Imperative programming languages do have one mayor drawback: state. The concept of an imperative language is that commands are executed which change the content of variables or complex objects in the memory. When trying to create an optimizing compiler that from itself finds parallelizable parts in the code, the compiler has to keep track of data dependencies and the random side effects of each command and function call.
The possibly simplest solution to this problem is to tell the compiler exactly which loops are parallelizable. This however forces the developer to write nearly side-effect-free code. So we decided to go the pure way – to design a programming language that does not allow side-effects.
The Functional World
A "pure" functional programming language is a language where every function will compute its result only and only from its inputs. This builds a great basis for highly parallel map-reduce algorithms like we need in our clusterable in-memory database.
We took the scheme interpreter from Pieter Kelchtermans written in golang and added some extra features:
- We removed the
set!
instruction because it is the only function to cause global side effects All other functions are local to the current environment and as long as you don’t change the environment, every piece of code can be run in parallel without affecting each other - We made
begin
to open its own environment, so self recursion can be done by defining a function in a begin block (!begin
is the scopeless version) - We fixed
if
- We also allowed strings as native datatypes as well as the
concat
function which will concatenate all strings to one string - We added a serialization mechanism to fully recover values and turn them into valid scheme code again.
carli@launix-MS-7C51:~/projekte/memcp/server-node-golang$ make go run *.go > 45 ==> 45 > (+ 1 2) ==> 3 > (define currified_add (lambda (a) (lambda (b) (+ a b)))) ==> "ok" > ((currified_add 4) 5) ==> 9 > (define add_1 (currified_add 1)) ==> "ok" > (add_1 6) ==> 7 > (add_1 (add_1 3)) ==> 5 > (define name "Peter") ==> "ok" > (concat "Hello " name) ==> "Hello Peter" >
MemCP functions that support parallelism
The following functions support parallelism:
scan
runsfilter
,map
andreduce
in parallel for each shard,reduce2
is serialscan_order
runsfilter
as well as the sorting in parallel andmap
andreduce
in serialparallel
evaluates each given parameter in parallel and continues if all jobs are donenewsession
is a threadsafe key-value store to share context over threadsonce
andmutex
help to synchronize control flow
You can read the manual by typing (help "scan")
in the scheme console.
Conclusion
What did we achieve?
- We chose scheme to be our language of choice
- We stripped away those parts from scheme that make it unsafe for parallel computing
- We added some useful functions to scheme to fit our needs (string processing, parallelization primitives…)
- We implemented a serialization function that can recreate scheme code from memory objects that can be loaded on other machines
- Now we can start implementing our highly-parallel map-reduce algorithms that can take map and reduce lambda-functions, execute them in parallel and enjoy the highly parallel result