Cluster Monitor: Difference between revisions

From MemCP
Jump to navigation Jump to search
(Created page with "The cluster monitor lets you scale out MemCP over multiple nodes. Here is what you can achieve: * '''Multiple users''': each user has his own database * '''One big user''': Big tables are spread over multiple nodes * A mixture of both variants Each database and each shard (the piece where a part of a table's data is stored) can be placed either: * '''COLD''': only in storage backend * '''SHARED''': read-only on multiple nodes (e.g. the access table) * '''WRITE''': exc...")
 
 
(3 intermediate revisions by the same user not shown)
Line 10: Line 10:
* '''SHARED''': read-only on multiple nodes (e.g. the access table)
* '''SHARED''': read-only on multiple nodes (e.g. the access table)
* '''WRITE''': exclusively on one node
* '''WRITE''': exclusively on one node
Cluster monitors can coordinate all kinds of storage:
* [[File System]]
* [[S3 Buckets]]
* [[Ceph/Rados]]
== How to set up ==
TODO: describe how to edit (settings) to connect to other nodes and authenticate (add at least a common secret and optionally a list of nodes in the cluster)
== How it works ==
Each node in the network knows all other nodes.
Also, there is a distributed key map that tracks which resource is claimed by whom:
* COLD items are not tracked in the list at all to save space
* SHARED and WRITE items are tracked together with their owners
* when a user claims WRITE, all other owners must confirm that they give up their SHARED status
* when a user claims SHARED, the WRITE owner must confirm that he gives up his SHARED status
* after you claimed your access, you can use the storage backend to read/write the data
* when a node dosen't own a database, the database is shared
* when a node dosen't own a shard, the owner is asked to perform the computation
* when a shard is owned by noone but the shardlist is very big, other nodes are asked to claim the nodes and perform the computation
== ZooKeeper internals ==
* /memcp/resources/<rid> -> information about the resource
* /memcp/resources/<rid>/owner -> if a write lock exists
* /memcp/resources/<rid>/readers/<reader-id> -> which nodes read it

Latest revision as of 23:57, 31 August 2025

The cluster monitor lets you scale out MemCP over multiple nodes. Here is what you can achieve:

  • Multiple users: each user has his own database
  • One big user: Big tables are spread over multiple nodes
  • A mixture of both variants

Each database and each shard (the piece where a part of a table's data is stored) can be placed either:

  • COLD: only in storage backend
  • SHARED: read-only on multiple nodes (e.g. the access table)
  • WRITE: exclusively on one node

Cluster monitors can coordinate all kinds of storage:

How to set up

TODO: describe how to edit (settings) to connect to other nodes and authenticate (add at least a common secret and optionally a list of nodes in the cluster)

How it works

Each node in the network knows all other nodes.

Also, there is a distributed key map that tracks which resource is claimed by whom:

  • COLD items are not tracked in the list at all to save space
  • SHARED and WRITE items are tracked together with their owners
  • when a user claims WRITE, all other owners must confirm that they give up their SHARED status
  • when a user claims SHARED, the WRITE owner must confirm that he gives up his SHARED status
  • after you claimed your access, you can use the storage backend to read/write the data
  • when a node dosen't own a database, the database is shared
  • when a node dosen't own a shard, the owner is asked to perform the computation
  • when a shard is owned by noone but the shardlist is very big, other nodes are asked to claim the nodes and perform the computation

ZooKeeper internals

  • /memcp/resources/<rid> -> information about the resource
  • /memcp/resources/<rid>/owner -> if a write lock exists
  • /memcp/resources/<rid>/readers/<reader-id> -> which nodes read it