We want the guests to form a cluster and work together to enable a service. The guests and hosts should be able to talk to each other.For our purposes, we prefer a 'host-only', 'private' network with the following criteria. It offers a variety of networking options (sometimes daunting as I found out) to expand/limit the accessibility/capability of the guests. There are prebuilt images as well of any number of open source Linux distributions that you can simply drop in for the guest OS.
Virtualbox is free to use, runs very well on my Linux laptop (Ubuntu 15.04 64bit on my laptop with 8 core i7, 2.2GHz CPU, 16GB RAM), and has extensive documentation on how to control the various aspects of the hosts to be created.
We use Oracle's Virtualbox as the provider of guest virtual hosts.
Install a distributed data store on this cluster of guests.Later we will extend the same means to run services on a cluster of nodes provided by AWS Develop a means to run a virtual cluster of a few nodes ('guests') where the guests for now are carved out of my laptop by Virtualbox.We are not going to delve into all that here but our goals for this post are more pragmatic: Applications running on distributed storage & CPU have to deal with their own issues like keeping a CPU busy on the data that is 'local' to it, making sure that cluster members are aware of one another and know who has what piece of the data, and perhaps elect a leader/master as needed for coordination, writes etc., as the implementation details vary across systems. The reader is referred to Hadoop: the Definitive Guide, where Tom White goes over these scale issues in depth. NoSQL databases that skimp on relations (the 'R' of RDBMS) to allow for simpler horizontal scaling have become the go-to datastores nowadays for applications that need to scale large as in facebook/google large. But vertical scaling has its limits and, more importantly, becomes very expensive well before hitting those limits. Intuitively, the latter model is appealing as it sounds like less work! In the traditional RDBMS centric applications, there was no choice, and vertical scaling actually made sense because it is difficult to do joins across large distributed data tables. This is as opposed to 'Horizontal Scaling,' where more servers are simply added to the mix to handle the extra demand. Backgroundīackend capacity scaling in the face of increasing front-end demand has generally been addressed by replacing weaker servers with more powerful ones, CPU/RAM/disk wise - so-called 'Vertical Scaling'. The artifacts used in this article can be downloaded from Github. The goal for this post is to build a clustered virtual appliance offering Elasticsearch as a service that can be consumed/controlled by a host machine.
A simple way to simulate a distributed storage and compute environment is with Virtualbox as the provider of VMs ('Virtual Machines') and Vagrant as the front-end scripting engine to configure, start, and stop those VMs.