Difference between revisions of "CNM Bureau Farm"

From CNM Wiki
Jump to: navigation, search
(COTS for storage)
(Storage platform)
Line 67: Line 67:
 
: The interaction between [[CNM ProxmoxVE]] instances and [[#The Infrastructure]] is carried out by [[Debian]] [[operating system]] ([[operating system|OS]]) that comes in the same [[commercial off-the-shelf|COTS]] "box" as [[ProxmoxVE]] and is specifically configured for that interaction.
 
: The interaction between [[CNM ProxmoxVE]] instances and [[#The Infrastructure]] is carried out by [[Debian]] [[operating system]] ([[operating system|OS]]) that comes in the same [[commercial off-the-shelf|COTS]] "box" as [[ProxmoxVE]] and is specifically configured for that interaction.
  
==Storage platform==
+
==The Storage==
To make objects, blocks, and files immediately available for [[#The Apps]]' operations, the ''Farm'' uses [[CNM Ceph]] as its [[storage]]. This common distributed cluster foundation orchestrates storage spaces of the individual ''Nodes''.
+
For the purposes of this very wikipage, the ''Storage'' refers to the storage platform or, simply, [[storage]] that supports [[#The Cluster]] and differs from the [[#Backup box]].
 +
 
 +
To make objects, blocks, and files immediately available for [[#The Apps]]' operations, the ''Farm'' uses [[CNM Ceph]]. This common distributed cluster foundation orchestrates storage spaces of the individual ''Nodes''.
  
 
===COTS for the Storage===
 
===COTS for the Storage===

Revision as of 18:47, 10 August 2023

CNM Bureau Farm (formerly known as CNM EndUser Farm; hereinafter, the Farm) is the CNM farm that hosts CNM Social, CNM Talk, and CNM Venture. For the purposes of this very wikipage, those three end-user applications are collectively called #The Apps and described in #The Apps section.

Technically, the Farm is a collection of commercial off-the-shelf (COTS) software. The Farm's end-users (hereinafter, the Patrons) work with #The Apps that are installed in #The Cluster that is, consequently, installed on #The Infrastructure that consists of #The Bridges and hardware, which includes #Bare-metal servers and #Backup box.

To eliminate a single point of failure, the Farm is built on not just one, but three #Bare-metal servers. Each of those hardware servers with all of software installed on the top of it (hereinafter, the Node) is self-sufficient to host #The Apps. #High availability (HA) tools orchestrate coordination between the Nodes.


The Apps

For the purposes of this very wikipage, the Apps refer to those end-user applications with which the Patrons interact. The Apps can be deployed utilizing two models:

  1. Using containers; they already contain operating systems tailored specifically to the needs of the App.
  2. In virtual machines (VM) and without containers. In that model, the App is installed on the operating system of its VM.

HumHub

CNM Social, which is the end-user instance of CNM HumHub.

Odoo

CNM Venture, which is the end-user instance of CNM Odoo.

Jitsi

CNM Talk, which is the end-user instance of CNM Jitsi.

The Cluster

For the purposes of this very wikipage, the Cluster refers to all of the software between #The Apps and #The Infrastructure.

Cluster components

the cluster of three Nodes that #High availability (HA) tools orchestrate.

Monitoring

Monitoring features are to be identified. Previously, various candidates offered three options:
  1. Stack -- prometheus + node-exporter + grafana
  2. Prometheus to monitor VMs, Influx to monitor Pve nodes, Grafana for Dashbord
  3. grafana + influxdb + telegraf, as well as zabbix. To monitor websites, use uptimerobot

Recovery

High availability (HA)

High availability (HA) of the Farm assumes that its higher uptime in comparison with a similar system without HA measures. For a while, the Farm was built only on one Node and, when it failed, services of #The Apps were no longer available until the failure was fixed.

HA principles

Principally, HA tools are based on:

HA limitations

Generally speaking, HA comes with significant costs. So does HA of the Farm. At very least, running three Nodes is more expensive than running one. The cost cannot exceed the benefit, so high availability cannot be equal to failure tolerance.

HA management

To manage redundant resources, #The Cluster:
  • Monitors its resources to identify whether they are operational or failed as described in the #Monitoring section.
  • Fences those resources that are identified as failed. As a result, non-operational resources are withdrawn from the list of available.
  • Restores those resources that are fenced. The #Recovery supports that feature, while constantly creating snapshots and reserve copies of the Farm and its parts in order to make them available for restoring when needed.

Virtual environment (VE)

CNMCyber Team uses virtualization to divide excessive hardware resources of #Bare-metal servers in smaller containers and virtual machines (VMs), which are created in virtual environments (VEs).

As its software for VEs, the Farm utilizes CNM ProxmoxVE. Every instance of CNM ProxmoxVE is installed on #Node OS, which require "physical" #Bare-metal servers. The Farm's CNM ProxmoxVE also utilizes #Storage platform as its storage.

COTS for VE

CNMCyber Team has tried OpenStack and VirtualBox as its virtualization tools. The trials suggested that OpenStack required more hardware resources and VirtualBox didn't allow for required sophistication in comparison with ProxmoxVE, which has been chosen as COTS for the Farm's virtualization.

HA for the Apps

When one of #The Apps fails, its work continues its sister application installed on the second Node. If another application fails, its work continues its sister application installed on the third Node. If the third application fails, the Farm cannot provide the Patrons with the Farm's services in full any longer.
To ensure that, the Farm utilizes tools that come with ProxmoxVE. Every virtual machine (VM) or container is kept on at least two Nodes. When the operational resource, VM or container, fails, CNM ProxmoxVE activates another resource and creates the third resource as a reserve. As a result, VM or container "migrates" from one Node to another Node.

VE provisioning

The interaction between CNM ProxmoxVE instances and #The Infrastructure is carried out by Debian operating system (OS) that comes in the same COTS "box" as ProxmoxVE and is specifically configured for that interaction.

The Storage

For the purposes of this very wikipage, the Storage refers to the storage platform or, simply, storage that supports #The Cluster and differs from the #Backup box.

To make objects, blocks, and files immediately available for #The Apps' operations, the Farm uses CNM Ceph. This common distributed cluster foundation orchestrates storage spaces of the individual Nodes.

COTS for the Storage

CNMCyber Team has tried OpenZFS and RAID as the Farm's storage. Initially, the original cluster developer proposed using Ceph. Later, the team substituted one node with another with higher hard disk, but without SSD and NVMe; as a result, the Farm's storage collapsed. The substituted node was disconnected (today, it serves as hardware for CNM Lab Farm), a new bare-metal server was purchased (today, it is the #Node 3 hardware) and Ceph restored.
As COTS, ProxmoxVE comes with OpenZFS. CNMCyber Team has deployed the combination of both in its CNM Lab Farm.

Deployment model

At the Farm, CNM Ceph is deployed at every Node. Every of #Bare-metal servers features doubled hard disks. Physically, CNM ProxmoxVE is installed on one disk of each Node; CNM Ceph uses three "second" disks. Since every disk is 512 GB, the Farm's CNM Ceph capacity is about 3 * 512 GB = 1.536 GB.
While experimenting with OpenZFS and RAID, CNMCyber Team has also tried another model. The second disks then served as reserve copies of the first ones.

HA for the Storage

When one DBMS fails, its work continues its sister DBMS installed on the second Node. When another DBMS fails, its work continues its sister DBMS installed on the third Node. If the third DBMS fails, the Farm can no longer provide #The Apps with the data it requires to properly work.
To ensure that, the Farm utilizes the #Storage platform. Every object, block, or file is kept on at least two Nodes. When any stored resource fails, #Storage platform activates another resource and creates the third resource as a reserve. As a result, any stored resource "migrates" from one Node to another Node.

Storage provisioning

Web architecture

For the purposes of this wikipage, "web architecture" refers to the Farm's outline of DNS records and IP addresses.

Channels and networks

The Farm's communication channels are built on the #Bare-metal servers and #The Bridges. Currently, the Farm uses three communication channels, each of which serves one of the network as follows:
  1. WAN (wide area network), which is the Farm's public network that uses external, public IPv4 addresses to integrate the #The Gateway into the Internet. The public network is described in the #LAN gateway section of this wikipage.
  2. LAN (local area network), which is the Farm's private network that uses internal, private IPv6 addresses to integrate #The Gateway and the Nodes into one network cluster. This network cluster is described in the #Virtual environment (VE) section of this wikipage.
  3. SAN (storage area network), which is the Farm's private network that uses internal, private IPv6 addresses to integrate storage spaces of the Nodes into one storage cluster. This storage cluster is described in the #Storage platform section of this wikipage.
The Farm's usage of IP addresses is best described in the #IP addresses section.

DNS zone

To locate the Farm's public resources in the Internet, the following DNS records are created in the Farm's DNS zone:
Field Type Data Comment (not a part of the records)
pm1.bskol.com AAAA record 2a01:4f8:10a:439b::2 Node 1
pm2.bskol.com AAAA record 2a01:4f8:10a:1791::2 Node 2
pm3.bskol.com AAAA record 2a01:4f8:10b:cdb::2 Node 3
pf.bskol.com A record 88.99.71.85 CNM pfSense
npm1.bskol.com A record 88.99.218.172 Node 1 Nginx
npm2.bskol.com A record 88.99.71.85 Node 2 Nginx
npm3.bskol.com A record 94.130.8.161 Node 3 Nginx
talk.cnmcyber.com A record 2a01:4f8:fff0:53::2 CNM Talk (CNM Jitsi)
venture.cnmcyber.com A record 2a01:4f8:fff0:53::3 CNM Venture (CNM Odoo)
social.cnmcyber.com A record 2a01:4f8:fff0:53::4 CNM Social (CNM HumHub)

IP addresses

To locate its resources in the #Communication channels, the Farm uses three types of IP addresses:
  1. To access #Virtual environment (VE) of various Nodes from the outside world, the Farm features public IPv6 addresses. One address is assigned to each Node. Since there are three Nodes, three addresses of that type are created.
  2. For an internal network of three Nodes, which is assembled on the internal Bridge, a private IP address is used. This network is not accessible from the Internet and not included in the Farm's DNS zone. For instance, the #Storage platform utilizes this network to synchronize its data. For this network, an address with the type "/24" is selected.
  3. For an external network of three Nodes, which is assembled on the external Bridge, the Farm features public IPv4 addresses. They are handled by #Web intermediaries.

The Gateway

For the purposes of this very wikipage, the Gateway refers to the composition of software that is built on the external Bridge. The Gateway is the hub for both Farm's wide area network (WAN) and local area network (LAN). To power the Gateway, CNM pfSense is deployed.

The composition of software such as a load balancer or reverse proxy that is built on the #External Bridge.

COTS for Gateway

CNMCyber Team has tried iptables as a firewall and Fail2ban, which operates by monitoring log files (e.g. /var/log/auth.log, /var/log/apache/access.log, etc.) for selected entries and running scripts based on them. Most commonly this is used to block selected IP addresses that may belong to hosts that are trying to breach the system's security. It can ban any host IP address that makes too many login attempts or performs any other unwanted action within a time frame defined by the administrator. Includes support for both IPv4 and IPv6.

Gateway functions

#The Gateway can be compared to an executive secretary, who (a) takes external client's requests, (b) serves as a gatekeeper, while checking validity of those requests, (c) when the request is valid, selects to which internal resource to dispatch it, (d) dispatches those requests to the selected resource, (e) gets internal responses, and (f) returns them back to the client in the outside world.
Thus, #The Gateway (a) receives requests from the world outside of the Farm, (b) serves as a firewall, while checking validity of those requests, (c) when the request is valid, selects to which Node to dispatch it, (d) dispatches those requests to the selected Node, (e) gets internal responses, and (f) returns those responses to the outside world.
#The Gateway is responsible for dispatching external requests to those and only to those internal resources that the Farm's #Monitoring has identified as operational. To be more accessible to its clients, #The Gateway utilizes public IPv4 addresses.

Gateway components

#The Gateway includes #Firewall and router, #Load balancer, and #Webserver.

Firewall and router

CNM pfSense plays roles of firewall, reverse proxy, and platform to which #Load balancer and #Webserver are attached.

Load balancer

As a load balancer, CNM pfSense uses the select version of HAProxy that is specifically configured as HAProxy's add-on. As of summer of 2023, no full HAProxy Manager exists in the Farm. As of summer of 2023, a round robin model is activated for load balancing.

Webserver

As its webserver, pfSense utilizes lighttpd. Prior to deployment of CNM pfSense, CNMCyber Team utilized two web servers to communicate with the outside world via HTTP. Nginx handled requests initially and Apache HTTP Server handled those requests that hadn't handled by Nginx.

Accesses

End-user access

The Patrons access #The Apps and #The Apps only. Those users cannot access #The Infrastructure and #The Cluster.
The Patrons access #The Apps via those IPv4 addresses that are associated with the particular application. Opplet.net provides the Patrons with access automatically or, by bureaucrats or other power-users, manually.

Power-user access

Power-users of the Farm (hereinafter, the Admins) are those users who have authorized to access more resources of the Farm than a regular Patron.
  1. Hardware-level admin. Administrative access to #Bare-metal servers and #Backup box is carried out without any IP addresses, through the administrative panel and administrative consoles that #Service provider grants to CNMCyber Customer. The customer grants hardware-level admin access personally.
  2. VE-level admin. Administrative access to #Virtual environment (VE), #The Gateway, and #Emergency tools is carried out through IPv6 addresses linked to those tools. Access credentials are classified and securely stored in CNM Lab.
  3. App-level admin. Administrative access to #The Apps is carried out through the IPv4 addresses associated with the particular application. At the moment, those accesses are provided by other Admins manually.

The Infrastructure

For the purposes of this very wikipage, the Infrastructure refers to those software and hardware that CNMCyber Team rents from the #Service provider. The rented hardware consists of #Bare-metal servers and #Backup box. The rented software refers to #The Bridges.

Service provider

Hetzner has been serving as CNMCyber Team's Internet service provider (ISP) and lessor of #The Infrastructure since 2016. Offers from other potential providers, specifically, Contabo and DigitalOcean, have been periodically reviewed, but no one else has offered any better quality/price rate on a long-term basis.

Choosing the hardware

Due to the lower cost, #Bare-metal servers were purchased via #Service provider's auction -- https://www.hetzner.com/sb?hdd_from=500&hdd_to=1000 -- based on the following assumptions:
  • Number: ProxmoxVE normally requires three nodes. The third node is needed to provide quorum; however, it shall not necessarily run applications. At the same time, Ceph requires three nodes at least.
  • Hard drives:
    1. The hard drive storage capacity for any Node shall be 512Gb at least.
    2. Because Ceph is selected to power the #Storage platform, any hard-drive of the Farm shall be both SSD and NVMe.
  • Processors:
    1. The processor frequency for two Nodes of the Farm shall be 32Gb at least. Processor frequency requirements to the third Node may be lower because of ProxmoxVE's characteristics.
    2. Those servers that deploys Intel Xeon E3-1275v5 processors are preferable over those servers that deploys Intel Core i7-7700 ones.
  • Location: At least two Nodes shall be located in the same data center. Although the #Service provider does not charge for internal traffic, this circumstance increases the speed of the whole Farm. If no nodes are available in the same data center, they shall be looked for in the same geographic location.
The hardware characteristics of the chosen Nodes are presented in #Bare-metal servers.

The Bridges

Сеть каждого Узла использует мост по выбираемой по умолчанию в Network Configuration модели.

Hetzner vSwitches (hereinafter, the Bridges) serve as bridges for #Communication channels to connect the Nodes in networks and switch from one Node to another one. The #Service provider provides CNMCyber Team with the Bridges; the team can order up to 5 connectors to be connected to one Node.

The Farm cannot support high availability of the Bridges. Resiliency of the Bridges is the courtesy of their owner, #Service provider.

The Farm utilizes two Bridges: They come with the lease of the Nodes.

Internal Bridge

Internal Bridge serves as the hub for node and storage networks. It is located on an internal, private IPv6 address to provide for data transfer between the Nodes and their storage spaces.

External Bridge

External Bridge serves as the hub for the public network, the Internet. It is located on external, public IPv4 address to provide for data transfer between the Farm's publicly-available and other Internet resources.

Backup box

A backup box is deployed on a 1 TB, unlimited traffic storage box BX-11 that has been rented for that purpose.

Basic features

10 concurrent connections, 100 sub-accounts, 10 snapshots, 10 automated snapshots, FTP, FTPS, SFTP, SCP, Samba/CIFS, BorgBackup, Restic, Rclone, rsync via SSH, HTTPS, WebDAV, Usable as network drive

COTS for backups

Initially, Proxmox Backup Server was used. However, it consumed the active storage. As a result, no additional COTS was deployed beyond Hetzner tools. The storage box was attached it on the Cluster as a storage. And backup to that storage goes directly from the Cluster.

Description

#Service provider's description: Storage Boxes provide you with safe and convenient online storage for your data. Score a Storage Box from one of Hetzner Online's German or Finnish data centers! With Hetzner Online Storage Boxes, you can access your data on the go wherever you have internet access. Storage Boxes can be used like an additional storage drive that you can conveniently access from your home PC, your smartphone, or your tablet. Hetzner Online Storage Boxes are available with various standard protocols which all support a wide array of apps. We have an assortment of diverse packages, so you can choose the storage capacity that best fits your individual needs. And upgrading or downgrading your choice at any time is hassle-free!

Bare-metal servers

The #Virtual environment (VE) are deployed on three bare-metal servers. As the result of #Choice of bare-metal, #Node 1 hardware, #Node 2 hardware, and #Node 3 hardware have been rented for that purpose.

Node 1 hardware

1 x Dedicated Root Server "Server Auction"
  • Intel Xeon E3-1275v5
  • 2x SSD M.2 NVMe 512 GB
  • 4x RAM 16384 MB DDR4 ECC
  • NIC 1 Gbit Intel I219-LM
  • Location: FSN1-DC1
  • Rescue system (English)
  • 1 x Primary IPv4

Node 2 hardware

1 x Dedicated Root Server "Server Auction"
  • Intel Xeon E3-1275v5
  • 2x SSD M.2 NVMe 512 GB
  • 4x RAM 16384 MB DDR4 ECC
  • NIC 1 Gbit Intel I219-LM
  • Location: FSN1-DC1
  • Rescue system (English)
  • 1 x Primary IPv4

Node 3 hardware

1 x Dedicated Root Server "Server Auction"
  • Intel Xeon E3-1275v5
  • 2x SSD M.2 NVMe 512 GB
  • 4x RAM 16384 MB DDR4 ECC
  • NIC 1 Gbit Intel I219-LM
  • Location: FSN1-DC1
  • Rescue system (English)

See also

Related lectures

Special terms

On this wiki page, the following terms are used:
  • Admin. A power-user of the Farm or any user who has authorized to access more resources of the Farm than a regular Patron.
  • Bridge. Any of two Hetzner vSwitche that the Farm utilizes.
  • Farm. CNM Bureau Farm, this very wikipage describes it.
  • Node. One Farm's hardware server with all of software installed on the top of it.
  • Patron. An end-user of the Farm.

Useful recommendations