Difference between revisions of "CNM Bureau Farm"

From CNM Wiki
Jump to: navigation, search
(Habitat provisions)
Line 120: Line 120:
  
 
===Habitat provisions===
 
===Habitat provisions===
: Every instance of [[CNM ProxmoxVE]] requires a "physical" [[bare-metal server]]. The interaction between [[CNM ProxmoxVE]] instances and [[#The Infrastructure]] is carried out by [[Debian]] [[operating system]] ([[operating system|OS]]) that comes in the same ''COTS'' "box" as [[ProxmoxVE]] and is specifically configured for that interaction. The ''Farm's'' [[CNM ProxmoxVE]] also hosts [[#The Storage]] as its [[storage]].
+
: Every instance of [[CNM ProxmoxVE]] requires one "physical" [[bare-metal server]]. The interaction between [[CNM ProxmoxVE]] instances and [[#The Infrastructure]] is carried out by [[Debian]] [[operating system]] ([[operating system|OS]]) that comes in the same ''COTS'' "box" as [[ProxmoxVE]] and is specifically configured for that interaction. The ''Farm's'' [[CNM ProxmoxVE]] also hosts [[#The Storage]] as its [[storage]].
  
 
===UI of the Habitat===
 
===UI of the Habitat===

Revision as of 22:57, 20 August 2023

CNM Bureau Farm (formerly known as CNM EndUser Farm; hereinafter, the Farm) is the CNM farm that hosts CNM Social, CNM Talk, and CNM Venture. For the purposes of this very wikipage, those three end-user applications are collectively called #The Apps and briefly described in #The Apps section.

The Farm's end-users, who are called #The Patrons, work with #The Apps that are installed in #The Cluster that is, consequently, hosted by #The Infrastructure that consists of #The Bridges and #The Metal, which is the only hardware of the Farm.

Technically, the Farm is a collection of commercial off-the-shelf (COTS) software (hereinafter, the COTS). In plain English, the COTS is the software that is already available on the market. No single line of programming code is written specifically for the Farm. Only time-tested market-proven solutions have been used. #The Apps use HumHub, Jitsi, and Odoo instances. In #The Cluster, Ceph stands behind #The Storage, ProxmoxVE does so behind #The Habitat, and pfSense stands behind #The Gateway.

To mitigate a single point of failure (SPOF), the Farm is built on not just one, but three bare-metal servers. Each of those hardware servers with all of software installed on the top of it (hereinafter, the Node) is self-sufficient to host #The Apps. Various solutions such as #The uptime tools orchestrate coordination between the different Nodes.


Addressing the needs

Development and sustenance of the Farm address two business needs of CNMCyber Team (hereinafter, the Team). For the Team, the Farm shall serve as both #Tech side of the Apps and #Worksite.

Tech side of the Apps

The Team needs to provide #The Patrons with the services of #The Apps 24 hours a day, 7 days a week. The Farm shall power #The Apps technologically; their content is outside of the Farm's scope.

Worksite

The Team needs to provide those incumbents of CNMCyber jobs who work with the Farm's technologies with their worksite.

Farm users

For the purposes of this very wikipage, a farm user refer to any user of the Farm including both #The Patrons and #The Admins.

The Admins

For the purposes of this very wikipage, the Admins are power-users of the Farm. Literally, they are those members of the Team who have authorized to access more resources of the Farm than #The Patrons.
  1. Hardware-level admin. Administrative access to #The Metal is carried out without any IP addresses, through the administrative panel and administrative consoles that #Service provider grants to CNMCyber Customer. The customer grants hardware-level admin access personally; those admins access the hardware via #UI for the Metal.
  2. Cluster-level admin. Administrative access to #The Habitat, #The Storage, and #The Gateway is carried out through IPv6 addresses linked to those tools. Access credentials are classified and securely stored in CNM Lab. Those admins access the hardware via #UI for the Habitat.
  3. App-level admin. Administrative access to #The Apps is carried out through the IPv4 addresses associated with the particular application. At the moment, those accesses are provided by other Admins manually. Those admins access the hardware via #UIs for the Apps.
Those incumbents of CNMCyber jobs who work with the Farm's technologies are a part of the Admins. The Admins' access administration is a part of identity and access management (IAM).

The Patrons

For the purposes of this very wikipage, the Patrons are end-users of the Farm. They access #The Apps via #App-user UIs that are located at those IPv4 addresses that are associated with the particular application. Opplet.net provides the Patrons with access automatically or, by #The Admins, manually. The Patrons can access #The Apps and #The Apps only. Those users cannot access #The Infrastructure and/or #The Cluster. Only #The Admins can.

User interfaces (UI)

For the purposes of this very wikipage, any user interface (UI) refers to the software features of the COTS that allow a COTS instance and #The Admins to interact. The #UI of the Habitat, #UI of the Storage, #UI of the Gateway, #UI for backups sections of this very wikipage describe those UIs that the Cluster offers.

Admin UIs

Administrative access to #The Metal is carried out without any IP addresses, through the administrative panel and administrative consoles that #Service provider grants to CNMCyber Customer. The customer grants hardware-level admin access personally.

       VE-level admin. Administrative access to #The Habitat and #The Gateway is carried out through IPv6 addresses linked to those tools. Access credentials are classified and securely stored in CNM Lab.
       App-level admin. Administrative access to #The Apps is carried out through the IPv4 addresses associated with the particular application. At the moment, those accesses are provided by other Admins manually.

Patron UIs

The Patrons

   For the purposes of this very wikipage, the Patrons are end-users of the Farm. They access #The Apps via those IPv4 addresses that are associated with the particular application. Opplet.net provides the Patrons with access automatically or, by #The Admins, manually. The Patrons can access #The Apps and #The Apps only. Those users cannot access #The Infrastructure and/or #The Cluster. Only #The Admins can.

The Apps

For the purposes of this very wikipage, the Apps refer to those end-user applications with which #The Patrons interact.

HumHub

CNM Social, which is the end-user instance of CNM HumHub.

Jitsi

CNM Talk, which is the end-user instance of CNM Jitsi.

Odoo

CNM Venture, which is the end-user instance of CNM Odoo.

The Cluster

For the purposes of this very wikipage, the Cluster refers to all of the software between #The Apps and #The Infrastructure.

Cluster components

#The Cluster consists of three Nodes and their management tools. The following components compose #The Cluster:
  1. #The Storage that is powered by CNM Ceph to provide #The Habitat with stored objects, blocks, and files. Storage spaces of the three Nodes create one distributed storage foundation.
  2. #The Habitat that is powered by CNM ProxmoxVE to make containers and virtual machines (VMs) available to #The Apps, so #The Apps can function properly. Each of the three Nodes features its own environment; #The uptime tools orchestrate them all.
  3. #The Gateway that is powered by CNM pfSense to create a gateway between #The Habitat and the outside world. There is only one gateway; if it fails, the Farm fails.

Choice of COTS

While building the Farm generally and #Cluster components specifically, the Team utilized only open-source COTS that is free of charge. Other considerations for the choice are stated in the #COTS for the Habitat, #COTS for the Storage, #COTS for the Gateway, #COTS for backups sections of this very wikipage.

Cluster provisioning

Provisioning of #The Cluster occurs in the following sequence:
  1. Since #The Habitat is installed on the top of #The Infrastructure, the #Habitat provisions shall be accommodated first.
  2. Since #The Storage is a part of #The Habitat, #Storage provisions shall be accommodated second.
  3. Since #The Gateway is installed in #The Habitat, #Gateway provisions shall be accommodated third.

Cluster monitoring

Monitoring features are to be identified. Previously, various candidates offered three options:
  1. Stack -- prometheus + node-exporter + grafana
  2. Prometheus to monitor VMs, Influx to monitor Pve nodes, Grafana for Dashbord
  3. grafana + influxdb + telegraf, as well as zabbix. To monitor websites, use uptimerobot

Cluster recovery

High availability (HA)

Generally speaking, high availability (HA) of any system assumes its higher uptime in comparison with a similar system without higher uptime ability. HA of the Farm assumes its higher uptime in comparison with a similar farm built on one Node. Before #The uptime tools were deployed, the Farm functioned only on one Node and, when that Node failed, services of #The Apps were no longer available until the failure was fixed. Now, until at least one of the three Nodes is operational, the Farm is operational.

The uptime tools

Both #The Habitat and #The Storage feature advanced tools for #High availability (HA).
  • The CNM ProxmoxVE instances. With regards to the Farm's applications, when any application fails, its work continues its sister application installed on the second Node. If another application fails, its work continues its sister application installed on the third Node. If the third application fails, the Farm can no longer provide #The Patrons with the Farm's services in full. To ensure that, the Farm utilizes tools that come with ProxmoxVE. Every virtual machine (VM) or container is kept on at least two Nodes. When the operational resource, VM or container, fails on one instance, the second CNM ProxmoxVE instance activates its own resource and requests the third instance to create the third resource as a reserve. As a result, VM or container "migrates" from one Node to another Node.
  • The CNM Ceph instance. Because of distributed nature of Ceph, #High availability (HA) is the native feature of #The Storage. When one DBMS fails, its work continues its sister DBMS installed on the second Node. When another DBMS fails, its work continues its sister DBMS installed on the third Node. If the third DBMS fails, the Farm can no longer provide #The Apps with the data it requires to properly work.

Uptime limitations

Generally speaking, HA comes with significant costs. So does #The uptime tools. At very least, running three Nodes is more expensive than running one. The cost cannot exceed the benefit, so high availability (HA) cannot be equal to failure tolerance.

Uptime management

To manage redundant resources, #The uptime tools:
  • Monitor its resources to identify whether they are operational or failed as described in the #Monitoring section of this very wikipage.
  • Fence those resources that are identified as failed. As a result, non-operational resources are withdrawn from the list of available.
  • Restore those resources that are fenced. The #Recovery supports that feature, while constantly creating snapshots and reserve copies of the Farm and its parts in order to make them available for restoring when needed.

Uptime principles

Principally, #High availability (HA) of #The Cluster is based on:

The Habitat

For the purposes of this very wikipage, the Habitat refers to the virtual environment (VE) of #The Cluster or, allegorically, to the habitat where #The Apps "live".

COTS for the Habitat

As the COTS for the Habitat, the Team utilizes CNM ProxmoxVE. For a while, the Team has also tried OpenStack and VirtualBox as its virtualization tools. The trials suggested that OpenStack required more hardware resources and VirtualBox didn't allow for required sophistication in comparison with ProxmoxVE, which has been chosen as the COTS for the Farm's virtualization.

Habitat features

The Team uses virtualization to divide excessive hardware resources of the Node's servers in smaller containers and virtual machines (VMs), which are created in the Habitat to host #The Apps, #The Gateway, and other applications.

Habitat functions

#The Apps, #The Gateway, and other applications can be deployed utilizing two models:
  1. Using containers; they already contain operating systems tailored specifically to the needs of the App.
  2. In virtual machines (VM) and without containers. In that model, the App is installed on the operating system of its VM.

Habitat provisions

Every instance of CNM ProxmoxVE requires one "physical" bare-metal server. The interaction between CNM ProxmoxVE instances and #The Infrastructure is carried out by Debian operating system (OS) that comes in the same COTS "box" as ProxmoxVE and is specifically configured for that interaction. The Farm's CNM ProxmoxVE also hosts #The Storage as its storage.

UI of the Habitat

#User interfaces (UI)

The Storage

For the purposes of this very wikipage, the Storage refers to the storage platform or database management system (DBMS) that provides #The Apps with the storage they need to operate. Thus, the Storage supports #The Habitat's non-emergency operations and differs from the #Backup box that comes into play in emergencies.

COTS for the Storage

As the COTS for the Storage, the Team utilizes CNM Ceph. Any ProxmoxVE instance requires some storage to operate.
Before deploying #The uptime tools, the Team used RAID to make the double hard disks redundant. So, the ProxmoxVE instance was just installed on the top of one disk and replicated to the other disk automatically. Flexibly, ProxmoxVE allows for better usage of hard disks. ProxmoxVE can be configured to host many storage-type COTS such as ZFS, NFS, GlusterFS, and so on.
Initially, the cluster developer proposed using Ceph. Later, the Team substituted one node with another with higher hard disk, but without SSD and NVMe; as a result, the Farm's storage collapsed. The substituted node was disconnected (today, it serves as hardware for CNM Lab Farm), a new bare-metal server was purchased (today, it is the #Node 3 hardware) and Ceph restored.
As the COTS, ProxmoxVE comes with OpenZFS. The Team has deployed the combination of both in its CNM Lab Farm.

Storage features

Storage functions

To make objects, blocks, and files immediately available for #The Apps' operations, the Farm uses a common distributed cluster foundation that orchestrates storage spaces of the individual Nodes.

Storage provisions

Since #The Storage is installed on the top of #The Habitat, the Storage provisioning entails configuring a CNM ProxmoxVE instance to work with a CNM Ceph instance.
At the Farm, CNM Ceph is deployed at every Node. Each of the Node's servers features doubled hard disks. Physically, CNM ProxmoxVE is installed on one disk of each Node; CNM Ceph uses three "second" disks. So, the Farm features three instances of CNM ProxmoxVE and one instance of CNM Ceph.
While experimenting with OpenZFS and RAID, the Team has also tried another model. The second disks then served as reserve copies of the first ones. Since every disk is just 512 GB, that model shrank the Farm's capacity in a half since both #The Apps and their storage needed to fit the 512 GB limitation together.
In the current model, #The Apps shouldn't share their 512 GB with the storage. On another hand, the Farm's CNM Ceph capacity is about 3 * 512 GB = 1.536 GB.

UI of the Storage

#User interfaces (UI)

The Gateway

For the purposes of this very wikipage, the Gateway refers to the composition of software that is built on the external Bridge. The Gateway is the hub for both Farm's wide area network (WAN) and local area network (LAN). To power the Gateway, CNM pfSense is deployed.

The composition of software such as a load balancer or reverse proxy that is built on the #External Bridge.

COTS for the Gateway

As the COTS for the Gateway, the Team utilizes CNM pfSense. For a while, the Team has also tried iptables as a firewall and Fail2ban, which operates by monitoring log files (e.g. /var/log/auth.log, /var/log/apache/access.log, etc.) for selected entries and running scripts based on them. Most commonly this is used to block selected IP addresses that may belong to hosts that are trying to breach the system's security. It can ban any host IP address that makes too many login attempts or performs any other unwanted action within a time frame defined by the administrator. Includes support for both IPv4 and IPv6.

Gateway features

Gateway functions

FreeBSD, HA, VPN, LDAP, backups, CARP VIP

#The Gateway can be compared to an executive secretary, who (a) takes external client's requests, (b) serves as a gatekeeper, while checking validity of those requests, (c) when the request is valid, selects to which internal resource to dispatch it, (d) dispatches those requests to the selected resource, (e) gets internal responses, and (f) returns them back to the client in the outside world.
Thus, #The Gateway:
  1. (constantly) Is monitoring state of internal resources of the Farm.
  2. Receives requests from the world outside of the Farm.
  3. Serves as a firewall, while checking validity of those requests.
  4. When the request is valid, selects to which Node to dispatch it. #The Gateway is responsible for dispatching external requests to those and only to those internal resources that the Farm's #Monitoring has identified as operational.
  5. Dispatches those requests to the selected Node.
  6. Collects internal responses.
  7. Returns those responses to the outside world.
To be more accessible to its clients, #The Gateway utilizes public IPv4 addresses.

Gateway provisions

#The Gateway is deployed in a virtual machine (VM) of #The Habitat.

UI of the Gateway

#User interfaces (UI)

Gateway components

#The Gateway includes #Firewall and router, #Load balancer, and #Webserver.

Firewall and router

CNM pfSense plays roles of firewall, reverse proxy, and platform to which #Load balancer and #Webserver are attached.

Load balancer

As a load balancer, CNM pfSense uses the select version of HAProxy that is specifically configured as HAProxy's add-on. As of summer of 2023, no full HAProxy Manager exists in the Farm. As of summer of 2023, a round robin model is activated for load balancing.

Webserver

As its webserver, pfSense utilizes lighttpd. Prior to deployment of CNM pfSense, The Team utilized two web servers to communicate with the outside world via HTTP. Nginx handled requests initially and Apache HTTP Server handled those requests that hadn't handled by Nginx.

The Bridges

Hetzner vSwitches (hereinafter, the Bridges) serve as bridges for #Communication channels to connect the Nodes in networks and switch from one Node to another one. #The Cluster utilizes two Bridges, which are #Internal Bridge and #External Bridge.

The #Service provider provides the Team with the Bridges; They come with the lease of #The Metal. The Team can order up to 5 connectors to be connected to one Node.

The Farm cannot support high availability of the Bridges. Resiliency of the Bridges is the courtesy of their owner, #Service provider.

External Bridge

External Bridge serves as the hub for the public network, the Internet. It is located on external, public IPv4 address to provide for data transfer between the Farm's publicly-available and other Internet resources.

Сеть каждого Узла использует мост по выбираемой по умолчанию в Network Configuration модели.

Internal Bridge

Internal Bridge serves as the hub for node and storage networks. It is located on an internal, private IPv6 address to provide for data transfer between the Nodes and their storage spaces.

Сеть каждого Узла использует мост по выбираемой по умолчанию в Network Configuration модели.

Web architecture

For the purposes of this wikipage, "web architecture" refers to the Farm's outline of DNS records and IP addresses.

Channels and networks

The Farm's communication channels are built on the #Bare-metal servers and #The Bridges. Currently, the Farm uses three communication channels, each of which serves one of the network as follows:
  1. WAN (wide area network), which is the Farm's public network that uses external, public IPv4 addresses to integrate the #The Gateway into the Internet. The public network is described in the #The Gateway section of this wikipage.
  2. LAN (local area network), which is the Farm's private network that uses internal, private IPv6 addresses to integrate #The Gateway and the Nodes into one network cluster. This network cluster is described in #The Habitat section of this very wikipage.
  3. SAN (storage area network), which is the Farm's private network that uses internal, private IPv6 addresses to integrate storage spaces of the Nodes into one storage cluster. This storage cluster is described in #The Storage section of this wikipage.
The Farm's usage of IP addresses is best described in the #IP addresses section.

DNS zone

To locate the Farm's public resources in the Internet, the following DNS records are created in the Farm's DNS zone:
Field Type Data Comment (not a part of the records)
pm1.bskol.com AAAA record 2a01:4f8:10a:439b::2 Node 1
pm2.bskol.com AAAA record 2a01:4f8:10a:1791::2 Node 2
pm3.bskol.com AAAA record 2a01:4f8:10b:cdb::2 Node 3
pf.bskol.com A record 88.99.71.85 CNM pfSense
talk.cnmcyber.com A record 2a01:4f8:fff0:53::2 CNM Talk (CNM Jitsi)
venture.cnmcyber.com A record 2a01:4f8:fff0:53::3 CNM Venture (CNM Odoo)
social.cnmcyber.com A record 2a01:4f8:fff0:53::4 CNM Social (CNM HumHub)

IP addresses

To locate its resources in the #Communication channels, the Farm uses three types of IP addresses:
  1. To access #The Habitat of various Nodes from the outside world, the Farm features public IPv6 addresses. One address is assigned to each Node. Since there are three Nodes, three addresses of that type are created.
  2. For an internal network of three Nodes, which is assembled on the #Internal Bridge, a private IP address is used. This network is not accessible from the Internet and not included in the Farm's DNS zone. For instance, #The Storage utilizes this network to synchronize its data. For this network, an address with the type "/24" is selected.
  3. For an external network of three Nodes, which is assembled on the #External Bridge, the Farm features public IPv4 addresses. They are handled by #Web intermediaries.

SSL certificates

The Infrastructure

For the purposes of this very wikipage, the Infrastructure refers to those software and hardware that the Team rents from the #Service provider. The rented hardware consists of #Bare-metal servers and #Backup box. The rented software refers to #The Bridges.

Service provider

Hetzner has been serving as the Team's Internet service provider (ISP) and lessor of #The Infrastructure since 2016. Offers from other potential providers, specifically, Contabo and DigitalOcean, have been periodically reviewed, but no one else has offered any better quality/price rate on a long-term basis.

Choosing the metal

Due to the lower cost, #Bare-metal servers were purchased via #Service provider's auction -- https://www.hetzner.com/sb?hdd_from=500&hdd_to=1000 -- based on the following assumptions:
  • Number: ProxmoxVE normally requires three nodes. The third node is needed to provide quorum; however, it shall not necessarily run applications. At the same time, Ceph's non-emergency operations require three nodes at least.
  • Hard drives:
    1. The hard drive storage capacity for any Node shall be 512Gb at least.
    2. Because Ceph is selected to power #The Storage, any hard-drive of the Farm shall be both SSD and NVMe.
  • Processors:
    1. The processor frequency for two Nodes of the Farm shall be 32Gb at least. Processor frequency requirements to the third Node may be lower because of ProxmoxVE's characteristics.
    2. Those servers that deploys Intel Xeon E3-1275v5 processors are preferable over those servers that deploys Intel Core i7-7700 ones.
  • Location: At least two Nodes shall be located in the same data center. Although the #Service provider does not charge for internal traffic, this circumstance increases the speed of the whole Farm. If no nodes are available in the same data center, they shall be looked for in the same geographic location.
The hardware characteristics of the chosen Nodes are presented in the #Bare-metal servers section below.

The Metal

#The Habitat is mostly deployed on three bare-metal servers. As the result of #Choosing the metal, #Node 1 hardware, #Node 2 hardware, #Node 3 hardware, and #Storagebox hardware have been rented for that purpose.

Node 1 hardware

1 x Dedicated Root Server "Server Auction"
  • Intel Xeon E3-1275v5
  • 2x SSD M.2 NVMe 512 GB
  • 4x RAM 16384 MB DDR4 ECC
  • NIC 1 Gbit Intel I219-LM
  • Location: FSN1-DC1
  • Rescue system (English)
  • 1 x Primary IPv4

Node 2 hardware

1 x Dedicated Root Server "Server Auction"
  • Intel Xeon E3-1275v5
  • 2x SSD M.2 NVMe 512 GB
  • 4x RAM 16384 MB DDR4 ECC
  • NIC 1 Gbit Intel I219-LM
  • Location: FSN1-DC1
  • Rescue system (English)
  • 1 x Primary IPv4

Node 3 hardware

1 x Dedicated Root Server "Server Auction"
  • Intel Xeon E3-1275v5
  • 2x SSD M.2 NVMe 512 GB
  • 4x RAM 16384 MB DDR4 ECC
  • NIC 1 Gbit Intel I219-LM
  • Location: FSN1-DC1
  • Rescue system (English)

Storagebox hardware

Backup box

A backup box is deployed on a 1 TB, unlimited traffic storage box BX-11 that has been rented for that purpose.

COTS for backups

The Team utilizes no additional COTS beyond CNM ProxmoxVE for backups. Initially, Proxmox Backup Server was used. However, it consumed the active storage. As a result, the storage box was just attached to #The Habitat. And backup to that storage goes directly from and to #The Habitat.

Box features

10 concurrent connections, 100 sub-accounts, 10 snapshots, 10 automated snapshots, FTP, FTPS, SFTP, SCP, Samba/CIFS, BorgBackup, Restic, Rclone, rsync via SSH, HTTPS, WebDAV, Usable as network drive

Box functions

#Service provider's description: Storage Boxes provide you with safe and convenient online storage for your data. Score a Storage Box from one of Hetzner Online's German or Finnish data centers! With Hetzner Online Storage Boxes, you can access your data on the go wherever you have internet access. Storage Boxes can be used like an additional storage drive that you can conveniently access from your home PC, your smartphone, or your tablet. Hetzner Online Storage Boxes are available with various standard protocols which all support a wide array of apps. We have an assortment of diverse packages, so you can choose the storage capacity that best fits your individual needs. And upgrading or downgrading your choice at any time is hassle-free!

Box provisions

UI for backups

#User interfaces (UI)

See also

Related lectures

Special terms

On this very wikipage, the following abbreviations and terms are commonly used:

Useful recommendations