Difference between revisions of "CNM Farms"

Latest revision as of 23:27, 14 April 2024

CNM Farms (formerly known as CNM Servers, sometimes referred as CNM Platform; hereinafter, the Farms) is The combination of computing servers and container engines that host Opplet, as well as those parts of CNM Lab Farm that are provisioned to run serverless systems. The Farms provide both Worldopp Middleware and CNM apps with functionality.

Application servers

Currently, every of four existing application servers that support Opplet is build on one droplet of DigitalOcean. Every of these Platform features 2 GB Memory, 50 GB Disk, and Ubuntu 18.04.2 x64.

All of those Platform are for compute. However, some need for one or more testing servers, tentatively called a fellow server, is identified. Some control servers may also be considered.

Campus Farm

Main wikipage: CNM Campus Farm

The CNM Campus Farm supports...

Fed Farm

Main wikipage: CNM Fed Farm

The CNM Fed Farm supports one instance of WorldOpp Middleware (Opplet; shall be located at https://cabin.friendsofcnm.org).

Bureau Farm

Main wikipage: CNM Bureau Farm

The CNM Bureau Farm is a production server that shall support:

One instance of Educaship Moodle (Moodle; currently, located at https://cert.friendsofcnm.org).
One instance of CNM Mailware (currently, Roundcube; currently, not located at any URL).
One instance of Educaship GitLab (Redmine and SVN linked to Bitbucket's file storage; currently, located at https://lab.friendsofcnm.org).
One instance of CNM Linkupware (SuiteCRM; currently, located at https://linkup.friendsofcnm.org).
Two instances of Educaship MediaWiki setup for two languages (MediaWiki; currently, located at https://wiki.friendsofcnm.org).
Several instances of Educaship WordPress (WordPress; currently, located at https://worldopp.org).
One instance of Educaship AVideo (YouPHPTube; currently, located at https://tube.friendsofcnm.org).
One instance of Educaship HumHub (HumHub; shall be located at https://social.friendsofcnm.org).

Lab Farm

Main wikipage: CNM Lab Farm

The CNM Lab Farm is used for learning and testing. It shall support all the applications installed at the EndUser Farm and, in addition, one Humhub instance.

Support servers

Database server

No database server is currently deployed.

File server

No file server is currently deployed. CNM Lab utilizes Bitbucket to satisfy its file storage needs.

Mail server

Several CNM apps currently deploy Sendmail as their mail servers.

Web server

All web servers of Opplet are currently built on Apache HTTP Servers.

Development

The development of the Platform can be divided in two parts -- the historical endeavors and further projects.

Historical endeavors

Main wikipage: CNM Cloud Project

Historically, the WorldOpp Fellow Staff has undertaken the endeavors to develop the Platform under the CNM Cloud Project and, as of July of 2019, Romanof has completed the overwhelming majority of that work.

Further projects

The promising cloud service model of Opplet that shall offer services of its ecosystem of servers and cloud OS, which is OpenStack. This infrastructure enables operations of CNM apps.

Architecture

The Infrastructure supports cloud functions as follows:

Development and production. If more than one, this cloud servers shall be located in the same data center.
Backup
Testing
Demonstration.

Architectures of separate servers have not determined yet. If we can not think of anything better, we can take https://ru.hetzner.com/hosting/produkte_vserver/private-cloud

Requirements -- first draft

The Infrastructure:

Utilizes OpenStack as the cloud operating system;
Utilizes OpenStack Keystone based on LDAP as identity provider;
Shall enable both Opplet and CNM apps;
Most likely, will use hetzner.de dedicated servers at the beginning.

Backup and recovery

The backup work model and the ability to restore an operational state, including backups for all of its own data and development, are core requirements. Topics to be addressed are:

Restarts:
1. What parts and why shall be restarted;
2. How -- by request and automatically -- restarts shall be initiated
3. How often?
4. What are the dependencies of a restart?
5. What are the prerequisites of a restart?
6. Why are potential challenges or limitations faced when initiating a restart?
7. What is the plan if a restart fails?
8. What is the duration of a restart?
9. What is the outage?
Outside docker-registries with the appropriate repositories:
1. Where is the docker-registry located?
2. Why was the docker registry created?
3. Explain what is included in the repositories.
4. Why are the repositories necessary?
Push launches:
1. A push procedure shall be planned with a corresponding increase in the number of subversions.
2. What is a monthly frequency desirable?
3. Why is there a corresponding increase in the number of subversions per push?
4. What are the expected challenges faced (if any)?
5. What is done in the initial process to alleviate these challenges?
6. What are the unexpected challenges faced?
7. How are the unexpected challenges (if any) resolved?
8. What would you do differently if you know what the challenges are going to be?
Tested recovery:
1. Why is recovery testing necessary?
2. When is recovery tested?
3. How do you test recovery?
4. List the steps shall be taken when testing recovery.
5. List the steps to initiating the backup.
6. List expected or unexpected results.
7. How lessons learned will be managed?
Backup
1. What and how to backup?
2. Where to store the backup data?
3. What is a frequency desired? Why is the backup frequency selected as optimal?
4. What are the steps to initiating a non-planned backup?

Restarts

All the parts in the cluster shall be able to be restarted if the restart is initiated by request. The detailed instructions for these restarts shall be a part of the Infrastructure.
Automatic restarts shall be executed if the service gets stuck.

-> What are the dependencies of a restart?

If we are using the the provider network then we can restart the controller and compute individually or both at once. But the virtual machines running on the  host will get restarted. In case of self service the communication between the virtual machines can be interrupted

-> What are the prerequisites of a restart?

If you want to need have frequent restart then you can setup openstack in high availablity mode so that VM's won't do down. for that we can do that using storage Server like CEPH

-> Why are potential challenges or limitations faced when initiating a restart?

 Improper restart can damage services like database or network.

-> What is the plan if a restart fails?

 If we have set up HA then the loss will be neglibile. Otherwise we have to evacuate the virtual machines from the server.

-> What is the duration of a restart?

Restart of server can take 5 minutes min. Restart of server can take 1 minute.

-> What is the outage?

Outage can be a power ,network or resources outage. That can make the resources

=================

=> Outside docker-registries with the appropriate repositories:

=================

-> Where is the docker-registry located?

We can use public or private registry

-> Why was the docker registry created?

It is repository that contains docker images. So that we can pull own request

-> Explain what is included in the repositories.

Docker repositories only contains docker images. It is a webbased panel where we can request the docker image

-> Why are the repositories necessary? Public repositories have a purpose to collect all the images. So that they can be shared to everyone and anyone can use it according to their requirement. Private repositories are for private use like they are only available within the organization.

=================

=> Push launches:

=================

A push procedure shall be planned with a corresponding increase in the number of subversions. -> What is a monthly frequency desirable? -> Why is there a corresponding increase in the number of subversions per push? -> What are the expected challenges faced (if any)? -> What is done in the initial process to alleviate these challenges? -> What are the unexpected challenges faced? -> How are the unexpected challenges (if any) resolved? -> What would you do differently if you know what the challenges are going to be?

I am sorry I don't have much experience with git or SVN. But usually the changes are needed be pushed after the code is final even if it is just a subpart.

====================

=> Tested recovery:

=======================

-> Why is recovery testing necessary?

 Recovery testing is necessary to ensure that we taking proper,complete and working backup for the data or instance.

-> When is recovery tested?

It can only be tested after the setup is complete and data/application is runnning on the backup.

> How do you test recovery?

In case of openstack we can use the snapshot. In case of a website we can test it with the web data and database of the website.

-> List the steps to initiating the backup. There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.

-> List expected or unexpected results.

-> How lessons learned will be managed?

=============================

=> Backup

=======================

-> What and how to backup? There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.

-> Where to store the backup data? We can store images in seperate drive by mounting it on the /var/lib/glane/images. So we can have seperate Storage.

-> What is a frequency desired? Why is the backup frequency selected as optimal? Creating the whole backup of cinder drive can increase storage cost where as if we are using image snapshot's for backup then it might reduce the storage but the drives attached seperately will not be backed up. The frequency of backups generally depends upon the environment. In most cases the backup policies are set to 7 days. So that we can go back and revert the changes made 7 days ago.

-> What are the steps to initiating a non-planned backup? We can do the manual backup of the instances using snapshots or we can manually copy the files or take manual database dump’s.

If you want to run Docker and going with the orchestration tools/application like kubernetes I would recommend highly against it as it can make the openstack network slow. Keeping them seperate would be a good choice.

I will give you detailed server specifications. I will prepare that just wanted to know if you want separate storage servers for CEPH. I will give you the chance to increase your storage without any issue and will provide high availability feature but it will require faster switch

We can use opeenstack magnum service to deploy containers

Server specs

IF you are using CEPH storage then the specs are these :-

======

For controller

======

8 GB RAM 4 CPU RAID 1 500SSD x2 or 250GB x2

======

For compute

======

Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2 (FOR DEV) Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2 and 1 additional SSD or NVME for ceph cache (FOR PROD)

You can increase or decrease number of server's by replication same specs

======

For storage server

======

Min 16GB RAM and 12 CPU RAID 1 minimum capacity HARDRIVE OR SSDx2 (only for OS) and 1 NVME or SSD for ceph jouranl's and 2TBx5 Harddrives for data storage. For Ceph we need atleast 2 server and HA we need 3 (Recommended)

For storage Server RAM requirement is 1GB ram per 1TB of ceph storage and 1 cpu per ceph osd

====================================================================================================================================

As for Network We need atleast 10GBPS LAN cards. 2 cards for storage server and 2 10GBPS lan card and 1GBPS for COMPUTE and controller.

Executive roles

System Administrator

Duties

Configuring servers, fault-tolerant solutions, infrastructure elements;
Install / install servers / services, upgrade existing ones;
Monitoring system performance
Creating file systems
Create a backup and restore policy
Organization of remote access;
Monitoring of network communication
Update the system as soon as a new version of the OS and application software is released
Implementing policies to use a computer system and network
Information Security;
Administration of users (setting up and maintaining an account). Monitoring networks to ensure security and availability for specific users.
User support;
Troubleshoot problems reported by users.
Configuring user security policies.
Managing passwords and identity
Documentation in the form of an internal wiki
Routing protocols and configuration of the routing table.
Configurations for authentication and authorization of directory services.
Writing server software;
The system administrator sets tasks for writing the necessary modules to programmers, and introduces rules for working with software for the whole company.

Administrator of particular application

Develop rule policy inside application
Maintain application, version control, notify System Administrator about stable updates, issues of application

Admin Tasks

Ongoing tasks for which a monthly fee will apply: - Monitor the servers for any problem and quickly respond to fix it. - Make sure that the periodic backups are done and in complete health. - Maintain the health of all the services that run on the servers (ie. apache, postfix, mysql etc.) - Keep the servers fully updated with the latest patches and updates to prevent security problems and maintain performance - Keep track of performance and improve it with the latest tweaks. - If needed suggest hardware/resource upgrades. - Maintain the firewall rules

Additional tasks that will be billed by hour: - Development team requests - Installment of new software / scripts - Installment or configuration of new servers - Weekly meeting

Tasks ideas

daily routine checkups on logs -- update the server and upgrade it manually -- make manual backup if necessary -- check firewall setting -- set a restriction for supposedly required ports only set restriction on ssh for allowed users only -- install anti malware script if necessary -- delete logs after reviewing mostly if it eats a lot of hard disk space

if its a newly created server,
- create a sudo user although I have access to the root password;
- install the required dependencies and libraries for the server for what uses its going to be;
- Like if it's a mail server or a webserver; for every extension or dependencies even the main script or program to be install the server should be updated as always.
If its the existing server,
- check the backup of the server.
- after the access for root has given to me. I would create a sudo user and login using that username.
- check what are the process running from the server.
- check the installed packages on the server.
- Update the server and check which one is necessary to upgrade.
- check and monitor the logs from ssh and http.
- check the disk allocation.
- Delete old auth and http logs.
- check if the ufw is installed and active.
- check if the ports are filtered.
- check the ssh config if the allowed users is defined.

optimize the servers to run much more faster and smoothly by nginx,apache,php and mariadb tuning. use multiserver plugin for the video site to speedup the website performance.

Requirements -- second draft

These requirements are a set of detailed requirements for the Infrastructure. This document has been drafted to assist cloud developers to come up with detailed implementation of enterprise private cloud with an objective to host BigBlueButton, Moodle, MediaWiki, Odoo, and Redmine end-user software applications. Each of these application shall serve more than 100 users with a peak load capacity of 20 simultaneous users. The requirements have been drafted keeping in mind that users of above applications would require an uninterrupted very high availability.

The target audience for this document is cloud architects, DevOps engineers and system administrators.

Block storage

Block Storage Objectives

capacity requirement of 2T
A peak speed of 110Mbps of data transfer
A peak speed of 15-20Mbps of data transfer rate with about 20 parallel transfers
3x Data redundancy
Super scalability
Uniform load distribution
Shall be consumed by VMs and Containers
Shall be consumed by applications
Shall be consumed by OpenStack
An acceptable latency

Storage Requirements

Distributed striped GlusterFS volume shall be implemented for data redundancy, load balancing and scalability
GlusterFS shall be implemented for serving block storage
System should be implemented with at least three or more stripes.
A minimum of three bare metal servers shall be used to implement GlusterFS.
Each of the node shall have at least 3 SSD drives each with capacity of 1TBx2 terabytes and 125GBx1.
Hardware conﬁguration for each node shall be, a minimum of 4GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.

Object storage

Object Storage Objectives

capacity requirement of 3T
A peak speed of 110Mbps of data transfer
A peak speed of 15-20Mbps of data transfer rate with about 20 parallel transfers
2x data redundancy
Super scalability
Shall host Storage images, Data backups and System snapshots
Shall host Docker Registry

Object Storage Requirements

Distributed OpenStack swift object volume shall be implemented for data redundancy, load balancing and scalability
Swift object shall be implemented for serving Block storage
A minimum of three bare metal servers shall be used to implement Swift.
Each of the node shall have at least 3 SATA/SCSI drives each with capacity of 1TBx2 terabytes and 125GBx1.
Hardware conﬁguration for each node shall be, a minimum of 4GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.
The management network shall be separate from data for replications
The OpenStack controller cluster shall be used to authenticate and authorise the usage of object storage.
Possibility of using data and management network separate

BigBlueButton

BigBlueButton Speciﬁc Objectives

This cluster should be built using bare metals.
The streaming data that comes in and goes out should directly terminate on this application
A minimum bandwidth of 80Mbps internet data streaming capacity is required.

BigBlueButton Requirements

A two node cluster shall be implemented to host BigBlueButton
Hardware conﬁguration for each node shall be, a minimum of 8GB RAM, 4 or more CPU Cores, 1+ GBE x 2 Network adapters.
Big blue button should be speciﬁcally implemented on Ubuntu 16.04 64-bit OS
$A minimum of 500G of GlusterFS storage shall be made available to the cluster
80Mbps symmetrical data transfer rate shall be made available to the cluster.

Container/Magnum Cluster Requirements

A two node cluster shall be implemented to host Dockers
Hardware conﬁguration for each node shall be, a minimum of 32GB RAM, 4 or more CPU Cores, 1+ GBE x 2 Network adapters.
A best possible OS shall be chosen to implement this cluster, to accommodate maximum number of applications.
At a minimum of ten applications shall run simultaneously.
A minimum of 300G of GlusterFS storage shall be reserved for use by Containers
A minimum of 300G of Swift object storage shall be reserved for use by docker registry
All applications other than BigBlueButton shall be hosted using these Containers

OpenStack Objectives

Should be able to act as single point infrastructure management console.
A single authentication, authorisation agent for all the applications in the cloud.
Should be able to make all the cluster and nodes in concert.

Controller Cluster

A two node cluster shall be implemented to host OpenStack controller.
Hardware conﬁguration for each node shall be, a minimum of 8GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.
A local storage of 256 GB SSD shall be used for OS and logs
This cluster shall host OpenStack dashboard, neutron and authentication agent and any other dependencies.

Compute Cluster

A two node cluster shall be implemented to host OpenStack Nova.
Hardware conﬁguration for each node shall be, a minimum of 32GB RAM, 4 or more CPU Cores, 1+ GBE x 2 Network adapters.
A local storage of 256 GB SSD shall be used for OS and logs
This cluster shall host OpenStack Nova and Ironic (Bitfrost ??) services along with any other necessary dependencies.

Cinder Cluster

A two node cluster shall be implemented to host OpenStack Cinder.
Hardware conﬁguration for each node shall be, a minimum of 4GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.
A local storage of 256 GB SSD shall be used for OS and logs
This cluster shall host OpenStack Cinder and any other necessary dependencies.

Senlin Cluster

TBD

Security Requirements

TBD

Internet and Floating IPs

At a minimum pack of 10-12 real IPv4 would be required
Each application interfacing user shall use 1-2 IP addresses as per application and design requirements.

Networking Requirements

GlusterFS and Swift cluster shall have two separate networks
VLAN tagging if necessary
Number of networking
TBD

Monitoring and trouble Requirements

What monitoring tools
Where to install
TBD

Development

The RFB has been posted and the following responses are collected so far:

I will try to develop a proof of concept (PoC) based on the available and known requirements. This will help to understand the other requirements and then to develop a full scale private cloud.
We would document a clear set of functional and non functional requirements clearly capturing the essence of what is needed. After that, we will document the architecture and design, implement and test followed lastly by acceptance and handover to you, the customer. However, in order to develop an accurate schedule and budget, we would need to run a due diligence exercise. This due diligence would include detailed analysis and documentation of your functional and non-functional requirements for the private cloud project. We have material and a methodology we use. The due diligence would take approximately 3 days and we would then hand over to you the following deliverables:
- A detailed requirements specification
- A project plan
- A work package breakdown with effort and roles required
- A risk register and mitigation plan.
What are performance and storage expectations are from applications that you want to run?
On a very high level, these are the steps I'd follow based on the current information.
- meeting with you to understand your specific needs in detail
- assuming this is a private cloud deployment, we need to understand the size of the applications you'll need deployed in that cloud
- with this you can determine the amount of physical servers needed depending on the number of instances, and virtual resources needed (calculation based on physical hardware resources)
- analyze scalability and performance needs
- decompose in stories/requirements that can be estimated and implemented by the team of your choice.
Putting requirements together for your private cloud would require following inputs:
- Are you a startup or want to migrate to the private cloud.
- What is the objective that you would like to achieve?
- What is the size of the private cloud that you envisage?

@@ Line 1: / Line 1: @@
-[[CNM Servers]] (hereinafter, the ''Servers'') is the combination of [[computer server]]s that serve [[CNM Cloud]].
+[[CNM Farms]] (formerly known as [[CNM Servers]], sometimes referred as [[CNM Platform]]; hereinafter, the ''Farms'') is The combination of [[computing server]]s and [[container engine]]s that host [[Opplet]], as well as those parts of [[CNM Lab Farm]] that are provisioned to run [[serverless]] systems. The ''Farms'' provide both [[Worldopp Middleware]] and [[CNM app]]s with functionality.
 ==Application servers==
-Currently, every of three existing [[application server]]s that support [[CNM Cloud]] is build on one droplet of [[DigitalOcean]]. Every of these ''Servers'' features 2 GB Memory, 50 GB Disk, and Ubuntu 18.04.2 x64.
+Currently, every of four existing [[application server]]s that support [[Opplet]] is build on one droplet of [[DigitalOcean]]. Every of these ''Platform'' features 2 GB Memory, 50 GB Disk, and Ubuntu 18.04.2 x64.
-All of those ''Servers'' are for compute. However, some need for one or more testing servers, tentatively called a ''fellow server'', is identified. Some control servers may also be considered.
+All of those ''Platform'' are for compute. However, some need for one or more testing servers, tentatively called a ''fellow server'', is identified. Some control servers may also be considered.
-===Federated Server===
+===Campus Farm===
-:''Main wikipage: [[WorldOpp Federated Server]]''
+:''Main wikipage: [[CNM Campus Farm]]''
-:The [[WorldOpp Federated Server|Federated Server]] supports one instance of [[CNM Middleware]] ([[Opplet]]; shall be located at https://cabin.friendsofcnm.org).
+:The [[CNM Campus Farm]] supports...
-===Fellow Server===
+===Fed Farm===
-:''Main wikipage: [[CNM Fellow Server]]''
+:''Main wikipage: [[CNM Fed Farm]]''
+:The [[CNM Fed Farm]] supports one instance of [[WorldOpp Middleware]] ([[Opplet]]; shall be located at https://cabin.friendsofcnm.org).
-:The Fellow Server is a production server that shall support:
+===Bureau Farm===
-:#One instance of [[CNM Certware]] ([[Moodle]]; currently, located at https://cert.friendsofcnm.org).
+:''Main wikipage: [[CNM Bureau Farm]]''
+:The [[CNM Bureau Farm]] is a production server that shall support:
+:#One instance of [[Educaship Moodle]] ([[Moodle]]; currently, located at https://cert.friendsofcnm.org).
 :#One instance of [[CNM Mailware]] (currently, [[Roundcube]]; currently, not located at any [[URL]]).
-:#One instance of [[CNM Labware]] ([[Redmine]] and [[Apache Subversion|SVN]] linked to [[Bitbucket]]'s file storage; currently, located at https://lab.friendsofcnm.org).
+:#One instance of [[Educaship GitLab]] ([[Redmine]] and [[Apache Subversion|SVN]] linked to [[Bitbucket]]'s file storage; currently, located at https://lab.friendsofcnm.org).
 :#One instance of [[CNM Linkupware]] ([[SuiteCRM]]; currently, located at https://linkup.friendsofcnm.org).
-:#Two instances of [[CNM Wikiware]] setup for two languages ([[MediaWiki]]; currently, located at https://wiki.friendsofcnm.org).
+:#Two instances of [[Educaship MediaWiki]] setup for two languages ([[MediaWiki]]; currently, located at https://wiki.friendsofcnm.org).
-:#Several instances of [[CNM Pageware]] ([[WordPress]]; currently, located at https://worldopp.org).
+:#Several instances of [[Educaship WordPress]] ([[WordPress]]; currently, located at https://worldopp.org).
-:#One instance of [[CNM Videoware]] ([[YouPHPTube]]; currently, located at https://video.friendsofcnm.org).
+:#One instance of [[Educaship AVideo]] ([[YouPHPTube]]; currently, located at https://tube.friendsofcnm.org).
-:#One instance of [[CNM Socialware]] ([[HumHub]]; shall be located at https://social.friendsofcnm.org).
+:#One instance of [[Educaship HumHub]] ([[HumHub]]; shall be located at https://social.friendsofcnm.org).
-===Next Server===
+===Lab Farm===
-:''Main wikipage: [[CNM Next Server]]''
+:''Main wikipage: [[CNM Lab Farm]]''
-:The [[CNM Next Server]] is used for learning and testing. It shall support all the applications installed at the [[#Fellow Server|Fellow Server]] and, in addition, one [[Humhub]] instance.
+:The [[CNM Lab Farm]] is used for learning and testing. It shall support all the applications installed at the [[#EndUser Farm|EndUser Farm]] and, in addition, one [[Humhub]] instance.
 ==Support servers==
@@ Line 40: / Line 44: @@
 ===Web server===
-:All [[web server]]s of [[CNM Cloud]] are currently built on [[Apache HTTP Server]]s.
+:All [[web server]]s of [[Opplet]] are currently built on [[Apache HTTP Server]]s.
 ==Development==
-The development of the ''Servers'' can be divided in two parts -- the historical endeavors and further projects.
+The development of the ''Platform'' can be divided in two parts -- the historical endeavors and further projects.
 ===Historical endeavors===
 :''Main wikipage: [[CNM Cloud Project]]''
-:Historically, the [[WorldOpp Fellow Staff]] has undertaken the endeavors to develop the ''Servers'' under the [[CNM Cloud Project]] and, as of July of 2019, Romanof has completed the overwhelming majority of that work.
+:Historically, the [[WorldOpp Fellow Staff]] has undertaken the endeavors to develop the ''Platform'' under the [[CNM Cloud Project]] and, as of July of 2019, Romanof has completed the overwhelming majority of that work.
 ===Further projects===
-:''Main wikipage: [[CNM Servers (development)]]''
-:Further projects are drafted at the [[CNM Servers (development)]] wikipage. [[CNM Servers (development)]] is the promising cloud service model of [[CNM Cloud]] that shall offer services of its ecosystem of servers and cloud OS, which is OpenStack. This infrastructure enables operations of [[CNM app]]s.
+:The promising cloud service model of [[Opplet]] that shall offer services of its ecosystem of servers and cloud OS, which is OpenStack. This infrastructure enables operations of [[CNM app]]s.
+==See also==
+===Related lectures===
+:*[[What CNM Farms Are]].
+a draft for the promising [[CNM Farms]] that both:
+*Enable [[Opplet]] of the [[Opplet]]; consequently, [[Opplet]] enables the [[end-user application]]s of [[CNMCyber]]; and
+*Utilize a bundle of servers and [[OpenStack]] as the [[cloud operating system]].
+==Architecture==
+The ''Infrastructure'' supports cloud functions as follows:
+#Development and production. If more than one, this cloud servers shall be located in the same data center.
+#Backup
+#[[Testing]]
+#Demonstration.
+Architectures of separate servers have not determined yet. If we can not think of anything better, we can take https://ru.hetzner.com/hosting/produkte_vserver/private-cloud
+==Requirements -- first draft==
+The ''Infrastructure'':
+#Utilizes [[OpenStack]] as the [[cloud operating system]];
+#Utilizes [[OpenStack Keystone]] based on [[LDAP]] as [[identity provider]];
+#Shall enable both [[Opplet]] and [[CNM app]]s;
+#Most likely, will use [[hetzner.de]] dedicated servers at the beginning.
+===Backup and recovery===
+::The backup work model and the ability to restore an operational state, including backups for all of its own data and development, are core requirements. Topics to be addressed are:
+::*'''Restarts''':
+::*#What parts and why shall be restarted;
+::*#How -- by request and automatically -- restarts shall be initiated
+::*#How often?
+::*#What are the dependencies of a restart?
+::*#What are the prerequisites of a restart?
+::*#Why are potential challenges or limitations faced when initiating a restart?
+::*#What is the plan if a restart fails?
+::*#What is the duration of a restart?
+::*#What is the outage?
+::*'''Outside docker-registries with the appropriate repositories''':
+::*#Where is the docker-registry located?
+::*#Why was the docker registry created?
+::*#Explain what is included in the repositories.
+::*#Why are the repositories necessary?
+::*'''Push launches''':
+::*#A push procedure shall be planned with a corresponding increase in the number of subversions.
+::*#What is a monthly frequency desirable?
+::*#Why is there a corresponding increase in the number of subversions per push?
+::*#What are the expected challenges faced (if any)?
+::*#What is done in the initial process to alleviate these challenges?
+::*#What are the unexpected challenges faced?
+::*#How are the unexpected challenges (if any) resolved?
+::*#What would you do differently if you know what the challenges are going to be?
+::*'''Tested recovery''':
+::*#Why is recovery testing necessary?
+::*#When is recovery tested?
+::*#How do you test recovery?
+::*#List the steps shall be taken when testing recovery.
+::*#List the steps to initiating the backup.
+::*#List expected or unexpected results.
+::*#How lessons learned will be managed?
+::*'''Backup'''
+::*#What and how to backup?
+::*#Where to store the backup data?
+::*#What is a frequency desired? Why is the backup frequency selected as optimal?
+::*#What are the steps to initiating a non-planned backup?
+===Restarts===
+::#All the parts in the cluster shall be able to be restarted if the restart is initiated by request. The detailed instructions for these restarts shall be a part of the ''Infrastructure''.
+::#Automatic restarts shall be executed if the service gets stuck.
+-> What are the dependencies of a restart?
+ If we are using the the provider network then we can restart the controller and compute individually or both at once. But the virtual machines running on the  host will get restarted. In case of self service the communication between the virtual machines can be interrupted
+-> What are the prerequisites of a restart?
+If you want to need have frequent restart then you can setup openstack in high availablity mode so that VM's won't do down. for that we can do that using storage Server like CEPH
+-> Why are potential challenges or limitations faced when initiating a restart?
+  Improper restart can damage services like database or network.
+-> What is the plan if a restart fails?
+  If we have set up HA then the loss will be neglibile. Otherwise we have to evacuate the virtual machines from the server.
+-> What is the duration of a restart?
+ Restart of server can take 5 minutes min. Restart of server can take 1 minute.
+-> What is the outage?
+ Outage can be a power ,network or resources outage. That can make the resources
+=============================
+=> Outside docker-registries with the appropriate repositories:
+=============================
+-> Where is the docker-registry located?
+ We can use public or private registry
+-> Why was the docker registry created?
+ It is repository that contains docker images. So that we can pull own request
+-> Explain what is included in the repositories.
+ Docker repositories only contains docker images. It is a webbased panel where we can request the docker image
+-> Why are the repositories necessary?
+Public repositories have a purpose to collect all the images. So that they can be shared to everyone and anyone can use it according to their requirement.
+Private repositories are for private use like they are only available within the organization.
+=============================
+=> Push launches:
+=============================
+A push procedure shall be planned with a corresponding increase in the number of subversions.
+-> What is a monthly frequency desirable?
+->  Why is there a corresponding increase in the number of subversions per push?
+->  What are the expected challenges faced (if any)?
+->  What is done in the initial process to alleviate these challenges?
+->  What are the unexpected challenges faced?
+-> How are the unexpected challenges (if any) resolved?
+-> What would you do differently if you know what the challenges are going to be?
+----------------------------------
+I am sorry I don't have much experience with git or SVN. But usually the changes are needed be pushed after the code is final even if it is just a subpart.
+----------------------------------
+================================
+=> Tested recovery:
+===================================
+-> Why is recovery testing necessary?
+  Recovery testing is necessary to ensure that we taking proper,complete and working backup for the data or instance.
+-> When is recovery tested?
+ It can only be tested after the setup is complete and data/application is runnning on the backup.
+> How do you test recovery?
+ In case of openstack we can use the snapshot. In case of a website we can test it with the web data and database of the website.
+-> List the steps to initiating the backup.
+There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.
+-> List expected or unexpected results.
+-> How lessons learned will be managed?
+=========================================
+=> Backup
+===================================
+-> What and how to backup?
+There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.
+-> Where to store the backup data?
+We can store images in seperate drive by mounting it on the /var/lib/glane/images. So we can have seperate Storage.
+-> What is a frequency desired? Why is the backup frequency selected as optimal?
+Creating the whole backup of cinder drive can increase storage cost where as if we are using image snapshot's for backup then it might reduce the storage but the drives attached seperately will not be backed up. The frequency of backups generally depends upon the environment. In most cases the backup policies are set to 7 days. So that we can go back and revert the changes made 7 days ago.
+-> What are the steps to initiating a non-planned backup?
+We can do the manual backup of the instances using snapshots or we can manually copy the files or take manual database dump’s.
+If you want to run Docker and going with the orchestration tools/application like kubernetes I would recommend highly against it as it can make the openstack network slow. Keeping them seperate would be a good choice.
+ I will give you detailed server specifications. I will prepare that just wanted to know if you want separate storage servers for CEPH. I will give you the chance to increase your storage without any issue and will provide high availability feature but it will require faster switch
+We can use opeenstack magnum service to deploy containers
+===Server specs===
+IF you are using CEPH storage then the specs are these :-
+==================
+For controller
+==================
+GB RAM 4 CPU RAID 1 500SSD x2 or 250GB x2
+==================
+For compute
+==================
+Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2  (FOR DEV)
+Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2 and 1 additional SSD or NVME for ceph cache (FOR PROD)
+You can increase or decrease number of server's by replication same specs
+==================
+For storage server
+==================
+Min 16GB RAM and 12 CPU RAID 1 minimum [[capacity]] HARDRIVE OR SSDx2 (only for OS) and 1 NVME or SSD for ceph jouranl's and 2TBx5 Harddrives for data storage.
+For Ceph we need atleast 2 server and HA we need 3 (Recommended)
+For storage Server RAM requirement is 1GB ram per 1TB of ceph storage and 1 cpu per ceph osd
+================================================================================================================================================
+As for Network We need atleast 10GBPS LAN cards. 2 cards for storage server and 2 10GBPS lan card and 1GBPS  for COMPUTE and controller.
+==Executive roles==
+===System Administrator===
+::Duties
+::#Configuring servers, fault-tolerant solutions, infrastructure elements;
+::#Install / install servers / services, upgrade existing ones;
+::#Monitoring system performance
+::#Creating file systems
+::#Create a backup and restore policy
+::#Organization of remote access;
+::#Monitoring of network communication
+::#Update the system as soon as a new version of the OS and application software is released
+::#Implementing policies to use a computer system and network
+::#Information Security;
+::#Administration of users (setting up and maintaining an account). Monitoring networks to ensure security and availability for specific users.
+::#User support;
+::#Troubleshoot problems reported by users.
+::#Configuring user security policies.
+::#Managing passwords and identity
+::#Documentation in the form of an internal wiki
+::#Routing protocols and configuration of the routing table.
+::#Configurations for authentication and authorization of directory services.
+::#Writing server software;
+::#The system administrator sets tasks for writing the necessary modules to programmers, and introduces rules for working with software for the whole company.
+===Administrator of particular application===
+::#Develop rule policy inside application
+::#Maintain application, version control, notify System Administrator about stable updates, issues of application
+===Admin Tasks===
+Ongoing tasks for which a monthly fee will apply:
+- Monitor the servers for any problem and quickly respond to fix it.
+- Make sure that the periodic backups are done and in complete health.
+- Maintain the health of all the services that run on the servers (ie. apache, postfix, mysql etc.)
+- Keep the servers fully updated with the latest patches and updates to prevent security problems and maintain performance
+- Keep track of performance and improve it with the latest tweaks.
+- If needed suggest hardware/resource upgrades.
+- Maintain the firewall rules
+Additional tasks that will be billed by hour:
+- Development team requests
+- Installment of new software / scripts
+- Installment or configuration of new servers
+- Weekly meeting
+===Tasks ideas===
+daily routine checkups on logs -- update the server and upgrade it manually -- make manual backup if necessary -- check firewall setting -- set a restriction for supposedly required ports only
+set restriction on ssh for allowed users only -- install anti malware script if necessary -- delete logs after reviewing mostly if it eats a lot of hard disk space
+#if its a newly created server,
+#*create a sudo user although I have access to the root password;
+#*install the required dependencies and libraries for the server for what uses its going to be;
+#*Like if it's a mail server or a webserver; for every extension or dependencies even the main script or program to be install the server should be updated as always.
+#If its the existing server,
+#* check the backup of the server.
+#* after the access for root has given to me. I would create a sudo user and login using that username.
+#* check what are the process running from the server.
+#* check the installed packages on the server.
+#* Update the server and check which one is necessary to upgrade.
+#* check and monitor the logs from ssh and http.
+#* check the disk allocation.
+#* Delete old auth and http logs.
+#* check if the ufw is installed and active.
+#* check if the ports are filtered.
+#* check the ssh config if the allowed users is defined.
+optimize the servers to run much more faster and smoothly by nginx,apache,php and mariadb tuning. use multiserver plugin for the video site to speedup the website performance.
+==Requirements -- second draft==
+These requirements are a set of detailed requirements for the ''Infrastructure''. This document has been drafted to assist cloud developers to come up with detailed implementation of [[enterprise private cloud]] with an objective to host [[BigBlueButton]], [[Moodle]], [[MediaWiki]], [[Odoo]], and [[Redmine]] [[end-user software application]]s. Each of these application shall serve more than 100 users with a peak load [[capacity]] of 20 simultaneous users. The requirements have been drafted keeping in mind that users of above applications would require an uninterrupted very high availability.
+The target audience for this document is [[cloud architect]]s, [[DevOps engineer]]s and [[system administrator]]s.
+===Block storage===
+====Block Storage Objectives====
+#[[capacity]] requirement of 2T
+#A peak speed of 110[[Mbps]] of data transfer
+#A peak speed of 15-20[[Mbps]] of data transfer rate with about 20 parallel transfers
+#3x Data redundancy
+#Super scalability
+#Uniform load distribution
+#Shall be consumed by [[VM]]s and [[Container]]s
+#Shall be consumed by applications
+#Shall be consumed by [[OpenStack]]
+#An acceptable latency
+====Storage Requirements====
+#Distributed striped [[GlusterFS]] volume shall be implemented for data redundancy, load balancing and scalability
+#[[GlusterFS]] shall be implemented for serving block storage
+#System should be implemented with at least three or more stripes.
+#A minimum of three bare metal servers shall be used to implement [[GlusterFS]].
+#Each of the node shall have at least 3 [[SSD]] drives each with [[capacity]] of 1TBx2 terabytes and 125GBx1.
+#Hardware conﬁguration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
+===Object storage===
+====Object Storage Objectives====
+#[[capacity]] requirement of 3T
+#A peak speed of 110[[Mbps]] of data transfer
+#A peak speed of 15-20[[Mbps]] of data transfer rate with about 20 parallel transfers
+#2x data redundancy
+#Super scalability
+#Shall host [[Storage image]]s, [[Data backup]]s and [[System snapshot]]s
+#Shall host [[Docker Registry]]
+====Object Storage Requirements====
+#Distributed [[OpenStack]] swift object volume shall be implemented for data redundancy, load balancing and scalability
+#[[Swift]] object shall be implemented for serving [[Block storage]]
+#A minimum of three bare metal servers shall be used to implement [[Swift]].
+#Each of the node shall have at least 3 [[SATA]]/[[SCSI]] drives each with [[capacity]] of 1TBx2 terabytes and 125GBx1.
+#Hardware conﬁguration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
+#The management network shall be separate from data for replications
+#The [[OpenStack]] controller cluster shall be used to authenticate and authorise the usage of object storage.
+#Possibility of using data and management network separate
+===BigBlueButton===
+====BigBlueButton Speciﬁc Objectives====
+#This cluster should be built using bare metals.
+#The streaming data that comes in and goes out should directly terminate on this application
+#A minimum bandwidth of 80[[Mbps]] internet data streaming [[capacity]] is required.
+====BigBlueButton Requirements====
+#A two node cluster shall be implemented to host [[BigBlueButton]]
+#Hardware conﬁguration for each node shall be, a minimum of 8GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
+#Big blue button should be speciﬁcally implemented on [[Ubuntu]] 16.04 64-bit [[OS]]
+#$A minimum of 500G of [[GlusterFS]] storage shall be made available to the cluster
+#80[[Mbps]] symmetrical data transfer rate shall be made available to the cluster.
+===Container/Magnum Cluster Requirements===
+#A two node cluster shall be implemented to host [[Docker]]s
+#Hardware conﬁguration for each node shall be, a minimum of 32GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
+#A best possible [[OS]] shall be chosen to implement this cluster, to accommodate maximum number of applications.
+#At a minimum of ten applications shall run simultaneously.
+#A minimum of 300G of [[GlusterFS]] storage shall be reserved for use by [[Container]]s
+#A minimum of 300G of [[Swift]] object storage shall be reserved for use by docker registry
+#All applications other than [[BigBlueButton]] shall be hosted using these [[Container]]s
+===OpenStack Objectives===
+#Should be able to act as single point infrastructure management console.
+#A single authentication, authorisation agent for all the applications in the cloud.
+#Should be able to make all the cluster and nodes in concert.
+====Controller Cluster====
+#A two node cluster shall be implemented to host [[OpenStack]] controller.
+#Hardware conﬁguration for each node shall be, a minimum of 8GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
+#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
+#This cluster shall host [[OpenStack]] dashboard, neutron and authentication agent and any other dependencies.
+====Compute Cluster====
+#A two node cluster shall be implemented to host [[OpenStack Nova]].
+#Hardware conﬁguration for each node shall be, a minimum of 32GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
+#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
+#This cluster shall host [[OpenStack Nova]] and Ironic ([[Bitfrost]] ??) services along with any other necessary dependencies.
+====Cinder Cluster====
+#A two node cluster shall be implemented to host [[OpenStack Cinder]].
+#Hardware conﬁguration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ GBE x 2 [[Network adapter]]s.
+#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
+#This cluster shall host [[OpenStack Cinder]] and any other necessary dependencies.
+====Senlin Cluster====
+#TBD
+===Security Requirements===
+#TBD
+===Internet and Floating IPs===
+#At a minimum pack of 10-12 real [[IPv4]] would be required
+#Each application interfacing user shall use 1-2 [[IP]] addresses as per application and design requirements.
+===Networking Requirements===
+#[[GlusterFS]] and [[Swift]] cluster shall have two separate networks
+#[[VLAN]] tagging if necessary
+#Number of networking
+#TBD
+===Monitoring and trouble Requirements===
+#What monitoring tools
+#Where to install
+#TBD
+==Development==
+The [[RFB]] has been posted and the following responses are collected so far:
+#I will try to develop a [[proof of concept]] (PoC) based on the available and known requirements. This will help to understand the other requirements and then to develop a full scale private cloud.
+#We would document a clear set of functional and non functional requirements clearly capturing the essence of what is needed. After that, we will document the architecture and design, implement and test followed lastly by acceptance and handover to you, the customer. However, in order to develop an accurate schedule and budget, we would need to run a [[due diligence]] exercise. This due diligence would include detailed analysis and documentation of your functional and non-functional requirements for the private cloud project. We have material and a methodology we use. The due diligence would take approximately 3 days and we would then hand over to you the following deliverables:
+#*A detailed requirements specification
+#*A project plan
+#*A work package breakdown with effort and roles required
+#*A risk register and mitigation plan.
+#What are performance and storage expectations are from applications that you want to run?
+#On a very high level, these are the steps I'd follow based on the current information.
+#*meeting with you to understand your specific needs in detail
+#*assuming this is a private cloud deployment, we need to understand the size of the applications you'll need deployed in that cloud
+#*with this you can determine the amount of physical servers needed depending on the number of instances, and virtual resources needed (calculation based on physical hardware resources)
+#*analyze scalability and performance needs
+#*decompose in stories/requirements that can be estimated and implemented by the team of your choice.
+#Putting requirements together for your private cloud would require following inputs:
+#*Are you a startup or want to migrate to the private cloud.
+#*What is the objective that you would like to achieve?
+#*What is the size of the private cloud that you envisage?
+==See also==
+*[[Cloud lexicon]], a listing of links about computer terms
+[[Category: CNM Cyber Orientation]][[Category: Articles]][[Category:CNM Cloud products]]

Difference between revisions of "CNM Farms"

Latest revision as of 23:27, 14 April 2024

Contents

Application servers

Campus Farm

Fed Farm

Bureau Farm

Lab Farm

Support servers

Database server

File server

Mail server

Web server

Development

Historical endeavors

Further projects

See also

Related lectures

Architecture

Requirements -- first draft

Backup and recovery

Restarts

=================

=================

=================

=================

====================

=======================

=============================

=======================

Server specs

======

======

======

======

======

======

====================================================================================================================================

Executive roles

System Administrator

Administrator of particular application

Admin Tasks

Tasks ideas

Requirements -- second draft

Block storage

Block Storage Objectives

Storage Requirements

Object storage

Object Storage Objectives

Object Storage Requirements

BigBlueButton

BigBlueButton Speciﬁc Objectives

BigBlueButton Requirements

Container/Magnum Cluster Requirements

OpenStack Objectives

Controller Cluster

Compute Cluster

Cinder Cluster

Senlin Cluster

Security Requirements

Internet and Floating IPs

Networking Requirements

Monitoring and trouble Requirements

Development

See also

Navigation menu

Search