Difference between revisions of "Talk:CNM Farms"

From CNM Wiki
Jump to: navigation, search
(Created page with "CNM Servers (development) is a draft for the promising CNM Farms that both: *Enable Opplet of the CNM Cloud; consequently, Opplet enables the end-user ap...")
 
(Blanked the page)
(Tag: Blanking)
 
Line 1: Line 1:
[[CNM Servers (development)]] is a draft for the promising [[CNM Farms]] that both:
 
*Enable [[Opplet]] of the [[CNM Cloud]]; consequently, [[Opplet]] enables the [[end-user application]]s of [[CNMCyber]]; and
 
*Utilize a bundle of servers and [[OpenStack]] as the [[cloud operating system]].
 
  
 
==Architecture==
 
The ''Infrastructure'' supports cloud functions as follows:
 
#Development and production. If more than one, this cloud servers shall be located in the same data center.
 
#Backup
 
#[[Testing]]
 
#Demonstration.
 
 
Architectures of separate servers have not determined yet. If we can not think of anything better, we can take https://ru.hetzner.com/hosting/produkte_vserver/private-cloud
 
 
==Requirements -- first draft==
 
The ''Infrastructure'':
 
#Utilizes [[OpenStack]] as the [[cloud operating system]];
 
#Utilizes [[OpenStack Keystone]] based on [[LDAP]] as [[identity provider]];
 
#Shall enable both [[Opplet]] and [[CNM app]]s;
 
#Most likely, will use [[hetzner.de]] dedicated servers at the beginning.
 
 
===Backup and recovery===
 
::The backup work model and the ability to restore an operational state, including backups for all of its own data and development, are core requirements. Topics to be addressed are:
 
::*'''Restarts''':
 
::*#What parts and why shall be restarted;
 
::*#How -- by request and automatically -- restarts shall be initiated
 
::*#How often?
 
::*#What are the dependencies of a restart?
 
::*#What are the prerequisites of a restart?
 
::*#Why are potential challenges or limitations faced when initiating a restart?
 
::*#What is the plan if a restart fails?
 
::*#What is the duration of a restart?
 
::*#What is the outage?
 
::*'''Outside docker-registries with the appropriate repositories''':
 
::*#Where is the docker-registry located?
 
::*#Why was the docker registry created?
 
::*#Explain what is included in the repositories.
 
::*#Why are the repositories necessary?
 
::*'''Push launches''':
 
::*#A push procedure shall be planned with a corresponding increase in the number of subversions.
 
::*#What is a monthly frequency desirable?
 
::*#Why is there a corresponding increase in the number of subversions per push?
 
::*#What are the expected challenges faced (if any)?
 
::*#What is done in the initial process to alleviate these challenges?
 
::*#What are the unexpected challenges faced?
 
::*#How are the unexpected challenges (if any) resolved?
 
::*#What would you do differently if you know what the challenges are going to be?
 
::*'''Tested recovery''':
 
::*#Why is recovery testing necessary?
 
::*#When is recovery tested?
 
::*#How do you test recovery?
 
::*#List the steps shall be taken when testing recovery.
 
::*#List the steps to initiating the backup.
 
::*#List expected or unexpected results.
 
::*#How lessons learned will be managed?
 
::*'''Backup'''
 
::*#What and how to backup?
 
::*#Where to store the backup data?
 
::*#What is a frequency desired? Why is the backup frequency selected as optimal?
 
::*#What are the steps to initiating a non-planned backup?
 
 
===Restarts===
 
::#All the parts in the cluster shall be able to be restarted if the restart is initiated by request. The detailed instructions for these restarts shall be a part of the ''Infrastructure''.
 
::#Automatic restarts shall be executed if the service gets stuck.
 
 
-> What are the dependencies of a restart?
 
If we are using the the provider network then we can restart the controller and compute individually or both at once. But the virtual machines running on the  host will get restarted. In case of self service the communication between the virtual machines can be interrupted
 
 
-> What are the prerequisites of a restart?
 
 
If you want to need have frequent restart then you can setup openstack in high availablity mode so that VM's won't do down. for that we can do that using storage Server like CEPH
 
 
-> Why are potential challenges or limitations faced when initiating a restart?
 
  Improper restart can damage services like database or network.
 
 
-> What is the plan if a restart fails?
 
  If we have set up HA then the loss will be neglibile. Otherwise we have to evacuate the virtual machines from the server.
 
 
-> What is the duration of a restart?
 
Restart of server can take 5 minutes min. Restart of server can take 1 minute.
 
 
-> What is the outage?
 
Outage can be a power ,network or resources outage. That can make the resources
 
=============================
 
 
=> Outside docker-registries with the appropriate repositories:
 
=============================
 
-> Where is the docker-registry located?
 
We can use public or private registry
 
 
-> Why was the docker registry created?
 
It is repository that contains docker images. So that we can pull own request
 
 
-> Explain what is included in the repositories.
 
Docker repositories only contains docker images. It is a webbased panel where we can request the docker image
 
 
-> Why are the repositories necessary?
 
Public repositories have a purpose to collect all the images. So that they can be shared to everyone and anyone can use it according to their requirement.
 
Private repositories are for private use like they are only available within the organization.
 
=============================
 
 
 
=> Push launches:
 
=============================
 
A push procedure shall be planned with a corresponding increase in the number of subversions.
 
-> What is a monthly frequency desirable?
 
->  Why is there a corresponding increase in the number of subversions per push?
 
->  What are the expected challenges faced (if any)?
 
->  What is done in the initial process to alleviate these challenges?
 
->  What are the unexpected challenges faced?
 
-> How are the unexpected challenges (if any) resolved?
 
-> What would you do differently if you know what the challenges are going to be?
 
 
----------------------------------
 
I am sorry I don't have much experience with git or SVN. But usually the changes are needed be pushed after the code is final even if it is just a subpart.
 
----------------------------------
 
 
================================
 
 
 
=> Tested recovery:
 
===================================
 
-> Why is recovery testing necessary?
 
  Recovery testing is necessary to ensure that we taking proper,complete and working backup for the data or instance.
 
 
-> When is recovery tested?
 
It can only be tested after the setup is complete and data/application is runnning on the backup.
 
 
> How do you test recovery?
 
In case of openstack we can use the snapshot. In case of a website we can test it with the web data and database of the website.
 
 
 
-> List the steps to initiating the backup.
 
There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.
 
 
-> List expected or unexpected results.
 
 
 
-> How lessons learned will be managed?
 
 
=========================================
 
 
=> Backup
 
===================================
 
-> What and how to backup?
 
There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.
 
 
-> Where to store the backup data?
 
We can store images in seperate drive by mounting it on the /var/lib/glane/images. So we can have seperate Storage.
 
 
-> What is a frequency desired? Why is the backup frequency selected as optimal?
 
Creating the whole backup of cinder drive can increase storage cost where as if we are using image snapshot's for backup then it might reduce the storage but the drives attached seperately will not be backed up. The frequency of backups generally depends upon the environment. In most cases the backup policies are set to 7 days. So that we can go back and revert the changes made 7 days ago.
 
 
-> What are the steps to initiating a non-planned backup?
 
We can do the manual backup of the instances using snapshots or we can manually copy the files or take manual database dump’s.
 
 
If you want to run Docker and going with the orchestration tools/application like kubernetes I would recommend highly against it as it can make the openstack network slow. Keeping them seperate would be a good choice.
 
 
I will give you detailed server specifications. I will prepare that just wanted to know if you want separate storage servers for CEPH. I will give you the chance to increase your storage without any issue and will provide high availability feature but it will require faster switch
 
 
We can use opeenstack magnum service to deploy containers
 
 
===Server specs===
 
IF you are using CEPH storage then the specs are these :-
 
==================
 
For controller
 
==================
 
8 GB RAM 4 CPU RAID 1 500SSD x2 or 250GB x2
 
 
==================
 
For compute
 
==================
 
Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2  (FOR DEV)
 
Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2 and 1 additional SSD or NVME for ceph cache (FOR PROD)
 
 
You can increase or decrease number of server's by replication same specs
 
 
==================
 
For storage server
 
==================
 
Min 16GB RAM and 12 CPU RAID 1 minimum [[capacity]] HARDRIVE OR SSDx2 (only for OS) and 1 NVME or SSD for ceph jouranl's and 2TBx5 Harddrives for data storage.
 
For Ceph we need atleast 2 server and HA we need 3 (Recommended)
 
 
For storage Server RAM requirement is 1GB ram per 1TB of ceph storage and 1 cpu per ceph osd
 
 
================================================================================================================================================
 
As for Network We need atleast 10GBPS LAN cards. 2 cards for storage server and 2 10GBPS lan card and 1GBPS  for COMPUTE and controller.
 
 
==Executive roles==
 
===System Administrator===
 
::Duties
 
::#Configuring servers, fault-tolerant solutions, infrastructure elements;
 
::#Install / install servers / services, upgrade existing ones;
 
::#Monitoring system performance
 
::#Creating file systems
 
::#Create a backup and restore policy
 
::#Organization of remote access;
 
::#Monitoring of network communication
 
::#Update the system as soon as a new version of the OS and application software is released
 
::#Implementing policies to use a computer system and network
 
::#Information Security;
 
::#Administration of users (setting up and maintaining an account). Monitoring networks to ensure security and availability for specific users.
 
::#User support;
 
::#Troubleshoot problems reported by users.
 
::#Configuring user security policies.
 
::#Managing passwords and identity
 
::#Documentation in the form of an internal wiki
 
::#Routing protocols and configuration of the routing table.
 
::#Configurations for authentication and authorization of directory services.
 
::#Writing server software;
 
::#The system administrator sets tasks for writing the necessary modules to programmers, and introduces rules for working with software for the whole company.
 
 
===Administrator of particular application===
 
::#Develop rule policy inside application
 
::#Maintain application, version control, notify System Administrator about stable updates, issues of application
 
 
===Admin Tasks===
 
 
Ongoing tasks for which a monthly fee will apply:
 
- Monitor the servers for any problem and quickly respond to fix it.
 
- Make sure that the periodic backups are done and in complete health.
 
- Maintain the health of all the services that run on the servers (ie. apache, postfix, mysql etc.)
 
- Keep the servers fully updated with the latest patches and updates to prevent security problems and maintain performance
 
- Keep track of performance and improve it with the latest tweaks.
 
- If needed suggest hardware/resource upgrades.
 
- Maintain the firewall rules
 
 
Additional tasks that will be billed by hour:
 
- Development team requests
 
- Installment of new software / scripts
 
- Installment or configuration of new servers
 
- Weekly meeting
 
 
===Tasks ideas===
 
daily routine checkups on logs -- update the server and upgrade it manually -- make manual backup if necessary -- check firewall setting -- set a restriction for supposedly required ports only
 
set restriction on ssh for allowed users only -- install anti malware script if necessary -- delete logs after reviewing mostly if it eats a lot of hard disk space
 
 
#if its a newly created server,
 
#*create a sudo user although I have access to the root password;
 
#*install the required dependencies and libraries for the server for what uses its going to be;
 
#*Like if it's a mail server or a webserver; for every extension or dependencies even the main script or program to be install the server should be updated as always.
 
#If its the existing server,
 
#* check the backup of the server.
 
#* after the access for root has given to me. I would create a sudo user and login using that username.
 
#* check what are the process running from the server.
 
#* check the installed packages on the server.
 
#* Update the server and check which one is necessary to upgrade.
 
#* check and monitor the logs from ssh and http.
 
#* check the disk allocation.
 
#* Delete old auth and http logs.
 
#* check if the ufw is installed and active.
 
#* check if the ports are filtered.
 
#* check the ssh config if the allowed users is defined.
 
 
optimize the servers to run much more faster and smoothly by nginx,apache,php and mariadb tuning. use multiserver plugin for the video site to speedup the website performance.
 
 
==Requirements -- second draft==
 
 
These requirements are a set of detailed requirements for the ''Infrastructure''. This document has been drafted to assist cloud developers to come up with detailed implementation of [[enterprise private cloud]] with an objective to host [[BigBlueButton]], [[Moodle]], [[MediaWiki]], [[Odoo]], and [[Redmine]] [[end-user software application]]s. Each of these application shall serve more than 100 users with a peak load [[capacity]] of 20 simultaneous users. The requirements have been drafted keeping in mind that users of above applications would require an uninterrupted very high availability.
 
 
The target audience for this document is [[cloud architect]]s, [[DevOps engineer]]s and [[system administrator]]s.
 
 
===Block storage===
 
====Block Storage Objectives====
 
#[[capacity]] requirement of 2T
 
#A peak speed of 110[[Mbps]] of data transfer
 
#A peak speed of 15-20[[Mbps]] of data transfer rate with about 20 parallel transfers
 
#3x Data redundancy
 
#Super scalability
 
#Uniform load distribution
 
#Shall be consumed by [[VM]]s and [[Container]]s
 
#Shall be consumed by applications
 
#Shall be consumed by [[OpenStack]]
 
#An acceptable latency
 
 
====Storage Requirements====
 
#Distributed striped [[GlusterFS]] volume shall be implemented for data redundancy, load balancing and scalability
 
#[[GlusterFS]] shall be implemented for serving block storage
 
#System should be implemented with at least three or more stripes.
 
#A minimum of three bare metal servers shall be used to implement [[GlusterFS]].
 
#Each of the node shall have at least 3 [[SSD]] drives each with [[capacity]] of 1TBx2 terabytes and 125GBx1.
 
#Hardware configuration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 
 
===Object storage===
 
====Object Storage Objectives====
 
#[[capacity]] requirement of 3T
 
#A peak speed of 110[[Mbps]] of data transfer
 
#A peak speed of 15-20[[Mbps]] of data transfer rate with about 20 parallel transfers
 
#2x data redundancy
 
#Super scalability
 
#Shall host [[Storage image]]s, [[Data backup]]s and [[System snapshot]]s
 
#Shall host [[Docker Registry]]
 
 
====Object Storage Requirements====
 
#Distributed [[OpenStack]] swift object volume shall be implemented for data redundancy, load balancing and scalability
 
#[[Swift]] object shall be implemented for serving [[Block storage]]
 
#A minimum of three bare metal servers shall be used to implement [[Swift]].
 
#Each of the node shall have at least 3 [[SATA]]/[[SCSI]] drives each with [[capacity]] of 1TBx2 terabytes and 125GBx1.
 
#Hardware configuration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 
#The management network shall be separate from data for replications
 
#The [[OpenStack]] controller cluster shall be used to authenticate and authorise the usage of object storage.
 
#Possibility of using data and management network separate
 
 
===BigBlueButton===
 
====BigBlueButton Specific Objectives====
 
#This cluster should be built using bare metals.
 
#The streaming data that comes in and goes out should directly terminate on this application
 
#A minimum bandwidth of 80[[Mbps]] internet data streaming [[capacity]] is required.
 
 
====BigBlueButton Requirements====
 
#A two node cluster shall be implemented to host [[BigBlueButton]]
 
#Hardware configuration for each node shall be, a minimum of 8GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 
#Big blue button should be specifically implemented on [[Ubuntu]] 16.04 64-bit [[OS]]
 
#$A minimum of 500G of [[GlusterFS]] storage shall be made available to the cluster
 
#80[[Mbps]] symmetrical data transfer rate shall be made available to the cluster.
 
 
===Container/Magnum Cluster Requirements===
 
#A two node cluster shall be implemented to host [[Docker]]s
 
#Hardware configuration for each node shall be, a minimum of 32GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 
#A best possible [[OS]] shall be chosen to implement this cluster, to accommodate maximum number of applications.
 
#At a minimum of ten applications shall run simultaneously.
 
#A minimum of 300G of [[GlusterFS]] storage shall be reserved for use by [[Container]]s
 
#A minimum of 300G of [[Swift]] object storage shall be reserved for use by docker registry
 
#All applications other than [[BigBlueButton]] shall be hosted using these [[Container]]s
 
 
===OpenStack Objectives===
 
#Should be able to act as single point infrastructure management console.
 
#A single authentication, authorisation agent for all the applications in the cloud.
 
#Should be able to make all the cluster and nodes in concert.
 
 
====Controller Cluster====
 
#A two node cluster shall be implemented to host [[OpenStack]] controller.
 
#Hardware configuration for each node shall be, a minimum of 8GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 
#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
 
#This cluster shall host [[OpenStack]] dashboard, neutron and authentication agent and any other dependencies.
 
 
====Compute Cluster====
 
#A two node cluster shall be implemented to host [[OpenStack Nova]].
 
#Hardware configuration for each node shall be, a minimum of 32GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 
#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
 
#This cluster shall host [[OpenStack Nova]] and Ironic ([[Bitfrost]] ??) services along with any other necessary dependencies.
 
 
====Cinder Cluster====
 
#A two node cluster shall be implemented to host [[OpenStack Cinder]].
 
#Hardware configuration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ GBE x 2 [[Network adapter]]s.
 
#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
 
#This cluster shall host [[OpenStack Cinder]] and any other necessary dependencies.
 
 
====Senlin Cluster====
 
#TBD
 
 
===Security Requirements===
 
#TBD
 
 
===Internet and Floating IPs===
 
#At a minimum pack of 10-12 real [[IPv4]] would be required
 
#Each application interfacing user shall use 1-2 [[IP]] addresses as per application and design requirements.
 
 
===Networking Requirements===
 
#[[GlusterFS]] and [[Swift]] cluster shall have two separate networks
 
#[[VLAN]] tagging if necessary
 
#Number of networking
 
#TBD
 
 
===Monitoring and trouble Requirements===
 
#What monitoring tools
 
#Where to install
 
#TBD
 
 
==Development==
 
The [[RFB]] has been posted and the following responses are collected so far:
 
#I will try to develop a [[proof of concept]] (PoC) based on the available and known requirements. This will help to understand the other requirements and then to develop a full scale private cloud.
 
#We would document a clear set of functional and non functional requirements clearly capturing the essence of what is needed. After that, we will document the architecture and design, implement and test followed lastly by acceptance and handover to you, the customer. However, in order to develop an accurate schedule and budget, we would need to run a [[due diligence]] exercise. This due diligence would include detailed analysis and documentation of your functional and non-functional requirements for the private cloud project. We have material and a methodology we use. The due diligence would take approximately 3 days and we would then hand over to you the following deliverables:
 
#*A detailed requirements specification
 
#*A project plan
 
#*A work package breakdown with effort and roles required
 
#*A risk register and mitigation plan.
 
#What are performance and storage expectations are from applications that you want to run?
 
#On a very high level, these are the steps I'd follow based on the current information.
 
#*meeting with you to understand your specific needs in detail
 
#*assuming this is a private cloud deployment, we need to understand the size of the applications you'll need deployed in that cloud
 
#*with this you can determine the amount of physical servers needed depending on the number of instances, and virtual resources needed (calculation based on physical hardware resources)
 
#*analyze scalability and performance needs
 
#*decompose in stories/requirements that can be estimated and implemented by the team of your choice.
 
#Putting requirements together for your private cloud would require following inputs:
 
#*Are you a startup or want to migrate to the private cloud.
 
#*What is the objective that you would like to achieve?
 
#*What is the size of the private cloud that you envisage?
 
 
==See also==
 
*[[Cloud lexicon]], a listing of links about computer terms
 

Latest revision as of 03:27, 15 November 2023