Installation Review
Note: This document is constantly updated and provides guidance to review the installed environment. It's always encouraged to review the product documentation first: docs.openshift.com.
This document complements the official page of "Installing a cluster on any platform" to review specific configurations and components after the cluster has been installed.
This document is also a helper for the "OPCT - Installation Checklist" user document.
Compute
- Minimal requirements for Compute nodes: [User Documentation -> Prerequisites][opct-user-guide#prerequisites]
Load Balancers
Review the Load Balancer requirements: Load balancing requirements for user-provisioned infrastructure
Review the Load Balancer size
The Load Balancer used by the API must support a throughput higher than 100Mbps.
Reference:
Review the private Load Balancer
The basic OpenShift Installations with support of external Load Balancers deploy 3 Load Balancers: public and private for control plane services (Kubernetes API and Machine Config Server), and one public for the ingress.
The DNS or IP address for the private Load Balancer must point to the DNS record api-int.<cluster>.<domain>
, which will be accessed for internal services.
Reference: User-provisioned DNS requirements.
Review Health Check configurations
The kube-apiserver has a graceful termination engine that requires the Load Balancer health check probe to the HTTP path.
Service | Protocol | Port | Path | Threshold | Interval | Timeout |
---|---|---|---|---|---|---|
Kubernetes API Server | HTTPS* | 6443 | /readyz | 2 | 10 | 10 |
Machine Config Server | HTTPS* | 22623 | /healthz | 2 | 10 | 10 |
Ingress | TCP | 80 | - | 2 | 10 | 10 |
Ingress | TCP | 443 | - | 2 | 10 | 10 |
Reminder for the API Load Balancer Health Check:
"The load balancer must be configured to take a maximum of 30 seconds from the time the API server turns off the /readyz endpoint to the removal of the API server instance from the pool. Within the time frame after /readyz returns an error or becomes healthy, the endpoint must have been removed or added. Probing every 5 or 10 seconds, with two successful requests to become healthy and three to become unhealthy, are well-tested values." Load balancing requirements for user-provisioned infrastructure.
Review Hairpin Traffic
Hairpin traffic is when a backend node's traffic is load-balanced to itself. If this type of network traffic is dropped because your load balancer does not allow hairpin traffic, you need to provide a solution.
On the integrated clouds that do not support hairpin traffic, OpenShift provides a static pod to redirect traffic destined for the load balancer VIP back to the node on the kube-apiserver.
For Reference:
This is not a recommendation, any solution provided by you will not be supported by Red Hat.
- Static pods to redirect hairpin traffic for Azure
- Static pods to redirect hairpin traffic for AlibabaCloud
Steps to reproduce the Hairpin traffic to a node:
- Deploy one sample pod
- Add one service with a node port
- Create the load balancer with the listener in any port. Example 80
- Create the backend/target group pointing to the node port
- Add the node which the pod is running to the LB/target group/backend nodes
- Try to reach the load balancer IP/DNS through the pod
Components
etcd
Review etcd's disk speed requirements:
- etcd: Hardware recommendations
- OpenShift Docs: Planning your environment according to object maximums
- OpenShift KCS: Backend Performance Requirements for OpenShift etcd
- IBM: Using Fio to Tell Whether Your Storage is Fast Enough for Etcd
Review disk performance with etcd-fio
The KCS "How to Use 'fio' to Check Etcd Disk Performance in OCP" is a guide to check if the disk used by etcd has the expected performance on OpenShift.
Review etcd logs: etcd slow requests
This section provides a guide to check the etcd slow requests from the logs on the etcd pods to understand how the etcd is performing while running the e2e tests.
The command opct adm parse-etcd-logs
reads the logs, aggregates the requests and displays results in buckets of 100ms increments up to 1s.
opct adm parse-etcd-logs
is the utility to help troubleshoot the slow requests in the cluster, and help make decisions like changing the flavor of the block device used by the control plane, increasing IOPS, changing the flavor of the instances, etc.
See the command opct adm parse-etcd-logs
for more information.
Mount /var/lib/etcd in separate disk
One way to improve the performance on etcd is to use a dedicated block device.
You can mount /var/lib/etcd
by following the documentation:
Image Registry
You should be able to access the registry and make sure you can push and pull images on it, otherwise, the e2e tests will be reported as failed.
Please check the OpenShift documentation to validate it:
You can also explore the OpenShift sample projects that create PVC and BuildConfigs (which result in images being built and pushed to image registry). For example:
oc new-app nodejs-postgresql-persistent