Installation Review

Note: This document is constantly updated and provides guidance to review the installed environment. It's always encouraged to review the product documentation first: docs.openshift.com.

This document complements the official page of "Installing a cluster on any platform" to review specific configurations and components after the cluster has been installed.

This document is also a helper for the "OPCT - Installation Checklist" user document.

Compute
Load Balancers
Components
- etcd
- Image Registry

Compute

Minimal requirements for Compute nodes: [User Documentation -> Prerequisites][opct-user-guide#prerequisites]

Load Balancers

Review the Load Balancer requirements: Load balancing requirements for user-provisioned infrastructure

Review the Load Balancer size

The Load Balancer used by the API must support a throughput higher than 100Mbps.

Reference:

AWS: NLB (Network Load Balancer)
Alibaba: slb.s2.small
Azure: Standard

Review the private Load Balancer

The basic OpenShift Installations with support of external Load Balancers deploy 3 Load Balancers: public and private for control plane services (Kubernetes API and Machine Config Server), and one public for the ingress.

The DNS or IP address for the private Load Balancer must point to the DNS record api-int.<cluster>.<domain>, which will be accessed for internal services.

Reference: User-provisioned DNS requirements.

Review Health Check configurations

The kube-apiserver has a graceful termination engine that requires the Load Balancer health check probe to the HTTP path.

Service	Protocol	Port	Path	Threshold	Interval	Timeout
Kubernetes API Server	HTTPS*	6443	/readyz	2	10	10
Machine Config Server	HTTPS*	22623	/healthz	2	10	10
Ingress	TCP	80	-	2	10	10
Ingress	TCP	443	-	2	10	10

Reminder for the API Load Balancer Health Check:

"The load balancer must be configured to take a maximum of 30 seconds from the time the API server turns off the /readyz endpoint to the removal of the API server instance from the pool. Within the time frame after /readyz returns an error or becomes healthy, the endpoint must have been removed or added. Probing every 5 or 10 seconds, with two successful requests to become healthy and three to become unhealthy, are well-tested values." Load balancing requirements for user-provisioned infrastructure.

Review Hairpin Traffic

Hairpin traffic is when a backend node's traffic is load-balanced to itself. If this type of network traffic is dropped because your load balancer does not allow hairpin traffic, you need to provide a solution.

On the integrated clouds that do not support hairpin traffic, OpenShift provides a static pod to redirect traffic destined for the load balancer VIP back to the node on the kube-apiserver.

For Reference:

This is not a recommendation, any solution provided by you will not be supported by Red Hat.

Steps to reproduce the Hairpin traffic to a node:

Deploy one sample pod
Add one service with a node port
Create the load balancer with the listener in any port. Example 80
Create the backend/target group pointing to the node port
Add the node which the pod is running to the LB/target group/backend nodes
Try to reach the load balancer IP/DNS through the pod

Components

etcd

Review etcd's disk speed requirements:

Review disk performance with etcd-fio

The KCS "How to Use 'fio' to Check Etcd Disk Performance in OCP" is a guide to check if the disk used by etcd has the expected performance on OpenShift.

Review etcd logs: etcd slow requests

This section provides a guide to check the etcd slow requests from the logs on the etcd pods to understand how the etcd is performing while running the e2e tests.

The command opct adm parse-etcd-logs reads the logs, aggregates the requests and displays results in buckets of 100ms increments up to 1s.

opct adm parse-etcd-logs is the utility to help troubleshoot the slow requests in the cluster, and help make decisions like changing the flavor of the block device used by the control plane, increasing IOPS, changing the flavor of the instances, etc.

See the command opct adm parse-etcd-logs for more information.

Mount /var/lib/etcd in separate disk

One way to improve the performance on etcd is to use a dedicated block device.

You can mount /var/lib/etcd by following the documentation:

Image Registry

You should be able to access the registry and make sure you can push and pull images on it, otherwise, the e2e tests will be reported as failed.

Please check the OpenShift documentation to validate it:

You can also explore the OpenShift sample projects that create PVC and BuildConfigs (which result in images being built and pushed to image registry). For example:

oc new-app nodejs-postgresql-persistent