All posts by Steven Netting

From Zero to Openshift in 30 Minutes

Discover how to leverage the power of kcli and libvirt to rapidly deploy a full OpenShift cluster in under 30 minutes, cutting through the complexity often associated with OpenShift installations.  

Prerequisites

Server with 8+ cores, minimum of 64GB RAM (96+ for >1 worker node)
Fast IO
– dedicated NVMe libvirt storage or
– NVMe LVMCache fronting HDD (surprisingly effective!)
OS installed (tested with CentOS Stream 8)
Packages libvirt + git installed
Pull-secret (store in openshift_pull.json) obtained from https://cloud.redhat.com/openshift/install/pull-secret

Install kcli

[steve@shift ~]$ git clone https://github.com/karmab/kcli.git
[steve@shift ~]$ cd kcli; ./install.sh

Configure parameters.yml


(see https://kcli.readthedocs.io/en/latest/#deploying-kubernetes-openshift-clusters)

example:-

[steve@shift ~]$ cat parameters.yml
cluster: shift413
domain: shift.local
version: stable
tag: '4.13'
ctlplanes: 3
workers:3
ctlplane_memory:16384
worker_memory:16384
ctlplane_numcpus: 8
worker_numcpus: 4

Note 1: To deploy Single Node Openshift (SNO) set ctlplanes to 1 and workers to 0.

Note 2: Even a fast Xeon with NVMe storage may have difficulty deploying more than 3 workers before the installer times out.
An RFE exists to make the timeout configurable, see:

https://access.redhat.com/solutions/6379571
https://issues.redhat.com/browse/RFE-2512

Deploy cluster

[steve@shift ~]$ kcli create kube openshift --paramfile parameters.yml $cluster

Note: openshift_pull.json and parameters.yml should be in your current working directory, or adjust above as required

Monitor Progress

If you wish to monitor progress, find IP of bootsrap node:-

[steve@shift ~]$ virsh net-dhcp-leases default
 Expiry Time           MAC address         Protocol   IP address           Hostname            Client ID or DUID
---------------------------------------------------------------------------------------------------------------------
 2023-07-19 15:48:02   52:54:00:08:41:71   ipv4       192.168.122.103/24   ocp413-ctlplane-0   01:52:54:00:08:41:71
 2023-07-19 15:48:02   52:54:00:10:2a:9d   ipv4       192.168.122.100/24   ocp413-ctlplane-1   01:52:54:00:10:2a:9d
 2023-07-19 15:46:30   52:54:00:2b:98:2a   ipv4       192.168.122.211/24   ocp413-bootstrap    01:52:54:00:2b:98:2a
 2023-07-19 15:48:03   52:54:00:aa:d7:02   ipv4       192.168.122.48/24    ocp413-ctlplane-2   01:52:54:00:aa:d7:02

then ssh to bootstrap node as core user and follow instructions:-

[steve@shift ~]# ssh core@192.168.122.231
journalctl -b -f -u release-image.service -u bootkube.service
Once cluster is deployed you'll receive the following message:-

INFO Waiting up to 40m0s (until 3:42PM) for the cluster at https://api.ocp413.lab.local:6443 to initialize...
INFO Checking to see if there is a route at openshift-console/console...
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/.kcli/clusters/ocp413/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp413.lab.local
INFO Login to the console with user: "kubeadmin", and password: "qTT5W-F5Cjz-BIPx2-KWXQx"
INFO Time elapsed: 16m18s                        
Deleting ocp413-bootstrap

Note: Whilst the above credentials can be found later, it’s worthwhile making a note of the above.  I save to a text file on the host.

Confirm Status

[root@shift ~]# export KUBECONFIG=/root/.kcli/clusters/ocp413/auth/kubeconfig
[root@lab ~]# oc status
In project default on server https://api.ocp413.lab.local:6443
svc/openshift - kubernetes.default.svc.cluster.local
svc/kubernetes - 172.30.0.1:443 -> 6443
View details with 'oc describe <resource>/<name>' or list resources with 'oc get all'.
[root@shift ~]# oc get nodes
NAME                          STATUS   ROLES                  AGE   VERSION
ocp413-ctlplane-0.lab.local   Ready    control-plane,master   68m   v1.26.5+7d22122
ocp413-ctlplane-1.lab.local   Ready    control-plane,master   68m   v1.26.5+7d22122
ocp413-ctlplane-2.lab.local   Ready    control-plane,master   68m   v1.26.5+7d22122
ocp413-worker-0.lab.local     Ready    worker                 51m   v1.26.5+7d22122
ocp413-worker-1.lab.local     Ready    worker                 51m   v1.26.5+7d22122
ocp413-worker-2.lab.local     Ready    worker                 52m   v1.26.5+7d22122
[root@shift ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.4    True        False         42m     Cluster version is 4.13.4

And logging in via https://console-openshift-console.apps.ocp413.lab.local/ 

Note: If the cluster is not installed on your workstation, it’s may be easier to install a browser on the server then forward X connections, rather than maintaining a local hosts file or modifying local DNS to catch and resolve local cluster queries:

ssh -X user@server 

Success \o/

For detailed kcli documentation see: https://kcli.readthedocs.io/en/latest/

OpenShift: How to determine the latest version in an update channel.

Latest OpenShift Releases using Red Hat OpenShift Console
  1. Visit https://console.redhat.com/openshift/releases
    or
  2. Visit the Red Hat OpenShift Container Update Graph at https://access.redhat.com/labs/ocpupgradegraph/update_channel
  3. Using the CLI (curl & jq):-
curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.11 | jq -r '.nodes[].version' | sort -V | tail -n1

Also, to check available upgrade edges:-

curl -s -XGET "https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.11" --header 'Accept:application/json' |jq '. as $graph | $graph.nodes | map(.version == "4.10.36") | index(true) as $orig | $graph.edges | map(select(.[0] == $orig)[1]) | map($graph.nodes[.]) | .[].version'

Further examples can be found at https://access.redhat.com/solutions/4583231

Stable Diffusion on cpu

Example image of cat
CPU Rendered Cat 512×512

AKA “Stable Diffusion without a GPU” 🙂

Currently, the ‘Use CPU if no CUDA device detected’ [1] pull request has not merged. Following the instructions at [2] and jumping down the dependency rabbit hole, I finally have Stable Diffusion running on an old dual XEON server.

[1] https://github.com/CompVis/stable-diffusion/pull/132
[2] https://github.com/CompVis/stable-diffusion/issues/219

Server specs:-
Fujitsu R290 Xeon Workstation
Dual Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz
96 GB RAM
SSD storage

Sample command line:-

time python3.8 scripts/txt2img.py --prompt "AI image creation using CPU" --plms --ckpt sd-v1-4.ckpt --H 768 --W 512 --skip_grid --n_samples 1 --n_rows 1 --ddim_steps 50

Initial tests show the following:-

ResolutionStepsRAM
(GB)
Time
(minutes)
768 x 51250~1015
1024 x 76850~3024
1280 x 102450~6566
1536 x 128050OOMN/A
Resolution, Peak RAM and Time to Render

Notes:
1) Typically only 18 (out of 32 cores) active regardless of render size.
2) As expected, the calculation is entirely CPU bound.
3) For an unknown reason, even with –n_samples and –n_rows of 1, two images were still created (time halved for single image in above table).

Another CPU Rendered Cat 512×512

Conclusion:

It works. We gain resolution at the huge expense of memory and time.

AmigaOS4.1 (PPC) under FS-UAE and QEMU

I recently purchased AmigaOS 4.1 with a plan to familiarise myself with the OS via emulation before purchasing the Freescale QorIQ P1022 e500v2 ‘Tabor’ motherboard. In particular, I wanted to investigate the ssh and X display options, including AmiCygnix.

OS4.1 running under FS-UAE & QEMU, showing config and network status

However, despite being familiar with OS3.1 and FS-UAE I still managed to hit a few gotchas with the OS4 install and configuration.

Installation of the QEMU module was simple using the download and simple instructions from: https://fs-uae.net/download#plugins. In my case this was version 3.8.2qemu2.2.0 and installed in ~/Documents/FS-UAE/Plugins/QEMU-UAE/Linux/x86-64/ (your path may vary).

I then tried multiple FS-UAE configurations in order to get the emulated machine to boot with PPC, RTG and network support. A few options clash resulting in a purple screen on boot. Rather than work through the process from scratch, it’s easier to simply list my config here:-

[fs-uae]
accelerator = cyberstorm-ppc
amiga_model = A4000/OS4
gfx_card_flash_file = /home/snetting/Documents/FS-UAE/Kickstarts/picasso_iv_flash.rom
graphics_card = picasso-iv
graphics_memory = 131072
hard_drive_0 = /home/snetting/Amiga/SteveOS41.hdf
kickstart_file = Kickstart v3.1 rev 40.70 (1993)(Commodore)(A4000).rom
network_card = a2065
zorro_iii_memory = 524288

I used FS-UAE (and FS-UAE-Launcher) version 2.8.3.

Things to note:

  1. See http://eab.abime.net/showthread.php?t=75195 for install advice regarding disk partitioning and FS type. This is important!
  2. Shared folders (between host OS and Emulation) are *not* currently supported when using PPC under FS-UAE. Post install, many additional packages were required, including network drivers which resulted in a catch-22 situation. I worked around this by installing a 3.1.4 instance and mounting both the OS4 and ‘shared’ drives here, copying the required files over then booting back into the OS4 PPC environment.
  3. For networking, UAE.bsdsocket.library in UAE should be disabled but the A2065 network card enabled. The correct driver from aminet is: http://aminet.net/package/driver/net/Ethernet
  4. The latest updates to OS4.1 (final) enable Zorro III RAM to be used in addition to accelerator RAM; essential for AmiCygnix. Once OS4.1 is installed and network configured, use the included update tool to pull OS4.1 FE updates.

The documentation at http://eab.abime.net/showthread.php?t=75195 is definitely useful as a reference but don’t rely on it; it’s dated (2014) and not necessarily accurate.

Whilst I’ve written this from memory, I’ll happily recreate my install from scratch if anyone has any specific questions or issues.

Good luck!

ROMs are available from Cloanto: https://www.amigaforever.com/
OS4.1 and updates from Hyperion: https://www.amigaos.net/

Atari ST and Amiga Desktop Wallpapers

I couldn’t find any good quality 1920×1080 (so called ‘full HD’) desktop wallpapers featuring either Atari ST GEM or Commodore Amiga Workbench 1.3. So, assembled from parts taken from various images on google, scaled with correct aspect ration maintained, tidied and assembled to fill the full resolution and with no JPEG compression artifacts – here we are:-

Atari GEM Desktop, 1920×1080 PNG
Commodore Amiga Workbench 1.3 + Boing Ball, 1920×1080 PNG

You’re welcome 🙂

Building qtel (Echolink client) under Fedora Linux

With both my previous bad experience building qtel (the Linux EchoLink client) and recent discussions on a forum around similar difficulties – I thought I’d identify, resolve and document the issues.

I’m not sure what’s changed but the process is now very simple (Fedora 28):-

git clone https://github.com/sm0svx/svxlink.git
cd svxlink/
cd src
sudo dnf install cmake libsigc++20-devel qt-devel popt-devel libgcrypt-devel gsm-devel tcl-devel
cmake .
make
cp bin/qtel DESTINATION_PATH_OF_CHOICE

Depending on libs already installed, additional packages may be required – as indicated by failures during the ‘cmake’ stage.

GlusterFS / VG Metrics in Prometheus (OCP)

We had a requirement to gather LVM (VG) metrics via Prometheus to alert when GlusterFS is running low on ‘brick’ storage space. Currently, within Openshift 3.9 the only metrics seem to relate to mounted FS. A ‘heketi exporter module’ exists but this only reports space within allocated blocks. There doesn’t appear to be any method to pull metrics from the underlying storage.

We solved this by using a Prometheus pushgateway. Metrics are pushed from Gluster hosts using curl (via cron) and then pulled using a standard Prometheus scrape configuration (via prometheus configmap in OCP). Alerts are then pushed via alertmanager and eventually Cloudforms.

Import the pushgateway image:

oc import-image openshift/prom-pushgateway --from= docker.io/prom/pushgateway --confirm

Create pod and expose route. Then, add scrape config to prometheus configmap:-


- job_name: openshift-pushgateway
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- pushgateway-route.example.com

On GlusterFS hosts we then gather metrics in whatever way we like and push to the gateway via curl. Example:-

echo "VG_Gluster 42" | curl --data-binary @- http://pushgateway-route.example.com/metrics/job/pv_mon/pv/vg"

The metrics are then visible via prometheus UI / Grafana and alerts via alertmanager and CFME respectively.

Gnome 3 (Fedora 27) Screen Lock Timeout

Gnome 3 doesn’t appear to offer any GUI control over the screen lock timeout.

So, to get current values:-

[snetting@lapper ~]$ gsettings get org.gnome.desktop.session idle-delay
uint32 300
[snetting@lapper ~]$ gsettings get org.gnome.desktop.screensaver lock-delay
uint32 0

And to set:-

[snetting@lapper ~]$ gsettings set org.gnome.desktop.session idle-delay 600

idle-delay is the time taken to blank the screen.
lock-delay is an additional delay before locking.

OSEv3 Node Utilisation

A quick and dirty script to query all nodes for utilisation data:-


#/bin/bash

printf "%-12s %-25s %-4s %-15s %-20s %-18s %-8s \n" "NODE" "STATE" "PODS" "CPU Req" "CPU Lim" "Memory Req" "Memory Lim"
oc get nodes --show-labels | grep user | while read NODE STAT stuff
do
printf "%-12s %-25s %-5s " $(echo $NODE | cut -f1 -d. ) $STAT $(oadm manage-node --list-pods $NODE 2> /dev/null | sed '/^NAME.*/d' | wc -l)
printf "%-7s %-7s %-7s %-12s %-12s %-5s %-12s %-5s\n" $(oc describe node $NODE | grep -a2 "CPU Requests" | tail -1)
done

Docker Panic, CMD and Hashbang

Noticed issue when rebuilding dockerfile and running image:-

panic: standard_init_linux.go:178: exec user process caused "exec format error" [recovered]
	panic: standard_init_linux.go:178: exec user process caused "exec format error"

goroutine 1 [running, locked to thread]:
panic(0x6f3080, 0xc4201393b0)

Did much digging, identified that when specifying a script as a CMD in the Dockerfile, this script now requires a proper hashbang (aka shebang) or the above panic results.

#!/bin/bash
rm -rf /run/httpd/* /tmp/httpd*
exec /usr/sbin/apachectl -DFOREGROUND

Rebuilding the docker image with –no-cache option ensures the updated file is included.