Tag Archives: OpenShift

From the Ham Shack to the Edge: Delivering Real Value with Local LLMs

What happens when you combine an NVIDIA RTX 3060, an open-weight 14-billion parameter LLM, and a global network of amateur radio operators? You get a surprisingly perfect example of edge computing.

By day, I help enterprises scale complex architectures as a Red Hat OpenShift Technical Account Manager. But like many in our industry, my passion for problem-solving does not stop when I log off. A colleague recently suggested I stop keeping my side projects to myself and share what I build in my spare time. So, let me show you how I am using small, locally hosted AI to solve real-world problems right here in my amateur radio shack.

The Context: Open Source Digital Radio

Amateur Radio is currently transitioning from analogue to digital. Much of the digital voice landscape has historically been dominated by locked-down proprietary codecs. FreeDV changes that; It is a suite of digital voice modes built entirely on open-source software. No secrets, no proprietary lock-in; just a community free to experiment.

The Problem: Finding the Signal in the Noise

When using FreeDV, operators rely on live data APIs to see who is transmitting. While standard tools are great, I wanted a way to instantly identify opportunities to communicate with rare, distant stations (known as ‘DX’ in the community) without manually parsing through an hour of rolling logs.

The Solution: FreeDV Reporter+

I built a free, web-based tool called FreeDV Reporter+. It connects to the FreeDV Socket.IO API to map live activity, but the real magic is the AI integration.

Every hour, the application analyses the live data stream, condenses the logs, and generates a human-readable summary of the best opportunities on the bands.

Under the Hood: Lean AI on Consumer Hardware

The application is a lightweight Python Flask application, containerised then deployed using Podman.

Instead of relying on expensive cloud APIs, the AI summarisation runs locally on a consumer-grade NVIDIA RTX 3060 GPU, using the qwen3-14b model via the OpenWebUI API. This is a perfect example of how a relatively small (14b) parameter model can provide real, tangible value by assessing live data right at the source.

Scaling from the Shack to the Enterprise Edge

Processing data near the physical location of the user is the fundamental definition of Edge computing. While my setup is a grassroots project, the architectural principles apply directly to modern enterprise challenges.

If an organisation wanted to scale an architecture like FreeDV Reporter+ across thousands of locations (perhaps telecom base stations, logistics networks or electrical substations), they would need to deploy and manage these AI applications consistently. This is exactly where Red Hat OpenShift comes in.

OpenShift extends Kubernetes to the edge, allowing teams to manage remote deployments with the same operational consistency as their core data centres. By utilising Red Hat OpenShift AI, teams can serve and monitor right-sized models securely at the edge, reducing latency and preserving bandwidth.

Crucially, OpenShift AI incorporates vLLM to manage the costs and performance of inferencing. vLLM is an open-source framework that drastically increases inference throughput and optimises memory usage. When deploying models to constrained edge environments where compute resources are strictly limited, this level of efficiency is absolutely vital to keeping operations lean and responsive.

Furthermore, with tools like RHEL AI and InstructLab, developers can easily fine-tune models for more specialist purposes. You get enterprise-grade accuracy for specific tasks without the massive compute costs associated with general-purpose behemoths.

The Takeaway

You do not need massive data centres to extract real value from AI. By strategically deploying small, fine-tuned LLMs at the edge, you can deliver real-time insights securely and efficiently. Whether you are hunting for rare radio signals across the globe or optimising a production line, the edge is where AI truly gets to work.

If you are a radio amateur, I would love for you to try out FreeDV Reporter+. And if you are interested in running robust AI workloads at the edge, let us connect and talk about OpenShift!

What I Learnt Giving an LLM Agent Control of My Crypto Wallet

UkkoTrader Dashboard Header showing Net Loss from early trades

In my role as a Senior OpenShift Technical Account Manager at Red Hat, I focus on mission-critical stability; helping organisations navigate the shift from cloud-native architectures to AI-ready operations. But there is a distinct difference between advising on a scalable MLOps workflow and trusting a local LLM to trade your own capital in a volatile market.

Would you trust an AI agent with your bank account? I did; and it was a masterclass in ‘Boom or Bust’ logic.

Continue reading What I Learnt Giving an LLM Agent Control of My Crypto Wallet

From Zero to Openshift in 30 Minutes

Discover how to leverage the power of kcli and libvirt to rapidly deploy a full OpenShift cluster in under 30 minutes, cutting through the complexity often associated with OpenShift installations.  

Prerequisites

Server with 8+ cores, minimum of 64GB RAM (96+ for >1 worker node)
Fast IO
– dedicated NVMe libvirt storage or
– NVMe LVMCache fronting HDD (surprisingly effective!)
OS installed (tested with CentOS Stream 8)
Packages libvirt + git installed
Pull-secret (store in openshift_pull.json) obtained from https://cloud.redhat.com/openshift/install/pull-secret

Install kcli

[steve@shift ~]$ git clone https://github.com/karmab/kcli.git
[steve@shift ~]$ cd kcli; ./install.sh

Configure parameters.yml


(see https://kcli.readthedocs.io/en/latest/#deploying-kubernetes-openshift-clusters)

example:-

[steve@shift ~]$ cat parameters.yml
cluster: shift413
domain: shift.local
version: stable
tag: '4.13'
ctlplanes: 3
workers:3
ctlplane_memory:16384
worker_memory:16384
ctlplane_numcpus: 8
worker_numcpus: 4

Note 1: To deploy Single Node Openshift (SNO) set ctlplanes to 1 and workers to 0.

Note 2: Even a fast Xeon with NVMe storage may have difficulty deploying more than 3 workers before the installer times out.
An RFE exists to make the timeout configurable, see:

https://access.redhat.com/solutions/6379571
https://issues.redhat.com/browse/RFE-2512

Deploy cluster

[steve@shift ~]$ kcli create kube openshift --paramfile parameters.yml $cluster

Note: openshift_pull.json and parameters.yml should be in your current working directory, or adjust above as required

Monitor Progress

If you wish to monitor progress, find IP of bootsrap node:-

[steve@shift ~]$ virsh net-dhcp-leases default
 Expiry Time           MAC address         Protocol   IP address           Hostname            Client ID or DUID
---------------------------------------------------------------------------------------------------------------------
 2023-07-19 15:48:02   52:54:00:08:41:71   ipv4       192.168.122.103/24   ocp413-ctlplane-0   01:52:54:00:08:41:71
 2023-07-19 15:48:02   52:54:00:10:2a:9d   ipv4       192.168.122.100/24   ocp413-ctlplane-1   01:52:54:00:10:2a:9d
 2023-07-19 15:46:30   52:54:00:2b:98:2a   ipv4       192.168.122.211/24   ocp413-bootstrap    01:52:54:00:2b:98:2a
 2023-07-19 15:48:03   52:54:00:aa:d7:02   ipv4       192.168.122.48/24    ocp413-ctlplane-2   01:52:54:00:aa:d7:02

then ssh to bootstrap node as core user and follow instructions:-

[steve@shift ~]# ssh core@192.168.122.231
journalctl -b -f -u release-image.service -u bootkube.service
Once cluster is deployed you'll receive the following message:-

INFO Waiting up to 40m0s (until 3:42PM) for the cluster at https://api.ocp413.lab.local:6443 to initialize...
INFO Checking to see if there is a route at openshift-console/console...
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/.kcli/clusters/ocp413/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp413.lab.local
INFO Login to the console with user: "kubeadmin", and password: "qTT5W-F5Cjz-BIPx2-KWXQx"
INFO Time elapsed: 16m18s                        
Deleting ocp413-bootstrap

Note: Whilst the above credentials can be found later, it’s worthwhile making a note of the above.  I save to a text file on the host.

Confirm Status

[root@shift ~]# export KUBECONFIG=/root/.kcli/clusters/ocp413/auth/kubeconfig
[root@lab ~]# oc status
In project default on server https://api.ocp413.lab.local:6443
svc/openshift - kubernetes.default.svc.cluster.local
svc/kubernetes - 172.30.0.1:443 -> 6443
View details with 'oc describe <resource>/<name>' or list resources with 'oc get all'.
[root@shift ~]# oc get nodes
NAME                          STATUS   ROLES                  AGE   VERSION
ocp413-ctlplane-0.lab.local   Ready    control-plane,master   68m   v1.26.5+7d22122
ocp413-ctlplane-1.lab.local   Ready    control-plane,master   68m   v1.26.5+7d22122
ocp413-ctlplane-2.lab.local   Ready    control-plane,master   68m   v1.26.5+7d22122
ocp413-worker-0.lab.local     Ready    worker                 51m   v1.26.5+7d22122
ocp413-worker-1.lab.local     Ready    worker                 51m   v1.26.5+7d22122
ocp413-worker-2.lab.local     Ready    worker                 52m   v1.26.5+7d22122
[root@shift ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.4    True        False         42m     Cluster version is 4.13.4

And logging in via https://console-openshift-console.apps.ocp413.lab.local/ 

Note: If the cluster is not installed on your workstation, it’s may be easier to install a browser on the server then forward X connections, rather than maintaining a local hosts file or modifying local DNS to catch and resolve local cluster queries:

ssh -X user@server 

Success \o/

For detailed kcli documentation see: https://kcli.readthedocs.io/en/latest/

Openshift: Recovery from Head Gear (or Node) Failure

This is another question that has been raised several times recently. Perhaps a node vanishes and is unrecoverable, how do we recover from the loss of a head gear? Is it possible to promote a normal gear to head status?

The simple answer appears to be … no.

The solution here is to run backups of /var/lib/openshift on all nodes.

In the case of node failure a fresh node can be built, added to the district, /var/lib/openshift restored from backup then a ‘oo-admin-regenerate-gear-metadata’ executed. This (as the name suggests) recreates metadata associated with all gears on the node. This includes gear entries in passwd/group files, cgroup rules and limits.conf.

OpenShift: Testing of Resource Limits, CGroups

Recently I’ve had two customers asking the same question.

How can we put sufficient load on a gear or node in order to demonstrate:-
a) cgroup limits
b) guaranteed resource allocation
c) ‘worst case scenario’ performance expectations

This is perhaps a reasonable question but very difficult to answer. Most of the limits in OSE are imposed by cgroups, mostly with clearly defined limits (as defined in the nodes /etc/openshift/resource_limits.conf). The two obvious exceptions are disk space (using quota) and CPU.

Whilst CPU is implemented by cgroups, this is defined in terms of shares; You can’t guarantee a gear x cpu cycles, only allocate a share and always in competition with other gears. However, by default a gear will only use one CPU core.

When trying to create a cartridge to demonstrate behavior under load, I quickly realised the openshift-watchman process is quick to throttle misbehaving gears. If during testing you see unexpected behaviour, remember to test with and without watchman running!

I took the DIY cartridge as an example and modified the start hook to start a ‘stress’ process. Environment variables can be set using rhc to specify number of CPU, VM, IO and HD threads. This cartridge does not create network load.

http://www.track3.org.uk/~steve/openshift/openshift-snetting-cartridge-stress-0.0.1-1.el6.x86_64.rpm

Collection and analysis of load/io data is left to the user.

Creating of a ‘stress’ application:-


[steve@broker ~]$ rhc app create snstress stress
Using snetting-stress-0.1 (StressTest 0.1) for 'stress'

Application Options
-------------------
Domain: steve
Cartridges: snetting-stress-0.1
Gear Size: default
Scaling: no

Creating application 'snstress' ... done

Disclaimer: Experimental cartridge to stress test a gear (CPU/IO).
Use top/iotop/vmstat/sar to demonstrate cgroup limits and watchman throttling.
STRESS_CPU_THREADS=1
STRESS_IO_THREADS=0
STRESS_VM_THREADS=0
STRESS_HD_THREADS=0
Note: To override these values use 'rhc env-set' and restart gear
See http://tinyurl.com/procgrr for Resource Management Guide
Stress testing started.

Waiting for your DNS name to be available ... done

Initialized empty Git repository in /home/steve/snstress/.git/

Your application 'snstress' is now available.

URL: http://snstress-steve.example.com/
SSH to: 55647297e3c9c34266000137@snstress-steve.example.com
Git remote: ssh://55647297e3c9c34266000137@snstress-steve.example.com/~/git/snstress.git/
Cloned to: /home/steve/snstress

Run 'rhc show-app snstress' for more details about your app.

‘top’ running on the target node (one core at 100% user):-


top - 14:19:49 up 5:50, 1 user, load average: 0.76, 0.26, 0.11
Tasks: 139 total, 3 running, 135 sleeping, 0 stopped, 1 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1019812k total, 474820k used, 544992k free, 75184k buffers
Swap: 835580k total, 440k used, 835140k free, 80596k cached

Using rhc we stop the application, define some variables (add IO worker threads) and restart:-


[steve@broker ~]$ rhc app stop snstress
RESULT:
snstress stopped
[steve@broker ~]$ rhc app-env STRESS_IO_THREADS=1 --app snstress
Setting environment variable(s) ... done
[steve@broker ~]$ rhc app-env STRESS_VM_THREADS=1 --app snstress
Setting environment variable(s) ... done
[steve@broker ~]$ rhc app-env STRESS_HD_THREADS=1 --app snstress
Setting environment variable(s) ... done
[steve@broker ~]$ rhc app start snstress
RESULT:
snstress started

Check node ‘top’ again (note multiple threads):-


top - 14:23:20 up 5:54, 1 user, load average: 0.53, 0.40, 0.20
Tasks: 142 total, 4 running, 137 sleeping, 0 stopped, 1 zombie
Cpu0 : 1.3%us, 0.3%sy, 0.0%ni, 97.7%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.7%us, 11.9%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 2.6%us, 7.3%sy, 0.0%ni, 86.8%id, 2.6%wa, 0.0%hi, 0.0%si, 0.7%st
Cpu3 : 6.6%us, 0.3%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 1019812k total, 636048k used, 383764k free, 64732k buffers
Swap: 835580k total, 692k used, 834888k free, 68716k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20637 4325 20 0 262m 198m 176 R 12.0 19.9 0:04.35 stress
20635 4325 20 0 6516 192 100 R 9.6 0.0 0:04.33 stress
20636 4325 20 0 6516 188 96 R 8.0 0.0 0:02.42 stress

Not what’s expected?


[root@node1 ~]# service openshift-watchman status
Watchman is running

Hmmm…


[root@node1 node]# tail -f /var/log/messages
May 26 15:33:55 node1 watchman[7672]: Throttler: throttle => 55647297e3c9c34266000137 (99.99)

… demonstrating watchman is doing its job! But, let’s stop watchman and let the abuse begin…


[root@node1 ~]# service openshift-watchman stop
Stopping Watchman

Top (notice high IO Wait)…


top - 14:26:46 up 5:57, 1 user, load average: 0.70, 0.41, 0.22
Tasks: 142 total, 4 running, 137 sleeping, 0 stopped, 1 zombie
Cpu0 : 0.0%us, 5.4%sy, 0.0%ni, 23.7%id, 69.5%wa, 0.3%hi, 0.3%si, 0.7%st
Cpu1 : 0.3%us, 6.0%sy, 0.0%ni, 64.2%id, 27.8%wa, 0.0%hi, 0.7%si, 1.0%st
Cpu2 : 12.2%us, 0.0%sy, 0.0%ni, 87.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.7%st
Cpu3 : 0.7%us, 11.3%sy, 0.0%ni, 76.4%id, 10.6%wa, 0.0%hi, 0.7%si, 0.3%st

Mem: 1019812k total, 910040k used, 109772k free, 66360k buffers
Swap: 835580k total, 692k used, 834888k free, 339780k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22182 4325 20 0 6516 192 100 R 12.3 0.0 0:00.70 stress
22184 4325 20 0 262m 226m 176 R 10.6 22.7 0:00.60 stress
22185 4325 20 0 7464 1264 152 R 7.3 0.1 0:00.53 stress

Further analysis can be done using vmstat, iotop, sar or your tool of preference.

If IO stops after a few seconds it’s also worth tailing your application log:-


[steve@broker ~]$ rhc tail snstress
[2015-05-26 14:25:34] INFO going to shutdown ...
[2015-05-26 14:25:34] INFO WEBrick::HTTPServer#start done.
stress: info: [21775] dispatching hogs: 1 cpu, 1 io, 1 vm, 1 hdd
[2015-05-26 14:25:35] INFO WEBrick 1.3.1
[2015-05-26 14:25:35] INFO ruby 1.8.7 (2013-06-27) [x86_64-linux]
[2015-05-26 14:25:36] INFO WEBrick::HTTPServer#start: pid=21773 port=8080
stress: FAIL: [21780] (591) write failed: Disk quota exceeded
stress: FAIL: [21775] (394) <-- worker 21780 returned error 1 stress: WARN: [21775] (396) now reaping child worker processes stress: FAIL: [21775] (400) kill error: No such process

I hope someone, somewhere, finds this useful :o)

OSE 2.x Support Node (MongoDB) Firewall

This is effectively a ‘reverse firewall’;  allow everything *except* connections to MongoDB.  A connection to Mongo without authentication can do little more than query the MongoDB db.version() however some still consider this a security risk.

#!/bin/bash -x
#
# Script to firewall Openshift Support (Mongo) Nodes
# 21/04/15 snetting

IPTABLES=/sbin/iptables

# Add all brokers and support nodes here (use FQDNs)
OSE_HOSTS="broker1.domain
broker2.domain
supportnode1.domain
supportnode2.domain"

# Convert to IPs and add localhost
MONGO_IPS=$(dig $OSE_HOSTS +short)
MONGO_IPS="$(echo $MONGO_IPS | tr ' ' ','),127.0.0.1"

# Add iptables ACCEPT rules
$IPTABLES -A INPUT -p tcp -s $MONGO_IPS --destination-port 27017 -j ACCEPT

# Add iptables REJECT (port 27017)
$IPTABLES -A INPUT -p tcp --destination-port 27017 -j REJECT