Category Archives: linux

Fedora 42 Meets CUDA 12.9: The Quest to Build vllm (InstructLab)

Over the past couple of weeks I’ve been wrestling with building vllm (with CUDA support) under Fedora 42. Here’s the short version of what went wrong:-

  1. Python version confusion
    • My virtualenv was pointing at Python 3.11 but CMake kept complaining it couldn’t find “python3.11.”
    • Fix: explicitly passed -DPYTHON_EXECUTABLE=$(which python) to CMake, which got past the Python lookup errors.
  2. CUDA toolkit headers/libs not found
    • Although Fedora’s CUDA 12.9 RPMs were installed, CMake couldn’t locate CUDA_INCLUDE_DIRS or CUDA_CUDART_LIBRARY.
    • Fix: set CUDA_HOME=/usr/local/cuda-12.9 and passed -DCUDA_TOOLKIT_ROOT_DIR & -DCUDA_SDK_ROOT_DIR to CMake.
  3. cuDNN import errors
    • Pip’s PyTorch import of libcudnn.so.9 failed during the vllm build.
    • Fix: reinstalled torch/cu121 via the official PyTorch Cu121 wheel index so that all the nvidia-cudnn-cu12 wheels were in place.
  4. GCC / Clang version mismatches
    • CUDA 12.9’s nvcc choked on GCC 15 (“unsupported GNU version”) and later on Clang 20.
    • Tried installing gcc-14 and symlinking it into PATH, exporting CC=/usr/bin/gcc-14 / CXX=/usr/bin/g++-14, and even passing -DCMAKE_CUDA_HOST_COMPILER, but CMake’s CUDA‐ID test was still failing on the Fedora header mismatch.
    • Ultimately we switched to Clang 20 with --allow-unsupported-compiler, which let us get past the version “block.”
  5. Math header noexcept conflicts
    • CMake’s nvcc identification build then ran into four “expected a declaration” errors in CUDA’s math_functions.h, caused by mismatched noexcept(true) on sinpi/cospi vs system headers.
    • I patched those lines (removing or adding, I forget which, the trailing noexcept(true)) so cudafe++ could preprocess happily.
  6. Missing NVToolsExt library
    • After all that, CMake could find CUDA and compile—but hit kotlinCopyEditThe link interface of target "torch::nvtoolsext" contains: CUDA::nvToolsExt but the target was not found.
    • Looking under /usr/local/cuda-12.9, there was no libnvToolsExt.so* at all—only the NVTX‐3 interop helper (libnvtx3interop.so*) lived under the extracted toolkit tree.

Current hurdle
I still don’t have the core NVTX library (libnvToolsExt.so.*) in /usr/local/cuda-12.9/…/lib, so the CMake target CUDA::nvToolsExt remains unavailable. This library appears to be missing from the both the Fedora cuda-nvtx and the NVIDIA nvtx toolkit download/runfile. This appears to be a known issue with recent versions.

Work continues and a full process will be documented, once successful.

Stable Diffusion on cpu

Example image of cat
CPU Rendered Cat 512×512

AKA “Stable Diffusion without a GPU” 🙂

Currently, the ‘Use CPU if no CUDA device detected’ [1] pull request has not merged. Following the instructions at [2] and jumping down the dependency rabbit hole, I finally have Stable Diffusion running on an old dual XEON server.

[1] https://github.com/CompVis/stable-diffusion/pull/132
[2] https://github.com/CompVis/stable-diffusion/issues/219

Server specs:-
Fujitsu R290 Xeon Workstation
Dual Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz
96 GB RAM
SSD storage

Sample command line:-

time python3.8 scripts/txt2img.py --prompt "AI image creation using CPU" --plms --ckpt sd-v1-4.ckpt --H 768 --W 512 --skip_grid --n_samples 1 --n_rows 1 --ddim_steps 50

Initial tests show the following:-

ResolutionStepsRAM
(GB)
Time
(minutes)
768 x 51250~1015
1024 x 76850~3024
1280 x 102450~6566
1536 x 128050OOMN/A
Resolution, Peak RAM and Time to Render

Notes:
1) Typically only 18 (out of 32 cores) active regardless of render size.
2) As expected, the calculation is entirely CPU bound.
3) For an unknown reason, even with –n_samples and –n_rows of 1, two images were still created (time halved for single image in above table).

Another CPU Rendered Cat 512×512

Conclusion:

It works. We gain resolution at the huge expense of memory and time.

AmigaOS4.1 (PPC) under FS-UAE and QEMU

I recently purchased AmigaOS 4.1 with a plan to familiarise myself with the OS via emulation before purchasing the Freescale QorIQ P1022 e500v2 ‘Tabor’ motherboard. In particular, I wanted to investigate the ssh and X display options, including AmiCygnix.

OS4.1 running under FS-UAE & QEMU, showing config and network status

However, despite being familiar with OS3.1 and FS-UAE I still managed to hit a few gotchas with the OS4 install and configuration.

Installation of the QEMU module was simple using the download and simple instructions from: https://fs-uae.net/download#plugins. In my case this was version 3.8.2qemu2.2.0 and installed in ~/Documents/FS-UAE/Plugins/QEMU-UAE/Linux/x86-64/ (your path may vary).

I then tried multiple FS-UAE configurations in order to get the emulated machine to boot with PPC, RTG and network support. A few options clash resulting in a purple screen on boot. Rather than work through the process from scratch, it’s easier to simply list my config here:-

[fs-uae]
accelerator = cyberstorm-ppc
amiga_model = A4000/OS4
gfx_card_flash_file = /home/snetting/Documents/FS-UAE/Kickstarts/picasso_iv_flash.rom
graphics_card = picasso-iv
graphics_memory = 131072
hard_drive_0 = /home/snetting/Amiga/SteveOS41.hdf
kickstart_file = Kickstart v3.1 rev 40.70 (1993)(Commodore)(A4000).rom
network_card = a2065
zorro_iii_memory = 524288

I used FS-UAE (and FS-UAE-Launcher) version 2.8.3.

Things to note:

  1. See http://eab.abime.net/showthread.php?t=75195 for install advice regarding disk partitioning and FS type. This is important!
  2. Shared folders (between host OS and Emulation) are *not* currently supported when using PPC under FS-UAE. Post install, many additional packages were required, including network drivers which resulted in a catch-22 situation. I worked around this by installing a 3.1.4 instance and mounting both the OS4 and ‘shared’ drives here, copying the required files over then booting back into the OS4 PPC environment.
  3. For networking, UAE.bsdsocket.library in UAE should be disabled but the A2065 network card enabled. The correct driver from aminet is: http://aminet.net/package/driver/net/Ethernet
  4. The latest updates to OS4.1 (final) enable Zorro III RAM to be used in addition to accelerator RAM; essential for AmiCygnix. Once OS4.1 is installed and network configured, use the included update tool to pull OS4.1 FE updates.

The documentation at http://eab.abime.net/showthread.php?t=75195 is definitely useful as a reference but don’t rely on it; it’s dated (2014) and not necessarily accurate.

Whilst I’ve written this from memory, I’ll happily recreate my install from scratch if anyone has any specific questions or issues.

Good luck!

ROMs are available from Cloanto: https://www.amigaforever.com/
OS4.1 and updates from Hyperion: https://www.amigaos.net/

Building qtel (Echolink client) under Fedora Linux

With both my previous bad experience building qtel (the Linux EchoLink client) and recent discussions on a forum around similar difficulties – I thought I’d identify, resolve and document the issues.

I’m not sure what’s changed but the process is now very simple (Fedora 28):-

git clone https://github.com/sm0svx/svxlink.git
cd svxlink/
cd src
sudo dnf install cmake libsigc++20-devel qt-devel popt-devel libgcrypt-devel gsm-devel tcl-devel
cmake .
make
cp bin/qtel DESTINATION_PATH_OF_CHOICE

Depending on libs already installed, additional packages may be required – as indicated by failures during the ‘cmake’ stage.

GlusterFS / VG Metrics in Prometheus (OCP)

We had a requirement to gather LVM (VG) metrics via Prometheus to alert when GlusterFS is running low on ‘brick’ storage space. Currently, within Openshift 3.9 the only metrics seem to relate to mounted FS. A ‘heketi exporter module’ exists but this only reports space within allocated blocks. There doesn’t appear to be any method to pull metrics from the underlying storage.

We solved this by using a Prometheus pushgateway. Metrics are pushed from Gluster hosts using curl (via cron) and then pulled using a standard Prometheus scrape configuration (via prometheus configmap in OCP). Alerts are then pushed via alertmanager and eventually Cloudforms.

Import the pushgateway image:

oc import-image openshift/prom-pushgateway --from= docker.io/prom/pushgateway --confirm

Create pod and expose route. Then, add scrape config to prometheus configmap:-


- job_name: openshift-pushgateway
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- pushgateway-route.example.com

On GlusterFS hosts we then gather metrics in whatever way we like and push to the gateway via curl. Example:-

echo "VG_Gluster 42" | curl --data-binary @- http://pushgateway-route.example.com/metrics/job/pv_mon/pv/vg"

The metrics are then visible via prometheus UI / Grafana and alerts via alertmanager and CFME respectively.

Dummy java loop/sleep for test of init scripts

A dummy java executable (actually a jar) was required to develop init scripts without access to the client’s application.   The process of creating a Java ‘sleep’ application and wrapping within a ‘jar’ complete with manifest was not obvious to me.  The ‘thread.sleep’ also didn’t work as I expected, requiring an additional exception handler.  Not to mention the requirement for the manifest to require multiple new lines before being syntactically correct (and no report otherwise when incorrectly parsed, except ‘no main manifest attribute’ when attempting to run).  Why Java, WHY?

The following tgz contains both the compiled java executable plus source, manifest and instructions to build / compile the jar should the wait time (default 100 seconds) need to be modified.

WaitLoop.tgz (source and executable tgz)
WaitLoop (Github Project)

The .jar can be executed with:-

java -jar WaitLoop.jar

X forwarding over ssh and sudo

This has bugged me for years – with random success depending on sudo, su – etc.

The proper solution:-

steve@studio:~$ ssh -X 192.168.0.201
Last login: Fri Feb 10 21:54:11 2017 from 192.168.0.247
[steve@fleabox ~]$ xauth list
fleabox.track3.org.uk/unix:12  MIT-MAGIC-COOKIE-1  b4339e07fb0e4febdde6128fc56419e4
[steve@fleabox ~]$ sudo su -
[sudo] password for steve: 
[root@fleabox ~]# xauth add fleabox.track3.org.uk/unix:12  MIT-MAGIC-COOKIE-1  b4339e07fb0e4febdde6128fc56419e4
[root@fleabox ~]# virt-manager &
[1] 7168

Success!

Docker and Apache QuickStart

mkdir -p dockerfile/C7httpd; cd dockerfile/C7httpd
vi Dockerfile

FROM centos:7
MAINTAINER "Steve Netting" steve@netting.org.uk
ENV container docker
RUN yum -y --setopt=tsflags=nodocs update && \
yum -y --setopt=tsflags=nodocs install httpd && \
yum clean all

EXPOSE 80

ADD run-httpd.sh /run-httpd.sh
RUN chmod -v +x /run-httpd.sh

CMD ["/run-httpd.sh"]

vi run-httpd.sh

#!/bin/bash

# Make sure we're not confused by old, incompletely-shutdown httpd
# context after restarting the container. httpd won't start correctly
# if it thinks it is already running.
rm -rf /run/httpd/* /tmp/httpd*

exec /usr/sbin/apachectl -DFOREGROUND

docker build .
docker images
docker run -d -p:8082:80 steve/httpd:latest
docker ps
curl http://localhost:8082

... If you can read this page it means that this site is working properly ...

To start interactive shell from inside running container:
docker exec -i -t romantic_noyce /bin/bash

Retrieving SCSI/LUN IDs from Linux /dev/sd*

Useful to identify disks present before/after a rescan of LUNs:-

for x in `ls -1 /dev/sd*`; do echo -n "$x:"; sudo /sbin/scsi_id -g $x; done > file.txt

The output is then in the format:-

/dev/sdt:360a9800044316f6f543f4646506b334d

Capture output before/after rescan of SCSI bus then use something like this to reveal the newly added LUNs (SCSI IDs):-

sdiff prescan.txt postscan.txt | grep ">"  | awk '{ print $2 }' | cut -d":" -f2 | sort -n | uniq