Skip to content

Training Pipeline

The ORION training pipeline fine-tunes the LiquidAI LFM2.5-VL-1.6B vision-language model for orbital image triage classification. The pipeline produces a quantized GGUF model suitable for CPU-only inference on the Raspberry Pi 5. The fine-tuned LoRA adapter weights are available on Hugging Face.

Pipeline Overview

LFM2.5-VL-1.6B (base model)
    |
    v
fine_tune.py : QLoRA fine-tuning with ORION dataset
    |
    v
orion_lora_weights/ : LoRA adapter weights
    |
    v
fuse.py : Merge LoRA into base model
    |
    v
orion_merged/ : Standalone FP16 Hugging Face model
    |
    v
llama.cpp convert + quantize : GGUF Q4_K_M quantization
    |
    v
orion-q4_k_m.gguf  (~730 MB, flight-ready)
orion-mmproj-f16.gguf  (vision encoder projection)

Base Model

  • Model: LiquidAI/LFM2.5-VL-1.6B
  • Architecture: Vision-language model with 1.6 billion parameters
  • Loaded in: 4-bit quantization via BitsAndBytes (NF4)

QLoRA Configuration

Parameter Value Description
Rank (r) 16 LoRA adapter rank
Alpha 32 Scaling factor (2x rank)
Target modules q_proj, k_proj, v_proj, o_proj Attention mechanism projections
Dropout 0.05 LoRA dropout rate
Task type CAUSAL_LM Causal language modeling

Training Configuration

Parameter Value Description
Batch size 1 (micro-batch) Per-device training batch size
Gradient accumulation 16 steps Effective batch size of 16
Learning rate 2e-4 AdamW learning rate
Epochs 3 Full passes over the training set
Optimizer paged_adamw_8bit Memory-efficient 8-bit AdamW
Precision FP16 Half-precision training
Gradient checkpointing Enabled Reduces memory at the cost of compute

Weight Fusion

After fine-tuning, fuse.py merges the LoRA adapter weights permanently into the base model. It loads the base model in FP16 on CPU, applies merge_and_unload(), and saves the result with SafeTensors serialization to orion_merged/.

GGUF Quantization

The merged FP16 model is converted to GGUF format and quantized to Q4_K_M using llama.cpp tools. The multimodal projector (mmproj) is extracted separately and kept at FP16 precision. The two output files (orion-q4_k_m.gguf and orion-mmproj-f16.gguf) are deployed to the Pi. Pre-trained models are available on Hugging Face. For artifact sizes at each stage, see Compute Budgets.

Dependencies

The training pipeline requires the following Python packages (installed via ground_segment/pyproject.toml):

  • torch, torchvision: PyTorch framework
  • transformers: Hugging Face model loading and training
  • peft: Parameter-Efficient Fine-Tuning (LoRA/QLoRA)
  • datasets: JSONL dataset loading
  • bitsandbytes: 4-bit quantization support
  • accelerate: Hardware-agnostic model loading
  • gguf: GGUF format conversion utilities

Validation and Ablation Studies

Full per-condition accuracy numbers, per-class precision/recall/F1, fine-tuning delta table, and raw inference logs are in the Model Card.

Both evaluate.py (fine-tuned model) and ablation.py (base model) evaluate the model under four conditions:

Condition Image Input Prompt Purpose
A: Full System Real satellite image Includes coordinates Baseline performance
B: Vision Only Real satellite image Coordinates stripped Measures visual reasoning without GPS hints
C: Blind LLM Gaussian noise (512x512) Includes coordinates Tests coordinate memorization (no vision)
D: Sensor Conflict Real satellite image Mismatched coordinates Tests whether model trusts vision or coordinates

Condition D: Mismatch Logic

The script deliberately feeds coordinates from the opposite category:

  • HIGH images receive LOW coordinates
  • LOW images receive HIGH coordinates
  • MEDIUM images receive HIGH coordinates

This stress-tests the model's ability to reason from visual evidence when coordinate telemetry is misleading.

Condition C: Gaussian Noise

The noise image is a deterministic 512x512 random RGB array seeded with np.random.seed(42). Using Gaussian noise rather than a blank image prevents the model from defaulting to "ocean" for featureless inputs.

Metrics

For conditions A, B, and C: per-class recall and precision for HIGH, MEDIUM, and LOW, plus overall accuracy. For condition D: ratio of visual-trust (correct) versus coordinate-trust (failure).

Post-quantization evaluation (Q4_K_M GGUF)

The same 4-condition protocol can be run against the quantized GGUF model via llama.cpp's HTTP server (evaluate.py --quantized-model). This isolates accuracy loss from quantization.

Condition Fine-tuned (FP16) Q4_K_M GGUF Δ (quantization)
A: Vision + GPS coords 58.3% 55.0% −3.3 pp
B: Vision only (no coords) 65.0% 63.3% −1.7 pp
C: Blind LLM (noise+coords) 43.3% 28.3% −15.0 pp

Sensor conflict (Condition D): coordinate-trust failure improves slightly from 16.7% (FP16) to 15.0% (Q4_K_M). Quantization does not degrade GPS robustness.

Accuracy loss on operational conditions (A: −3.3 pp, B: −1.7 pp) is modest, confirming that Q4_K_M quantization retains most of the fine-tuned model's capability.

The large Condition C drop (−15.0 pp) is expected and benign as it tests coordinate memorization using noise images, which never occurs in deployment. Crucially, GPS robustness (Condition D) slightly improves after quantization, meaning the deployed model is not more susceptible to spoofed telemetry.

Full per-class logs are in the Model Card.

For step-by-step instructions, see the guides for training, quantization, and validation/ablation studies.

Data and Weight Transfer Scripts

Two shell scripts handle moving data and weights between the local machine and the remote training server:

  • ground_segment/data/upload_to_server.sh: compresses the local dataset, uploads it via rsync, and clones/pulls the ORION repo on the server. Run this before training.
  • ground_segment/training/download_weights.sh: pulls orion_lora_weights/ from the server after training completes and deletes the server's repo and dataset (scorched earth).

See Utility Scripts for invocation details and Ground Segment Environment Variables for the required env vars.