Training Pipeline¶
The ORION training pipeline fine-tunes the LiquidAI LFM2.5-VL-1.6B vision-language model for orbital image triage classification. The pipeline produces a quantized GGUF model suitable for CPU-only inference on the Raspberry Pi 5. The fine-tuned LoRA adapter weights are available on Hugging Face.
Pipeline Overview¶
LFM2.5-VL-1.6B (base model)
|
v
fine_tune.py : QLoRA fine-tuning with ORION dataset
|
v
orion_lora_weights/ : LoRA adapter weights
|
v
fuse.py : Merge LoRA into base model
|
v
orion_merged/ : Standalone FP16 Hugging Face model
|
v
llama.cpp convert + quantize : GGUF Q4_K_M quantization
|
v
orion-q4_k_m.gguf (~730 MB, flight-ready)
orion-mmproj-f16.gguf (vision encoder projection)
Base Model¶
- Model:
LiquidAI/LFM2.5-VL-1.6B - Architecture: Vision-language model with 1.6 billion parameters
- Loaded in: 4-bit quantization via BitsAndBytes (NF4)
QLoRA Configuration¶
| Parameter | Value | Description |
|---|---|---|
| Rank (r) | 16 | LoRA adapter rank |
| Alpha | 32 | Scaling factor (2x rank) |
| Target modules | q_proj, k_proj, v_proj, o_proj |
Attention mechanism projections |
| Dropout | 0.05 | LoRA dropout rate |
| Task type | CAUSAL_LM | Causal language modeling |
Training Configuration¶
| Parameter | Value | Description |
|---|---|---|
| Batch size | 1 (micro-batch) | Per-device training batch size |
| Gradient accumulation | 16 steps | Effective batch size of 16 |
| Learning rate | 2e-4 | AdamW learning rate |
| Epochs | 3 | Full passes over the training set |
| Optimizer | paged_adamw_8bit | Memory-efficient 8-bit AdamW |
| Precision | FP16 | Half-precision training |
| Gradient checkpointing | Enabled | Reduces memory at the cost of compute |
Weight Fusion¶
After fine-tuning, fuse.py merges the LoRA adapter weights permanently into the base model. It loads the base model in FP16 on CPU, applies merge_and_unload(), and saves the result with SafeTensors serialization to orion_merged/.
GGUF Quantization¶
The merged FP16 model is converted to GGUF format and quantized to Q4_K_M using llama.cpp tools. The multimodal projector (mmproj) is extracted separately and kept at FP16 precision. The two output files (orion-q4_k_m.gguf and orion-mmproj-f16.gguf) are deployed to the Pi. Pre-trained models are available on Hugging Face. For artifact sizes at each stage, see Compute Budgets.
Dependencies¶
The training pipeline requires the following Python packages (installed via ground_segment/pyproject.toml):
torch,torchvision: PyTorch frameworktransformers: Hugging Face model loading and trainingpeft: Parameter-Efficient Fine-Tuning (LoRA/QLoRA)datasets: JSONL dataset loadingbitsandbytes: 4-bit quantization supportaccelerate: Hardware-agnostic model loadinggguf: GGUF format conversion utilities
Validation and Ablation Studies¶
Full per-condition accuracy numbers, per-class precision/recall/F1, fine-tuning delta table, and raw inference logs are in the Model Card.
Both evaluate.py (fine-tuned model) and ablation.py (base model) evaluate the model under four conditions:
| Condition | Image Input | Prompt | Purpose |
|---|---|---|---|
| A: Full System | Real satellite image | Includes coordinates | Baseline performance |
| B: Vision Only | Real satellite image | Coordinates stripped | Measures visual reasoning without GPS hints |
| C: Blind LLM | Gaussian noise (512x512) | Includes coordinates | Tests coordinate memorization (no vision) |
| D: Sensor Conflict | Real satellite image | Mismatched coordinates | Tests whether model trusts vision or coordinates |
Condition D: Mismatch Logic¶
The script deliberately feeds coordinates from the opposite category:
- HIGH images receive LOW coordinates
- LOW images receive HIGH coordinates
- MEDIUM images receive HIGH coordinates
This stress-tests the model's ability to reason from visual evidence when coordinate telemetry is misleading.
Condition C: Gaussian Noise¶
The noise image is a deterministic 512x512 random RGB array seeded with np.random.seed(42). Using Gaussian noise rather than a blank image prevents the model from defaulting to "ocean" for featureless inputs.
Metrics¶
For conditions A, B, and C: per-class recall and precision for HIGH, MEDIUM, and LOW, plus overall accuracy. For condition D: ratio of visual-trust (correct) versus coordinate-trust (failure).
Post-quantization evaluation (Q4_K_M GGUF)¶
The same 4-condition protocol can be run against the quantized GGUF model via llama.cpp's HTTP server (evaluate.py --quantized-model). This isolates accuracy loss from quantization.
| Condition | Fine-tuned (FP16) | Q4_K_M GGUF | Δ (quantization) |
|---|---|---|---|
| A: Vision + GPS coords | 58.3% | 55.0% | −3.3 pp |
| B: Vision only (no coords) | 65.0% | 63.3% | −1.7 pp |
| C: Blind LLM (noise+coords) | 43.3% | 28.3% | −15.0 pp |
Sensor conflict (Condition D): coordinate-trust failure improves slightly from 16.7% (FP16) to 15.0% (Q4_K_M). Quantization does not degrade GPS robustness.
Accuracy loss on operational conditions (A: −3.3 pp, B: −1.7 pp) is modest, confirming that Q4_K_M quantization retains most of the fine-tuned model's capability.
The large Condition C drop (−15.0 pp) is expected and benign as it tests coordinate memorization using noise images, which never occurs in deployment. Crucially, GPS robustness (Condition D) slightly improves after quantization, meaning the deployed model is not more susceptible to spoofed telemetry.
Full per-class logs are in the Model Card.
For step-by-step instructions, see the guides for training, quantization, and validation/ablation studies.
Data and Weight Transfer Scripts¶
Two shell scripts handle moving data and weights between the local machine and the remote training server:
ground_segment/data/upload_to_server.sh: compresses the local dataset, uploads it viarsync, and clones/pulls the ORION repo on the server. Run this before training.ground_segment/training/download_weights.sh: pullsorion_lora_weights/from the server after training completes and deletes the server's repo and dataset (scorched earth).
See Utility Scripts for invocation details and Ground Segment Environment Variables for the required env vars.