Ground Segment Budgets
Resource requirements for training, quantization, and dataset generation.
Training Hardware
fine_tune.py is CUDA only. The training stack uses bitsandbytes (4-bit NF4 quantization + paged_adamw_8bit optimizer), which does not support Apple Silicon, AMD GPUs, or CPU-only execution.
| Resource |
Requirement |
Notes |
| GPU |
NVIDIA CUDA-capable, 8+ GB VRAM |
QLoRA loads base model in 4-bit (NF4) |
| CUDA toolkit |
12.x |
Matched to the GPU driver |
| System RAM |
8+ GB recommended |
For data loading and preprocessing |
| Training time per epoch |
830s |
3 epochs total, ~480 training samples |
| Total training time |
~2492s |
Depends on GPU |
Reference hardware (verified working):
| Component |
Spec |
| GPU |
NVIDIA GeForce RTX 4070 Ti, 12 GB VRAM |
| Driver |
535.261.03 |
| CUDA |
12.2 |
| OS |
Linux |
Model Artifact Sizes
| Stage |
File |
Size |
Notes |
| Base model |
LiquidAI/LFM2.5-VL-1.6B |
~3.2 GB |
Downloaded from Hugging Face |
| LoRA adapter weights |
orion_lora_weights/ |
~50 MB |
r=16, 4 target modules |
| Merged FP16 model |
orion_merged/ |
~3.2 GB |
Full standalone checkpoint |
| FP16 GGUF |
orion-f16.gguf |
~3.2 GB |
Intermediate conversion |
| Q4_K_M GGUF |
orion-q4_k_m.gguf |
~730 MB |
Deployed to Pi |
| Vision projector |
orion-mmproj-f16.gguf |
~814 MB |
FP16, deployed to Pi |
Quantization Compute
| Step |
RAM required |
Time |
Notes |
| HF to GGUF conversion |
~8 GB |
17.5s |
Full FP16 model loaded into RAM |
| mmproj extraction |
~4 GB |
14s |
Vision encoder only |
| Q4_K_M quantization |
~4 GB |
18.6s |
Reads FP16 GGUF, writes Q4 |
| Total disk (all stages) |
~11 GB |
|
Base + merged + F16 GGUF + Q4 GGUF + mmproj |
Weight Fusion Compute
| Resource |
Requirement |
Notes |
| RAM |
~8 GB |
Full FP16 model loaded on CPU (no GPU required) |
| Time |
98.21s |
merge_and_unload() + SafeTensors save |
| Disk |
~3.2 GB output |
Merged model saved to orion_merged/ |
Validation / Ablation Studies
ablation.py (base model) and evaluate.py (fine-tuned) are device-agnostic; hence, they run on CUDA, MPS (Apple Silicon), or CPU at FP16. CPU-only inference is functional but will be 50-100x slower than GPU.
| Resource |
Requirement |
Notes |
| GPU VRAM |
~4 GB (FP16, no quantization in eval) |
Or run on CPU/MPS without VRAM budget |
| Test samples |
60 (deterministic IID carve from 360 pool) |
4 conditions x 60 = 240 inferences total |
| Val samples |
60 (deterministic IID carve from 360 pool) |
Same shape as test, used during training |
Dataset
| Item |
Size |
Notes |
| Target images |
~36 MB |
360 PNG images at ~100 KB each |
| train_dataset.jsonl |
~1 MB |
~480 records (240 targets x 2 with coordinate dropout) |
| val_dataset.jsonl |
~200 KB |
60 records |
| test_dataset.jsonl |
~200 KB |
60 records |
| Total dataset |
~37 MB |
Images + JSONL |
| Generation time |
~3 min |
360 images from SimSat at ~2 req/s |
Data Transfer (Remote Server)
| Operation |
Data transferred |
Method |
| Dataset upload |
~31 MB (compressed) |
upload_to_server.sh via rsync |
| LoRA weights download |
~50 MB |
download_weights.sh via rsync |