Training¶
For design details - QLoRA configuration, LoRA adapter parameters, training hyperparameters, and data collation internals - see the Training Pipeline architecture page.
Hardware Requirements¶
- A GPU with CUDA support is required. The training pipeline uses 4-bit quantization via
bitsandbytesand FP16 math. - Minimum ~8 GB VRAM (the micro-batch size is 1 with gradient accumulation).
- For detailed GPU, RAM, and disk requirements, see Compute Budgets.
Prerequisites¶
Install the project dependencies from the ground segment:
cd ground_segment
pip install -e .
This installs: torch, torchvision, transformers, accelerate, peft, datasets, bitsandbytes, and other required packages.
Ensure you have already generated the dataset (see the Data Generation guide).
Step 1: Fine-Tuning¶
Before running, update the TRAIN_FILE path in fine_tune.py to point to your generated dataset, and update the image path prefix in the VLMDataCollator.__call__ method.
cd ground_segment/training
python fine_tune.py
Training logs are printed every 5 steps. Checkpoints are saved at the end of each epoch. The final LoRA adapter weights and processor are saved to orion_lora_weights/.
Fine-Tuning Output¶
After training completes, the output directory contains:
orion_lora_weights/
adapter_config.json
adapter_model.safetensors
tokenizer_config.json
preprocessor_config.json
...
Step 2: Merging LoRA Weights (Fuse)¶
After training, merge the LoRA adapters into the base model to produce a standalone FP16 model. This is required before quantization to GGUF format.
cd ground_segment/training
python fuse.py
Fuse Output¶
orion_merged/
config.json
model.safetensors
tokenizer_config.json
preprocessor_config.json
...
This merged model is a complete, self-contained Hugging Face model that can be loaded without any LoRA dependencies. It is the input for the quantization step (see the Quantization guide).