🚀 MokingbirdFT v3.0

System Status

Operating System ✓ OK

Windows 11 Pro

Build 22621.963

CUDA Version ✓ OK

12.1

Drivers: 535.104.05

PyTorch Version ✓ OK

2.2.0

CUDA 12.1 support enabled

CPU ✓ OK

AMD Ryzen 9 7950X

16 cores / 32 threads @ 4.5 GHz

GPU ✓ OK

NVIDIA RTX 4090

24 GB VRAM | Compute 8.9

23.8 GB available (99% free)

RAM ✓ OK

64 GB DDR5

5600 MHz

42 GB available (65% free)

Storage ✓ OK

2 TB NVMe SSD

Read: 7000 MB/s | Write: 5000 MB/s

1.2 TB available (60% free)

Python Environment ✓ OK

Python 3.10.12

Virtual environment: mokingbird_env

Recommendations

✅ Excellent VRAM! You can fine-tune 7B-13B models with LoRA or 7B models with full fine-tuning.

✅ NVMe SSD detected. Fast checkpoint saving and data loading.

💡 Suggested: Enable Unsloth for 2x faster training on your RTX 4090.

💡 Consider enabling Flash Attention 2 for memory efficiency.

Dataset Selection

Data Source

Model Selection

Llama 3.1 8B Instruct

8B params Instruct 128K context

Meta's flagship model optimized for instruction following and chat applications.

Min VRAM: 8 GB

Recommended: 16 GB

Architecture: Llama

License: Llama 3.1

Qwen 2.5 7B Instruct

7B params Instruct 32K context

Alibaba's multilingual model with strong performance across languages.

Min VRAM: 8 GB

Recommended: 14 GB

Architecture: Qwen

License: Apache 2.0

Mistral 7B v0.3

7B params Base 32K context

Mistral AI's efficient 7B model with sliding window attention.

Min VRAM: 8 GB

Recommended: 14 GB

Architecture: Mistral

License: Apache 2.0

Phi 3 Mini 4K

3.8B params Instruct 4K context

Microsoft's small but powerful model optimized for efficiency.

Min VRAM: 4 GB

Recommended: 8 GB

Architecture: Phi

License: MIT

Gemma 2 9B Instruct

9B params Instruct 8K context

Google's open model with strong performance and safety features.

Min VRAM: 10 GB

Recommended: 18 GB

Architecture: Gemma

License: Gemma

DeepSeek Coder 6.7B

6.7B params Code 16K context

Specialized code generation model with strong programming capabilities.

Min VRAM: 8 GB

Recommended: 14 GB

Architecture: DeepSeek

License: MIT

Framework Selection

Selection Mode

Recommended Framework

⭐ #1

Auto-selected based on your setup

Unsloth 95% confidence

Speed

2x faster

Memory

-60% VRAM

Compatibility

Excellent

✓ Optimized for your RTX 4090 GPU

✓ Native support for LoRA/QLoRA

✓ Automatic Flash Attention 2

✓ Best for Llama 3.1 architecture

Supervised Fine-Tuning (SFT)

Technique

Technique Configuration

LoRA Rank (r)

LoRA Alpha

Target Modules Hold Ctrl/Cmd to select multiple

LoRA Dropout

Reinforcement Learning (RL)

Technique

🎯 Smart Config Matcher

Configuration Mode

User Preferences

Optimization Goal

Use Case Domain

Time Budget (hours)

Recommended Configurations

⭐ #1

📋 Pre-optimized Template

Llama 3.1 8B - Medical (Quality-optimized) 95% confidence

Technique

LoRA (rank 64)

VRAM Usage

22.5 / 24 GB

Training Time

~3.5 hours

Expected Quality

0.92 accuracy

✓ Perfect fit for medical domain

✓ Tested on 100+ medical datasets

✓ Leaves 1.5GB VRAM margin for safety

✓ Higher rank captures domain-specific nuances

🤖 AI Recommender

AI-Optimized Config (Speed Focus) 87% confidence

Technique

QLoRA (rank 16)

VRAM Usage

18.2 / 24 GB

Training Time

~2.1 hours

Expected Quality

0.88 accuracy

✓ 40% faster than option #1

✓ Good balance of speed vs quality

✓ Larger VRAM margin (5.8GB)

✓ AI predicts stable convergence

⚡ Rule-Based + AI

Memory-Optimized Config 82% confidence

Technique

QLoRA (rank 8)

VRAM Usage

14.5 / 24 GB

Training Time

~2.8 hours

Expected Quality

0.85 accuracy

✓ Minimal VRAM usage (39% free)

✓ Safe for multi-tasking during training

✓ Good baseline for quick experiments

📈

Training Progress

Starting training...

🔴 LIVE

Step 0 / 1,250

Loss

1.1352

2.0e-4

Grad Norm

0.041

Model

Llama 3.1 8B

Method

LoRA

Elapsed: 0m 0s

ETA: --

Speed: 0.00 steps/s

Tokens: 0

Training Configuration

Model

Llama 3.1 8B

Dataset

medical_qa_5k

Method

LoRA (rank 32)

Framework

Unsloth

Estimated Resources

VRAM Usage

22.5 / 24 GB

Training Time

~3.5 hours

Expected Quality

0.92 accuracy

Est. Cost (Cloud)

$2.50

✓ Ready to Train

All prerequisites met. Click "Start Training" to begin fine-tuning your model. Training will take approximately 3.5 hours with estimated quality of 0.92.

Training Loss

Gradient Norm

GPU Monitor

Live

⚡ Utilization

23%

💾 VRAM

66.22 / 179.06 GB

Temperature

68°C

Power Draw

285W

Clock Speed

2.1 GHz

Fan Speed

65%

Post-Processing & Deployment

1️⃣ Merge LoRA Weights

Merge adapter weights into base model for standalone deployment.

2️⃣ Convert to GGUF

Convert to llama.cpp format for CPU/GPU inference.

3️⃣ Deploy to Ollama

Create Modelfile and import to Ollama for easy local serving.

4️⃣ Deploy to vLLM

High-throughput serving with OpenAI-compatible API.

5️⃣ Convert to ONNX

Export to ONNX format for cross-platform deployment.

6️⃣ Push to HuggingFace

Upload model to HuggingFace Hub for sharing.

Ready

Model: None

Dataset: None

v3.0.0 | GPU: RTX 4090 (24GB)

🚀 MokingbirdFT v3.0

System Status

Recommendations

Dataset Selection

Dataset Statistics

Model Selection

Framework Selection

Supervised Fine-Tuning (SFT)

Reinforcement Learning (RL)

🎯 Smart Config Matcher

Training Progress

Training Configuration

Training Loss

Gradient Norm

GPU Monitor

Training Logs

Post-Processing & Deployment

Deployment Status

🚀 MokingbirdFT v3.0

System Status

Recommendations

Dataset Selection

Dataset Statistics

Model Selection

Framework Selection

Supervised Fine-Tuning (SFT)

Reinforcement Learning (RL)

🎯 Smart Config Matcher

Training Progress

Training Configuration

Training Loss

Gradient Norm

GPU Monitor

Training Logs

Post-Processing & Deployment

Deployment Status

Details