PyTorch Lightning separates research code from engineering boilerplate. Automatic distributed training. Built-in logging and checkpointing. Hardware-agnostic code.
Module Structure
LightningModule organizes model code. Define training_step for forward pass and loss. Configure optimizers in configure_optimizers. Trainer handles training loop.
- Define models as LightningModule subclasses
- Implement training_step for training logic
- Use Trainer for training configuration
- Leverage callbacks for custom behavior
- Log metrics automatically
Scaling
Automatic multi-GPU and multi-node training. Mixed precision with single flag. Gradient accumulation for large batches. Profile performance easily.