Reference ========== Pruning -------- * `Learning both Weights and Connections for Efficient Neural Networks `_ * `Pruning Filters for Efficient ConvNets `_ * `Pruning Convolutional Neural Networks for Resource Efficient Inference `_ * `Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning `_ * `SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot `_ * `A Simple and Effective Pruning Approach for Large Language Models `_ Quantization ------------- * `Quantizing deep convolutional networks for efficient inference: A whitepaper `_ * `Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference `_ * `AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration `_ * `Quantization Mimic: Towards Very Tiny CNN for Object Detection `_ .. Distillation .. -------------