sconce

latest

Introduction
Usage
Workflow
Model Compression Techniques
Tutorials
Efficient-Hyperparameter Auto Sensitivity Seach
Features
Reference
- Pruning
- Quantization
ToDo

sconce

Reference
Edit on GitHub

Reference

Pruning

Learning both Weights and Connections for Efficient Neural Networks
Pruning Filters for Efficient ConvNets
Pruning Convolutional Neural Networks for Resource Efficient Inference
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
A Simple and Effective Pruning Approach for Large Language Models

Quantization

Quantizing deep convolutional networks for efficient inference: A whitepaper
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Quantization Mimic: Towards Very Tiny CNN for Object Detection

Previous Next

© Copyright 2023, Sathyaprakash. Revision 545264df.

Built with Sphinx using a theme provided by Read the Docs.

Read the Docs v: latest

Versions: latest; stable

Downloads

On Read the Docs: Project Home; Builds