sconce Logo
0.0.2
  • Introduction
  • Usage
  • Workflow
  • Model Compression Techniques
  • Tutorials
  • Efficient-Hyperparameter Auto Sensitivity Seach
  • Features
  • Reference
    • Pruning
    • Quantization
  • ToDo
sconce
  • Reference
  • View page source

Reference

Pruning

  • Learning both Weights and Connections for Efficient Neural Networks

  • Pruning Filters for Efficient ConvNets

  • Pruning Convolutional Neural Networks for Resource Efficient Inference

  • Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

  • SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

  • A Simple and Effective Pruning Approach for Large Language Models

Quantization

  • Quantizing deep convolutional networks for efficient inference: A whitepaper

  • Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

  • AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

  • Quantization Mimic: Towards Very Tiny CNN for Object Detection

Previous Next

© Copyright 2023, Sathyaprakash.

Built with Sphinx using a theme provided by Read the Docs.