Features

class sconce.sconce

Bases: object

CWP_Pruning()

Applies channel pruning to the model using the specified channel pruning ratio. Returns the pruned model.

GMP_Pruning(model=None, prune_dict=None)

Applies Group-wise Magnitude Pruning (GMP) to the model’s convolutional and fully-connected weights. The pruning is performed based on the sparsity levels specified in the sparsity_dict attribute. The pruned weights are stored in the masks attribute.

GMP_apply()

Applies the Group Masking Procedure (GMP) to the model’s parameters.

This function iterates over the model’s named parameters and applies the corresponding mask if it exists in the masks dictionary. The mask is applied by element-wise multiplication with the parameter tensor.

Args:

self (object): The sconce object.

Returns:

None

apply_channel_sorting()

Applies channel sorting to the model’s convolutional and batch normalization layers. Returns a copy of the model with sorted channels.

Returns: model (torch.nn.Module): A copy of the model with sorted channels.

channel_prune(model: Module, prune_ratio: List | float) Module

Apply channel pruning to each of the conv layer in the backbone Note that for prune_ratio, we can either provide a floating-point number, indicating that we use a uniform pruning rate for all layers, or a list of numbers to indicate per-layer pruning rate.

channel_prune_layerwise(model: Module, prune_ratio: List | float, i_layer) Module

Apply channel pruning to each of the conv layer in the backbone Note that for prune_ratio, we can either provide a floating-point number, indicating that we use a uniform pruning rate for all layers, or a list of numbers to indicate per-layer pruning rate.

compare_models(model_list, model_tags=None)

Compares the performance of two PyTorch models: an original dense model and a pruned and fine-tuned model. Prints a table of metrics including latency, MACs, and model size for both models and their reduction ratios.

Args: - original_dense_model: a PyTorch model object representing the original dense model - pruned_fine_tuned_model: a PyTorch model object representing the pruned and fine-tuned model

Returns: None

compress(verbose=True) None

Compresses the neural network model using either Granular-Magnitude Pruning (GMP) or Channel-Wise Pruning (CWP). If GMP is used, the sensitivity of each layer is first scanned and then the Fine-Grained Pruning is applied. If CWP is used, the Channel-Wise Pruning is applied directly. After pruning, the model is fine-tuned using Stochastic Gradient Descent (SGD) optimizer with Cosine Annealing Learning Rate Scheduler. The original dense model and the pruned fine-tuned model are saved in separate files. Finally, the validation accuracy and the size of the pruned model are printed.

Args:

verbose (bool): If True, prints the validation accuracy and the size of the pruned model. Default is True.

Returns:

None

evaluate(model=None, device=None, Tqdm=True, verbose=False)

Evaluates the model on the test dataset and returns the accuracy.

Args:

verbose (bool): If True, prints the test accuracy.

Returns:

float: The test accuracy as a percentage.

evaluate_model(model, test_loader, device, criterion=None)
find_instance(obj, object_of_importance=(<class 'torch.nn.modules.conv.Conv2d'>, <class 'torch.nn.modules.linear.Linear'>), sparsity=None)
fine_grained_prune(tensor: Tensor, sparsity: float) Tensor

Magnitude-based pruning for single tensor

Parameters:
  • tensor – torch.(cuda.)Tensor, weight of conv/fc layer

  • sparsity – float, pruning sparsity sparsity = #zeros / #elements = 1 - #nonzeros / #elements

Returns:

torch.(cuda.)Tensor, mask for zeros

forward_pass_snn(data, mem_out_rec=None)

Perform a forward pass through the spiking neural network (SNN).

Args:

data: Input data for the SNN. mem_out_rec: Optional tensor to record the membrane potential at each time step.

Returns:

If mem_out_rec is not None, returns a tuple containing the spike outputs and membrane potentials as tensors. Otherwise, returns only the spike outputs as a tensor.

get_input_channel_importance(weight)

Computes the importance of each input channel in a weight tensor.

Args:

weight (torch.Tensor): The weight tensor to compute channel importance for.

Returns:

torch.Tensor: A tensor containing the importance of each input channel.

get_model_macs(model, inputs) int

Calculates the number of multiply-accumulate operations (MACs) required to run the given model with the given inputs.

Args:

model: The model to profile. inputs: The inputs to the model.

Returns:

The number of MACs required to run the model with the given inputs.

get_model_size(model: Module, data_width=32, count_nonzero_only=False) int

calculate the model size in bits :param data_width: #bits per element :param count_nonzero_only: only count nonzero weights

get_model_size_weights(mdl)

Calculates the size of the model’s weights in megabytes.

Args:

mdl (torch.nn.Module): The model whose weights size needs to be calculated.

Returns:

float: The size of the model’s weights in megabytes.

get_model_sparsity(model: Module) float

Calculate the sparsity of the given PyTorch model.

Sparsity is defined as the ratio of the number of zero-valued weights to the total number of weights in the model. This function iterates over all parameters in the model and counts the number of non-zero values and the total number of values.

Args:

model (nn.Module): The PyTorch model to calculate sparsity for.

Returns:

float: The sparsity of the model, defined as 1 - (# non-zero weights / # total weights).

calculate the sparsity of the given model

sparsity = #zeros / #elements = 1 - #nonzeros / #elements

get_num_channels_to_keep(channels: int, prune_ratio: float) int

A function to calculate the number of layers to PRESERVE after pruning Note that preserve_rate = 1. - prune_ratio

get_num_parameters(model: Module, count_nonzero_only=False) int

Calculates the total number of parameters in a given PyTorch model.

Parameters:
  • (nn.Module) (model) – The PyTorch model.

  • optional) (count_nonzero_only (bool,) – If True, only counts the number of non-zero parameters. If False, counts all parameters. Defaults to False.

get_sparsity(tensor: Tensor) float
calculate the sparsity of the given tensor

sparsity = #zeros / #elements = 1 - #nonzeros / #elements

load_torchscript_model(model_filepath, device)
measure_inference_latency(model, device, input_data, num_samples=100, num_warmups=10)
measure_latency(model, dummy_input, n_warmup=20, n_test=100)

Measures the average latency of a given PyTorch model by running it on a dummy input multiple times.

Args:

model (nn.Module): The PyTorch model to measure the latency of. dummy_input (torch.Tensor): A dummy input to the model. n_warmup (int, optional): The number of warmup iterations to run before measuring the latency. Defaults to 20. n_test (int, optional): The number of iterations to run to measure the latency. Defaults to 100.

Returns:

float: The average latency of the model in milliseconds.

plot_weight_distribution(bins=256, count_nonzero_only=False)

Plots the weight distribution of the model’s named parameters.

Args:

bins (int): Number of bins to use in the histogram. Default is 256. count_nonzero_only (bool): If True, only non-zero weights will be plotted. Default is False.

Returns:

None

print_model_size(mdl)
qat()
save_torchscript_model(model, model_dir, model_filename)
sensitivity_scan(dense_model_accuracy, scan_step=0.05, scan_start=0.1, scan_end=1.0, verbose=True)

Scans the sensitivity of the model to weight pruning by gradually increasing the sparsity of each layer’s weights and measuring the resulting accuracy. Returns a dictionary mapping layer names to the sparsity values that resulted in the highest accuracy for each layer.

Parameters:
  • dense_model_accuracy – the accuracy of the original dense model

  • scan_step – the step size for the sparsity scan

  • scan_start – the starting sparsity for the scan

  • scan_end – the ending sparsity for the scan

  • verbose – whether to print progress information during the scan

Returns:

a dictionary mapping layer names to the sparsity values that resulted in the highest accuracy for each layer

train(model=None) None

Trains the model for a specified number of epochs using the specified dataloader and optimizer. If fine-tuning is enabled, the number of epochs is set to num_finetune_epochs. The function also saves the model state after each epoch if the validation accuracy improves.

venum(sparstiy)
venum_CWP_Pruning(original_dense_model, sparsity_dict)

Applies channel pruning to the model using the specified channel pruning ratio. Returns the pruned model.

venum_apply(sparsity_dict)
venum_evaluate(Tqdm=True, verbose=False)

Evaluates the model on the test dataset and returns the accuracy.

Args:

verbose (bool): If True, prints the test accuracy.

Returns:

float: The test accuracy as a percentage.

venum_prune(W, X, s, in_channel=0, kernel_size=0, cnn=False)