THE EXPRESSIVE POWER OF LOW-RANK ADAPTATION

23.04.25

Methodology Used

The paper theoretically analyzes the expressive power of Low-Rank Adaptation (LoRA) by investigating its ability to adapt a frozen pre-trained model (f0​) to accurately represent a target model (f​). The core methodology involves:

  1. Linear Model Analysis: Starting with a simplified scenario where both the frozen and target models are linear. This serves as a warm-up to understand the minimal achievable approximation error and the smallest LoRA-rank required for exact representation under non-singularity assumptions. The proof for linear models involves decomposing the difference between the adapted and target models and relating it to the best low-rank approximation of the error matrix.

  2. Fully Connected Neural Network (FNN) Analysis: Extending the analysis to ReLU FNNs. This involves addressing the non-linearity introduced by ReLU activation functions and biases. The techniques used include:

    • Linearization: Eliminating non-linearity in initial layers by using sufficiently large biases to ensure ReLUs are always activated.

    • Weight Matrix Alignment: Using the results from the linear case to align the weight matrices and bias vectors of the adapted model with the target model.

    • Model Partition: For multi-layer FNNs, partitioning the layers of the adapted model to approximate corresponding layers or groups of layers in the target model. Both uniform and general partitions are considered.

  3. Transformer Network (TFN) Analysis: Extending the analysis to TFNs, which introduces non-linearities from softmax and ReLU. The approach involves segmenting sequences of transformer blocks based on these non-linearities and matching intermediate outputs. The analysis focuses on adding LoRA adapters primarily to self-attention layers.

  4. Empirical Validation: Conducting experiments on both synthetic (linear models, FNNs, TFNs) and real datasets (GLUE benchmark) to validate the theoretical findings regarding the relationship between approximation error, LoRA-rank, model depth, and the distance between frozen and target models. The performance of the theoretically constructed LoRA adapters is compared with those optimized via gradient descent.

The theoretical proofs often rely on constructing specific LoRA adapters that achieve the desired approximation or exact representation under certain non-singularity assumptions about the weight matrices.

New Things Introduced / Novelty

The paper presents the first theoretical analysis of the expressive power of Low-Rank Adaptation (LoRA). Key novel contributions include:

Key Takeaways and Results

Comparison with State of the Art (SOTA) and How Better It Is

The paper compares LoRA primarily with:

The paper is better than previous theoretical work by being the first to specifically analyze LoRA's expressive power for widely used architectures like FNNs and TFNs, going beyond simpler linear models or focusing on other adaptation techniques.

Drawbacks That Are Discussed in the Paper

Improvements That Can Be Made

Based on the drawbacks and future work discussed in the paper: