This is an annotated Python implementation of Oblivious Sketching for Logistic Regression from ICML 2021. The paper is also available on arXiv.
The proposed data oblivious method computes data sketch in input-sparsity time, and reduces dataset of size \(n \times d\) to a sketch consisting of \(\mathrm{poly}(\mu d \log n)\) weighted points (vectors) where
- \(n\) is the number observations in the dataset,
- \(d\) is the dimension of the dataset,
- and \(\mu\) is the complexity of compressing the dataset.
Code is adapted from the implementation from the original authors, and styling labml.ai.
The structure of the implementation is as follows:
- Class
Sketch
is used for generating sketch blocks (unweighted sketch matrix \(A''\) and weights \(w'\) in Proof of Theorem 1 from the paper). - Function
get_reduced_matrix_and_weights
is responsible for generating the complete sketch (both sketch blocks \(A''\) and uniform sampling block \(T\)). The function createsSketch
object in its intermediary step, and outputs unweighted sketch matrix \(B\) and corresponding weights \(w\), among other results. - Function
optimize
takes results fromget_reduced_matrix_and_weights
, and solves for the logistic regression coefficients by leveraging Ky Fan argument.