Thursday, May 23, 2019
Modern biomedical science is defined by noisy high-dimensional data, whether from microscopes (electron, light-sheet, confocal), sequencing (RNA-seq, ATAC-seq, Hi-C), or sensors (physiology, EEG). We present a general framework for denoising high-dimensional measurements which can be applied to any of these domains, and which requires no prior on the signal, no estimate of the noise, and no clean training data. The only assumption is that the noise exhibits statistical independence across different dimensions of the measurement, while the true signal exhibits some correlation. For a broad class of functions ("J-invariant"), it is then possible to estimate the performance of a denoiser from noisy data alone. This allows us to calibrate J-invariant versions of any parameterised denoising algorithm, from the single hyperparameter of a median filter to the millions of weights of a deep neural network. We demonstrate this on natural image and microscopy data, where we exploit noise independence between pixels, and on single-cell gene expression data, where we exploit independence between detections of individual molecules. This framework generalizes recent work on training neural nets from noisy images and on cross-validation for matrix factorization.