DiffPhase: Generative Diffusion-based STFT Phase Retrieval
This page accompanies the DiffPhase paper [1], presnted at ICASSP 2023, with examples of speech reconstructed from known (clean) STFT magnitudes, without prior knowledge of the phase spectrum. The following eight utterances were randomly selected from the WSJ0 test set. For each example, we provide the reconstructed signals generated by:
- Two variants of the proposed DiffPhase approach [1] (DiffPhase and DiffPhase-small), after 15 and 30 reverse diffusion steps
- Two variants of DeGLI (Deep Griffin-Lim Iteration [2]): The original model and a larger model, both at 15 and 30 iterations
- The original Griffin-Lim algorithm (GLA) [3] at 50 and 200 iterations
Audio examples: Phase retrieval with known clean magnitudes
Example 1 | Example 2 | Example 3 | Example 4 | Example 5 | Example 6 | Example 7 | Example 8 | |
---|---|---|---|---|---|---|---|---|
Reference | ||||||||
Zero Phase | ||||||||
DiffPhase, N=15 | ||||||||
DiffPhase, N=30 | ||||||||
DiffPhase-small N=15 | ||||||||
DiffPhase-small N=30 | ||||||||
DeGLI, N=15 | ||||||||
DeGLI, N=30 | ||||||||
DeGLI-large, N=15 | ||||||||
DeGLI-large, N=30 | ||||||||
GLA, N=50 | ||||||||
GLA, N=200 |
References
[1] Tal Peer, Simon Welker, Timo Gerkmann, "DiffPhase: Generative Diffusion-based STFT Phase Retrieval", IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Rhodes Island, Greece, Jun 2023.
[2] Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, “Deep Griffin–Lim Iteration: Trainable Iterative Phase Reconstruction Using Neural Network,” IEEE J. Sel. Top. Signal Process., vol. 15, no. 1, pp. 37–50, Jan. 2021.
[3] D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, Apr. 1984.