🎓 Education

BSc in Automotive Engineering

Hanyang University (Mar., 2011 - Aug., 2016)

Ph.D in Digital Contents & Information Studies

Seoul National University (Sep., 2016 - Feb., 2022)

Thesis Award

Title: “A Controllable Generation of Signals from Self-Supervised Representations”

🎊 Experience

Machine Learning Engineer @ ElevenLabs

2024 Mar. -

Co-founder & Lead of Research Team @ Supertone (Acquired by Hybe)

2020 Mar. - 2024 Feb.

AI Research Scientist @ Naver CLOVA team.

2018 Summer

📖 Publications

First author:

(May 2023, ICLR)

NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

(December 2021, NeurIPS)

Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

(June 2021, ICASSP)

Real-time Denoising and Dereverberation with Tiny Recurrent U-Net

(May 2020, Arxiv)

Phase-aware Single-stage Denoising and Dereverberation with U-Net

(April 2020, ICLR)

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

(November 2019, ISMIR)

Audio-query Based Source Separation

(April 2019, Arxiv)

Phase-aware Speech Enhancement with Deep Complex U-Net

(December 2017, Neurips ML4Audio Workshop (Oral presentation))

Singing Voice Separation using Generative Adversarial Networks

Second author:

(2022, Transactions on Audio, Speech, and Language Processing) **

Differentiable Artificial Reverberation

(November 2020, ISMIR)

Exploring Aligned Lyrics-informed Singing Voice Separation

(May 2020, ICASSP)

Disentangling Timbre and Singing Style with Multi-Singer Singing Synthesis System

(September 2019, Interspeech (Best paper award))

Adversarially Trained End-to-end Korean Singing Voice Synthesis System

Corresponding author**:**

(October 2023, WASPAA)

AECSQI: Referenceless Acoustic Echo Cancellation Measures using Speech Quality and Intelligibility Improvement

Yet Another Generative Model for Room Impulse Response Estimation

(June 2023, ICASSP)

Towards Trustworthy Phoneme Boundary Detection With Autoregressive Model and Improved Evaluation Metric

Others**:**

(2023, IEEE International Solid-State Circuits Conference (ISSCC))

A 0.81mm2 740µW Real-Time Speech Enhancement Processor Using Multiplier-Less PE Arrays for Hearing Aids in 28nm CMOS

(June 2021, ICASSP)

Room Adaptive Conditioning Method for Sound Event Classification in Reverberant Environments

🏆 Awards

CES2022, Innovation Awards Honoree: Software & Mobile Apps

CES2022

I developed a controllable voice conversion technology that can convert arbitrary speech and singing voice into any target voice in real-time.

ICASSP2021, Deep Noise Suppression Challenge, 3rd Prize

ICASSP2021, Microsoft

I developed a speech enhancement algorithm as a lead of the team SNU/Supertone. I won 3rd prize in the ICASSP2021 Deep Noise Suppression (DNS) challenge out of 16 participants from diverse universities and companies using about 10 times fewer parameters than the first and second winners.

Interspeech 2019, Best student paper award

Interspeech 2019

We won the best student paper at Interspeech 2019 by proposing a neural singing voice synthesis system.

Digital Health Hackathon 2018, First Prize

Samsung Advanced Institute for Health Sciences & Technology, Nov 2018

We won the first prize in the hackathon by developing a deep learning-based hearing-aid framework and by suggesting a future concept of digital hearing-aid.

📝 Academic Service

Conference Session Chair

Interspeech: 2022

Reviewer

NeurIPS: 2020, 2021 (outstanding reviewer), 2022, 2023

ICML: 2021, 2022

ICLR: 2020, 2021, 2022

ISMIR: 2021, 2022

Interspeech: 2022, 2023

JSTSP: 2018

IEEE Signal Processing Letter: 2021

Volunteer

ICASSP: 2018

👨🏻‍🏫 Lectures

International Conference Tutorial: Designing Controllable Synthesis System for Musical Signals @ ISMIR2022

T3(M): Designing Controllable Synthesis System for Musical Signals

Controllalble Synthesis of Speech and Singing Voice with Neural Networks @ Korea Advanced Institute of Science and Technology (KAIST, GCT634, Musical Applications of Machine Learning), Nov. 29, 2021

Deep Learning Based Speech Enhancement @ Korea Institute for Advanced Study (KIAS), Dec. 12, 2019

Conditional Generative Model for Audio @ Soundly (Acoustic Speech Audio Language Processing Community in Korea), Nov. 30, 2019

ED2139C0-0BC7-47C1-B169-D8A13F19DAA7.jpeg

Contact

📨 Email: [email protected]

🔗 Linkedin: https://kr.linkedin.com/in/형석-최-79533010a/en

Research Interests

Controllable & interpretable generative models for speech & singing synthesis

Self-supervised representation learning methods for speech

Multi-modal representation learning methods

Universal source separation

Speech enhancement

Conversational text-to-speech with natural prosody