Open Access System for Information Sharing

Login Library

 

Conference
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads
Full metadata record
Files in This Item:
There are no files associated with this item.
DC FieldValueLanguage
dc.contributor.authorSeongyeon Park-
dc.contributor.authorBohyung Kim-
dc.contributor.authorOh, Tae-Hyun-
dc.date.accessioned2023-04-27T04:50:39Z-
dc.date.available2023-04-27T04:50:39Z-
dc.date.created2023-04-27-
dc.date.issued2023-08-20-
dc.identifier.urihttps://oasis.postech.ac.kr/handle/2014.oak/117585-
dc.description.abstractRecently, zero-shot TTS and VC methods have gained attention due to their practicality of being able to generate voices even unseen during training. Among these methods, zero-shot modifications of the VITS model have shown superior performance, while having useful properties inherited from VITS. However, the performance of VITS and VITS-based zero-shot models vary dramatically depending on how the losses are balanced. This can be problematic, as it requires a burdensome procedure of tuning loss balance hyper-parameters to find the optimal balance. In this work, we propose a novel framework that finds this optimum without search, by inducing the decoder of VITS-based models to its full reconstruction ability. With our framework, we show superior performance compared to baselines in zero-shot TTS and VC, achieving state-of-the-art performance. Furthermore, we show the robustness of our framework in various settings. We provide an explanation for the results in the discussion.-
dc.languageEnglish-
dc.publisherINTERSPEECH-
dc.relation.isPartOfINTERSPEECH Conference-
dc.relation.isPartOfProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.titleAutomatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis-
dc.typeConference-
dc.type.rimsCONF-
dc.identifier.bibliographicCitationINTERSPEECH Conference-
dc.citation.conferenceDate2023-08-20-
dc.citation.conferencePlaceIE-
dc.citation.titleINTERSPEECH Conference-
dc.contributor.affiliatedAuthorOh, Tae-Hyun-
dc.identifier.scopusid2-s2.0-85171567016-
dc.description.journalClass1-
dc.description.journalClass1-

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse