Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher

Overview

Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher

arXiv GitHub Stars

Official implementation of Learn2Sing 2.0. For all details check out our paper submitted to Interspeech 2022 via this link.

Authors: Heyang Xue, Xinsheng Wang, Yongmao Zhang, Lei Xie, Pengcheng Zhu, Mengxiao Bi.

Abstract

Demo page : link.

Building a high-quality singing corpus for a person who is not good at singing is non-trivial, thus making it challenging to create a singing voice synthesizer for this person. Learn2Sing is dedicated to synthesizing the singing voice of a speaker without his or her singing data by learning from data recorded by others, i.e., the singing teacher. Inspired by the fact that pitch is the key style factor to distinguish singing from speaking voice, the proposed Learn2Sing 2.0 first generates the preliminary acoustic feature with averaged pitch value in the phone level, which allows the training of this process for different styles, i.e., speaking or singing, share same conditions except for the speaker information. Then, conditioned on the specific style, a diffusion decoder, which is accelerated by a fast sampling algorithm during the inference stage, is adopted to gradually restore the final acoustic feature. During the training, to avoid the information confusion of the speaker embedding and the style embedding, mutual information is employed to restrain the learning of speaker embedding and style embedding. Experiments show that the proposed approach is capable of synthesizing high-quality singing voice for the target speaker without singing data with 10 decoding steps.

Training and inference:

  • Before you can use this implementation, you need to modify the following:
  1. Replace the phoneset and pitchset in text/symbols.py with your own set

  2. Provide the path to the data in config.json, the testdata folder contains example files to demonstrate the format

  • Training

      bash run.sh
    
  • Inference

      bash syn.sh outputs target_speaker_id 0 decoding_steps cuda True
    

Acknowledgements:

  • The diffusion decoder is adapted from GradTTS;
  • Estimation of mutual information is modified from VQMIVC;
  • Vadim Popov performed a code review of the fast sampling algorithm part.
You might also like...

DownTube is a free to use - Content downloader service that works upon YouTube based open source APIs. It is developed and handled by M30.

DownTube is a free to use - Content downloader service that works upon YouTube based open source APIs. It is developed and handled by M30.

🚀 DownTube 3.1.7 DownTube is a free to use - Content downloader service that works upon YouTube based open source APIs. It is developed and handled b

Sep 5, 2022

⚡ Discord bot with economy, gambling, music, fun, moderation features based on discord.js v12

⚡ Discord bot with economy, gambling, music, fun, moderation features based on discord.js v12

Crucian Crucian is my discord bot with simple structure based on discord.js Click Here to invite Crucian to your server Author Crucian © Apoo Authored

Jul 26, 2021

nganu bot, multi device based simple whatsapp-bot with social media downloader

nganu bot, multi device based simple whatsapp-bot with social media downloader

nganu A Simple Multi-Device WhatsApp Bot simple whatsapp-bot using baileys-md to download social media post and many features Install set instagram se

Nov 20, 2022

XPeer is a WebRTC based package for building Muti-Peer-to-Peer Frontend Application.

XPeer is a WebRTC based package for building Muti-Peer-to-Peer Frontend Application.

Jun 22, 2022

Remote Keyboard Tutoring System is a web-based system that can be attached to any keyboard synthesizer through a MIDI connector.

Remote Keyboard Tutoring System is a web-based system that can be attached to any keyboard synthesizer through a MIDI connector.

The Remote Keyboard Tutoring System is a web-based system that can be attached to any (electronic) keyboard synthesizer through a MIDI connector. Once our system is connected to the keyboard, the user can interactively learn, play or teach in combination with the web application that we provide.

Nov 15, 2022

A unofficial discord.js fork for creating selfbots [Based on discord.js v13]

About discord.js-selfbot-v13 is a Node.js module that allows user accounts to interact with the Discord API v9. I don't take any responsibility for bl

Jan 4, 2023

A Fully Oriented Whatsapp Bot Based on Chitoge Don't forget to give a star to the repo before fork

A Fully Oriented Whatsapp Bot Based on Chitoge Don't forget to give a star to the repo before fork

NEZUKO : ANIME THEMED WHATSAPP BOT WITH RICH FEATURES A Fully Modular and Efficient Bot Button : If you are deploying normally Button : If you are dep

Dec 3, 2022

A Advance Music Bot Based on erela.js

A Advance Music Bot Based on erela.js

WOLF MUSIC WOLF MUSIC is an advance music bot . Report Bug · Request Feature 🎭 Features Music 24/7 Dj Custom Playlist (global) SlashCommand Custom pr

May 22, 2022

TypeScript bot for auto-deleting of Telegram premium stickers with some interesting features based on the grammY library

anti-premium-stickers-bot TypeScript bot for auto-deleting of Telegram premium stickers with some interesting features based on the grammY library Thi

Dec 15, 2022
Comments
  • 请问训练大概需要多长时间?

    请问训练大概需要多长时间?

    你好 首先感谢你的分享!

    我处理了自己手头的数据,已经跑起来训练了,我的配置是: "memory_efficient_training": false, batch_size=4 sample_rate=24000 与hifigan对接mels,不使用f0. 其他均为默认配置

    目前合可以合成声音了,只是效果还不行。 请问这种情况的话,对应learn2sing模型,大概需要训练多久, 到M_X.pth能达到一个不错的效果?

    opened by Liujingxiu23 12
  • regarding english dateset

    regarding english dateset

    hi great work done here !!! I wanted to know if this repo is going to work on English speaking dataset ?? and whether Are there English examples for reference to know the quality ?? and if yes it is going to work on English dataset what exactly should i do ,? like in the "Replace the phoneset and pitchset in text/symbols.py with your own set" what would be the case here if using English ? Also "Provide the path to the data in config.json" is clear but what what would be the format ??

    Thanks in advance!

    opened by dutchsing009 1
Owner
HeyangXue1997
Speech synthesis/Singing voice synthesis/machine learning @ aslp, nwpu, Xi'an, ShannXi, China
HeyangXue1997
Cyrus is a Discord Bot with focus on Fun, Moderation, information and much more commands! Made it with Discord.js

Cyrus Cyrus is a Discord Bot with focus on Fun, Moderation, information and much more commands! Made it with Discord.js Invite : Click here Vote : Top

null 4 Dec 3, 2022
A simple & easy2use API for obtaining information about a discord user, discord bot or discord guild and their use for some purpose on websites!

discord-web-api A simple & easy2use API for obtaining information about a discord user, discord bot or discord guild and their use for some purpose on

InvalidLenni 2 Jun 28, 2022
Front-end capstone project that takes information from an API and displays it in an e-commerce format.

FEC-Atelier Overview Front-end capstone project that takes information from an API and displays it in an e-commerce format. The application is compose

null 6 Aug 24, 2022
A collection of lots of information about MooMoo.io, how it works, what happens and why it happens.

MooMoo in Depth A collection of lots of information about MooMoo.io, how it works, what happens and why it happens. This is a project that aims to exp

Nuro 15 Dec 15, 2022
A Discord bot to display information for an IL-2 Sturmovik server.

IL-2 Sturmovik Mission Info Discord Bot A Discord bot to display information for an IL-2 Sturmovik server. Setup Create a Discord bot Navigate to the

Tim Murphy 1 Dec 20, 2021
Development of a landing page where the user can see information about my learnings in the first month at Trybe Course.

Project: Lessons Learned Lessons Learned was the first project developed by me while studying in Trybe. Technologies and tools used HTML CSS Project o

Ádran Farias Carnavale 1 Feb 12, 2022
Display your Discord information - neofetch style!

discord-neofetch View your Discord information, neofetch style! Add to your server Demo Here's a demo of the bot FAQs Hmm. What's this? This is the Ne

Skyascii 36 Dec 11, 2022
A simple and easy-to-use WhatsApp bot project based on Multi-Device Baileys and written in JavaScript

MIZUHARA ANIME THEMED FULL FLEDGED MULTI DEVICE WHATSAPP BOT WITH COOL FEATURES A Full Fledged MD Bot For Bot Lovers REQUIREMENTS • HOW TO INSTALL? •

Arus~Bots 18 Oct 25, 2022
This is a simple web based media player for playing video and audio. Build with pure HTML, CSS and Javascript. No framework or library included.

Aim-Player This is a simple web based media player for playing video and audio. Build with pure HTML, CSS and Javascript. No framework or library incl

Aim Mikel 2 Jun 27, 2021
Secretly record audio and video with chromium based browsers

snoop TCC restricts access to the device camera and microphone to protect user data from unauthorized access. But... If you trusted your browser with

BreakPoint Technologies 10 Aug 30, 2022