428 lines
9.7 KiB
Plaintext
428 lines
9.7 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7bd11665",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Getting Started with Mayavoz\n",
|
|
"\n",
|
|
"#### Contents:\n",
|
|
"- [How to do inference using pretrained model](#inference)\n",
|
|
"- [How to train your custom model](#basictrain)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d3c589bb",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Install Mayavoz"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5b68e053",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"! pip install -q mayavoz "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "87ee497f",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"inference\"></div>\n",
|
|
"\n",
|
|
"### Pretrained Model\n",
|
|
"\n",
|
|
"To start using pretrained model,select any of the available recipes from [here](). \n",
|
|
"For this exercice I am selecting [mayavoz/waveunet]()\n",
|
|
"\n",
|
|
"- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format\n",
|
|
" - audio file path\n",
|
|
" - numpy audio data\n",
|
|
" - torch tensor audio data\n",
|
|
" \n",
|
|
"It auto-detects the input format and does inference for you.\n",
|
|
" \n",
|
|
"At the moment mayavoz only accepts single audio input"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bd514ff4",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Load model**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "67698871",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\n",
|
|
"from mayavoz import Mayamodel\n",
|
|
"model = Mayamodel.from_pretrained(\"mayavoz/waveunet\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c7fd4cbe",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Inference using file path**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d7996c16",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"file = \"myvoice.wav\"\n",
|
|
"audio = model.enhance(\"myvoice.wav\")\n",
|
|
"audio.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8ee20a83",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Inference using torch tensor**\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e1a1c718",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"audio_tensor = torch.rand(1,1,32000) ## random audio data\n",
|
|
"audio = model.enhance(audio_tensor)\n",
|
|
"audio.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2ac27920",
|
|
"metadata": {},
|
|
"source": [
|
|
"- if you want to save the output, just pass `save_output=True`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9e0313f7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"audio = model.enhance(\"myvoice.wav\",save_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "25077720",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from Ipython.audio import Audio\n",
|
|
"\n",
|
|
"Audio(\"myvoice_cleaned.wav\",rate=SAMPLING_RATE)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3170bb0b",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"basictrain\"></div>\n",
|
|
"\n",
|
|
"\n",
|
|
"## Training your own custom Model\n",
|
|
"\n",
|
|
"There are two ways of doing this\n",
|
|
"\n",
|
|
"* [Using mayavoz framework ](#code)\n",
|
|
"* [Using mayavoz command line tool ](#cli)\n",
|
|
"\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a44fc314",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"code\"></div>\n",
|
|
"\n",
|
|
"**Using Mayavoz framwork** [Basic]\n",
|
|
"- Prepapare dataloader\n",
|
|
"- import preferred model\n",
|
|
"- Train"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dbc14b36",
|
|
"metadata": {},
|
|
"source": [
|
|
"Files is dataclass that helps your to organise your train/test file paths"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2c8c2b12",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mayavoz.utils import Files\n",
|
|
"\n",
|
|
"name = \"dataset_name\"\n",
|
|
"root_dir = \"root_directory_of_your_dataset\"\n",
|
|
"files = Files(train_clean=\"train_cleanfiles_foldername\",\n",
|
|
" train_noisy=\"noisy_train_foldername\",\n",
|
|
" test_clean=\"clean_test_foldername\",\n",
|
|
" test_noisy=\"noisy_test_foldername\")\n",
|
|
"duration = 4.0 \n",
|
|
"stride = None\n",
|
|
"sampling_rate = 16000"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "07ef8721",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now there are two types of `matching_function`\n",
|
|
"- `one_to_one` : In this one clean file will only have one corresponding noisy file. For example VCTK datasets\n",
|
|
"- `one_to_many` : In this one clean file will only have one corresponding noisy file. For example DNS dataset."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4b0fdc62",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mapping_function = \"one_to_one\"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ff0cfe60",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mayavoz.dataset import MayaDataset\n",
|
|
"dataset = MayaDataset(\n",
|
|
" name=name,\n",
|
|
" root_dir=root_dir,\n",
|
|
" files=files,\n",
|
|
" duration=duration,\n",
|
|
" stride=stride,\n",
|
|
" sampling_rate=sampling_rate\n",
|
|
" )\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "acfdc655",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mayavoz.models import Demucs\n",
|
|
"model = Demucs(dataset=dataset, loss=\"mae\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "4fabe46d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pytorch_lightning as pl"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "20d98ed0",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"trainer = pl.Trainer(model)\n",
|
|
"trainer.fit(max_epochs=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "28bc697b",
|
|
"metadata": {},
|
|
"source": [
|
|
"**mayavoz model and dataset are highly customazibale**, see [here]() for advanced usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "df01aa1e",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"cli\"></div>\n",
|
|
"\n",
|
|
"\n",
|
|
"## Mayavoz CLI"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2bbf2747",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"! pip install mayavoz[cli]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4447dd07",
|
|
"metadata": {},
|
|
"source": [
|
|
"### TL;DR\n",
|
|
"Calling the following command would train mayavoz Demucs model on DNS-2020 dataset.\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"mayavoz-train \\\n",
|
|
" model=Demucs \\\n",
|
|
" Demucs.sampling_rate=16000 \\\n",
|
|
" dataset=DNS-2020 \\\n",
|
|
" DNS-2020.name = \"dns-2020\" \\\n",
|
|
" DNS-2020.root_dir=\"your_root_dir\" \\\n",
|
|
" DNS-2020.train_clean=\"\" \\\n",
|
|
" DNS-2020.train_noisy=\"\" \\\n",
|
|
" DNS-2020.test_clean=\"\" \\\n",
|
|
" DNS-2020.test_noisy=\"\" \\\n",
|
|
" DNS-2020.sampling_rate=16000 \\\n",
|
|
" DNS-2020.duration=2.0 \\\n",
|
|
" traine=default \\ \n",
|
|
" default.max_epochs=1 \\\n",
|
|
"\n",
|
|
"```\n",
|
|
"\n",
|
|
"This is more or less equaivalent to below code"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9278742a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mayavoz.utils import Files\n",
|
|
"from mayavoz.data import MayaDataset\n",
|
|
"from mayavoz.models import Demucs\n",
|
|
"\n",
|
|
"files = Files(\n",
|
|
" train_clean=\"\",\n",
|
|
" train_noisy=\"\",\n",
|
|
" test_clean=\"\",\n",
|
|
" test_noisy=\"\"\n",
|
|
")\n",
|
|
"dataset = MayaDataset(\n",
|
|
" name='dns-2020'\n",
|
|
" root_dir=\"your_root_dir\",\n",
|
|
" files=files,\n",
|
|
" sampling_rate=16000,\n",
|
|
" duration=2.0)\n",
|
|
"model = Demucs(dataset=dataset,sampling_rate=16000)\n",
|
|
"trainer = Trainer(max_epochs=1)\n",
|
|
"trainer.fit(model)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "eb26692c",
|
|
"metadata": {},
|
|
"source": [
|
|
"Hydra-based configuration\n",
|
|
"mayavoz-train relies on Hydra to configure the training process. Adding --cfg job option to the previous command will let you know about the actual configuration used for training:\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"mayavoz-train --cfg job \\\n",
|
|
" model=Demucs \\\n",
|
|
" Demucs.sampling_rate=16000 \\\n",
|
|
" dataset=DNS-2020\n",
|
|
"\n",
|
|
"```\n",
|
|
"\n",
|
|
"```yaml\n",
|
|
"_target_: mayavoz.models.demucs.Demucs\n",
|
|
"num_channels: 1\n",
|
|
"resample: 4\n",
|
|
"sampling_rate : 16000\n",
|
|
"\n",
|
|
"encoder_decoder:\n",
|
|
" depth: 4\n",
|
|
" initial_output_channels: 64\n",
|
|
" \n",
|
|
"[...]\n",
|
|
"```\n",
|
|
"\n",
|
|
"To change the sampling_rate, you can \n",
|
|
"\n",
|
|
"```bash\n",
|
|
"mayavoz-train \\\n",
|
|
" model=Demucs model.sampling_rate=16000 \\\n",
|
|
" dataset=DNS-2020\n",
|
|
"\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "93555860",
|
|
"metadata": {},
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "mayavoz",
|
|
"language": "python",
|
|
"name": "mayavoz"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.13"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|