{ "cells": [ { "cell_type": "markdown", "id": "7bd11665", "metadata": {}, "source": [ "## Getting Started with Mayavoz\n", "\n", "#### Contents:\n", "- [How to do inference using pretrained model](#inference)\n", "- [How to train your custom model](#basictrain)" ] }, { "cell_type": "markdown", "id": "d3c589bb", "metadata": {}, "source": [ "### Install Mayavoz" ] }, { "cell_type": "code", "execution_count": null, "id": "5b68e053", "metadata": {}, "outputs": [], "source": [ "! pip install -q mayavoz " ] }, { "cell_type": "markdown", "id": "87ee497f", "metadata": {}, "source": [ "
\n", "\n", "### Pretrained Model\n", "\n", "To start using pretrained model,select any of the available recipes from [here](). \n", "For this exercice I am selecting [mayavoz/waveunet]()\n", "\n", "- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format\n", " - audio file path\n", " - numpy audio data\n", " - torch tensor audio data\n", " \n", "It auto-detects the input format and does inference for you.\n", " \n", "At the moment mayavoz only accepts single audio input" ] }, { "cell_type": "markdown", "id": "bd514ff4", "metadata": {}, "source": [ "**Load model**" ] }, { "cell_type": "code", "execution_count": 3, "id": "67698871", "metadata": {}, "outputs": [], "source": [ "\n", "from mayavoz.models import Mayamodel\n", "model = Mayamodel.from_pretrained(\"shahules786/mayavoz-dccrn-valentini-28spk\")\n" ] }, { "cell_type": "markdown", "id": "c7fd4cbe", "metadata": {}, "source": [ "**Inference using file path**" ] }, { "cell_type": "code", "execution_count": 7, "id": "d7996c16", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([1, 1, 36414])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audio = model.enhance(\"my_voice.wav\")\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "8ee20a83", "metadata": {}, "source": [ "**Inference using numpy ndarray**\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "e1a1c718", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(36414,)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "from librosa import load\n", "my_voice,sr = load(\"my_voice.wav\",sr=16000)\n", "my_voice.shape" ] }, { "cell_type": "code", "execution_count": 9, "id": "0cbef6c0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 1, 36414)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audio = model.enhance(my_voice,sampling_rate=sr)\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "a22fc10f", "metadata": {}, "source": [ "**Inference using torch tensor**\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "a884a935", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([1, 1, 36414])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_voice = torch.from_numpy(my_voice)\n", "audio = model.enhance(my_voice,sampling_rate=sr)\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "2ac27920", "metadata": {}, "source": [ "- if you want to save the output, just pass `save_output=True`" ] }, { "cell_type": "code", "execution_count": 11, "id": "9e0313f7", "metadata": {}, "outputs": [], "source": [ "audio = model.enhance(\"my_voice.wav\",save_output=True)" ] }, { "cell_type": "code", "execution_count": 12, "id": "25077720", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Audio\n", "SAMPLING_RATE = 16000\n", "Audio(\"cleaned_my_voice.wav\",rate=SAMPLING_RATE)" ] }, { "cell_type": "markdown", "id": "3170bb0b", "metadata": {}, "source": [ "
\n", "\n", "\n", "## Training your own custom Model\n", "\n", "There are two ways of doing this\n", "\n", "* [Using mayavoz framework ](#code)\n", "* [Using mayavoz command line tool ](#cli)\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "a44fc314", "metadata": {}, "source": [ "
\n", "\n", "**Using Mayavoz framwork** [Basic]\n", "- Prepapare dataloader\n", "- import preferred model\n", "- Train" ] }, { "cell_type": "markdown", "id": "dbc14b36", "metadata": {}, "source": [ "Files is dataclass that helps your to organise your train/test file paths" ] }, { "cell_type": "code", "execution_count": 8, "id": "2c8c2b12", "metadata": {}, "outputs": [], "source": [ "from mayavoz.utils import Files\n", "\n", "name = \"valentini\"\n", "root_dir = \"/Users/shahules/Myprojects/enhancer/datasets/vctk\"\n", "files = Files(train_clean=\"clean_testset_wav\",\n", " train_noisy=\"clean_testset_wav\",\n", " test_clean=\"noisy_testset_wav\",\n", " test_noisy=\"noisy_testset_wav\")\n", "duration = 4.0 \n", "stride = None\n", "sampling_rate = 16000" ] }, { "cell_type": "markdown", "id": "07ef8721", "metadata": {}, "source": [ "Now there are two types of `matching_function`\n", "- `one_to_one` : In this one clean file will only have one corresponding noisy file. For example Valentini datasets\n", "- `one_to_many` : In this one clean file will only have one corresponding noisy file. For example DNS dataset." ] }, { "cell_type": "code", "execution_count": 9, "id": "4b0fdc62", "metadata": {}, "outputs": [], "source": [ "mapping_function = \"one_to_one\"\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "ff0cfe60", "metadata": {}, "outputs": [], "source": [ "from mayavoz.data import MayaDataset\n", "dataset = MayaDataset(\n", " name=name,\n", " root_dir=root_dir,\n", " files=files,\n", " duration=duration,\n", " stride=stride,\n", " sampling_rate=sampling_rate,\n", " min_valid_minutes = 5.0,\n", " )\n" ] }, { "cell_type": "code", "execution_count": 11, "id": "acfdc655", "metadata": {}, "outputs": [], "source": [ "from mayavoz.models import Demucs\n", "model = Demucs(dataset=dataset, loss=\"mae\")\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "4fabe46d", "metadata": {}, "outputs": [], "source": [ "import pytorch_lightning as pl" ] }, { "cell_type": "code", "execution_count": 13, "id": "20d98ed0", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: False, used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Selected fp257 for valid\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", " | Name | Type | Params\n", "----------------------------------------\n", "0 | _loss | LossWrapper | 0 \n", "1 | encoder | ModuleList | 4.7 M \n", "2 | decoder | ModuleList | 4.7 M \n", "3 | de_lstm | DemucsLSTM | 24.8 M\n", "----------------------------------------\n", "34.2 M Trainable params\n", "0 Non-trainable params\n", "34.2 M Total params\n", "136.866 Total estimated model params size (MB)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Total train duration 27.4 minutes\n", "Total validation duration 29.733333333333334 minutes\n", "Total test duration 57.2 minutes\n", "Epoch 0: 48%|▍| 13/27 [15:18<16:29, 70.66s/it, loss=0.0265, v_num=2, train_loss\n", "Validation: 0it [00:00, ?it/s]\u001b[A\n", "Validation: 0%| | 0/14 [00:00\n", "\n", "\n", "## Mayavoz CLI" ] }, { "cell_type": "code", "execution_count": null, "id": "2bbf2747", "metadata": {}, "outputs": [], "source": [ "! pip install mayavoz[cli]" ] }, { "cell_type": "markdown", "id": "4447dd07", "metadata": {}, "source": [ "### TL;DR\n", "Calling the following command would train mayavoz Demucs model on DNS-2020 dataset.\n", "\n", "```bash\n", "mayavoz-train \\\n", " model=Demucs \\\n", " Demucs.sampling_rate=16000 \\\n", " dataset=VCTK dataset.root_dir = \"your_root_directory\" \\\n", " trainer=fastrun_dev\n", "\n", "```\n", "\n", "This is more or less equaivalent to below code" ] }, { "cell_type": "code", "execution_count": null, "id": "9278742a", "metadata": {}, "outputs": [], "source": [ "from mayavoz.data import MayaDataset\n", "from mayavoz.models import Demucs\n", "\n", "dataset = MayaDataset(\n", " name='vctk'\n", " root_dir=\"your_root_directory\",\n", " )\n", "model = Demucs(dataset=dataset, sampling_rate=16000)\n", "trainer = Trainer()\n", "trainer.fit(model)" ] }, { "cell_type": "markdown", "id": "15737128", "metadata": {}, "source": [ "For example, if you want to add/change `stride` of dataset\n", "\n", "```bash\n", "mayavoz-train \\\n", " model=Demucs \\\n", " Demucs.sampling_rate=16000 \\\n", " dataset=VCTK dataset.root_dir = \"your_root_directory\" dataset.stride=1\\\n", "\n", "```" ] }, { "cell_type": "markdown", "id": "eb26692c", "metadata": {}, "source": [ "#### Hydra-based configuration\n", "mayavoz-train relies on Hydra to configure the training process. Adding --cfg job option to the previous command will let you know about the actual configuration used for training:\n", "\n", "```bash\n", "mayavoz-train --cfg job \\\n", " model=Demucs \\\n", " Demucs.sampling_rate=16000 \\\n", " dataset=DNS-2020\n", "\n", "```\n", "\n", "```yaml\n", "_target_: mayavoz.models.demucs.Demucs\n", "num_channels: 1\n", "resample: 4\n", "sampling_rate : 16000\n", "\n", "encoder_decoder:\n", " depth: 4\n", " initial_output_channels: 64\n", " \n", "[...]\n", "```\n", "\n", "To change the sampling_rate, you can \n", "\n", "```bash\n", "mayavoz-train \\\n", " model=Demucs model.sampling_rate=16000 \\\n", " dataset=DNS-2020\n", "\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "enhancer", "language": "python", "name": "enhancer" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "vscode": { "interpreter": { "hash": "aa065deb7c1aa0a1a524e1ebced87b297febfedb61bf47eab2415d34995331a2" } } }, "nbformat": 4, "nbformat_minor": 5 }