{ "cells": [ { "cell_type": "markdown", "id": "7bd11665", "metadata": {}, "source": [ "## Getting Started with Mayavoz\n", "\n", "#### Contents:\n", "- [How to do inference using pretrained model](#inference)\n", "- [How to train your custom model](#basictrain)" ] }, { "cell_type": "markdown", "id": "d3c589bb", "metadata": {}, "source": [ "### Install Mayavoz" ] }, { "cell_type": "code", "execution_count": null, "id": "5b68e053", "metadata": {}, "outputs": [], "source": [ "! pip install -q mayavoz " ] }, { "cell_type": "markdown", "id": "87ee497f", "metadata": {}, "source": [ "
\n", "\n", "### Pretrained Model\n", "\n", "To start using pretrained model,select any of the available recipes from [here](). \n", "For this exercice I am selecting [mayavoz/waveunet]()\n", "\n", "- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format\n", " - audio file path\n", " - numpy audio data\n", " - torch tensor audio data\n", " \n", "It auto-detects the input format and does inference for you.\n", " \n", "At the moment mayavoz only accepts single audio input" ] }, { "cell_type": "markdown", "id": "bd514ff4", "metadata": {}, "source": [ "**Load model**" ] }, { "cell_type": "code", "execution_count": null, "id": "67698871", "metadata": {}, "outputs": [], "source": [ "\n", "from mayavoz import Mayamodel\n", "model = Mayamodel.from_pretrained(\"mayavoz/waveunet\")\n" ] }, { "cell_type": "markdown", "id": "c7fd4cbe", "metadata": {}, "source": [ "**Inference using file path**" ] }, { "cell_type": "code", "execution_count": null, "id": "d7996c16", "metadata": {}, "outputs": [], "source": [ "file = \"myvoice.wav\"\n", "audio = model.enhance(\"myvoice.wav\")\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "8ee20a83", "metadata": {}, "source": [ "**Inference using torch tensor**\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e1a1c718", "metadata": {}, "outputs": [], "source": [ "audio_tensor = torch.rand(1,1,32000) ## random audio data\n", "audio = model.enhance(audio_tensor)\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "2ac27920", "metadata": {}, "source": [ "- if you want to save the output, just pass `save_output=True`" ] }, { "cell_type": "code", "execution_count": null, "id": "9e0313f7", "metadata": {}, "outputs": [], "source": [ "audio = model.enhance(\"myvoice.wav\",save_output=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "25077720", "metadata": {}, "outputs": [], "source": [ "from Ipython.audio import Audio\n", "\n", "Audio(\"myvoice_cleaned.wav\",rate=SAMPLING_RATE)" ] }, { "cell_type": "markdown", "id": "3170bb0b", "metadata": {}, "source": [ "\n", "\n", "\n", "## Training your own custom Model\n", "\n", "There are two ways of doing this\n", "\n", "* [Using mayavoz framework ](#code)\n", "* [Using mayavoz command line tool ](#cli)\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "a44fc314", "metadata": {}, "source": [ "\n", "\n", "**Using Mayavoz framwork** [Basic]\n", "- Prepapare dataloader\n", "- import preferred model\n", "- Train" ] }, { "cell_type": "markdown", "id": "dbc14b36", "metadata": {}, "source": [ "Files is dataclass that helps your to organise your train/test file paths" ] }, { "cell_type": "code", "execution_count": null, "id": "2c8c2b12", "metadata": {}, "outputs": [], "source": [ "from mayavoz.utils import Files\n", "\n", "name = \"dataset_name\"\n", "root_dir = \"root_directory_of_your_dataset\"\n", "files = Files(train_clean=\"train_cleanfiles_foldername\",\n", " train_noisy=\"noisy_train_foldername\",\n", " test_clean=\"clean_test_foldername\",\n", " test_noisy=\"noisy_test_foldername\")\n", "duration = 4.0 \n", "stride = None\n", "sampling_rate = 16000" ] }, { "cell_type": "markdown", "id": "07ef8721", "metadata": {}, "source": [ "Now there are two types of `matching_function`\n", "- `one_to_one` : In this one clean file will only have one corresponding noisy file. For example VCTK datasets\n", "- `one_to_many` : In this one clean file will only have one corresponding noisy file. For example DNS dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "4b0fdc62", "metadata": {}, "outputs": [], "source": [ "mapping_function = \"one_to_one\"\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ff0cfe60", "metadata": {}, "outputs": [], "source": [ "from mayavoz.dataset import MayaDataset\n", "dataset = MayaDataset(\n", " name=name,\n", " root_dir=root_dir,\n", " files=files,\n", " duration=duration,\n", " stride=stride,\n", " sampling_rate=sampling_rate\n", " )\n" ] }, { "cell_type": "code", "execution_count": null, "id": "acfdc655", "metadata": {}, "outputs": [], "source": [ "from mayavoz.models import Demucs\n", "model = Demucs(dataset=dataset, loss=\"mae\")\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "4fabe46d", "metadata": {}, "outputs": [], "source": [ "import pytorch_lightning as pl" ] }, { "cell_type": "code", "execution_count": null, "id": "20d98ed0", "metadata": {}, "outputs": [], "source": [ "trainer = pl.Trainer(model)\n", "trainer.fit(max_epochs=1)" ] }, { "cell_type": "markdown", "id": "28bc697b", "metadata": {}, "source": [ "**mayavoz model and dataset are highly customazibale**, see [here]() for advanced usage" ] }, { "cell_type": "markdown", "id": "df01aa1e", "metadata": {}, "source": [ "\n", "\n", "\n", "## Mayavoz CLI" ] }, { "cell_type": "code", "execution_count": null, "id": "2bbf2747", "metadata": {}, "outputs": [], "source": [ "! pip install mayavoz[cli]" ] }, { "cell_type": "markdown", "id": "4447dd07", "metadata": {}, "source": [ "### TL;DR\n", "Calling the following command would train mayavoz Demucs model on DNS-2020 dataset.\n", "\n", "```bash\n", "mayavoz-train \\\n", " model=Demucs \\\n", " Demucs.sampling_rate=16000 \\\n", " dataset=DNS-2020 \\\n", " DNS-2020.name = \"dns-2020\" \\\n", " DNS-2020.root_dir=\"your_root_dir\" \\\n", " DNS-2020.train_clean=\"\" \\\n", " DNS-2020.train_noisy=\"\" \\\n", " DNS-2020.test_clean=\"\" \\\n", " DNS-2020.test_noisy=\"\" \\\n", " DNS-2020.sampling_rate=16000 \\\n", " DNS-2020.duration=2.0 \\\n", " traine=default \\ \n", " default.max_epochs=1 \\\n", "\n", "```\n", "\n", "This is more or less equaivalent to below code" ] }, { "cell_type": "code", "execution_count": null, "id": "9278742a", "metadata": {}, "outputs": [], "source": [ "from mayavoz.utils import Files\n", "from mayavoz.data import MayaDataset\n", "from mayavoz.models import Demucs\n", "\n", "files = Files(\n", " train_clean=\"\",\n", " train_noisy=\"\",\n", " test_clean=\"\",\n", " test_noisy=\"\"\n", ")\n", "dataset = MayaDataset(\n", " name='dns-2020'\n", " root_dir=\"your_root_dir\",\n", " files=files,\n", " sampling_rate=16000,\n", " duration=2.0)\n", "model = Demucs(dataset=dataset,sampling_rate=16000)\n", "trainer = Trainer(max_epochs=1)\n", "trainer.fit(model)" ] }, { "cell_type": "markdown", "id": "eb26692c", "metadata": {}, "source": [ "Hydra-based configuration\n", "mayavoz-train relies on Hydra to configure the training process. Adding --cfg job option to the previous command will let you know about the actual configuration used for training:\n", "\n", "```bash\n", "mayavoz-train --cfg job \\\n", " model=Demucs \\\n", " Demucs.sampling_rate=16000 \\\n", " dataset=DNS-2020\n", "\n", "```\n", "\n", "```yaml\n", "_target_: mayavoz.models.demucs.Demucs\n", "num_channels: 1\n", "resample: 4\n", "sampling_rate : 16000\n", "\n", "encoder_decoder:\n", " depth: 4\n", " initial_output_channels: 64\n", " \n", "[...]\n", "```\n", "\n", "To change the sampling_rate, you can \n", "\n", "```bash\n", "mayavoz-train \\\n", " model=Demucs model.sampling_rate=16000 \\\n", " dataset=DNS-2020\n", "\n", "```" ] }, { "cell_type": "markdown", "id": "93555860", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "mayavoz", "language": "python", "name": "mayavoz" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 5 }