diff --git a/notebooks/Getting_started.ipynb b/notebooks/Getting_started.ipynb new file mode 100644 index 0000000..05067cd --- /dev/null +++ b/notebooks/Getting_started.ipynb @@ -0,0 +1,315 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7bd11665", + "metadata": {}, + "source": [ + "## Getting Started with Mayavoz\n", + "\n", + "#### Contents:\n", + "- [How to do inference using pretrained model](#inference)\n", + "- [How to train your custom model](#basictrain)" + ] + }, + { + "cell_type": "markdown", + "id": "d3c589bb", + "metadata": {}, + "source": [ + "### Install Mayavoz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b68e053", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -q mayavoz " + ] + }, + { + "cell_type": "markdown", + "id": "87ee497f", + "metadata": {}, + "source": [ + "
\n", + "\n", + "### Pretrained Model\n", + "\n", + "To start using pretrained model,select any of the available recipes from [here](). \n", + "For this exercice I am selecting [mayavoz/waveunet]()\n", + "\n", + "- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format\n", + " - audio file path\n", + " - numpy audio data\n", + " - torch tensor audio data\n", + " \n", + "It auto-detects the input format and does inference for you.\n", + " \n", + "At the moment mayavoz only accepts single audio input" + ] + }, + { + "cell_type": "markdown", + "id": "bd514ff4", + "metadata": {}, + "source": [ + "**Load model**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67698871", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "from mayavoz import Mayamodel\n", + "model = Mayamodel.from_pretrained(\"mayavoz/waveunet\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "c7fd4cbe", + "metadata": {}, + "source": [ + "**Inference using file path**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d7996c16", + "metadata": {}, + "outputs": [], + "source": [ + "file = \"myvoice.wav\"\n", + "audio = model.enhance(\"myvoice.wav\")\n", + "audio.shape" + ] + }, + { + "cell_type": "markdown", + "id": "8ee20a83", + "metadata": {}, + "source": [ + "**Inference using torch tensor**\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1a1c718", + "metadata": {}, + "outputs": [], + "source": [ + "audio_tensor = torch.rand(1,1,32000) ## random audio data\n", + "audio = model.enhance(audio_tensor)\n", + "audio.shape" + ] + }, + { + "cell_type": "markdown", + "id": "2ac27920", + "metadata": {}, + "source": [ + "- if you want to save the output, just pass `save_output=True`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9e0313f7", + "metadata": {}, + "outputs": [], + "source": [ + "audio = model.enhance(\"myvoice.wav\",save_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25077720", + "metadata": {}, + "outputs": [], + "source": [ + "from Ipython.audio import Audio\n", + "\n", + "Audio(\"myvoice_cleaned.wav\",rate=SAMPLING_RATE)" + ] + }, + { + "cell_type": "markdown", + "id": "3170bb0b", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "## Training your own custom Model\n", + "\n", + "There are two ways of doing this\n", + "\n", + "* [Using mayavoz framework ](#code)\n", + "* [Using mayavoz command line tool ](#cli)\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "a44fc314", + "metadata": {}, + "source": [ + "\n", + "\n", + "**Using Mayavoz framwork** [Basic]\n", + "- Prepapare dataloader\n", + "- import preferred model\n", + "- Train" + ] + }, + { + "cell_type": "markdown", + "id": "dbc14b36", + "metadata": {}, + "source": [ + "Files is dataclass that helps your to organise your train/test file paths" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2c8c2b12", + "metadata": {}, + "outputs": [], + "source": [ + "from mayavoz.utils import Files\n", + "\n", + "name = \"dataset_name\"\n", + "root_dir = \"root_directory_of_your_dataset\"\n", + "files = Files(train_clean=\"train_cleanfiles_foldername\",\n", + " train_noisy=\"noisy_train_foldername\",\n", + " test_clean=\"clean_test_foldername\",\n", + " test_noisy=\"noisy_test_foldername\")\n", + "duration = 4.0 \n", + "stride = None\n", + "sampling_rate = 16000" + ] + }, + { + "cell_type": "markdown", + "id": "07ef8721", + "metadata": {}, + "source": [ + "Now there are two types of `matching_function`\n", + "- `one_to_one` : In this one clean file will only have one corresponding noisy file. For example VCTK datasets\n", + "- `one_to_many` : In this one clean file will only have one corresponding noisy file. For example DNS dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b0fdc62", + "metadata": {}, + "outputs": [], + "source": [ + "mapping_function = \"one_to_one\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff0cfe60", + "metadata": {}, + "outputs": [], + "source": [ + "from mayavoz.dataset import MayaDataset\n", + "dataset = MayaDataset(\n", + " name=name,\n", + " root_dir=root_dir,\n", + " files=files,\n", + " duration=duration,\n", + " stride=stride,\n", + " sampling_rate=sampling_rate\n", + " )\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "acfdc655", + "metadata": {}, + "outputs": [], + "source": [ + "from mayavoz.models import Demucs\n", + "model = Demucs(dataset=dataset, loss=\"mae\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4fabe46d", + "metadata": {}, + "outputs": [], + "source": [ + "import pytorch_lightning as pl" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20d98ed0", + "metadata": {}, + "outputs": [], + "source": [ + "trainer = pl.Trainer(model)\n", + "trainer.fit(max_epochs=1)" + ] + }, + { + "cell_type": "markdown", + "id": "28bc697b", + "metadata": {}, + "source": [ + "**mayavoz model and dataset are highly customazibale**, see [here]() for advanced usage" + ] + }, + { + "cell_type": "markdown", + "id": "df01aa1e", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "### Mayavoz CLI" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "enhancer", + "language": "python", + "name": "enhancer" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}