getting started

This commit is contained in:
shahules786 2022-11-08 12:45:09 +05:30
parent e0fbf55dca
commit ef06786d8c
1 changed files with 315 additions and 0 deletions

View File

@ -0,0 +1,315 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7bd11665",
"metadata": {},
"source": [
"## Getting Started with Mayavoz\n",
"\n",
"#### Contents:\n",
"- [How to do inference using pretrained model](#inference)\n",
"- [How to train your custom model](#basictrain)"
]
},
{
"cell_type": "markdown",
"id": "d3c589bb",
"metadata": {},
"source": [
"### Install Mayavoz"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5b68e053",
"metadata": {},
"outputs": [],
"source": [
"! pip install -q mayavoz "
]
},
{
"cell_type": "markdown",
"id": "87ee497f",
"metadata": {},
"source": [
"<div id=\"inference\"></div>\n",
"\n",
"### Pretrained Model\n",
"\n",
"To start using pretrained model,select any of the available recipes from [here](). \n",
"For this exercice I am selecting [mayavoz/waveunet]()\n",
"\n",
"- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format\n",
" - audio file path\n",
" - numpy audio data\n",
" - torch tensor audio data\n",
" \n",
"It auto-detects the input format and does inference for you.\n",
" \n",
"At the moment mayavoz only accepts single audio input"
]
},
{
"cell_type": "markdown",
"id": "bd514ff4",
"metadata": {},
"source": [
"**Load model**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67698871",
"metadata": {},
"outputs": [],
"source": [
"\n",
"from mayavoz import Mayamodel\n",
"model = Mayamodel.from_pretrained(\"mayavoz/waveunet\")\n"
]
},
{
"cell_type": "markdown",
"id": "c7fd4cbe",
"metadata": {},
"source": [
"**Inference using file path**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7996c16",
"metadata": {},
"outputs": [],
"source": [
"file = \"myvoice.wav\"\n",
"audio = model.enhance(\"myvoice.wav\")\n",
"audio.shape"
]
},
{
"cell_type": "markdown",
"id": "8ee20a83",
"metadata": {},
"source": [
"**Inference using torch tensor**\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1a1c718",
"metadata": {},
"outputs": [],
"source": [
"audio_tensor = torch.rand(1,1,32000) ## random audio data\n",
"audio = model.enhance(audio_tensor)\n",
"audio.shape"
]
},
{
"cell_type": "markdown",
"id": "2ac27920",
"metadata": {},
"source": [
"- if you want to save the output, just pass `save_output=True`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e0313f7",
"metadata": {},
"outputs": [],
"source": [
"audio = model.enhance(\"myvoice.wav\",save_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25077720",
"metadata": {},
"outputs": [],
"source": [
"from Ipython.audio import Audio\n",
"\n",
"Audio(\"myvoice_cleaned.wav\",rate=SAMPLING_RATE)"
]
},
{
"cell_type": "markdown",
"id": "3170bb0b",
"metadata": {},
"source": [
"<div id=\"basictrain\"></div>\n",
"\n",
"\n",
"## Training your own custom Model\n",
"\n",
"There are two ways of doing this\n",
"\n",
"* [Using mayavoz framework ](#code)\n",
"* [Using mayavoz command line tool ](#cli)\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "a44fc314",
"metadata": {},
"source": [
"<div id=\"code\"></div>\n",
"\n",
"**Using Mayavoz framwork** [Basic]\n",
"- Prepapare dataloader\n",
"- import preferred model\n",
"- Train"
]
},
{
"cell_type": "markdown",
"id": "dbc14b36",
"metadata": {},
"source": [
"Files is dataclass that helps your to organise your train/test file paths"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c8c2b12",
"metadata": {},
"outputs": [],
"source": [
"from mayavoz.utils import Files\n",
"\n",
"name = \"dataset_name\"\n",
"root_dir = \"root_directory_of_your_dataset\"\n",
"files = Files(train_clean=\"train_cleanfiles_foldername\",\n",
" train_noisy=\"noisy_train_foldername\",\n",
" test_clean=\"clean_test_foldername\",\n",
" test_noisy=\"noisy_test_foldername\")\n",
"duration = 4.0 \n",
"stride = None\n",
"sampling_rate = 16000"
]
},
{
"cell_type": "markdown",
"id": "07ef8721",
"metadata": {},
"source": [
"Now there are two types of `matching_function`\n",
"- `one_to_one` : In this one clean file will only have one corresponding noisy file. For example VCTK datasets\n",
"- `one_to_many` : In this one clean file will only have one corresponding noisy file. For example DNS dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b0fdc62",
"metadata": {},
"outputs": [],
"source": [
"mapping_function = \"one_to_one\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff0cfe60",
"metadata": {},
"outputs": [],
"source": [
"from mayavoz.dataset import MayaDataset\n",
"dataset = MayaDataset(\n",
" name=name,\n",
" root_dir=root_dir,\n",
" files=files,\n",
" duration=duration,\n",
" stride=stride,\n",
" sampling_rate=sampling_rate\n",
" )\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "acfdc655",
"metadata": {},
"outputs": [],
"source": [
"from mayavoz.models import Demucs\n",
"model = Demucs(dataset=dataset, loss=\"mae\")\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4fabe46d",
"metadata": {},
"outputs": [],
"source": [
"import pytorch_lightning as pl"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20d98ed0",
"metadata": {},
"outputs": [],
"source": [
"trainer = pl.Trainer(model)\n",
"trainer.fit(max_epochs=1)"
]
},
{
"cell_type": "markdown",
"id": "28bc697b",
"metadata": {},
"source": [
"**mayavoz model and dataset are highly customazibale**, see [here]() for advanced usage"
]
},
{
"cell_type": "markdown",
"id": "df01aa1e",
"metadata": {},
"source": [
"<div id=\"cli\"></div>\n",
"\n",
"\n",
"### Mayavoz CLI"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "enhancer",
"language": "python",
"name": "enhancer"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}