316 lines
6.8 KiB
Plaintext
316 lines
6.8 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7bd11665",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Getting Started with Mayavoz\n",
|
|
"\n",
|
|
"#### Contents:\n",
|
|
"- [How to do inference using pretrained model](#inference)\n",
|
|
"- [How to train your custom model](#basictrain)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d3c589bb",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Install Mayavoz"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5b68e053",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"! pip install -q mayavoz "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "87ee497f",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"inference\"></div>\n",
|
|
"\n",
|
|
"### Pretrained Model\n",
|
|
"\n",
|
|
"To start using pretrained model,select any of the available recipes from [here](). \n",
|
|
"For this exercice I am selecting [mayavoz/waveunet]()\n",
|
|
"\n",
|
|
"- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format\n",
|
|
" - audio file path\n",
|
|
" - numpy audio data\n",
|
|
" - torch tensor audio data\n",
|
|
" \n",
|
|
"It auto-detects the input format and does inference for you.\n",
|
|
" \n",
|
|
"At the moment mayavoz only accepts single audio input"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bd514ff4",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Load model**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "67698871",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\n",
|
|
"from mayavoz import Mayamodel\n",
|
|
"model = Mayamodel.from_pretrained(\"mayavoz/waveunet\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c7fd4cbe",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Inference using file path**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d7996c16",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"file = \"myvoice.wav\"\n",
|
|
"audio = model.enhance(\"myvoice.wav\")\n",
|
|
"audio.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8ee20a83",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Inference using torch tensor**\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e1a1c718",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"audio_tensor = torch.rand(1,1,32000) ## random audio data\n",
|
|
"audio = model.enhance(audio_tensor)\n",
|
|
"audio.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2ac27920",
|
|
"metadata": {},
|
|
"source": [
|
|
"- if you want to save the output, just pass `save_output=True`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9e0313f7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"audio = model.enhance(\"myvoice.wav\",save_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "25077720",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from Ipython.audio import Audio\n",
|
|
"\n",
|
|
"Audio(\"myvoice_cleaned.wav\",rate=SAMPLING_RATE)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3170bb0b",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"basictrain\"></div>\n",
|
|
"\n",
|
|
"\n",
|
|
"## Training your own custom Model\n",
|
|
"\n",
|
|
"There are two ways of doing this\n",
|
|
"\n",
|
|
"* [Using mayavoz framework ](#code)\n",
|
|
"* [Using mayavoz command line tool ](#cli)\n",
|
|
"\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a44fc314",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"code\"></div>\n",
|
|
"\n",
|
|
"**Using Mayavoz framwork** [Basic]\n",
|
|
"- Prepapare dataloader\n",
|
|
"- import preferred model\n",
|
|
"- Train"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dbc14b36",
|
|
"metadata": {},
|
|
"source": [
|
|
"Files is dataclass that helps your to organise your train/test file paths"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2c8c2b12",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mayavoz.utils import Files\n",
|
|
"\n",
|
|
"name = \"dataset_name\"\n",
|
|
"root_dir = \"root_directory_of_your_dataset\"\n",
|
|
"files = Files(train_clean=\"train_cleanfiles_foldername\",\n",
|
|
" train_noisy=\"noisy_train_foldername\",\n",
|
|
" test_clean=\"clean_test_foldername\",\n",
|
|
" test_noisy=\"noisy_test_foldername\")\n",
|
|
"duration = 4.0 \n",
|
|
"stride = None\n",
|
|
"sampling_rate = 16000"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "07ef8721",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now there are two types of `matching_function`\n",
|
|
"- `one_to_one` : In this one clean file will only have one corresponding noisy file. For example VCTK datasets\n",
|
|
"- `one_to_many` : In this one clean file will only have one corresponding noisy file. For example DNS dataset."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4b0fdc62",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mapping_function = \"one_to_one\"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ff0cfe60",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mayavoz.dataset import MayaDataset\n",
|
|
"dataset = MayaDataset(\n",
|
|
" name=name,\n",
|
|
" root_dir=root_dir,\n",
|
|
" files=files,\n",
|
|
" duration=duration,\n",
|
|
" stride=stride,\n",
|
|
" sampling_rate=sampling_rate\n",
|
|
" )\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "acfdc655",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mayavoz.models import Demucs\n",
|
|
"model = Demucs(dataset=dataset, loss=\"mae\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "4fabe46d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pytorch_lightning as pl"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "20d98ed0",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"trainer = pl.Trainer(model)\n",
|
|
"trainer.fit(max_epochs=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "28bc697b",
|
|
"metadata": {},
|
|
"source": [
|
|
"**mayavoz model and dataset are highly customazibale**, see [here]() for advanced usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "df01aa1e",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div id=\"cli\"></div>\n",
|
|
"\n",
|
|
"\n",
|
|
"### Mayavoz CLI"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "enhancer",
|
|
"language": "python",
|
|
"name": "enhancer"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.13"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|