{ "cells": [ { "cell_type": "markdown", "id": "7bd11665", "metadata": {}, "source": [ "## Getting Started with Mayavoz\n", "\n", "#### Contents:\n", "- [How to do inference using pretrained model](#inference)\n", "- [How to train your custom model](#basictrain)" ] }, { "cell_type": "markdown", "id": "d3c589bb", "metadata": {}, "source": [ "### Install Mayavoz" ] }, { "cell_type": "code", "execution_count": null, "id": "5b68e053", "metadata": {}, "outputs": [], "source": [ "! pip install -q mayavoz " ] }, { "cell_type": "markdown", "id": "87ee497f", "metadata": {}, "source": [ "
\n", "\n", "### Pretrained Model\n", "\n", "To start using pretrained model,select any of the available recipes from [here](). \n", "For this exercice I am selecting [mayavoz/waveunet]()\n", "\n", "- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format\n", " - audio file path\n", " - numpy audio data\n", " - torch tensor audio data\n", " \n", "It auto-detects the input format and does inference for you.\n", " \n", "At the moment mayavoz only accepts single audio input" ] }, { "cell_type": "markdown", "id": "bd514ff4", "metadata": {}, "source": [ "**Load model**" ] }, { "cell_type": "code", "execution_count": 3, "id": "67698871", "metadata": {}, "outputs": [], "source": [ "\n", "from mayavoz.models import Mayamodel\n", "model = Mayamodel.from_pretrained(\"shahules786/mayavoz-dccrn-valentini-28spk\")\n" ] }, { "cell_type": "markdown", "id": "c7fd4cbe", "metadata": {}, "source": [ "**Inference using file path**" ] }, { "cell_type": "code", "execution_count": 7, "id": "d7996c16", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([1, 1, 36414])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audio = model.enhance(\"my_voice.wav\")\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "8ee20a83", "metadata": {}, "source": [ "**Inference using numpy ndarray**\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "e1a1c718", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(36414,)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "from librosa import load\n", "my_voice,sr = load(\"my_voice.wav\",sr=16000)\n", "my_voice.shape" ] }, { "cell_type": "code", "execution_count": 9, "id": "0cbef6c0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 1, 36414)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audio = model.enhance(my_voice,sampling_rate=sr)\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "a22fc10f", "metadata": {}, "source": [ "**Inference using torch tensor**\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "a884a935", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([1, 1, 36414])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_voice = torch.from_numpy(my_voice)\n", "audio = model.enhance(my_voice,sampling_rate=sr)\n", "audio.shape" ] }, { "cell_type": "markdown", "id": "2ac27920", "metadata": {}, "source": [ "- if you want to save the output, just pass `save_output=True`" ] }, { "cell_type": "code", "execution_count": 11, "id": "9e0313f7", "metadata": {}, "outputs": [], "source": [ "audio = model.enhance(\"my_voice.wav\",save_output=True)" ] }, { "cell_type": "code", "execution_count": 12, "id": "25077720", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "