Merge pull request #46 from izam-mohammed/main

Updated conv.py
updated conv.py with changing imag variable
2023-02-08 16:40:35 +05:30 · 2023-01-24 07:12:45 +05:30 · 2023-01-24 06:57:04 +05:30 · 2023-01-11 23:03:27 +05:30 · 2023-01-11 12:52:13 +05:30 · 2023-01-11 12:51:32 +05:30
107 changed files with 3584 additions and 782 deletions
--- a/.flake8
+++ b/.flake8
@ -1,5 +1,5 @@
 [flake8]
-per-file-ignores = __init__.py:F401
+per-file-ignores = "mayavoz/model/__init__.py:F401"
 ignore = E203, E266, E501, W503
 # line length is intentionally set to 80 here because black uses Bugbear
 # See https://github.com/psf/black/blob/master/README.md#line-length for more details
--- a/.gitattributes
+++ b/.gitattributes
@ -0,0 +1 @@
 notebooks/** linguist-vendored
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@ -1,13 +1,13 @@
 # This workflow will install Python dependencies, run tests and lint with a variety of Python versions
 # For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
-name: Enhancer
+name: mayavoz
 on:
  push:
-    branches: [ dev ]
+    branches: [ main ]
  pull_request:
-    branches: [ dev ]
+    branches: [ main ]
 jobs:
  build:
    runs-on: ubuntu-latest
@ -40,12 +40,12 @@ jobs:
        sudo apt-get install libsndfile1
        pip install -r requirements.txt
        pip install black pytest-cov
-    - name: Install enhancer
+    - name: Install mayavoz
      run: |
          pip install -e .[dev,testing]
    - name: Run black
      run:
-        black --check .  --exclude enhancer/version.py
+        black --check .  --exclude mayavoz/version.py
    - name: Test with pytest
      run:
-        pytest tests --cov=enhancer/
+        pytest tests --cov=mayavoz/
--- a/.gitignore
+++ b/.gitignore
@ -1,5 +1,10 @@
 #local
 cleaned_my_voice.wav
 lightning_logs/
 my_voice.wav
 pretrained/
 *.ckpt
 *_local.yaml
 cli/train_config/dataset/Vctk_local.yaml
 .DS_Store
 outputs/
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -23,6 +23,7 @@ repos:
      hooks:
      - id: flake8
        args: ['--ignore=E203,E501,F811,E712,W503']
        exclude: __init__.py
    # Formatting, Whitespace, etc
    - repo: https://github.com/pre-commit/pre-commit-hooks
@ -40,5 +41,4 @@ repos:
      - id: end-of-file-fixer
      - id: requirements-txt-fixer
      - id: mixed-line-ending
        exclude: noisyspeech_synthesizer.cfg
        args: ['--fix=no']
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -0,0 +1,46 @@
 # Contributing
 Hi there 👋
 If you're reading this I hope that you're looking forward to adding value to Mayavoz. This document will help you to get started with your journey.
 ## How to get your code in Mayavoz
 1. We use git and GitHub.
 2. Fork the mayavoz repository (https://github.com/shahules786/mayavoz) on GitHub under your own account. (This creates a copy of mayavoz under your account, and GitHub knows where it came from, and we typically call this “upstream”.)
 3. Clone your own mayavoz repository. git clone https://github.com/ <your-account> /mayavoz (This downloads the git repository to your machine, git knows where it came from, and calls it “origin”.)
 4. Create a branch for each specific feature you are developing. git checkout -b your-branch-name
 5. Make + commit changes. git add files-you-changed ... git commit -m "Short message about what you did"
 6. Push the branch to your GitHub repository. git push origin your-branch-name
 7. Navigate to GitHub, and create a pull request from your branch to the upstream repository mayavoz/mayavoz, to the “develop” branch.
 8. The Pull Request (PR) appears on the upstream repository. Discuss your contribution there. If you push more changes to your branch on GitHub (on your repository), they are added to the PR.
 9. When the reviewer is satisfied that the code improves repository quality, they can merge.
 Note that CI tests will be run when you create a PR. If you want to be sure that your code will not fail these tests, we have set up pre-commit hooks that you can install.
 **If you're worried about things not being perfect with your code, we will work togethor and make it perfect. So, make your move!**
 ## Formating
 We use [black](https://black.readthedocs.io/en/stable/) and [flake8](https://flake8.pycqa.org/en/latest/) for code formating. Please ensure that you use the same before submitting the PR.
 ## Testing
 We adopt unit testing using [pytest](https://docs.pytest.org/en/latest/contents.html)
 Please make sure that adding your new component does not decrease test coverage.
 ## Other tools
 The use of [per-commit](https://pre-commit.com/) is recommended to ensure different requirements such as code formating, etc.
 ## How to start contributing to Mayavoz?
 1. Checkout issues marked as `good first issue`, let us know you're interested in working on some issue by commenting under it.
 2. For others, I would suggest you to explore mayavoz. One way to do is to use it to train your own model. This was you might end by finding a new unreported bug or getting an idea to improve Mayavoz.
--- a/20
+++ b/20
@ -0,0 +1,20 @@
 MIT License
 Copyright (c) 2022 Shahul Es
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -0,0 +1,4 @@
 recursive-include mayavoz *.py
 recursive-include mayavoz *.yaml
 global-exclude *.pyc
 global-exclude __pycache__
--- a/README.md
+++ b/README.md
@ -2,24 +2,52 @@
  <img src="https://user-images.githubusercontent.com/25312635/195514652-e4526cd1-1177-48e9-a80d-c8bfdb95d35f.png" />
 </p>
-mayavoz is a Pytorch-based opensource toolkit for speech enhancement. It is designed to save time for audio researchers. Is provides easy to use pretrained audio enhancement models and facilitates highly customisable model training.
+![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/shahules786/mayavoz/ci.yaml?branch=main)
 ![GitHub](https://img.shields.io/github/license/shahules786/enhancer)
 ![GitHub issues](https://img.shields.io/github/issues/shahules786/enhancer?logo=GitHub)
 ![GitHub Repo stars](https://img.shields.io/github/stars/shahules786/enhancer?style=social)
-| **[Quick Start]()** | **[Installation]()** | **[Tutorials]()** | **[Available Recipes]()**
+mayavoz is a Pytorch-based opensource toolkit for speech enhancement. It is designed to save time for audio practioners & researchers. It provides easy to use pretrained speech enhancement models and facilitates highly customisable model training.
 | **[Quick Start](#quick-start-fire)** | **[Installation](#installation)** | **[Tutorials](https://github.com/shahules786/enhancer/tree/main/notebooks)** | **[Available Recipes](#recipes)** | **[Demo](#demo)**
 ## Key features :key:
-* Various pretrained models nicely integrated with huggingface 	:hugs: that users can select and use without any hastle.
+* Various pretrained models nicely integrated with [huggingface hub](https://huggingface.co/docs/hub/index) :hugs: that users can select and use without any hastle.
-* :package: Ability to train and validation your own custom speech enhancement models with just under 10 lines of code!
+* :package: Ability to train and validate your own custom speech enhancement models with just under 10 lines of code!
 * :magic_wand: A command line tool that facilitates training of highly customisable speech enhacement models from the terminal itself!
-* :zap: Supports multi-gpu training integrated with Pytorch Lightning.
+* :zap: Supports multi-gpu training integrated with [Pytorch Lightning](https://pytorchlightning.ai/).
 * :shield: data augmentations integrated using [torch-augmentations](https://github.com/asteroid-team/torch-audiomentations)
 ## Demo
 Noisy speech followed by enhanced version.
 https://user-images.githubusercontent.com/25312635/203756185-737557f4-6e21-4146-aa2c-95da69d0de4c.mp4
 ## Quick Start :fire:
 ``` python
-from mayavoz import Mayamodel
+from mayavoz.models import Mayamodel
-model = Mayamodel.from_pretrained("mayavoz/waveunet")
+model = Mayamodel.from_pretrained("shahules786/mayavoz-waveunet-valentini-28spk")
-model("noisy_audio.wav")
+model.enhance("noisy_audio.wav")
 ```
 ## Recipes
 | Model     | Dataset      | STOI    | PESQ  | URL                           |
 | :---:     |  :---:       | :---:   | :---: | :---:                         |
 | WaveUnet  | Valentini-28spk   | 0.836   | 2.78  |  shahules786/mayavoz-waveunet-valentini-28spk      |
 | Demucs    | Valentini-28spk   | 0.961   | 2.56  |  shahules786/mayavoz-demucs-valentini-28spk       |
 | DCCRN     | Valentini-28spk   | 0.724   | 2.55  |  shahules786/mayavoz-dccrn-valentini-28spk         |
 | Demucs     | MS-SNSD-20hrs  | 0.56 | 1.26  | shahules786/mayavoz-demucs-ms-snsd-20       |
 Test scores are based on respective test set associated with train dataset.
 **See [tutorials](/notebooks/) to train your custom model**
 ## Installation
 Only Python 3.8+ is officially supported (though it might work with Python 3.7)
@ -41,3 +69,10 @@ git clone url
 cd mayavoz
 pip install -e .
 ```
 ## Support
 For commercial enquiries and scientific consulting, please [contact me](https://shahules786.github.io/).
 ### Acknowledgements
 Sincere gratitude to [AMPLYFI](https://amplyfi.com/) for supporting this project.
--- a/audiolib.py
+++ b/audiolib.py
@ -1,76 +0,0 @@
 # -*- coding: utf-8 -*-
 """
 Created on Wed Jun 26 15:54:05 2019
@author: chkarada
 """
 import os
 import numpy as np
 import soundfile as sf
 # Function to read audio
 def audioread(path, norm=True, start=0, stop=None):
    path = os.path.abspath(path)
    if not os.path.exists(path):
        raise ValueError("[{}] does not exist!".format(path))
    try:
        x, sr = sf.read(path, start=start, stop=stop)
    except RuntimeError:  # fix for sph pcm-embedded shortened v2
        print("WARNING: Audio type not supported")
    if len(x.shape) == 1:  # mono
        if norm:
            rms = (x**2).mean() ** 0.5
            scalar = 10 ** (-25 / 20) / (rms)
            x = x * scalar
        return x, sr
    else:  # multi-channel
        x = x.T
        x = x.sum(axis=0) / x.shape[0]
        if norm:
            rms = (x**2).mean() ** 0.5
            scalar = 10 ** (-25 / 20) / (rms)
            x = x * scalar
        return x, sr
 # Funtion to write audio
 def audiowrite(data, fs, destpath, norm=False):
    if norm:
        eps = 0.0
        rms = (data**2).mean() ** 0.5
        scalar = 10 ** (-25 / 10) / (rms + eps)
        data = data * scalar
        if max(abs(data)) >= 1:
            data = data / max(abs(data), eps)
    destpath = os.path.abspath(destpath)
    destdir = os.path.dirname(destpath)
    if not os.path.exists(destdir):
        os.makedirs(destdir)
    sf.write(destpath, data, fs)
    return
 # Function to mix clean speech and noise at various SNR levels
 def snr_mixer(clean, noise, snr):
    # Normalizing to -25 dB FS
    rmsclean = (clean**2).mean() ** 0.5
    scalarclean = 10 ** (-25 / 20) / rmsclean
    clean = clean * scalarclean
    rmsclean = (clean**2).mean() ** 0.5
    rmsnoise = (noise**2).mean() ** 0.5
    scalarnoise = 10 ** (-25 / 20) / rmsnoise
    noise = noise * scalarnoise
    rmsnoise = (noise**2).mean() ** 0.5
    # Set the noise level for a given SNR
    noisescalar = np.sqrt(rmsclean / (10 ** (snr / 20)) / rmsnoise)
    noisenewlevel = noise * noisescalar
    noisyspeech = clean + noisenewlevel
    return clean, noisenewlevel, noisyspeech
--- a/enhancer/cli/train_config/dataset/DNS-2020.yaml
+++ b/enhancer/cli/train_config/dataset/DNS-2020.yaml
@ -1,11 +0,0 @@
 _target_: enhancer.data.dataset.EnhancerDataset
 root_dir : /Users/shahules/Myprojects/enhancer/datasets/vctk_test
 name : dns-2020
 duration : 1.0
 sampling_rate: 8000
 batch_size: 32
 files:
  train_clean : clean_test_wav
  test_clean : clean_test_wav
  train_noisy : clean_test_wav
  test_noisy : clean_test_wav
--- a/enhancer/cli/train_config/dataset/Vctk_local.yaml
+++ b/enhancer/cli/train_config/dataset/Vctk_local.yaml
@ -1,13 +0,0 @@
 _target_: enhancer.data.dataset.EnhancerDataset
 name : vctk
 root_dir : /Users/shahules/Myprojects/enhancer/datasets/vctk
 duration : 1.0
 sampling_rate: 16000
 batch_size: 64
 num_workers : 0
 files:
  train_clean : clean_testset_wav
  test_clean : clean_testset_wav
  train_noisy : noisy_testset_wav
  test_noisy : noisy_testset_wav
--- a/enhancer/cli/train_config/mlflow/experiment.yaml
+++ b/enhancer/cli/train_config/mlflow/experiment.yaml
@ -1,2 +0,0 @@
 experiment_name : shahules/enhancer
 run_name : baseline
--- a/enhancer/data/init.py
+++ b/enhancer/data/init.py
@ -1 +0,0 @@
 from enhancer.data.dataset import EnhancerDataset
--- a/enhancer/data/dataset.py
+++ b/enhancer/data/dataset.py
@ -1,263 +0,0 @@
 import math
 import multiprocessing
 import os
 from typing import Optional
 import pytorch_lightning as pl
 import torch.nn.functional as F
 from sklearn.model_selection import train_test_split
 from torch.utils.data import DataLoader, Dataset, IterableDataset
 from enhancer.data.fileprocessor import Fileprocessor
 from enhancer.utils import check_files
 from enhancer.utils.config import Files
 from enhancer.utils.io import Audio
 from enhancer.utils.random import create_unique_rng
 class TrainDataset(IterableDataset):
    def __init__(self, dataset):
        self.dataset = dataset
    def __iter__(self):
        return self.dataset.train__iter__()
    def __len__(self):
        return self.dataset.train__len__()
 class ValidDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
    def __getitem__(self, idx):
        return self.dataset.val__getitem__(idx)
    def __len__(self):
        return self.dataset.val__len__()
 class TestDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
    def __getitem__(self, idx):
        return self.dataset.test__getitem__(idx)
    def __len__(self):
        return self.dataset.test__len__()
 class TaskDataset(pl.LightningDataModule):
    def __init__(
        self,
        name: str,
        root_dir: str,
        files: Files,
        valid_size: float = 0.20,
        duration: float = 1.0,
        sampling_rate: int = 48000,
        matching_function=None,
        batch_size=32,
        num_workers: Optional[int] = None,
    ):
        super().__init__()
        self.name = name
        self.files, self.root_dir = check_files(root_dir, files)
        self.duration = duration
        self.sampling_rate = sampling_rate
        self.batch_size = batch_size
        self.matching_function = matching_function
        self._validation = []
        if num_workers is None:
            num_workers = multiprocessing.cpu_count() // 2
        self.num_workers = num_workers
        if valid_size > 0.0:
            self.valid_size = valid_size
        else:
            raise ValueError("valid_size must be greater than 0")
    def setup(self, stage: Optional[str] = None):
        """
        prepare train/validation/test data splits
        """
        if stage in ("fit", None):
            train_clean = os.path.join(self.root_dir, self.files.train_clean)
            train_noisy = os.path.join(self.root_dir, self.files.train_noisy)
            fp = Fileprocessor.from_name(
                self.name, train_clean, train_noisy, self.matching_function
            )
            train_data = fp.prepare_matching_dict()
            self.train_data, self.val_data = train_test_split(
                train_data, test_size=0.20, shuffle=True, random_state=42
            )
            self._validation = self.prepare_mapstype(self.val_data)
            test_clean = os.path.join(self.root_dir, self.files.test_clean)
            test_noisy = os.path.join(self.root_dir, self.files.test_noisy)
            fp = Fileprocessor.from_name(
                self.name, test_clean, test_noisy, self.matching_function
            )
            test_data = fp.prepare_matching_dict()
            self._test = self.prepare_mapstype(test_data)
    def prepare_mapstype(self, data):
        metadata = []
        for item in data:
            clean, noisy, total_dur = item.values()
            if total_dur < self.duration:
                continue
            num_segments = round(total_dur / self.duration)
            for index in range(num_segments):
                start_time = index * self.duration
                metadata.append(({"clean": clean, "noisy": noisy}, start_time))
        return metadata
    def train_dataloader(self):
        return DataLoader(
            TrainDataset(self),
            batch_size=self.batch_size,
            num_workers=self.num_workers,
        )
    def val_dataloader(self):
        return DataLoader(
            ValidDataset(self),
            batch_size=self.batch_size,
            num_workers=self.num_workers,
        )
    def test_dataloader(self):
        return DataLoader(
            TestDataset(self),
            batch_size=self.batch_size,
            num_workers=self.num_workers,
        )
 class EnhancerDataset(TaskDataset):
    """
    Dataset object for creating clean-noisy speech enhancement datasets
    paramters:
    name : str
        name of the dataset
    root_dir : str
        root directory of the dataset containing clean/noisy folders
    files : Files
        dataclass containing train_clean, train_noisy, test_clean, test_noisy
        folder names (refer enhancer.utils.Files dataclass)
    duration : float
        expected audio duration of single audio sample for training
    sampling_rate : int
        desired sampling rate
    batch_size : int
        batch size of each batch
    num_workers : int
        num workers to be used while training
    matching_function : str
        maching functions - (one_to_one,one_to_many). Default set to None.
        use one_to_one mapping for datasets with one noisy file for each clean file
        use one_to_many mapping for multiple noisy files for each clean file
    """
    def __init__(
        self,
        name: str,
        root_dir: str,
        files: Files,
        valid_size=0.2,
        duration=1.0,
        sampling_rate=48000,
        matching_function=None,
        batch_size=32,
        num_workers: Optional[int] = None,
    ):
        super().__init__(
            name=name,
            root_dir=root_dir,
            files=files,
            valid_size=valid_size,
            sampling_rate=sampling_rate,
            duration=duration,
            matching_function=matching_function,
            batch_size=batch_size,
            num_workers=num_workers,
        )
        self.sampling_rate = sampling_rate
        self.files = files
        self.duration = max(1.0, duration)
        self.audio = Audio(self.sampling_rate, mono=True, return_tensor=True)
    def setup(self, stage: Optional[str] = None):
        super().setup(stage=stage)
    def train__iter__(self):
        rng = create_unique_rng(self.model.current_epoch)
        while True:
            file_dict, *_ = rng.choices(
                self.train_data,
                k=1,
                weights=[file["duration"] for file in self.train_data],
            )
            file_duration = file_dict["duration"]
            start_time = round(rng.uniform(0, file_duration - self.duration), 2)
            data = self.prepare_segment(file_dict, start_time)
            yield data
    def val__getitem__(self, idx):
        return self.prepare_segment(*self._validation[idx])
    def test__getitem__(self, idx):
        return self.prepare_segment(*self._test[idx])
    def prepare_segment(self, file_dict: dict, start_time: float):
        clean_segment = self.audio(
            file_dict["clean"], offset=start_time, duration=self.duration
        )
        noisy_segment = self.audio(
            file_dict["noisy"], offset=start_time, duration=self.duration
        )
        clean_segment = F.pad(
            clean_segment,
            (
                0,
                int(
                    self.duration * self.sampling_rate - clean_segment.shape[-1]
                ),
            ),
        )
        noisy_segment = F.pad(
            noisy_segment,
            (
                0,
                int(
                    self.duration * self.sampling_rate - noisy_segment.shape[-1]
                ),
            ),
        )
        return {"clean": clean_segment, "noisy": noisy_segment}
    def train__len__(self):
        return math.ceil(
            sum([file["duration"] for file in self.train_data]) / self.duration
        )
    def val__len__(self):
        return len(self._validation)
    def test__len__(self):
        return len(self._test)
--- a/enhancer/models/init.py
+++ b/enhancer/models/init.py
@ -1,3 +0,0 @@
 from enhancer.models.demucs import Demucs
 from enhancer.models.model import Model
 from enhancer.models.waveunet import WaveUnet
--- a/enhancer/utils/init.py
+++ b/enhancer/utils/init.py
@ -1,3 +0,0 @@
 from enhancer.utils.config import Files
 from enhancer.utils.io import Audio
 from enhancer.utils.utils import check_files
--- a/environment.yml
+++ b/environment.yml
@ -1,4 +1,4 @@
-name: enhancer
+name: mayavoz
 dependencies:
  - pip=21.0.1
--- a/hpc_entrypoint.sh
+++ b/hpc_entrypoint.sh
@ -1,52 +0,0 @@
 #!/bin/bash
 set -e
 echo '----------------------------------------------------'
 echo ' SLURM_CLUSTER_NAME = '$SLURM_CLUSTER_NAME
 echo '    SLURMD_NODENAME = '$SLURMD_NODENAME
 echo '        SLURM_JOBID = '$SLURM_JOBID
 echo '     SLURM_JOB_USER = '$SLURM_JOB_USER
 echo '    SLURM_PARTITION = '$SLURM_JOB_PARTITION
 echo '  SLURM_JOB_ACCOUNT = '$SLURM_JOB_ACCOUNT
 echo '----------------------------------------------------'
 #TeamCity Output
 cat << EOF
 ##teamcity[buildNumber '$SLURM_JOBID']
 EOF
 echo "Load HPC modules"
 module load anaconda
 echo "Activate Environment"
 source activate enhancer
 export TRANSFORMERS_OFFLINE=True
 export PYTHONPATH=${PYTHONPATH}:/scratch/c.sistc3/enhancer
 export HYDRA_FULL_ERROR=1
 echo $PYTHONPATH
 source ~/mlflow_settings.sh
 echo "Making temp dir"
 mkdir temp
 pwd
 # echo "files"
 # rm -rf  /scratch/c.sistc3/MS-SNSD/DNS30/CleanSpeech_training
 # rm -rf /scratch/c.sistc3/MS-SNSD/DNS30/NoisySpeech_training
 # rm -rf /scratch/c.sistc3/MS-SNSD/DNS30/NoisySpeech_testing
 # rm -rf /scratch/c.sistc3/MS-SNSD/DNS30/CleanSpeech_testing
 # cp -r /scratch/c.sistc3/MS-SNSD/DNS30/NoisySpeech_testing /scratch/c.sistc3/MS-SNSD/DNS15/
 # cp -r /scratch/c.sistc3/MS-SNSD/DNS30/CleanSpeech_testing /scratch/c.sistc3/MS-SNSD/DNS15/
 # rm -rf /scratch/c.sistc3/MS-SNSD/DNS20
 # mkdir  /scratch/c.sistc3/MS-SNSD/DNS20
 python noisyspeech_synthesizer.py
 mv ./CleanSpeech_testing/ /scratch/c.sistc3/MS-SNSD/DNS20
 mv ./NoisySpeech_testing/ /scratch/c.sistc3/MS-SNSD/DNS20
 ls /scratch/c.sistc3/MS-SNSD/DNS20
 #python enhancer/cli/train.py
--- a/enhancer/init.py
+++ b/enhancer/init.py
@ -1 +1,2 @@
 __import__("pkg_resources").declare_namespace(__name__)
 from mayavoz.models import Mayamodel
--- a/mayavoz/cli/train.py
+++ b/mayavoz/cli/train.py
@ -0,0 +1,120 @@
 import os
 from types import MethodType
 import hydra
 from hydra.utils import instantiate
 from omegaconf import DictConfig, OmegaConf
 from pytorch_lightning.callbacks import (
    EarlyStopping,
    LearningRateMonitor,
    ModelCheckpoint,
 )
 from pytorch_lightning.loggers import MLFlowLogger
 from torch.optim.lr_scheduler import ReduceLROnPlateau
 # from torch_audiomentations import Compose, Shift
 os.environ["HYDRA_FULL_ERROR"] = "1"
 JOB_ID = os.environ.get("SLURM_JOBID", "0")
@hydra.main(config_path="train_config", config_name="config")
 def train(config: DictConfig):
    OmegaConf.save(config, "config.yaml")
    callbacks = []
    logger = MLFlowLogger(
        experiment_name=config.mlflow.experiment_name,
        run_name=config.mlflow.run_name,
        tags={"JOB_ID": JOB_ID},
    )
    parameters = config.hyperparameters
    # apply_augmentations = Compose(
    #     [
    #         Shift(min_shift=0.5, max_shift=1.0, shift_unit="seconds", p=0.5),
    #     ]
    # )
    dataset = instantiate(config.dataset, augmentations=None)
    model = instantiate(
        config.model,
        dataset=dataset,
        lr=parameters.get("lr"),
        loss=parameters.get("loss"),
        metric=parameters.get("metric"),
    )
    direction = model.valid_monitor
    checkpoint = ModelCheckpoint(
        dirpath="./model",
        filename=f"model_{JOB_ID}",
        monitor="valid_loss",
        verbose=False,
        mode=direction,
        every_n_epochs=1,
    )
    callbacks.append(checkpoint)
    callbacks.append(LearningRateMonitor(logging_interval="epoch"))
    if parameters.get("Early_stop", False):
        early_stopping = EarlyStopping(
            monitor="val_loss",
            mode=direction,
            min_delta=0.0,
            patience=parameters.get("EarlyStopping_patience", 10),
            strict=True,
            verbose=False,
        )
        callbacks.append(early_stopping)
    def configure_optimizers(self):
        optimizer = instantiate(
            config.optimizer,
            lr=parameters.get("lr"),
            params=self.parameters(),
        )
        scheduler = ReduceLROnPlateau(
            optimizer=optimizer,
            mode=direction,
            factor=parameters.get("ReduceLr_factor", 0.1),
            verbose=True,
            min_lr=parameters.get("min_lr", 1e-6),
            patience=parameters.get("ReduceLr_patience", 3),
        )
        return {
            "optimizer": optimizer,
            "lr_scheduler": scheduler,
            "monitor": f'valid_{parameters.get("ReduceLr_monitor", "loss")}',
        }
    model.configure_optimizers = MethodType(configure_optimizers, model)
    trainer = instantiate(config.trainer, logger=logger, callbacks=callbacks)
    trainer.fit(model)
    trainer.test(model)
    logger.experiment.log_artifact(
        logger.run_id, f"{trainer.default_root_dir}/config.yaml"
    )
    saved_location = os.path.join(
        trainer.default_root_dir, "model", f"model_{JOB_ID}.ckpt"
    )
    if os.path.isfile(saved_location):
        logger.experiment.log_artifact(logger.run_id, saved_location)
        logger.experiment.log_param(
            logger.run_id,
            "num_train_steps_per_epoch",
            dataset.train__len__() / dataset.batch_size,
        )
        logger.experiment.log_param(
            logger.run_id,
            "num_valid_steps_per_epoch",
            dataset.val__len__() / dataset.batch_size,
        )
 if __name__ == "__main__":
    train()
--- a/enhancer/cli/train_config/config.yaml
+++ b/enhancer/cli/train_config/config.yaml
@ -3,5 +3,5 @@ defaults:
  - dataset : Vctk
  - optimizer : Adam
  - hyperparameters : default
-  - trainer : fastrun_dev
+  - trainer : default
  - mlflow : experiment
--- a/mayavoz/cli/train_config/dataset/MS-SNSD.yaml
+++ b/mayavoz/cli/train_config/dataset/MS-SNSD.yaml
@ -0,0 +1,12 @@
 _target_: mayavoz.data.dataset.MayaDataset
 name : MS-SDSD
 root_dir : /Users/shahules/Myprojects/MS-SNSD
 duration : 2.0
 sampling_rate: 16000
 batch_size: 32
 min_valid_minutes: 15
 files:
  train_clean : CleanSpeech_training
  test_clean : CleanSpeech_training
  train_noisy : NoisySpeech_training
  test_noisy : NoisySpeech_training
--- a/mayavoz/cli/train_config/dataset/Valentini.yaml
+++ b/mayavoz/cli/train_config/dataset/Valentini.yaml
@ -0,0 +1,13 @@
 _target_: mayavoz.data.dataset.MayaDataset
 name : Valentini
 root_dir : /scratch/c.sistc3/DS_10283_2791
 duration : 4.5
 stride : 2
 sampling_rate: 16000
 batch_size: 32
 valid_minutes : 15
 files:
  train_clean : clean_trainset_28spk_wav
  test_clean : clean_testset_wav
  train_noisy : noisy_trainset_28spk_wav
  test_noisy : noisy_testset_wav
--- a/mayavoz/cli/train_config/hyperparameters/default.yaml
+++ b/mayavoz/cli/train_config/hyperparameters/default.yaml
@ -0,0 +1,7 @@
 loss : mae
 metric : [stoi,pesq,si-sdr]
 lr : 0.0003
 ReduceLr_patience : 5
 ReduceLr_factor : 0.2
 min_lr : 0.000001
 EarlyStopping_factor : 10
--- a/mayavoz/cli/train_config/mlflow/experiment.yaml
+++ b/mayavoz/cli/train_config/mlflow/experiment.yaml
@ -0,0 +1,2 @@
 experiment_name : shahules/mayavoz
 run_name : Demucs + Vtck with stride + augmentations
--- a/mayavoz/cli/train_config/model/DCCRN.yaml
+++ b/mayavoz/cli/train_config/model/DCCRN.yaml
@ -0,0 +1,25 @@
 _target_: mayavoz.models.dccrn.DCCRN
 num_channels: 1
 sampling_rate : 16000
 complex_lstm : True
 complex_norm : True
 complex_relu : True
 masking_mode : True
 encoder_decoder:
  initial_output_channels : 32
  depth : 6
  kernel_size : 5
  growth_factor : 2
  stride : 2
  padding : 2
  output_padding : 1
 lstm:
  num_layers : 2
  hidden_size : 256
 stft:
  window_len : 400
  hop_size : 100
  nfft : 512
--- a/enhancer/cli/train_config/model/Demucs.yaml
+++ b/enhancer/cli/train_config/model/Demucs.yaml
@ -1,11 +1,11 @@
-_target_: enhancer.models.demucs.Demucs
+_target_: mayavoz.models.demucs.Demucs
 num_channels: 1
-resample: 2
+resample: 4
 sampling_rate : 16000
 encoder_decoder:
-  depth: 5
+  depth: 4
-  initial_output_channels: 32
+  initial_output_channels: 64
  kernel_size: 8
  stride: 4
  growth_factor: 2
--- a/enhancer/cli/train_config/model/WaveUnet.yaml
+++ b/enhancer/cli/train_config/model/WaveUnet.yaml
@ -1,5 +1,5 @@
-_target_: enhancer.models.waveunet.WaveUnet
+_target_: mayavoz.models.waveunet.WaveUnet
 num_channels : 1
-depth : 12
+depth : 9
 initial_output_channels: 24
 sampling_rate : 16000
--- a/enhancer/cli/train_config/optimizer/Adam.yaml
+++ b/enhancer/cli/train_config/optimizer/Adam.yaml
--- a/mayavoz/cli/train_config/trainer/default.yaml
+++ b/mayavoz/cli/train_config/trainer/default.yaml
@ -0,0 +1,46 @@
 _target_: pytorch_lightning.Trainer
 accelerator: gpu
 accumulate_grad_batches: 1
 amp_backend: native
 auto_lr_find: True
 auto_scale_batch_size: False
 auto_select_gpus: True
 benchmark: False
 check_val_every_n_epoch: 1
 detect_anomaly: False
 deterministic: False
 devices: 2
 enable_checkpointing: True
 enable_model_summary: True
 enable_progress_bar: True
 fast_dev_run: False
 gpus: null
 gradient_clip_val: 0
 gradient_clip_algorithm: norm
 ipus: null
 limit_predict_batches: 1.0
 limit_test_batches: 1.0
 limit_train_batches: 1.0
 limit_val_batches: 1.0
 log_every_n_steps: 50
 max_epochs: 200
 max_steps: -1
 max_time: null
 min_epochs: 1
 min_steps: null
 move_metrics_to_cpu: False
 multiple_trainloader_mode: max_size_cycle
 num_nodes: 1
 num_processes: 1
 num_sanity_val_steps: 2
 overfit_batches: 0.0
 precision: 32
 profiler: null
 reload_dataloaders_every_n_epochs: 0
 replace_sampler_ddp: True
 strategy: ddp
 sync_batchnorm: False
 tpu_cores: null
 track_grad_norm: -1
 val_check_interval: 1.0
 weights_save_path: null
--- a/enhancer/cli/train_config/trainer/fastrun_dev.yaml
+++ b/enhancer/cli/train_config/trainer/fastrun_dev.yaml
--- a/mayavoz/data/init.py
+++ b/mayavoz/data/init.py
@ -0,0 +1 @@
 from mayavoz.data.dataset import MayaDataset
--- a/mayavoz/data/dataset.py
+++ b/mayavoz/data/dataset.py
@ -0,0 +1,393 @@
 import math
 import multiprocessing
 import os
 import sys
 import warnings
 from pathlib import Path
 from typing import Optional
 import numpy as np
 import pytorch_lightning as pl
 import torch
 import torch.nn.functional as F
 from torch.utils.data import DataLoader, Dataset, RandomSampler
 from torch_audiomentations import Compose
 from mayavoz.data.fileprocessor import Fileprocessor
 from mayavoz.utils import check_files
 from mayavoz.utils.config import Files
 from mayavoz.utils.io import Audio
 from mayavoz.utils.random import create_unique_rng
 LARGE_NUM = 2147483647
 class TrainDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
    def __getitem__(self, idx):
        return self.dataset.train__getitem__(idx)
    def __len__(self):
        return self.dataset.train__len__()
 class ValidDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
    def __getitem__(self, idx):
        return self.dataset.val__getitem__(idx)
    def __len__(self):
        return self.dataset.val__len__()
 class TestDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
    def __getitem__(self, idx):
        return self.dataset.test__getitem__(idx)
    def __len__(self):
        return self.dataset.test__len__()
 class TaskDataset(pl.LightningDataModule):
    def __init__(
        self,
        name: str,
        root_dir: str,
        files: Files,
        min_valid_minutes: float = 0.20,
        duration: float = 1.0,
        stride=None,
        sampling_rate: int = 48000,
        matching_function=None,
        batch_size=32,
        num_workers: Optional[int] = None,
        augmentations: Optional[Compose] = None,
    ):
        super().__init__()
        self.name = name
        self.files, self.root_dir = check_files(root_dir, files)
        self.duration = duration
        self.stride = stride or duration
        self.sampling_rate = sampling_rate
        self.batch_size = batch_size
        self.matching_function = matching_function
        self._validation = []
        if num_workers is None:
            num_workers = multiprocessing.cpu_count() // 2
        if num_workers is None:
            num_workers = multiprocessing.cpu_count() // 2
        if (
            num_workers > 0
            and sys.platform == "darwin"
            and sys.version_info[0] >= 3
            and sys.version_info[1] >= 8
        ):
            warnings.warn(
                "num_workers > 0 is not supported with macOS and Python 3.8+: "
                "setting num_workers = 0."
            )
            num_workers = 0
        self.num_workers = num_workers
        if min_valid_minutes > 0.0:
            self.min_valid_minutes = min_valid_minutes
        else:
            raise ValueError("min_valid_minutes must be greater than 0")
        self.augmentations = augmentations
    def setup(self, stage: Optional[str] = None):
        """
        prepare train/validation/test data splits
        """
        if stage in ("fit", None):
            train_clean = os.path.join(self.root_dir, self.files.train_clean)
            train_noisy = os.path.join(self.root_dir, self.files.train_noisy)
            fp = Fileprocessor.from_name(
                self.name, train_clean, train_noisy, self.matching_function
            )
            train_data = fp.prepare_matching_dict()
            train_data, self.val_data = self.train_valid_split(
                train_data,
                min_valid_minutes=self.min_valid_minutes,
                random_state=42,
            )
            self.train_data = self.prepare_traindata(train_data)
            self._validation = self.prepare_mapstype(self.val_data)
            test_clean = os.path.join(self.root_dir, self.files.test_clean)
            test_noisy = os.path.join(self.root_dir, self.files.test_noisy)
            fp = Fileprocessor.from_name(
                self.name, test_clean, test_noisy, self.matching_function
            )
            test_data = fp.prepare_matching_dict()
            self._test = self.prepare_mapstype(test_data)
    def train_valid_split(
        self, data, min_valid_minutes: float = 20, random_state: int = 42
    ):
        min_valid_minutes *= 60
        valid_sec_now = 0.0
        valid_indices = []
        all_speakers = np.unique(
            [Path(file["clean"]).name.split("_")[0] for file in data]
        )
        possible_indices = list(range(0, len(all_speakers)))
        rng = create_unique_rng(len(all_speakers))
        while valid_sec_now <= min_valid_minutes:
            speaker_index = rng.choice(possible_indices)
            possible_indices.remove(speaker_index)
            speaker_name = all_speakers[speaker_index]
            print(f"Selected f{speaker_name} for valid")
            file_indices = [
                i
                for i, file in enumerate(data)
                if speaker_name == Path(file["clean"]).name.split("_")[0]
            ]
            for i in file_indices:
                valid_indices.append(i)
                valid_sec_now += data[i]["duration"]
        train_data = [
            item for i, item in enumerate(data) if i not in valid_indices
        ]
        valid_data = [item for i, item in enumerate(data) if i in valid_indices]
        return train_data, valid_data
    def prepare_traindata(self, data):
        train_data = []
        for item in data:
            clean, noisy, total_dur = item.values()
            num_segments = self.get_num_segments(
                total_dur, self.duration, self.stride
            )
            samples_metadata = ({"clean": clean, "noisy": noisy}, num_segments)
            train_data.append(samples_metadata)
        return train_data
    @staticmethod
    def get_num_segments(file_duration, duration, stride):
        if file_duration < duration:
            num_segments = 1
        else:
            num_segments = math.ceil((file_duration - duration) / stride) + 1
        return num_segments
    def prepare_mapstype(self, data):
        metadata = []
        for item in data:
            clean, noisy, total_dur = item.values()
            if total_dur < self.duration:
                metadata.append(({"clean": clean, "noisy": noisy}, 0.0))
            else:
                num_segments = self.get_num_segments(
                    total_dur, self.duration, self.duration
                )
                for index in range(num_segments):
                    start_time = index * self.duration
                    metadata.append(
                        ({"clean": clean, "noisy": noisy}, start_time)
                    )
        return metadata
    def train_collatefn(self, batch):
        output = {"clean": [], "noisy": []}
        for item in batch:
            output["clean"].append(item["clean"])
            output["noisy"].append(item["noisy"])
        output["clean"] = torch.stack(output["clean"], dim=0)
        output["noisy"] = torch.stack(output["noisy"], dim=0)
        if self.augmentations is not None:
            noise = output["noisy"] - output["clean"]
            output["clean"] = self.augmentations(
                output["clean"], sample_rate=self.sampling_rate
            )
            self.augmentations.freeze_parameters()
            output["noisy"] = (
                self.augmentations(noise, sample_rate=self.sampling_rate)
                + output["clean"]
            )
        return output
    @property
    def generator(self):
        generator = torch.Generator()
        if hasattr(self, "model"):
            seed = self.model.current_epoch + LARGE_NUM
        else:
            seed = LARGE_NUM
        return generator.manual_seed(seed)
    def train_dataloader(self):
        dataset = TrainDataset(self)
        sampler = RandomSampler(dataset, generator=self.generator)
        return DataLoader(
            dataset,
            batch_size=self.batch_size,
            num_workers=self.num_workers,
            sampler=sampler,
            collate_fn=self.train_collatefn,
        )
    def val_dataloader(self):
        return DataLoader(
            ValidDataset(self),
            batch_size=self.batch_size,
            num_workers=self.num_workers,
        )
    def test_dataloader(self):
        return DataLoader(
            TestDataset(self),
            batch_size=self.batch_size,
            num_workers=self.num_workers,
        )
 class MayaDataset(TaskDataset):
    """
    Dataset object for creating clean-noisy speech enhancement datasets
    paramters:
    name : str
        name of the dataset
    root_dir : str
        root directory of the dataset containing clean/noisy folders
    files : Files
        dataclass containing train_clean, train_noisy, test_clean, test_noisy
        folder names (refer mayavoz.utils.Files dataclass)
    min_valid_minutes: float
        minimum validation split size time in minutes
        algorithm randomly select n speakers (>=min_valid_minutes) from train data to form validation data.
    duration : float
        expected audio duration of single audio sample for training
    sampling_rate : int
        desired sampling rate
    batch_size : int
        batch size of each batch
    num_workers : int
        num workers to be used while training
    matching_function : str
        maching functions - (one_to_one,one_to_many). Default set to None.
        use one_to_one mapping for datasets with one noisy file for each clean file
        use one_to_many mapping for multiple noisy files for each clean file
    """
    def __init__(
        self,
        name: str,
        root_dir: str,
        files: Files,
        min_valid_minutes=5.0,
        duration=1.0,
        stride=None,
        sampling_rate=48000,
        matching_function=None,
        batch_size=32,
        num_workers: Optional[int] = None,
        augmentations: Optional[Compose] = None,
    ):
        super().__init__(
            name=name,
            root_dir=root_dir,
            files=files,
            min_valid_minutes=min_valid_minutes,
            sampling_rate=sampling_rate,
            duration=duration,
            matching_function=matching_function,
            batch_size=batch_size,
            num_workers=num_workers,
            augmentations=augmentations,
        )
        self.sampling_rate = sampling_rate
        self.files = files
        self.duration = max(1.0, duration)
        self.audio = Audio(self.sampling_rate, mono=True, return_tensor=True)
        self.stride = stride or duration
    def setup(self, stage: Optional[str] = None):
        super().setup(stage=stage)
    def train__getitem__(self, idx):
        for filedict, num_samples in self.train_data:
            if idx >= num_samples:
                idx -= num_samples
                continue
            else:
                start = 0
                if self.duration is not None:
                    start = idx * self.stride
                return self.prepare_segment(filedict, start)
    def val__getitem__(self, idx):
        return self.prepare_segment(*self._validation[idx])
    def test__getitem__(self, idx):
        return self.prepare_segment(*self._test[idx])
    def prepare_segment(self, file_dict: dict, start_time: float):
        clean_segment = self.audio(
            file_dict["clean"], offset=start_time, duration=self.duration
        )
        noisy_segment = self.audio(
            file_dict["noisy"], offset=start_time, duration=self.duration
        )
        clean_segment = F.pad(
            clean_segment,
            (
                0,
                int(
                    self.duration * self.sampling_rate - clean_segment.shape[-1]
                ),
            ),
        )
        noisy_segment = F.pad(
            noisy_segment,
            (
                0,
                int(
                    self.duration * self.sampling_rate - noisy_segment.shape[-1]
                ),
            ),
        )
        return {
            "clean": clean_segment,
            "noisy": noisy_segment,
        }
    def train__len__(self):
        _, num_examples = list(zip(*self.train_data))
        return sum(num_examples)
    def val__len__(self):
        return len(self._validation)
    def test__len__(self):
        return len(self._test)
--- a/enhancer/data/fileprocessor.py
+++ b/enhancer/data/fileprocessor.py
@ -62,25 +62,24 @@ class ProcessorFunctions:
        ]
        for clean_file in clean_filenames:
            noisy_filenames = glob.glob(
-                os.path.join(noisy_path, f"*_{clean_file}.wav")
+                os.path.join(noisy_path, f"*_{clean_file}")
            )
            for noisy_file in noisy_filenames:
-                sr_clean, clean_file = wavfile.read(
+                sr_clean, clean_wav = wavfile.read(
                    os.path.join(clean_path, clean_file)
                )
-                sr_noisy, noisy_file = wavfile.read(noisy_file)
+                sr_noisy, noisy_wav = wavfile.read(noisy_file)
-                if (clean_file.shape[-1] == noisy_file.shape[-1]) and (
+                if (clean_wav.shape[-1] == noisy_wav.shape[-1]) and (
                    sr_clean == sr_noisy
                ):
                    matching_wavfiles.append(
                        {
                            "clean": os.path.join(clean_path, clean_file),
                            "noisy": noisy_file,
-                            "duration": clean_file.shape[-1] / sr_clean,
+                            "duration": clean_wav.shape[-1] / sr_clean,
                        }
                    )
        return matching_wavfiles
@ -94,9 +93,9 @@ class Fileprocessor:
    def from_name(cls, name: str, clean_dir, noisy_dir, matching_function=None):
        if matching_function is None:
-            if name.lower() == "vctk":
+            if name.lower() in ("vctk", "valentini"):
                return cls(clean_dir, noisy_dir, ProcessorFunctions.one_to_one)
-            elif name.lower() == "dns-2020":
+            elif name.lower() == "ms-snsd":
                return cls(clean_dir, noisy_dir, ProcessorFunctions.one_to_many)
            else:
                raise ValueError(
--- a/enhancer/inference.py
+++ b/enhancer/inference.py
@ -8,7 +8,7 @@ from librosa import load as load_audio
 from scipy.io import wavfile
 from scipy.signal import get_window
-from enhancer.utils import Audio
+from mayavoz.utils import Audio
 class Inference:
@ -95,6 +95,7 @@ class Inference:
    ):
        """
        stitch batched waveform into single waveform. (Overlap-add)
        inspired from https://github.com/asteroid-team/asteroid
        arguments:
            data: batched waveform
            window_size : window_size used to batch waveform
--- a/enhancer/loss.py
+++ b/enhancer/loss.py
@ -1,8 +1,9 @@
-import logging
+import warnings
 import numpy as np
 import torch
 import torch.nn as nn
 from torchmetrics import ScaleInvariantSignalNoiseRatio
 from torchmetrics.audio.pesq import PerceptualEvaluationSpeechQuality
 from torchmetrics.audio.stoi import ShortTimeObjectiveIntelligibility
@ -65,8 +66,8 @@ class Si_SDR:
            raise TypeError(
                "Invalid reduction, valid options are sum, mean, None"
            )
-        self.higher_better = False
+        self.higher_better = True
-        self.name = "Si-SDR"
+        self.name = "si-sdr"
    def __call__(self, prediction: torch.Tensor, target: torch.Tensor):
@ -122,18 +123,18 @@ class Pesq:
        self.sr = sr
        self.name = "pesq"
        self.mode = mode
-        self.pesq = PerceptualEvaluationSpeechQuality(fs=sr, mode=mode)
+        self.pesq = PerceptualEvaluationSpeechQuality(
            fs=self.sr, mode=self.mode
        )
    def __call__(self, prediction: torch.Tensor, target: torch.Tensor):
        pesq_values = []
        for pred, target_ in zip(prediction, target):
            try:
-                pesq_values.append(
+                pesq_values.append(self.pesq(pred.squeeze(), target_.squeeze()))
                    self.pesq(pred.squeeze(), target_.squeeze()).item()
                )
            except Exception as e:
-                logging.warning(f"{e} error occured while calculating PESQ")
+                warnings.warn(f"{e} error occured while calculating PESQ")
        return torch.tensor(np.mean(pesq_values))
@ -182,10 +183,34 @@ class LossWrapper(nn.Module):
        return loss
 class Si_snr(nn.Module):
    """
    SI-SNR
    """
    def __init__(self, **kwargs):
        super().__init__()
        self.loss_fun = ScaleInvariantSignalNoiseRatio(**kwargs)
        self.higher_better = False
        self.name = "si_snr"
    def forward(self, prediction: torch.Tensor, target: torch.Tensor):
        if prediction.size() != target.size() or target.ndim < 3:
            raise TypeError(
                f"""Inputs must be of the same shape (batch_size,channels,samples)
                    got {prediction.size()} and {target.size()} instead"""
            )
        return -1 * self.loss_fun(prediction, target)
 LOSS_MAP = {
    "mae": mean_absolute_error,
    "mse": mean_squared_error,
    "si-sdr": Si_SDR,
    "pesq": Pesq,
    "stoi": Stoi,
    "si-snr": Si_snr,
 }
--- a/mayavoz/models/init.py
+++ b/mayavoz/models/init.py
@ -0,0 +1,3 @@
 from mayavoz.models.demucs import Demucs
 from mayavoz.models.model import Mayamodel
 from mayavoz.models.waveunet import WaveUnet
--- a/mayavoz/models/complexnn/init.py
+++ b/mayavoz/models/complexnn/init.py
@ -0,0 +1,5 @@
 from mayavoz.models.complexnn.conv import ComplexConv2d  # noqa
 from mayavoz.models.complexnn.conv import ComplexConvTranspose2d  # noqa
 from mayavoz.models.complexnn.rnn import ComplexLSTM  # noqa
 from mayavoz.models.complexnn.utils import ComplexBatchNorm2D  # noqa
 from mayavoz.models.complexnn.utils import ComplexRelu  # noqa
--- a/mayavoz/models/complexnn/conv.py
+++ b/mayavoz/models/complexnn/conv.py
@ -0,0 +1,136 @@
 from typing import Tuple
 import torch
 import torch.nn.functional as F
 from torch import nn
 def init_weights(nnet):
    nn.init.xavier_normal_(nnet.weight.data)
    nn.init.constant_(nnet.bias, 0.0)
    return nnet
 class ComplexConv2d(nn.Module):
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: Tuple[int, int] = (1, 1),
        stride: Tuple[int, int] = (1, 1),
        padding: Tuple[int, int] = (0, 0),
        groups: int = 1,
        dilation: int = 1,
    ):
        """
        Complex Conv2d (non-causal)
        """
        super().__init__()
        self.in_channels = in_channels // 2
        self.out_channels = out_channels // 2
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.groups = groups
        self.dilation = dilation
        self.real_conv = nn.Conv2d(
            self.in_channels,
            self.out_channels,
            kernel_size=self.kernel_size,
            stride=self.stride,
            padding=(self.padding[0], 0),
            groups=self.groups,
            dilation=self.dilation,
        )
        self.imag_conv = nn.Conv2d(
            self.in_channels,
            self.out_channels,
            kernel_size=self.kernel_size,
            stride=self.stride,
            padding=(self.padding[0], 0),
            groups=self.groups,
            dilation=self.dilation,
        )
        self.imag_conv = init_weights(self.imag_conv)
        self.real_conv = init_weights(self.real_conv)
    def forward(self, input):
        """
        complex axis should be always 1 dim
        """
        input = F.pad(input, [self.padding[1], 0, 0, 0])
        real, imag = torch.chunk(input, 2, 1)
        real_real = self.real_conv(real)
        real_imag = self.imag_conv(real)
        imag_imag = self.imag_conv(imag)
        imag_real = self.real_conv(imag)
        real = real_real - imag_imag
        imag = real_imag - imag_real
        out = torch.cat([real, imag], 1)
        return out
 class ComplexConvTranspose2d(nn.Module):
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: Tuple[int, int] = (1, 1),
        stride: Tuple[int, int] = (1, 1),
        padding: Tuple[int, int] = (0, 0),
        output_padding: Tuple[int, int] = (0, 0),
        groups: int = 1,
    ):
        super().__init__()
        self.in_channels = in_channels // 2
        self.out_channels = out_channels // 2
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.groups = groups
        self.output_padding = output_padding
        self.real_conv = nn.ConvTranspose2d(
            self.in_channels,
            self.out_channels,
            kernel_size=self.kernel_size,
            stride=self.stride,
            padding=self.padding,
            output_padding=self.output_padding,
            groups=self.groups,
        )
        self.imag_conv = nn.ConvTranspose2d(
            self.in_channels,
            self.out_channels,
            kernel_size=self.kernel_size,
            stride=self.stride,
            padding=self.padding,
            output_padding=self.output_padding,
            groups=self.groups,
        )
        self.real_conv = init_weights(self.real_conv)
        self.imag_conv = init_weights(self.imag_conv)
    def forward(self, input):
        real, imag = torch.chunk(input, 2, 1)
        real_real = self.real_conv(real)
        real_imag = self.imag_conv(real)
        imag_imag = self.imag_conv(imag)
        imag_real = self.real_conv(imag)
        real = real_real - imag_imag
        imag = real_imag + imag_real
        out = torch.cat([real, imag], 1)
        return out
--- a/mayavoz/models/complexnn/rnn.py
+++ b/mayavoz/models/complexnn/rnn.py
@ -0,0 +1,68 @@
 from typing import List, Optional
 import torch
 from torch import nn
 class ComplexLSTM(nn.Module):
    def __init__(
        self,
        input_size: int,
        hidden_size: int,
        num_layers: int = 1,
        projection_size: Optional[int] = None,
        bidirectional: bool = False,
    ):
        super().__init__()
        self.input_size = input_size // 2
        self.hidden_size = hidden_size // 2
        self.num_layers = num_layers
        self.real_lstm = nn.LSTM(
            self.input_size,
            self.hidden_size,
            self.num_layers,
            bidirectional=bidirectional,
            batch_first=False,
        )
        self.imag_lstm = nn.LSTM(
            self.input_size,
            self.hidden_size,
            self.num_layers,
            bidirectional=bidirectional,
            batch_first=False,
        )
        bidirectional = 2 if bidirectional else 1
        if projection_size is not None:
            self.projection_size = projection_size // 2
            self.real_linear = nn.Linear(
                self.hidden_size * bidirectional, self.projection_size
            )
            self.imag_linear = nn.Linear(
                self.hidden_size * bidirectional, self.projection_size
            )
        else:
            self.projection_size = None
    def forward(self, input):
        if isinstance(input, List):
            real, imag = input
        else:
            real, imag = torch.chunk(input, 2, 1)
        real_real = self.real_lstm(real)[0]
        real_imag = self.imag_lstm(real)[0]
        imag_imag = self.imag_lstm(imag)[0]
        imag_real = self.real_lstm(imag)[0]
        real = real_real - imag_imag
        imag = imag_real + real_imag
        if self.projection_size is not None:
            real = self.real_linear(real)
            imag = self.imag_linear(imag)
        return [real, imag]
--- a/mayavoz/models/complexnn/utils.py
+++ b/mayavoz/models/complexnn/utils.py
@ -0,0 +1,199 @@
 import torch
 from torch import nn
 class ComplexBatchNorm2D(nn.Module):
    def __init__(
        self,
        num_features: int,
        eps: float = 1e-5,
        momentum: float = 0.1,
        affine: bool = True,
        track_running_stats: bool = True,
    ):
        """
        Complex batch normalization 2D
        https://arxiv.org/abs/1705.09792
        """
        super().__init__()
        self.num_features = num_features // 2
        self.affine = affine
        self.momentum = momentum
        self.track_running_stats = track_running_stats
        self.eps = eps
        if self.affine:
            self.Wrr = nn.parameter.Parameter(torch.Tensor(self.num_features))
            self.Wri = nn.parameter.Parameter(torch.Tensor(self.num_features))
            self.Wii = nn.parameter.Parameter(torch.Tensor(self.num_features))
            self.Br = nn.parameter.Parameter(torch.Tensor(self.num_features))
            self.Bi = nn.parameter.Parameter(torch.Tensor(self.num_features))
        else:
            self.register_parameter("Wrr", None)
            self.register_parameter("Wri", None)
            self.register_parameter("Wii", None)
            self.register_parameter("Br", None)
            self.register_parameter("Bi", None)
        if self.track_running_stats:
            values = torch.zeros(self.num_features)
            self.register_buffer("Mean_real", values)
            self.register_buffer("Mean_imag", values)
            self.register_buffer("Var_rr", values)
            self.register_buffer("Var_ri", values)
            self.register_buffer("Var_ii", values)
            self.register_buffer(
                "num_batches_tracked", torch.tensor(0, dtype=torch.long)
            )
        else:
            self.register_parameter("Mean_real", None)
            self.register_parameter("Mean_imag", None)
            self.register_parameter("Var_rr", None)
            self.register_parameter("Var_ri", None)
            self.register_parameter("Var_ii", None)
            self.register_parameter("num_batches_tracked", None)
        self.reset_parameters()
    def reset_parameters(self):
        if self.affine:
            self.Wrr.data.fill_(1)
            self.Wii.data.fill_(1)
            self.Wri.data.uniform_(-0.9, 0.9)
            self.Br.data.fill_(0)
            self.Bi.data.fill_(0)
        self.reset_running_stats()
    def reset_running_stats(self):
        if self.track_running_stats:
            self.Mean_real.zero_()
            self.Mean_imag.zero_()
            self.Var_rr.fill_(1)
            self.Var_ri.zero_()
            self.Var_ii.fill_(1)
            self.num_batches_tracked.zero_()
    def extra_repr(self):
        return "{num_features}, eps={eps}, momentum={momentum}, affine={affine}, track_running_stats={track_running_stats}".format(
            **self.__dict__
        )
    def forward(self, input):
        real, imag = torch.chunk(input, 2, 1)
        exp_avg_factor = 0.0
        training = self.training and self.track_running_stats
        if training:
            self.num_batches_tracked += 1
            if self.momentum is None:
                exp_avg_factor = 1 / self.num_batches_tracked
            else:
                exp_avg_factor = self.momentum
        redux = [i for i in reversed(range(real.dim())) if i != 1]
        vdim = [1] * real.dim()
        vdim[1] = real.size(1)
        if training:
            batch_mean_real, batch_mean_imag = real, imag
            for dim in redux:
                batch_mean_real = batch_mean_real.mean(dim, keepdim=True)
                batch_mean_imag = batch_mean_imag.mean(dim, keepdim=True)
            if self.track_running_stats:
                self.Mean_real.lerp_(batch_mean_real.squeeze(), exp_avg_factor)
                self.Mean_imag.lerp_(batch_mean_imag.squeeze(), exp_avg_factor)
        else:
            batch_mean_real = self.Mean_real.view(vdim)
            batch_mean_imag = self.Mean_imag.view(vdim)
        real = real - batch_mean_real
        imag = imag - batch_mean_imag
        if training:
            batch_var_rr = real * real
            batch_var_ri = real * imag
            batch_var_ii = imag * imag
            for dim in redux:
                batch_var_rr = batch_var_rr.mean(dim, keepdim=True)
                batch_var_ri = batch_var_ri.mean(dim, keepdim=True)
                batch_var_ii = batch_var_ii.mean(dim, keepdim=True)
            if self.track_running_stats:
                self.Var_rr.lerp_(batch_var_rr.squeeze(), exp_avg_factor)
                self.Var_ri.lerp_(batch_var_ri.squeeze(), exp_avg_factor)
                self.Var_ii.lerp_(batch_var_ii.squeeze(), exp_avg_factor)
        else:
            batch_var_rr = self.Var_rr.view(vdim)
            batch_var_ii = self.Var_ii.view(vdim)
            batch_var_ri = self.Var_ri.view(vdim)
        batch_var_rr += self.eps
        batch_var_ii += self.eps
        # Covariance matrics
        # | batch_var_rr    batch_var_ri |
        # | batch_var_ir    batch_var_ii |  here batch_var_ir == batch_var_ri
        # Inverse square root of cov matrix by combining below two formulas
        # https://en.wikipedia.org/wiki/Square_root_of_a_2_by_2_matrix
        # https://mathworld.wolfram.com/MatrixInverse.html
        tau = batch_var_rr + batch_var_ii
        s = batch_var_rr * batch_var_ii - batch_var_ri * batch_var_ri
        t = (tau + 2 * s).sqrt()
        rst = (s * t).reciprocal()
        Urr = (batch_var_ii + s) * rst
        Uri = -batch_var_ri * rst
        Uii = (batch_var_rr + s) * rst
        if self.affine:
            Wrr, Wri, Wii = (
                self.Wrr.view(vdim),
                self.Wri.view(vdim),
                self.Wii.view(vdim),
            )
            Zrr = (Wrr * Urr) + (Wri * Uri)
            Zri = (Wrr * Uri) + (Wri * Uii)
            Zir = (Wii * Uri) + (Wri * Urr)
            Zii = (Wri * Uri) + (Wii * Uii)
        else:
            Zrr, Zri, Zir, Zii = Urr, Uri, Uri, Uii
        yr = (Zrr * real) + (Zri * imag)
        yi = (Zir * real) + (Zii * imag)
        if self.affine:
            yr = yr + self.Br.view(vdim)
            yi = yi + self.Bi.view(vdim)
        outputs = torch.cat([yr, yi], 1)
        return outputs
 class ComplexRelu(nn.Module):
    def __init__(self):
        super().__init__()
        self.real_relu = nn.PReLU()
        self.imag_relu = nn.PReLU()
    def forward(self, input):
        real, imag = torch.chunk(input, 2, 1)
        real = self.real_relu(real)
        imag = self.imag_relu(imag)
        return torch.cat([real, imag], dim=1)
 def complex_cat(inputs, axis=1):
    real, imag = [], []
    for data in inputs:
        real_data, imag_data = torch.chunk(data, 2, axis)
        real.append(real_data)
        imag.append(imag_data)
    real = torch.cat(real, axis)
    imag = torch.cat(imag, axis)
    return torch.cat([real, imag], axis)
--- a/mayavoz/models/dccrn.py
+++ b/mayavoz/models/dccrn.py
@ -0,0 +1,338 @@
 import warnings
 from typing import Any, List, Optional, Tuple, Union
 import torch
 import torch.nn.functional as F
 from torch import nn
 from mayavoz.data import MayaDataset
 from mayavoz.models import Mayamodel
 from mayavoz.models.complexnn import (
    ComplexBatchNorm2D,
    ComplexConv2d,
    ComplexConvTranspose2d,
    ComplexLSTM,
    ComplexRelu,
 )
 from mayavoz.models.complexnn.utils import complex_cat
 from mayavoz.utils.transforms import ConviSTFT, ConvSTFT
 from mayavoz.utils.utils import merge_dict
 class DCCRN_ENCODER(nn.Module):
    def __init__(
        self,
        in_channels: int,
        out_channel: int,
        kernel_size: Tuple[int, int],
        complex_norm: bool = True,
        complex_relu: bool = True,
        stride: Tuple[int, int] = (2, 1),
        padding: Tuple[int, int] = (2, 1),
    ):
        super().__init__()
        batchnorm = ComplexBatchNorm2D if complex_norm else nn.BatchNorm2d
        activation = ComplexRelu() if complex_relu else nn.PReLU()
        self.encoder = nn.Sequential(
            ComplexConv2d(
                in_channels,
                out_channel,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding,
            ),
            batchnorm(out_channel),
            activation,
        )
    def forward(self, waveform):
        return self.encoder(waveform)
 class DCCRN_DECODER(nn.Module):
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: Tuple[int, int],
        layer: int = 0,
        complex_norm: bool = True,
        complex_relu: bool = True,
        stride: Tuple[int, int] = (2, 1),
        padding: Tuple[int, int] = (2, 0),
        output_padding: Tuple[int, int] = (1, 0),
    ):
        super().__init__()
        batchnorm = ComplexBatchNorm2D if complex_norm else nn.BatchNorm2d
        activation = ComplexRelu() if complex_relu else nn.PReLU()
        if layer != 0:
            self.decoder = nn.Sequential(
                ComplexConvTranspose2d(
                    in_channels,
                    out_channels,
                    kernel_size=kernel_size,
                    stride=stride,
                    padding=padding,
                    output_padding=output_padding,
                ),
                batchnorm(out_channels),
                activation,
            )
        else:
            self.decoder = nn.Sequential(
                ComplexConvTranspose2d(
                    in_channels,
                    out_channels,
                    kernel_size=kernel_size,
                    stride=stride,
                    padding=padding,
                    output_padding=output_padding,
                )
            )
    def forward(self, waveform):
        return self.decoder(waveform)
 class DCCRN(Mayamodel):
    STFT_DEFAULTS = {
        "window_len": 400,
        "hop_size": 100,
        "nfft": 512,
        "window": "hamming",
    }
    ED_DEFAULTS = {
        "initial_output_channels": 32,
        "depth": 6,
        "kernel_size": 5,
        "growth_factor": 2,
        "stride": 2,
        "padding": 2,
        "output_padding": 1,
    }
    LSTM_DEFAULTS = {
        "num_layers": 2,
        "hidden_size": 256,
    }
    def __init__(
        self,
        stft: Optional[dict] = None,
        encoder_decoder: Optional[dict] = None,
        lstm: Optional[dict] = None,
        complex_lstm: bool = True,
        complex_norm: bool = True,
        complex_relu: bool = True,
        masking_mode: str = "E",
        num_channels: int = 1,
        sampling_rate=16000,
        lr: float = 1e-3,
        dataset: Optional[MayaDataset] = None,
        duration: Optional[float] = None,
        loss: Union[str, List, Any] = "mse",
        metric: Union[str, List] = "mse",
    ):
        duration = (
            dataset.duration if isinstance(dataset, MayaDataset) else duration
        )
        if dataset is not None:
            if sampling_rate != dataset.sampling_rate:
                warnings.warn(
                    f"model sampling rate {sampling_rate} should match dataset sampling rate {dataset.sampling_rate}"
                )
                sampling_rate = dataset.sampling_rate
        super().__init__(
            num_channels=num_channels,
            sampling_rate=sampling_rate,
            lr=lr,
            dataset=dataset,
            duration=duration,
            loss=loss,
            metric=metric,
        )
        encoder_decoder = merge_dict(self.ED_DEFAULTS, encoder_decoder)
        lstm = merge_dict(self.LSTM_DEFAULTS, lstm)
        stft = merge_dict(self.STFT_DEFAULTS, stft)
        self.save_hyperparameters(
            "encoder_decoder",
            "lstm",
            "stft",
            "complex_lstm",
            "complex_norm",
            "masking_mode",
        )
        self.complex_lstm = complex_lstm
        self.complex_norm = complex_norm
        self.masking_mode = masking_mode
        self.stft = ConvSTFT(
            stft["window_len"], stft["hop_size"], stft["nfft"], stft["window"]
        )
        self.istft = ConviSTFT(
            stft["window_len"], stft["hop_size"], stft["nfft"], stft["window"]
        )
        self.encoder = nn.ModuleList()
        self.decoder = nn.ModuleList()
        num_channels *= 2
        hidden_size = encoder_decoder["initial_output_channels"]
        growth_factor = 2
        for layer in range(encoder_decoder["depth"]):
            encoder_ = DCCRN_ENCODER(
                num_channels,
                hidden_size,
                kernel_size=(encoder_decoder["kernel_size"], 2),
                stride=(encoder_decoder["stride"], 1),
                padding=(encoder_decoder["padding"], 1),
                complex_norm=complex_norm,
                complex_relu=complex_relu,
            )
            self.encoder.append(encoder_)
            decoder_ = DCCRN_DECODER(
                hidden_size + hidden_size,
                num_channels,
                layer=layer,
                kernel_size=(encoder_decoder["kernel_size"], 2),
                stride=(encoder_decoder["stride"], 1),
                padding=(encoder_decoder["padding"], 0),
                output_padding=(encoder_decoder["output_padding"], 0),
                complex_norm=complex_norm,
                complex_relu=complex_relu,
            )
            self.decoder.insert(0, decoder_)
            if layer < encoder_decoder["depth"] - 3:
                num_channels = hidden_size
                hidden_size *= growth_factor
            else:
                num_channels = hidden_size
        kernel_size = hidden_size / 2
        hidden_size = stft["nfft"] / 2 ** (encoder_decoder["depth"])
        if self.complex_lstm:
            lstms = []
            for layer in range(lstm["num_layers"]):
                if layer == 0:
                    input_size = int(hidden_size * kernel_size)
                else:
                    input_size = lstm["hidden_size"]
                if layer == lstm["num_layers"] - 1:
                    projection_size = int(hidden_size * kernel_size)
                else:
                    projection_size = None
                kwargs = {
                    "input_size": input_size,
                    "hidden_size": lstm["hidden_size"],
                    "num_layers": 1,
                }
                lstms.append(
                    ComplexLSTM(projection_size=projection_size, **kwargs)
                )
            self.lstm = nn.Sequential(*lstms)
        else:
            self.lstm = nn.Sequential(
                nn.LSTM(
                    input_size=hidden_size * kernel_size,
                    hidden_sizs=lstm["hidden_size"],
                    num_layers=lstm["num_layers"],
                    dropout=0.0,
                    batch_first=False,
                )[0],
                nn.Linear(lstm["hidden"], hidden_size * kernel_size),
            )
    def forward(self, waveform):
        if waveform.dim() == 2:
            waveform = waveform.unsqueeze(1)
        if waveform.size(1) != self.hparams.num_channels:
            raise ValueError(
                f"Number of input channels initialized is {self.hparams.num_channels} but got {waveform.size(1)} channels"
            )
        waveform_stft = self.stft(waveform)
        real = waveform_stft[:, : self.stft.nfft // 2 + 1]
        imag = waveform_stft[:, self.stft.nfft // 2 + 1 :]
        mag_spec = torch.sqrt(real**2 + imag**2 + 1e-9)
        phase_spec = torch.atan2(imag, real)
        complex_spec = torch.stack([mag_spec, phase_spec], 1)[:, :, 1:]
        encoder_outputs = []
        out = complex_spec
        for _, encoder in enumerate(self.encoder):
            out = encoder(out)
            encoder_outputs.append(out)
        B, C, D, T = out.size()
        out = out.permute(3, 0, 1, 2)
        if self.complex_lstm:
            lstm_real = out[:, :, : C // 2]
            lstm_imag = out[:, :, C // 2 :]
            lstm_real = lstm_real.reshape(T, B, C // 2 * D)
            lstm_imag = lstm_imag.reshape(T, B, C // 2 * D)
            lstm_real, lstm_imag = self.lstm([lstm_real, lstm_imag])
            lstm_real = lstm_real.reshape(T, B, C // 2, D)
            lstm_imag = lstm_imag.reshape(T, B, C // 2, D)
            out = torch.cat([lstm_real, lstm_imag], 2)
        else:
            out = out.reshape(T, B, C * D)
            out = self.lstm(out)
            out = out.reshape(T, B, D, C)
        out = out.permute(1, 2, 3, 0)
        for layer, decoder in enumerate(self.decoder):
            skip_connection = encoder_outputs.pop(-1)
            out = complex_cat([skip_connection, out])
            out = decoder(out)
            out = out[..., 1:]
        mask_real, mask_imag = out[:, 0], out[:, 1]
        mask_real = F.pad(mask_real, [0, 0, 1, 0])
        mask_imag = F.pad(mask_imag, [0, 0, 1, 0])
        if self.masking_mode == "E":
            mask_mag = torch.sqrt(mask_real**2 + mask_imag**2)
            real_phase = mask_real / (mask_mag + 1e-8)
            imag_phase = mask_imag / (mask_mag + 1e-8)
            mask_phase = torch.atan2(imag_phase, real_phase)
            mask_mag = torch.tanh(mask_mag)
            est_mag = mask_mag * mag_spec
            est_phase = mask_phase * phase_spec
            # cos(theta) + isin(theta)
            real = est_mag + torch.cos(est_phase)
            imag = est_mag + torch.sin(est_phase)
        if self.masking_mode == "C":
            real = real * mask_real - imag * mask_imag
            imag = real * mask_imag + imag * mask_real
        else:
            real = real * mask_real
            imag = imag * mask_imag
        spec = torch.cat([real, imag], 1)
        wav = self.istft(spec)
        wav = wav.clamp_(-1, 1)
        return wav
--- a/enhancer/models/demucs.py
+++ b/enhancer/models/demucs.py
@ -1,14 +1,14 @@
 import logging
 import math
 import warnings
 from typing import List, Optional, Union
 import torch.nn.functional as F
 from torch import nn
-from enhancer.data.dataset import EnhancerDataset
+from mayavoz.data.dataset import MayaDataset
-from enhancer.models.model import Model
+from mayavoz.models.model import Mayamodel
-from enhancer.utils.io import Audio as audio
+from mayavoz.utils.io import Audio as audio
-from enhancer.utils.utils import merge_dict
+from mayavoz.utils.utils import merge_dict
 class DemucsLSTM(nn.Module):
@ -88,7 +88,7 @@ class DemucsDecoder(nn.Module):
        return out
-class Demucs(Model):
+class Demucs(Mayamodel):
    """
    Demucs model from https://arxiv.org/pdf/1911.13254.pdf
    parameters:
@ -102,8 +102,8 @@ class Demucs(Model):
            sampling rate of input audio
        lr : float, defaults to 1e-3
            learning rate used for training
-        dataset: EnhancerDataset, optional
+        dataset: MayaDataset, optional
-            EnhancerDataset object containing train/validation data for training
+            MayaDataset object containing train/validation data for training
        duration : float, optional
            chunk duration in seconds
        loss : string or List of strings
@ -133,17 +133,20 @@ class Demucs(Model):
        num_channels: int = 1,
        resample: int = 4,
        sampling_rate=16000,
        normalize=True,
        lr: float = 1e-3,
-        dataset: Optional[EnhancerDataset] = None,
+        dataset: Optional[MayaDataset] = None,
        duration: Optional[float] = None,
        loss: Union[str, List] = "mse",
        metric: Union[str, List] = "mse",
        floor=1e-3,
    ):
        duration = (
-            dataset.duration if isinstance(dataset, EnhancerDataset) else None
+            dataset.duration if isinstance(dataset, MayaDataset) else duration
        )
        if dataset is not None:
            if sampling_rate != dataset.sampling_rate:
-                logging.warning(
+                warnings.warn(
                    f"model sampling rate {sampling_rate} should match dataset sampling rate {dataset.sampling_rate}"
                )
                sampling_rate = dataset.sampling_rate
@ -161,6 +164,8 @@ class Demucs(Model):
        lstm = merge_dict(self.LSTM_DEFAULTS, lstm)
        self.save_hyperparameters("encoder_decoder", "lstm", "resample")
        hidden = encoder_decoder["initial_output_channels"]
        self.normalize = normalize
        self.floor = floor
        self.encoder = nn.ModuleList()
        self.decoder = nn.ModuleList()
@ -200,11 +205,16 @@ class Demucs(Model):
        if waveform.dim() == 2:
            waveform = waveform.unsqueeze(1)
-        if waveform.size(1) != 1:
+        if waveform.size(1) != self.hparams.num_channels:
-            raise TypeError(
+            raise ValueError(
-                f"Demucs can only process mono channel audio, input has {waveform.size(1)} channels"
+                f"Number of input channels initialized is {self.hparams.num_channels} but got {waveform.size(1)} channels"
            )
-
+        if self.normalize:
            waveform = waveform.mean(dim=1, keepdim=True)
            std = waveform.std(dim=-1, keepdim=True)
            waveform = waveform / (self.floor + std)
        else:
            std = 1
        length = waveform.shape[-1]
        x = F.pad(waveform, (0, self.get_padding_length(length) - length))
        if self.hparams.resample > 1:
@ -237,7 +247,7 @@ class Demucs(Model):
            )
        out = x[..., :length]
-        return out
+        return std * out
    def get_padding_length(self, input_length):
--- a/enhancer/models/model.py
+++ b/enhancer/models/model.py
@ -2,7 +2,7 @@ import os
 from collections import defaultdict
 from importlib import import_module
 from pathlib import Path
-from typing import List, Optional, Text, Union
+from typing import Any, List, Optional, Text, Union
 from urllib.parse import urlparse
 import numpy as np
@ -10,19 +10,24 @@ import pytorch_lightning as pl
 import torch
 from huggingface_hub import cached_download, hf_hub_url
 from pytorch_lightning.utilities.cloud_io import load as pl_load
 from torch import nn
 from torch.optim import Adam
-from enhancer.data.dataset import EnhancerDataset
+from mayavoz.data.dataset import MayaDataset
-from enhancer.inference import Inference
+from mayavoz.inference import Inference
-from enhancer.loss import LOSS_MAP, LossWrapper
+from mayavoz.loss import LOSS_MAP, LossWrapper
-from enhancer.version import __version__
+from mayavoz.version import __version__
-CACHE_DIR = ""
+CACHE_DIR = os.getenv(
-HF_TORCH_WEIGHTS = ""
+    "ENHANCER_CACHE",
    os.path.expanduser("~/.cache/torch/mayavoz"),
 )
 HF_TORCH_WEIGHTS = "pytorch_model.ckpt"
 DEFAULT_DEVICE = "cpu"
 SAVE_NAME = "mayavoz"
-class Model(pl.LightningModule):
+class Mayamodel(pl.LightningModule):
    """
    Base class for all models
    parameters:
@ -32,11 +37,11 @@ class Model(pl.LightningModule):
            audio sampling rate
        lr: float, optional
            learning rate for model training
-        dataset: EnhancerDataset, optional
+        dataset: MayaDataset, optional
-            Enhancer dataset used for training/validation
+            mayavoz dataset used for training/validation
        duration: float, optional
            duration used for training/inference
-        loss : string or List of strings, default to "mse"
+        loss : string or List of strings or custom loss (nn.Module), default to "mse"
            loss functions to be used. Available ("mse","mae","Si-SDR")
    """
@ -46,15 +51,13 @@ class Model(pl.LightningModule):
        num_channels: int = 1,
        sampling_rate: int = 16000,
        lr: float = 1e-3,
-        dataset: Optional[EnhancerDataset] = None,
+        dataset: Optional[MayaDataset] = None,
        duration: Optional[float] = None,
        loss: Union[str, List] = "mse",
-        metric: Union[str, List] = "mse",
+        metric: Union[str, List, Any] = "mse",
    ):
        super().__init__()
-        assert (
+        assert num_channels == 1, "mayavoz only support for mono channel models"
            num_channels == 1
        ), "Enhancer only support for mono channel models"
        self.dataset = dataset
        self.save_hyperparameters(
            "num_channels", "sampling_rate", "lr", "loss", "metric", "duration"
@ -86,10 +89,11 @@ class Model(pl.LightningModule):
    @metric.setter
    def metric(self, metric):
        self._metric = []
-        if isinstance(metric, str):
+        if isinstance(metric, (str, nn.Module)):
            metric = [metric]
        for func in metric:
            if isinstance(func, str):
                if func in LOSS_MAP.keys():
                    if func in ("pesq", "stoi"):
                        self._metric.append(
@ -97,9 +101,13 @@ class Model(pl.LightningModule):
                        )
                    else:
                        self._metric.append(LOSS_MAP[func]())
                else:
-                raise ValueError(f"Invalid metrics {func}")
+                    ValueError(f"Invalid metrics {func}")
            elif isinstance(func, nn.Module):
                self._metric.append(func)
            else:
                raise ValueError("Invalid metrics")
    @property
    def dataset(self):
@ -113,22 +121,29 @@ class Model(pl.LightningModule):
        if stage == "fit":
            torch.cuda.empty_cache()
            self.dataset.setup(stage)
            self.dataset.model = self
            print(
                "Total train duration",
-                self.dataset.train_dataloader().dataset.__len__() / 60,
+                self.dataset.train_dataloader().dataset.__len__()
                * self.dataset.duration
                / 60,
                "minutes",
            )
            print(
                "Total validation duration",
-                self.dataset.val_dataloader().dataset.__len__() / 60,
+                self.dataset.val_dataloader().dataset.__len__()
                * self.dataset.duration
                / 60,
                "minutes",
            )
            print(
                "Total test duration",
-                self.dataset.test_dataloader().dataset.__len__() / 60,
+                self.dataset.test_dataloader().dataset.__len__()
                * self.dataset.duration
                / 60,
                "minutes",
            )
            self.dataset.model = self
    def train_dataloader(self):
        return self.dataset.train_dataloader()
@ -219,8 +234,8 @@ class Model(pl.LightningModule):
    def on_save_checkpoint(self, checkpoint):
-        checkpoint["enhancer"] = {
+        checkpoint[SAVE_NAME] = {
-            "version": {"enhancer": __version__, "pytorch": torch.__version__},
+            "version": {SAVE_NAME: __version__, "pytorch": torch.__version__},
            "architecture": {
                "module": self.__class__.__module__,
                "class": self.__class__.__name__,
@ -273,8 +288,8 @@ class Model(pl.LightningModule):
        Returns
        -------
-        model : Model
+        model : Mayamodel
-            Model
+            Mayamodel
        See also
        --------
@ -303,7 +318,7 @@ class Model(pl.LightningModule):
            )
            model_path_pl = cached_download(
                url=url,
-                library_name="enhancer",
+                library_name="mayavoz",
                library_version=__version__,
                cache_dir=cached_dir,
                use_auth_token=use_auth_token,
@ -313,8 +328,8 @@ class Model(pl.LightningModule):
            map_location = torch.device(DEFAULT_DEVICE)
        loaded_checkpoint = pl_load(model_path_pl, map_location)
-        module_name = loaded_checkpoint["enhancer"]["architecture"]["module"]
+        module_name = loaded_checkpoint[SAVE_NAME]["architecture"]["module"]
-        class_name = loaded_checkpoint["enhancer"]["architecture"]["class"]
+        class_name = loaded_checkpoint[SAVE_NAME]["architecture"]["class"]
        module = import_module(module_name)
        Klass = getattr(module, class_name)
--- a/enhancer/models/waveunet.py
+++ b/enhancer/models/waveunet.py
@ -1,12 +1,12 @@
-import logging
+import warnings
 from typing import List, Optional, Union
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
-from enhancer.data.dataset import EnhancerDataset
+from mayavoz.data.dataset import MayaDataset
-from enhancer.models.model import Model
+from mayavoz.models.model import Mayamodel
 class WavenetDecoder(nn.Module):
@ -66,7 +66,7 @@ class WavenetEncoder(nn.Module):
        return self.encoder(waveform)
-class WaveUnet(Model):
+class WaveUnet(Mayamodel):
    """
    Wave-U-Net model from  https://arxiv.org/pdf/1811.11307.pdf
    parameters:
@ -80,8 +80,8 @@ class WaveUnet(Model):
            sampling rate of input audio
        lr : float, defaults to 1e-3
            learning rate used for training
-        dataset: EnhancerDataset, optional
+        dataset: MayaDataset, optional
-            EnhancerDataset object containing train/validation data for training
+            MayaDataset object containing train/validation data for training
        duration : float, optional
            chunk duration in seconds
        loss : string or List of strings
@ -97,17 +97,17 @@ class WaveUnet(Model):
        initial_output_channels: int = 24,
        sampling_rate: int = 16000,
        lr: float = 1e-3,
-        dataset: Optional[EnhancerDataset] = None,
+        dataset: Optional[MayaDataset] = None,
        duration: Optional[float] = None,
        loss: Union[str, List] = "mse",
        metric: Union[str, List] = "mse",
    ):
        duration = (
-            dataset.duration if isinstance(dataset, EnhancerDataset) else None
+            dataset.duration if isinstance(dataset, MayaDataset) else duration
        )
        if dataset is not None:
            if sampling_rate != dataset.sampling_rate:
-                logging.warning(
+                warnings.warn(
                    f"model sampling rate {sampling_rate} should match dataset sampling rate {dataset.sampling_rate}"
                )
                sampling_rate = dataset.sampling_rate
--- a/mayavoz/utils/init.py
+++ b/mayavoz/utils/init.py
@ -0,0 +1,3 @@
 from mayavoz.utils.config import Files
 from mayavoz.utils.io import Audio
 from mayavoz.utils.utils import check_files
--- a/enhancer/utils/config.py
+++ b/enhancer/utils/config.py
--- a/enhancer/utils/io.py
+++ b/enhancer/utils/io.py
@ -70,7 +70,7 @@ class Audio:
        if sampling_rate:
            audio = self.__class__.resample_audio(
-                audio, self.sampling_rate, sampling_rate
+                audio, sampling_rate, self.sampling_rate
            )
        if self.return_tensor:
            return torch.tensor(audio)
--- a/enhancer/utils/random.py
+++ b/enhancer/utils/random.py
--- a/mayavoz/utils/transforms.py
+++ b/mayavoz/utils/transforms.py
@ -0,0 +1,93 @@
 from typing import Optional
 import numpy as np
 import torch
 import torch.nn.functional as F
 from scipy.signal import get_window
 from torch import nn
 class ConvFFT(nn.Module):
    def __init__(
        self,
        window_len: int,
        nfft: Optional[int] = None,
        window: str = "hamming",
    ):
        super().__init__()
        self.window_len = window_len
        self.nfft = nfft if nfft else np.int(2 ** np.ceil(np.log2(window_len)))
        self.window = torch.from_numpy(
            get_window(window, window_len, fftbins=True).astype("float32")
        )
    def init_kernel(self, inverse=False):
        fourier_basis = np.fft.rfft(np.eye(self.nfft))[: self.window_len]
        real, imag = np.real(fourier_basis), np.imag(fourier_basis)
        kernel = np.concatenate([real, imag], 1).T
        if inverse:
            kernel = np.linalg.pinv(kernel).T
        kernel = torch.from_numpy(kernel.astype("float32")).unsqueeze(1)
        kernel *= self.window
        return kernel
 class ConvSTFT(ConvFFT):
    def __init__(
        self,
        window_len: int,
        hop_size: Optional[int] = None,
        nfft: Optional[int] = None,
        window: str = "hamming",
    ):
        super().__init__(window_len=window_len, nfft=nfft, window=window)
        self.hop_size = hop_size if hop_size else window_len // 2
        self.register_buffer("weight", self.init_kernel())
    def forward(self, input):
        if input.dim() < 2:
            raise ValueError(
                f"Expected signal with shape 2 or 3 got {input.dim()}"
            )
        elif input.dim() == 2:
            input = input.unsqueeze(1)
        else:
            pass
        input = F.pad(
            input,
            (self.window_len - self.hop_size, self.window_len - self.hop_size),
        )
        output = F.conv1d(input, self.weight, stride=self.hop_size)
        return output
 class ConviSTFT(ConvFFT):
    def __init__(
        self,
        window_len: int,
        hop_size: Optional[int] = None,
        nfft: Optional[int] = None,
        window: str = "hamming",
    ):
        super().__init__(window_len=window_len, nfft=nfft, window=window)
        self.hop_size = hop_size if hop_size else window_len // 2
        self.register_buffer("weight", self.init_kernel(True))
        self.register_buffer("enframe", torch.eye(window_len).unsqueeze(1))
    def forward(self, input, phase=None):
        if phase is not None:
            real = input * torch.cos(phase)
            imag = input * torch.sin(phase)
            input = torch.cat([real, imag], 1)
        out = F.conv_transpose1d(input, self.weight, stride=self.hop_size)
        coeff = self.window.unsqueeze(1).repeat(1, 1, input.size(-1)) ** 2
        coeff = coeff.to(input.device)
        coeff = F.conv_transpose1d(coeff, self.enframe, stride=self.hop_size)
        out = out / (coeff + 1e-8)
        pad = self.window_len - self.hop_size
        out = out[..., pad:-pad]
        return out
--- a/enhancer/utils/utils.py
+++ b/enhancer/utils/utils.py
@ -1,7 +1,7 @@
 import os
 from typing import Optional
-from enhancer.utils.config import Files
+from mayavoz.utils.config import Files
 def check_files(root_dir: str, files: Files):
--- a/enhancer/version.py
+++ b/enhancer/version.py
--- a/noisyspeech_synthesizer.cfg
+++ b/noisyspeech_synthesizer.cfg
@ -1,30 +0,0 @@
 # Configuration for generating Noisy Speech Dataset
 # - sampling_rate: Specify the sampling rate. Default is 16 kHz
 # - audioformat: default is .wav
 # - audio_length: Minimum Length of each audio clip (noisy and clean speech) in seconds that will be generated by augmenting utterances.
 # - silence_length: Duration of silence introduced between clean speech utterances.
 # - total_hours: Total number of hours of data required. Units are in hours.
 # - snr_lower: Lower bound for SNR required (default: 0 dB)
 # - snr_upper: Upper bound for SNR required (default: 40 dB)
 # - total_snrlevels: Number of SNR levels required (default: 5, which means there are 5 levels between snr_lower and snr_upper)
 # - noise_dir: Default is None. But specify the noise directory path if noise files are not in the source directory
 # - Speech_dir: Default is None. But specify the speech directory path if speech files are not in the source directory
 # - noise_types_excluded: Noise files starting with the following tags to be excluded in the noise list. Example: noise_types_excluded: Babble, AirConditioner
 #                         Specify 'None' if no noise files to be excluded.
 [noisy_speech]
 sampling_rate: 16000
 audioformat: *.wav
 audio_length: 10
 silence_length: 0.2
 total_hours: 1
 snr_lower: 0
 snr_upper: 40
 total_snrlevels: 2
 naming: test
 noise_dir: /scratch/c.sistc3/MS-SNSD/noise_test
 speech_dir: /scratch/c.sistc3/MS-SNSD/clean_test
 noise_types_excluded: None
--- a/noisyspeech_synthesizer.py
+++ b/noisyspeech_synthesizer.py
@ -1,155 +0,0 @@
 """
@author: chkarada
 """
 import argparse
 import configparser as CP
 import glob
 import os
 import numpy as np
 from audiolib import audioread, audiowrite, snr_mixer
 def main(cfg):
    snr_lower = float(cfg["snr_lower"])
    snr_upper = float(cfg["snr_upper"])
    total_snrlevels = int(cfg["total_snrlevels"])
    clean_dir = os.path.join(os.path.dirname(__file__), "clean_train")
    if cfg["speech_dir"] != "None":
        clean_dir = cfg["speech_dir"]
    if not os.path.exists(clean_dir):
        assert False, "Clean speech data is required"
    noise_dir = os.path.join(os.path.dirname(__file__), "noise_train")
    if cfg["noise_dir"] != "None":
        noise_dir = cfg["noise_dir"]
    if not os.path.exists(noise_dir):
        assert False, "Noise data is required"
    name = cfg["naming"]
    fs = float(cfg["sampling_rate"])
    audioformat = cfg["audioformat"]
    total_hours = float(cfg["total_hours"])
    audio_length = float(cfg["audio_length"])
    silence_length = float(cfg["silence_length"])
    noisyspeech_dir = os.path.join(
        os.path.dirname(__file__), f"NoisySpeech_{name}ing"
    )
    if not os.path.exists(noisyspeech_dir):
        os.makedirs(noisyspeech_dir)
    clean_proc_dir = os.path.join(
        os.path.dirname(__file__), f"CleanSpeech_{name}ing"
    )
    if not os.path.exists(clean_proc_dir):
        os.makedirs(clean_proc_dir)
    noise_proc_dir = os.path.join(
        os.path.dirname(__file__), f"NoisySpeech_{name}ing"
    )
    if not os.path.exists(noise_proc_dir):
        os.makedirs(noise_proc_dir)
    total_secs = total_hours * 60 * 60
    total_samples = int(total_secs * fs)
    audio_length = int(audio_length * fs)
    SNR = np.linspace(snr_lower, snr_upper, total_snrlevels)
    cleanfilenames = glob.glob(os.path.join(clean_dir, audioformat))
    if cfg["noise_types_excluded"] == "None":
        noisefilenames = glob.glob(os.path.join(noise_dir, audioformat))
    else:
        filestoexclude = cfg["noise_types_excluded"].split(",")
        noisefilenames = glob.glob(os.path.join(noise_dir, audioformat))
        for i in range(len(filestoexclude)):
            noisefilenames = [
                fn
                for fn in noisefilenames
                if not os.path.basename(fn).startswith(filestoexclude[i])
            ]
    filecounter = 0
    num_samples = 0
    while num_samples < total_samples:
        idx_s = np.random.randint(0, np.size(cleanfilenames))
        clean, fs = audioread(cleanfilenames[idx_s])
        if len(clean) > audio_length:
            clean = clean
        else:
            while len(clean) <= audio_length:
                idx_s = idx_s + 1
                if idx_s >= np.size(cleanfilenames) - 1:
                    idx_s = np.random.randint(0, np.size(cleanfilenames))
                newclean, fs = audioread(cleanfilenames[idx_s])
                cleanconcat = np.append(
                    clean, np.zeros(int(fs * silence_length))
                )
                clean = np.append(cleanconcat, newclean)
        idx_n = np.random.randint(0, np.size(noisefilenames))
        noise, fs = audioread(noisefilenames[idx_n])
        if len(noise) >= len(clean):
            noise = noise[0 : len(clean)]
        else:
            while len(noise) <= len(clean):
                idx_n = idx_n + 1
                if idx_n >= np.size(noisefilenames) - 1:
                    idx_n = np.random.randint(0, np.size(noisefilenames))
                newnoise, fs = audioread(noisefilenames[idx_n])
                noiseconcat = np.append(
                    noise, np.zeros(int(fs * silence_length))
                )
                noise = np.append(noiseconcat, newnoise)
        noise = noise[0 : len(clean)]
        filecounter = filecounter + 1
        for i in range(np.size(SNR)):
            clean_snr, noise_snr, noisy_snr = snr_mixer(
                clean=clean, noise=noise, snr=SNR[i]
            )
            noisyfilename = (
                "noisy"
                + str(filecounter)
                + "_SNRdb_"
                + str(SNR[i])
                + "_clnsp"
                + str(filecounter)
                + ".wav"
            )
            cleanfilename = "clnsp" + str(filecounter) + ".wav"
            noisefilename = (
                "noisy" + str(filecounter) + "_SNRdb_" + str(SNR[i]) + ".wav"
            )
            noisypath = os.path.join(noisyspeech_dir, noisyfilename)
            cleanpath = os.path.join(clean_proc_dir, cleanfilename)
            noisepath = os.path.join(noise_proc_dir, noisefilename)
            audiowrite(noisy_snr, fs, noisypath, norm=False)
            audiowrite(clean_snr, fs, cleanpath, norm=False)
            audiowrite(noise_snr, fs, noisepath, norm=False)
            num_samples = num_samples + len(noisy_snr)
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    # Configurations: read noisyspeech_synthesizer.cfg
    parser.add_argument(
        "--cfg",
        default="noisyspeech_synthesizer.cfg",
        help="Read noisyspeech_synthesizer.cfg for all the details",
    )
    parser.add_argument("--cfg_str", type=str, default="noisy_speech")
    args = parser.parse_args()
    cfgpath = os.path.join(os.path.dirname(__file__), args.cfg)
    assert os.path.exists(cfgpath), f"No configuration file as [{cfgpath}]"
    cfg = CP.ConfigParser()
    cfg._interpolation = CP.ExtendedInterpolation()
    cfg.read(cfgpath)
    main(cfg._sections[args.cfg_str])
--- a/notebooks/Custom_model_training.ipynb
+++ b/notebooks/Custom_model_training.ipynb
@ -0,0 +1,338 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ccd61d5c",
   "metadata": {},
   "source": [
    "## Custom model training using mayavoz [advanced]\n",
    "\n",
    "In this tutorial, we will cover advanced usages and customizations for training your own speecg enhancement model. \n",
    "\n",
    " - [Data preparation using MayaDataset](#dataprep)\n",
    " - [Model customization](#modelcustom)\n",
    " - [callbacks & LR schedulers](#callbacks)\n",
    " - [Model training & testing](#train)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "726c320f",
   "metadata": {},
   "source": [
    "- **install mayavoz**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c987c799",
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -q mayavoz"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ff9857b",
   "metadata": {},
   "source": [
    "<div id=\"dataprep\"></div>\n",
    "\n",
    "### Data preparation\n",
    "\n",
    "`Files` is a dataclass that wraps and holds train/test paths togethor. There are usually one folder each for clean and noisy data. These paths must be relative to a `root_dir` where all these directories reside. For example\n",
    "\n",
    "```\n",
    "- VCTK/\n",
    "    |__ clean_train_wav/\n",
    "    |__ noisy_train_wav/\n",
    "    |__ clean_test_wav/\n",
    "    |__ noisy_test_wav/\n",
    "    \n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "64cbc0c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from mayavoz.utils import Files\n",
    "file = Files(train_clean=\"clean_train_wav\",\n",
    "            train_noisy=\"noisy_train_wav\",\n",
    "            test_clean=\"clean_test_wav\",\n",
    "            test_noisy=\"noisy_test_wav\")\n",
    "root_dir = \"VCTK\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d324bd1",
   "metadata": {},
   "source": [
    "- `name`: name of the dataset. \n",
    "- `duration`: control the duration of each audio instance fed into your model.\n",
    "- `stride` is used if set to move the sliding window.\n",
    "- `sampling_rate`: desired sampling rate for audio\n",
    "- `batch_size`: model batch size\n",
    "- `min_valid_minutes`: minimum validation in minutes. Validation is automatically selected from training set. (exclusive users).\n",
    "- `matching_function`: there are two types of mapping functions.\n",
    "    - `one_to_one` : In this one clean file will only have one corresponding noisy file. For example Valentini datasets\n",
    "    - `one_to_many` : In this one clean file will only have one corresponding noisy file. For example MS-SNSD dataset.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6834941d",
   "metadata": {},
   "outputs": [],
   "source": [
    "name = \"vctk\"\n",
    "duration : 4.5\n",
    "stride : 2.0\n",
    "sampling_rate : 16000\n",
    "min_valid_minutes : 20.0\n",
    "batch_size : 32\n",
    "matching_function : \"one_to_one\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d08c6bf8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from mayavoz.dataset import MayaDataset\n",
    "dataset = MayaDataset(\n",
    "            name=name,\n",
    "            root_dir=root_dir,\n",
    "            files=files,\n",
    "            duration=duration,\n",
    "            stride=stride,\n",
    "            sampling_rate=sampling_rate,\n",
    "            batch_size=batch_size,\n",
    "            min_valid_minutes=min_valid_minutes,\n",
    "            matching_function=matching_function\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b315bde",
   "metadata": {},
   "source": [
    "Now your custom dataloader is ready!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01548fe5",
   "metadata": {},
   "source": [
    "<div id=\"modelcustom\"></div>\n",
    "\n",
    "### Model Customization\n",
    "Now, this is very easy. \n",
    "\n",
    "- Import the preferred model from `mayavoz.models`. Currently 3 models are implemented.\n",
    "   - `WaveUnet`\n",
    "   - `Demucs`\n",
    "   - `DCCRN`\n",
    "- Each of model hyperparameters such as depth,kernel_size,stride etc can be controlled by you.   Just check the parameters and pass it to as required.\n",
    "- `sampling_rate`: sampling rate (should be equal to dataset sampling rate)\n",
    "- `dataset`: mayavoz dataset object as prepared earlier.\n",
    "- `loss` : model loss. Multiple loss functions are available.\n",
    "\n",
    "        \n",
    "        \n",
    "you can pass one (as string)/more (as list of strings) of these loss functions as per your requirements. For example, model will automatically calculate loss as average of `mae` and `mse` if you pass loss as `[\"mae\",\"mse\"]`. Available loss functions are `mse`,`mae`,`si-snr`.\n",
    "\n",
    "mayavoz can accept **custom loss functions**. It should be of the form.\n",
    "```\n",
    "class your_custom_loss(nn.Module):\n",
    "    def __init__(self,**kwargs):\n",
    "        self.higher_better = False  ## loss minimization direction\n",
    "        self.name = \"your_loss_name\" ## loss name logging \n",
    "        ...\n",
    "    def forward(self,prediction, target):\n",
    "        loss = ....\n",
    "        return loss\n",
    "        \n",
    "```\n",
    "\n",
    "- metrics : validation metrics. Available options `mae`,`mse`,`si-sdr`,`si-sdr`,`pesq`,`stoi`. One or more can be used.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b36b457c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from mayavoz.models import Demucs\n",
    "model = Demucs(\n",
    "        sampling_rate=16000,\n",
    "        dataset=dataset,\n",
    "        loss=[\"mae\"],\n",
    "        metrics=[\"stoi\",\"pesq\"])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1523d638",
   "metadata": {},
   "source": [
    "<div id=\"callbacks\"></div>\n",
    "\n",
    "### learning rate schedulers and callbacks\n",
    "Here I am using `ReduceLROnPlateau`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8de6931c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.optim.lr_scheduler import ReduceLROnPlateau\n",
    "\n",
    "def configure_optimizers(self):\n",
    "        optimizer = instantiate(\n",
    "            config.optimizer,\n",
    "            lr=parameters.get(\"lr\"),\n",
    "            params=self.parameters(),\n",
    "        )\n",
    "        scheduler = ReduceLROnPlateau(\n",
    "            optimizer=optimizer,\n",
    "            mode=direction,\n",
    "            factor=parameters.get(\"ReduceLr_factor\", 0.1),\n",
    "            verbose=True,\n",
    "            min_lr=parameters.get(\"min_lr\", 1e-6),\n",
    "            patience=parameters.get(\"ReduceLr_patience\", 3),\n",
    "        )\n",
    "        return {\n",
    "            \"optimizer\": optimizer,\n",
    "            \"lr_scheduler\": scheduler,\n",
    "            \"monitor\": f'valid_{parameters.get(\"ReduceLr_monitor\", \"loss\")}',\n",
    "        }\n",
    "\n",
    "\n",
    "model.configure_optimizers = MethodType(configure_optimizers, model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f7b5af5",
   "metadata": {},
   "source": [
    "you can use any number of callbacks and pass it directly to pytorch lightning trainer. Here I am using only `ModelCheckpoint`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f6b62a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "callbacks = []\n",
    "direction = model.valid_monitor ## min or max \n",
    "checkpoint = ModelCheckpoint(\n",
    "        dirpath=\"./model\",\n",
    "        filename=f\"model_filename\",\n",
    "        monitor=\"valid_loss\",\n",
    "        verbose=False,\n",
    "        mode=direction,\n",
    "        every_n_epochs=1,\n",
    "    )\n",
    "callbacks.append(checkpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3534445",
   "metadata": {},
   "source": [
    "<div id=\"train\"></div>\n",
    "\n",
    "\n",
    "### Train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3dc0348b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pytorch_lightning as pl\n",
    "trainer = plt.Trainer(max_epochs=1,callbacks=callbacks,accelerator=\"gpu\")\n",
    "trainer.fit(model)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56dcfec1",
   "metadata": {},
   "source": [
    "- Test your model agaist test dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63851feb",
   "metadata": {},
   "outputs": [],
   "source": [
    "trainer.test(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d3f5350",
   "metadata": {},
   "source": [
    "**Hurray! you have your speech enhancement model trained and tested.**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10d630e8",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "enhancer",
   "language": "python",
   "name": "enhancer"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/notebooks/Getting_started.ipynb
+++ b/notebooks/Getting_started.ipynb
--- a/pyproject.toml
+++ b/pyproject.toml
@ -2,6 +2,7 @@
 line-length = 80
 target-version = ['py38']
 exclude = '''
 (
  /(
      \.eggs         # exclude a few common directories in the
@ -9,9 +10,6 @@ exclude = '''
    | \.mypy_cache
    | \.tox
    | \.venv
    | noisyspeech_synthesizer.py
    | noisyspeech_synthesizer.cfg
  )/
 )
 '''
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train.py
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train.py
@ -0,0 +1,120 @@
 import os
 from types import MethodType
 import hydra
 from hydra.utils import instantiate
 from omegaconf import DictConfig, OmegaConf
 from pytorch_lightning.callbacks import (
    EarlyStopping,
    LearningRateMonitor,
    ModelCheckpoint,
 )
 from pytorch_lightning.loggers import MLFlowLogger
 from torch.optim.lr_scheduler import ReduceLROnPlateau
 # from torch_audiomentations import Compose, Shift
 os.environ["HYDRA_FULL_ERROR"] = "1"
 JOB_ID = os.environ.get("SLURM_JOBID", "0")
@hydra.main(config_path="train_config", config_name="config")
 def train(config: DictConfig):
    OmegaConf.save(config, "config.yaml")
    callbacks = []
    logger = MLFlowLogger(
        experiment_name=config.mlflow.experiment_name,
        run_name=config.mlflow.run_name,
        tags={"JOB_ID": JOB_ID},
    )
    parameters = config.hyperparameters
    # apply_augmentations = Compose(
    #     [
    #         Shift(min_shift=0.5, max_shift=1.0, shift_unit="seconds", p=0.5),
    #     ]
    # )
    dataset = instantiate(config.dataset, augmentations=None)
    model = instantiate(
        config.model,
        dataset=dataset,
        lr=parameters.get("lr"),
        loss=parameters.get("loss"),
        metric=parameters.get("metric"),
    )
    direction = model.valid_monitor
    checkpoint = ModelCheckpoint(
        dirpath="./model",
        filename=f"model_{JOB_ID}",
        monitor="valid_loss",
        verbose=False,
        mode=direction,
        every_n_epochs=1,
    )
    callbacks.append(checkpoint)
    callbacks.append(LearningRateMonitor(logging_interval="epoch"))
    if parameters.get("Early_stop", False):
        early_stopping = EarlyStopping(
            monitor="val_loss",
            mode=direction,
            min_delta=0.0,
            patience=parameters.get("EarlyStopping_patience", 10),
            strict=True,
            verbose=False,
        )
        callbacks.append(early_stopping)
    def configure_optimizers(self):
        optimizer = instantiate(
            config.optimizer,
            lr=parameters.get("lr"),
            params=self.parameters(),
        )
        scheduler = ReduceLROnPlateau(
            optimizer=optimizer,
            mode=direction,
            factor=parameters.get("ReduceLr_factor", 0.1),
            verbose=True,
            min_lr=parameters.get("min_lr", 1e-6),
            patience=parameters.get("ReduceLr_patience", 3),
        )
        return {
            "optimizer": optimizer,
            "lr_scheduler": scheduler,
            "monitor": f'valid_{parameters.get("ReduceLr_monitor", "loss")}',
        }
    model.configure_optimizers = MethodType(configure_optimizers, model)
    trainer = instantiate(config.trainer, logger=logger, callbacks=callbacks)
    trainer.fit(model)
    trainer.test(model)
    logger.experiment.log_artifact(
        logger.run_id, f"{trainer.default_root_dir}/config.yaml"
    )
    saved_location = os.path.join(
        trainer.default_root_dir, "model", f"model_{JOB_ID}.ckpt"
    )
    if os.path.isfile(saved_location):
        logger.experiment.log_artifact(logger.run_id, saved_location)
        logger.experiment.log_param(
            logger.run_id,
            "num_train_steps_per_epoch",
            dataset.train__len__() / dataset.batch_size,
        )
        logger.experiment.log_param(
            logger.run_id,
            "num_valid_steps_per_epoch",
            dataset.val__len__() / dataset.batch_size,
        )
 if __name__ == "__main__":
    train()
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/config.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/config.yaml
@ -0,0 +1,7 @@
 defaults:
  - model : Demucs
  - dataset : MS-SNSD
  - optimizer : Adam
  - hyperparameters : default
  - trainer : default
  - mlflow : experiment
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/dataset/MS-SNSD.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/dataset/MS-SNSD.yaml
@ -0,0 +1,13 @@
 _target_: mayavoz.data.dataset.MayaDataset
 name : MS-SDSD
 root_dir : /Users/shahules/Myprojects/MS-SNSD
 duration : 1.5
 stride : 1
 sampling_rate: 16000
 batch_size: 32
 min_valid_minutes: 25
 files:
  train_clean : CleanSpeech_training
  test_clean : CleanSpeech_training
  train_noisy : NoisySpeech_training
  test_noisy : NoisySpeech_training
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/hyperparameters/default.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/hyperparameters/default.yaml
@ -1,7 +1,7 @@
-loss : mse
+loss : si-snr
-metric : [stoi,pesq,si-sdr]
+metric : [stoi,pesq]
 lr : 0.001
 ReduceLr_patience : 10
 ReduceLr_factor : 0.5
-min_lr : 0.00
+min_lr : 0.000001
 EarlyStopping_factor : 10
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/mlflow/experiment.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/mlflow/experiment.yaml
@ -0,0 +1,2 @@
 experiment_name : shahules/mayavoz
 run_name : Demucs + Vtck with stride + augmentations
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/model/DCCRN.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/model/DCCRN.yaml
@ -0,0 +1,25 @@
 _target_: mayavoz.models.dccrn.DCCRN
 num_channels: 1
 sampling_rate : 16000
 complex_lstm : True
 complex_norm : True
 complex_relu : True
 masking_mode : True
 encoder_decoder:
  initial_output_channels : 32
  depth : 6
  kernel_size : 5
  growth_factor : 2
  stride : 2
  padding : 2
  output_padding : 1
 lstm:
  num_layers : 2
  hidden_size : 256
 stft:
  window_len : 400
  hop_size : 100
  nfft : 512
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/optimizer/Adam.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/optimizer/Adam.yaml
@ -0,0 +1,6 @@
 _target_: torch.optim.Adam
 lr: 1e-3
 betas: [0.9, 0.999]
 eps: 1e-08
 weight_decay: 0
 amsgrad: False
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/trainer/default.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/trainer/default.yaml
@ -0,0 +1,46 @@
 _target_: pytorch_lightning.Trainer
 accelerator: gpu
 accumulate_grad_batches: 1
 amp_backend: native
 auto_lr_find: True
 auto_scale_batch_size: False
 auto_select_gpus: True
 benchmark: False
 check_val_every_n_epoch: 1
 detect_anomaly: False
 deterministic: False
 devices: 2
 enable_checkpointing: True
 enable_model_summary: True
 enable_progress_bar: True
 fast_dev_run: False
 gpus: null
 gradient_clip_val: 0
 gradient_clip_algorithm: norm
 ipus: null
 limit_predict_batches: 1.0
 limit_test_batches: 1.0
 limit_train_batches: 1.0
 limit_val_batches: 1.0
 log_every_n_steps: 50
 max_epochs: 200
 max_steps: -1
 max_time: null
 min_epochs: 1
 min_steps: null
 move_metrics_to_cpu: False
 multiple_trainloader_mode: max_size_cycle
 num_nodes: 1
 num_processes: 1
 num_sanity_val_steps: 2
 overfit_batches: 0.0
 precision: 32
 profiler: null
 reload_dataloaders_every_n_epochs: 0
 replace_sampler_ddp: True
 strategy: ddp
 sync_batchnorm: False
 tpu_cores: null
 track_grad_norm: -1
 val_check_interval: 1.0
 weights_save_path: null
--- a/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/trainer/fastrun_dev.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/DCCRN/train_config/trainer/fastrun_dev.yaml
@ -0,0 +1,2 @@
 _target_: pytorch_lightning.Trainer
 fast_dev_run: True
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train.py
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train.py
@ -0,0 +1,120 @@
 import os
 from types import MethodType
 import hydra
 from hydra.utils import instantiate
 from omegaconf import DictConfig, OmegaConf
 from pytorch_lightning.callbacks import (
    EarlyStopping,
    LearningRateMonitor,
    ModelCheckpoint,
 )
 from pytorch_lightning.loggers import MLFlowLogger
 from torch.optim.lr_scheduler import ReduceLROnPlateau
 # from torch_audiomentations import Compose, Shift
 os.environ["HYDRA_FULL_ERROR"] = "1"
 JOB_ID = os.environ.get("SLURM_JOBID", "0")
@hydra.main(config_path="train_config", config_name="config")
 def train(config: DictConfig):
    OmegaConf.save(config, "config.yaml")
    callbacks = []
    logger = MLFlowLogger(
        experiment_name=config.mlflow.experiment_name,
        run_name=config.mlflow.run_name,
        tags={"JOB_ID": JOB_ID},
    )
    parameters = config.hyperparameters
    # apply_augmentations = Compose(
    #     [
    #         Shift(min_shift=0.5, max_shift=1.0, shift_unit="seconds", p=0.5),
    #     ]
    # )
    dataset = instantiate(config.dataset, augmentations=None)
    model = instantiate(
        config.model,
        dataset=dataset,
        lr=parameters.get("lr"),
        loss=parameters.get("loss"),
        metric=parameters.get("metric"),
    )
    direction = model.valid_monitor
    checkpoint = ModelCheckpoint(
        dirpath="./model",
        filename=f"model_{JOB_ID}",
        monitor="valid_loss",
        verbose=False,
        mode=direction,
        every_n_epochs=1,
    )
    callbacks.append(checkpoint)
    callbacks.append(LearningRateMonitor(logging_interval="epoch"))
    if parameters.get("Early_stop", False):
        early_stopping = EarlyStopping(
            monitor="val_loss",
            mode=direction,
            min_delta=0.0,
            patience=parameters.get("EarlyStopping_patience", 10),
            strict=True,
            verbose=False,
        )
        callbacks.append(early_stopping)
    def configure_optimizers(self):
        optimizer = instantiate(
            config.optimizer,
            lr=parameters.get("lr"),
            params=self.parameters(),
        )
        scheduler = ReduceLROnPlateau(
            optimizer=optimizer,
            mode=direction,
            factor=parameters.get("ReduceLr_factor", 0.1),
            verbose=True,
            min_lr=parameters.get("min_lr", 1e-6),
            patience=parameters.get("ReduceLr_patience", 3),
        )
        return {
            "optimizer": optimizer,
            "lr_scheduler": scheduler,
            "monitor": f'valid_{parameters.get("ReduceLr_monitor", "loss")}',
        }
    model.configure_optimizers = MethodType(configure_optimizers, model)
    trainer = instantiate(config.trainer, logger=logger, callbacks=callbacks)
    trainer.fit(model)
    trainer.test(model)
    logger.experiment.log_artifact(
        logger.run_id, f"{trainer.default_root_dir}/config.yaml"
    )
    saved_location = os.path.join(
        trainer.default_root_dir, "model", f"model_{JOB_ID}.ckpt"
    )
    if os.path.isfile(saved_location):
        logger.experiment.log_artifact(logger.run_id, saved_location)
        logger.experiment.log_param(
            logger.run_id,
            "num_train_steps_per_epoch",
            dataset.train__len__() / dataset.batch_size,
        )
        logger.experiment.log_param(
            logger.run_id,
            "num_valid_steps_per_epoch",
            dataset.val__len__() / dataset.batch_size,
        )
 if __name__ == "__main__":
    train()
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/config.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/config.yaml
@ -0,0 +1,7 @@
 defaults:
  - model : Demucs
  - dataset : MS-SNSD
  - optimizer : Adam
  - hyperparameters : default
  - trainer : default
  - mlflow : experiment
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/dataset/MS-SNSD.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/dataset/MS-SNSD.yaml
@ -0,0 +1,13 @@
 _target_: mayavoz.data.dataset.MayaDataset
 name : MS-SDSD
 root_dir : /Users/shahules/Myprojects/MS-SNSD
 duration : 5
 stride : 1
 sampling_rate: 16000
 batch_size: 32
 min_valid_minutes: 25
 files:
  train_clean : CleanSpeech_training
  test_clean : CleanSpeech_training
  train_noisy : NoisySpeech_training
  test_noisy : NoisySpeech_training
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/hyperparameters/default.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/hyperparameters/default.yaml
@ -0,0 +1,7 @@
 loss : mae
 metric : [stoi,pesq]
 lr : 0.0003
 ReduceLr_patience : 10
 ReduceLr_factor : 0.5
 min_lr : 0.000001
 EarlyStopping_factor : 10
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/mlflow/experiment.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/mlflow/experiment.yaml
@ -0,0 +1,2 @@
 experiment_name : shahules/mayavoz
 run_name : demucs-ms-snsd
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/model/Demucs.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/model/Demucs.yaml
@ -0,0 +1,16 @@
 _target_: mayavoz.models.demucs.Demucs
 num_channels: 1
 resample: 4
 sampling_rate : 16000
 encoder_decoder:
  depth: 4
  initial_output_channels: 64
  kernel_size: 8
  stride: 4
  growth_factor: 2
  glu: True
 lstm:
  bidirectional: False
  num_layers: 2
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/optimizer/Adam.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/optimizer/Adam.yaml
@ -0,0 +1,6 @@
 _target_: torch.optim.Adam
 lr: 1e-3
 betas: [0.9, 0.999]
 eps: 1e-08
 weight_decay: 0
 amsgrad: False
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/trainer/default.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/trainer/default.yaml
@ -0,0 +1,46 @@
 _target_: pytorch_lightning.Trainer
 accelerator: gpu
 accumulate_grad_batches: 1
 amp_backend: native
 auto_lr_find: True
 auto_scale_batch_size: False
 auto_select_gpus: True
 benchmark: False
 check_val_every_n_epoch: 1
 detect_anomaly: False
 deterministic: False
 devices: 2
 enable_checkpointing: True
 enable_model_summary: True
 enable_progress_bar: True
 fast_dev_run: False
 gpus: null
 gradient_clip_val: 0
 gradient_clip_algorithm: norm
 ipus: null
 limit_predict_batches: 1.0
 limit_test_batches: 1.0
 limit_train_batches: 1.0
 limit_val_batches: 1.0
 log_every_n_steps: 50
 max_epochs: 200
 max_steps: -1
 max_time: null
 min_epochs: 1
 min_steps: null
 move_metrics_to_cpu: False
 multiple_trainloader_mode: max_size_cycle
 num_nodes: 1
 num_processes: 1
 num_sanity_val_steps: 2
 overfit_batches: 0.0
 precision: 32
 profiler: null
 reload_dataloaders_every_n_epochs: 0
 replace_sampler_ddp: True
 strategy: ddp
 sync_batchnorm: False
 tpu_cores: null
 track_grad_norm: -1
 val_check_interval: 1.0
 weights_save_path: null
--- a/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/trainer/fastrun_dev.yaml
+++ b/recipes/MS-SNSD/MS-SNSD-20/Demucs/train_config/trainer/fastrun_dev.yaml
@ -0,0 +1,2 @@
 _target_: pytorch_lightning.Trainer
 fast_dev_run: True
--- a/recipes/MS-SNSD/MS-SNSD-20/README.md
+++ b/recipes/MS-SNSD/MS-SNSD-20/README.md
@ -0,0 +1,17 @@
 ### Microsoft Scalable Noisy Speech Dataset (MS-SNSD)
 MS-SNSD is a speech datasetthat can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
 ### Dataset download & setup
 - Follow steps in the official repo [here](https://github.com/microsoft/MS-SNSD) to download and setup the dataset.
 **References**
 ```BibTex
@article{reddy2019scalable,
  title={A Scalable Noisy Speech Dataset and Online Subjective Test Framework},
  author={Reddy, Chandan KA and Beyrami, Ebrahim and Pool, Jamie and Cutler, Ross and Srinivasan, Sriram and Gehrke, Johannes},
  journal={Proc. Interspeech 2019},
  pages={1816--1820},
  year={2019}
 }
 ```
--- a/recipes/Valentini-dataset/28spk/Demucs/train.py
+++ b/recipes/Valentini-dataset/28spk/Demucs/train.py
@ -4,10 +4,16 @@ from types import MethodType
 import hydra
 from hydra.utils import instantiate
 from omegaconf import DictConfig, OmegaConf
-from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
+from pytorch_lightning.callbacks import (
    EarlyStopping,
    LearningRateMonitor,
    ModelCheckpoint,
 )
 from pytorch_lightning.loggers import MLFlowLogger
 from torch.optim.lr_scheduler import ReduceLROnPlateau
 # from torch_audiomentations import Compose, Shift
 os.environ["HYDRA_FULL_ERROR"] = "1"
 JOB_ID = os.environ.get("SLURM_JOBID", "0")
@ -25,8 +31,13 @@ def main(config: DictConfig):
    )
    parameters = config.hyperparameters
    # apply_augmentations = Compose(
    #     [
    #         Shift(min_shift=0.5, max_shift=1.0, shift_unit="seconds", p=0.5),
    #     ]
    # )
-    dataset = instantiate(config.dataset)
+    dataset = instantiate(config.dataset, augmentations=None)
    model = instantiate(
        config.model,
        dataset=dataset,
@ -45,6 +56,8 @@ def main(config: DictConfig):
        every_n_epochs=1,
    )
    callbacks.append(checkpoint)
    callbacks.append(LearningRateMonitor(logging_interval="epoch"))
    if parameters.get("Early_stop", False):
        early_stopping = EarlyStopping(
            monitor="val_loss",
@ -56,11 +69,11 @@ def main(config: DictConfig):
        )
        callbacks.append(early_stopping)
-    def configure_optimizer(self):
+    def configure_optimizers(self):
        optimizer = instantiate(
            config.optimizer,
            lr=parameters.get("lr"),
-            parameters=self.parameters(),
+            params=self.parameters(),
        )
        scheduler = ReduceLROnPlateau(
            optimizer=optimizer,
@ -70,9 +83,13 @@ def main(config: DictConfig):
            min_lr=parameters.get("min_lr", 1e-6),
            patience=parameters.get("ReduceLr_patience", 3),
        )
-        return {"optimizer": optimizer, "lr_scheduler": scheduler}
+        return {
            "optimizer": optimizer,
            "lr_scheduler": scheduler,
            "monitor": f'valid_{parameters.get("ReduceLr_monitor", "loss")}',
        }
-    model.configure_parameters = MethodType(configure_optimizer, model)
+    model.configure_optimizers = MethodType(configure_optimizers, model)
    trainer = instantiate(config.trainer, logger=logger, callbacks=callbacks)
    trainer.fit(model)
--- a/recipes/Valentini-dataset/28spk/Demucs/train_config/config.yaml
+++ b/recipes/Valentini-dataset/28spk/Demucs/train_config/config.yaml
@ -0,0 +1,7 @@
 defaults:
  - model : Demucs
  - dataset : Vctk
  - optimizer : Adam
  - hyperparameters : default
  - trainer : default
  - mlflow : experiment
--- a/recipes/Valentini-dataset/28spk/Demucs/train_config/dataset/Vctk.yaml
+++ b/recipes/Valentini-dataset/28spk/Demucs/train_config/dataset/Vctk.yaml
@ -0,0 +1,13 @@
 _target_: mayavoz.data.dataset.MayaDataset
 name : vctk
 root_dir : /scratch/c.sistc3/DS_10283_2791
 duration : 4.5
 stride : 0.5
 sampling_rate: 16000
 batch_size: 32
 min_valid_minutes : 25
 files:
  train_clean : clean_trainset_28spk_wav
  test_clean : clean_testset_wav
  train_noisy : noisy_trainset_28spk_wav
  test_noisy : noisy_testset_wav
--- a/recipes/Valentini-dataset/28spk/Demucs/train_config/hyperparameters/default.yaml
+++ b/recipes/Valentini-dataset/28spk/Demucs/train_config/hyperparameters/default.yaml
@ -0,0 +1,8 @@
 loss : mae
 metric : [stoi,pesq,si-sdr]
 lr : 0.0003
 Early_stop : False
 ReduceLr_patience : 10
 ReduceLr_factor : 0.1
 min_lr : 0.000001
 EarlyStopping_factor : 10
--- a/recipes/Valentini-dataset/28spk/Demucs/train_config/mlflow/experiment.yaml
+++ b/recipes/Valentini-dataset/28spk/Demucs/train_config/mlflow/experiment.yaml
@ -0,0 +1,2 @@
 experiment_name : shahules/mayavoz
 run_name : baseline
--- a/recipes/Valentini-dataset/28spk/Demucs/train_config/model/Demucs.yaml
+++ b/recipes/Valentini-dataset/28spk/Demucs/train_config/model/Demucs.yaml
@ -0,0 +1,16 @@
 _target_: mayavoz.models.demucs.Demucs
 num_channels: 1
 resample: 4
 sampling_rate : 16000
 encoder_decoder:
  depth: 4
  initial_output_channels: 64
  kernel_size: 8
  stride: 4
  growth_factor: 2
  glu: True
 lstm:
  bidirectional: True
  num_layers: 2
--- a/recipes/Valentini-dataset/28spk/Demucs/train_config/optimizer/Adam.yaml
+++ b/recipes/Valentini-dataset/28spk/Demucs/train_config/optimizer/Adam.yaml
@ -0,0 +1,6 @@
 _target_: torch.optim.Adam
 lr: 1e-3
 betas: [0.9, 0.999]
 eps: 1e-08
 weight_decay: 0
 amsgrad: False
--- a/recipes/Valentini-dataset/28spk/Demucs/train_config/trainer/default.yaml
+++ b/recipes/Valentini-dataset/28spk/Demucs/train_config/trainer/default.yaml
@ -2,14 +2,14 @@ _target_: pytorch_lightning.Trainer
 accelerator: gpu
 accumulate_grad_batches: 1
 amp_backend: native
-auto_lr_find: False
+auto_lr_find: True
 auto_scale_batch_size: False
 auto_select_gpus: True
 benchmark: False
 check_val_every_n_epoch: 1
 detect_anomaly: False
 deterministic: False
-devices: 2
+devices: 1
 enable_checkpointing: True
 enable_model_summary: True
 enable_progress_bar: True
@ -22,8 +22,9 @@ limit_predict_batches: 1.0
 limit_test_batches: 1.0
 limit_train_batches: 1.0
 limit_val_batches: 1.0
-log_every_n_steps: 100
+log_every_n_steps: 50
-max_epochs: 250
+max_epochs: 200
 max_steps: -1
 max_time: null
 min_epochs: 1
 min_steps: null
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train.py
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train.py
@ -0,0 +1,120 @@
 import os
 from types import MethodType
 import hydra
 from hydra.utils import instantiate
 from omegaconf import DictConfig, OmegaConf
 from pytorch_lightning.callbacks import (
    EarlyStopping,
    LearningRateMonitor,
    ModelCheckpoint,
 )
 from pytorch_lightning.loggers import MLFlowLogger
 from torch.optim.lr_scheduler import ReduceLROnPlateau
 # from torch_audiomentations import Compose, Shift
 os.environ["HYDRA_FULL_ERROR"] = "1"
 JOB_ID = os.environ.get("SLURM_JOBID", "0")
@hydra.main(config_path="train_config", config_name="config")
 def main(config: DictConfig):
    OmegaConf.save(config, "config_log.yaml")
    callbacks = []
    logger = MLFlowLogger(
        experiment_name=config.mlflow.experiment_name,
        run_name=config.mlflow.run_name,
        tags={"JOB_ID": JOB_ID},
    )
    parameters = config.hyperparameters
    # apply_augmentations = Compose(
    #     [
    #         Shift(min_shift=0.5, max_shift=1.0, shift_unit="seconds", p=0.5),
    #     ]
    # )
    dataset = instantiate(config.dataset, augmentations=None)
    model = instantiate(
        config.model,
        dataset=dataset,
        lr=parameters.get("lr"),
        loss=parameters.get("loss"),
        metric=parameters.get("metric"),
    )
    direction = model.valid_monitor
    checkpoint = ModelCheckpoint(
        dirpath="./model",
        filename=f"model_{JOB_ID}",
        monitor="valid_loss",
        verbose=False,
        mode=direction,
        every_n_epochs=1,
    )
    callbacks.append(checkpoint)
    callbacks.append(LearningRateMonitor(logging_interval="epoch"))
    if parameters.get("Early_stop", False):
        early_stopping = EarlyStopping(
            monitor="val_loss",
            mode=direction,
            min_delta=0.0,
            patience=parameters.get("EarlyStopping_patience", 10),
            strict=True,
            verbose=False,
        )
        callbacks.append(early_stopping)
    def configure_optimizers(self):
        optimizer = instantiate(
            config.optimizer,
            lr=parameters.get("lr"),
            params=self.parameters(),
        )
        scheduler = ReduceLROnPlateau(
            optimizer=optimizer,
            mode=direction,
            factor=parameters.get("ReduceLr_factor", 0.1),
            verbose=True,
            min_lr=parameters.get("min_lr", 1e-6),
            patience=parameters.get("ReduceLr_patience", 3),
        )
        return {
            "optimizer": optimizer,
            "lr_scheduler": scheduler,
            "monitor": f'valid_{parameters.get("ReduceLr_monitor", "loss")}',
        }
    model.configure_optimizers = MethodType(configure_optimizers, model)
    trainer = instantiate(config.trainer, logger=logger, callbacks=callbacks)
    trainer.fit(model)
    trainer.test(model)
    logger.experiment.log_artifact(
        logger.run_id, f"{trainer.default_root_dir}/config_log.yaml"
    )
    saved_location = os.path.join(
        trainer.default_root_dir, "model", f"model_{JOB_ID}.ckpt"
    )
    if os.path.isfile(saved_location):
        logger.experiment.log_artifact(logger.run_id, saved_location)
        logger.experiment.log_param(
            logger.run_id,
            "num_train_steps_per_epoch",
            dataset.train__len__() / dataset.batch_size,
        )
        logger.experiment.log_param(
            logger.run_id,
            "num_valid_steps_per_epoch",
            dataset.val__len__() / dataset.batch_size,
        )
 if __name__ == "__main__":
    main()
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/config.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/config.yaml
@ -0,0 +1,7 @@
 defaults:
  - model : WaveUnet
  - dataset : Vctk
  - optimizer : Adam
  - hyperparameters : default
  - trainer : default
  - mlflow : experiment
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/dataset/Vctk.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/dataset/Vctk.yaml
@ -1,11 +1,11 @@
-_target_: enhancer.data.dataset.EnhancerDataset
+_target_: mayavoz.data.dataset.MayaDataset
 name : vctk
 root_dir : /scratch/c.sistc3/DS_10283_2791
-duration : 1.5
+duration : 2
 stride : 1
 sampling_rate: 16000
-batch_size: 256
+batch_size: 128
-valid_size : 0.05
+valid_minutes : 25
 files:
  train_clean : clean_trainset_28spk_wav
  test_clean : clean_testset_wav
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/hyperparameters/default.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/hyperparameters/default.yaml
@ -0,0 +1,8 @@
 loss : mae
 metric : [stoi,pesq,si-sdr]
 lr : 0.003
 ReduceLr_patience : 10
 ReduceLr_factor : 0.1
 min_lr : 0.000001
 EarlyStopping_factor : 10
 Early_stop : False
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/mlflow/experiment.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/mlflow/experiment.yaml
@ -0,0 +1,2 @@
 experiment_name : shahules/mayavoz
 run_name : baseline
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/model/WaveUnet.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/model/WaveUnet.yaml
@ -0,0 +1,5 @@
 _target_: mayavoz.models.waveunet.WaveUnet
 num_channels : 1
 depth : 9
 initial_output_channels: 24
 sampling_rate : 16000
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/optimizer/Adam.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/optimizer/Adam.yaml
@ -0,0 +1,6 @@
 _target_: torch.optim.Adam
 lr: 1e-3
 betas: [0.9, 0.999]
 eps: 1e-08
 weight_decay: 0
 amsgrad: False
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/trainer/default.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/trainer/default.yaml
@ -0,0 +1,46 @@
 _target_: pytorch_lightning.Trainer
 accelerator: gpu
 accumulate_grad_batches: 1
 amp_backend: native
 auto_lr_find: True
 auto_scale_batch_size: False
 auto_select_gpus: True
 benchmark: False
 check_val_every_n_epoch: 1
 detect_anomaly: False
 deterministic: False
 devices: 2
 enable_checkpointing: True
 enable_model_summary: True
 enable_progress_bar: True
 fast_dev_run: False
 gpus: null
 gradient_clip_val: 0
 gradient_clip_algorithm: norm
 ipus: null
 limit_predict_batches: 1.0
 limit_test_batches: 1.0
 limit_train_batches: 1.0
 limit_val_batches: 1.0
 log_every_n_steps: 50
 max_epochs: 200
 max_steps: -1
 max_time: null
 min_epochs: 1
 min_steps: null
 move_metrics_to_cpu: False
 multiple_trainloader_mode: max_size_cycle
 num_nodes: 1
 num_processes: 1
 num_sanity_val_steps: 2
 overfit_batches: 0.0
 precision: 32
 profiler: null
 reload_dataloaders_every_n_epochs: 0
 replace_sampler_ddp: True
 strategy: ddp
 sync_batchnorm: False
 tpu_cores: null
 track_grad_norm: -1
 val_check_interval: 1.0
 weights_save_path: null
--- a/recipes/Valentini-dataset/28spk/WaveUnet/train_config/trainer/fastrun_dev.yaml
+++ b/recipes/Valentini-dataset/28spk/WaveUnet/train_config/trainer/fastrun_dev.yaml
@ -0,0 +1,2 @@
 _target_: pytorch_lightning.Trainer
 fast_dev_run: True
--- a/recipes/Valentini-dataset/README.md
+++ b/recipes/Valentini-dataset/README.md
@ -0,0 +1,12 @@
 ## Valentini dataset
 Clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. A more detailed description can be found in the papers associated with the database.[official page](https://datashare.ed.ac.uk/handle/10283/2791)
 **References**
 ```BibTex
@misc{
 title={Noisy speech database for training speech enhancement algorithms and TTS models},
 author={Valentini-Botinhao, Cassia}, year={2017},
 doi=https://doi.org/10.7488/ds/2117,
 }
 ```
--- a/requirements.txt
+++ b/requirements.txt
@ -1,19 +1,19 @@
-# torch>=1.12.1
+boto3>=1.24.86
-# torchaudio>=0.12.1
+huggingface-hub>=0.10.0
-# tqdm>=4.64.1
+hydra-core>=1.2.0
-configparser
+joblib>=1.2.0
-# boto3>=1.24.86
+librosa>=0.9.2
-# huggingface-hub>=0.10.0
+mlflow>=1.28.0
 # hydra-core>=1.2.0
 # joblib>=1.2.0
 # librosa>=0.9.2
 # mlflow>=1.29.0
 numpy>=1.23.3
-# pesq==0.0.4
+pesq==0.0.4
-# protobuf>=3.19.6
+protobuf>=3.19.6
-# pystoi==0.3.3
+pystoi==0.3.3
-# pytest-lazy-fixture>=0.6.3
+pytest-lazy-fixture>=0.6.3
-# pytorch-lightning>=1.7.7
+pytorch-lightning>=1.7.7
-# scikit-learn>=1.1.2
+scikit-learn>=1.1.2
 scipy>=1.9.1
 soundfile>=0.11.0
 torch>=1.12.1
 torch-audiomentations==0.11.0
 torchaudio>=0.12.1
 tqdm>=4.64.1
--- a/setup.cfg
+++ b/setup.cfg
@ -3,7 +3,7 @@
 # http://setuptools.readthedocs.io/en/latest/setuptools.html#configuring-setup-using-setup-cfg-files
 [metadata]
-name = enhancer
+name = mayavoz
 description = Deep learning for speech enhacement
 author = Shahul Ess
 author-email = shahules786@gmail.com
@ -53,7 +53,7 @@ cli =
 [options.entry_points]
 console_scripts =
-    enhancer-train=enhancer.cli.train:train
+    mayavoz-train=mayavoz.cli.train:train
 [test]
 # py.test options when running `python setup.py test`
@ -66,7 +66,7 @@ extras = True
 # e.g. --cov-report html (or xml) for html/xml output or --junitxml junit.xml
 # in order to write a coverage file that can be read by Jenkins.
 addopts =
-    --cov enhancer --cov-report term-missing
+    --cov mayavoz --cov-report term-missing
    --verbose
 norecursedirs =
    dist
@ -98,3 +98,7 @@ exclude =
    build
    dist
    .eggs
 [options.data_files]
 . = requirements.txt
 _ = version.txt
--- a/setup.py
+++ b/setup.py
@ -33,15 +33,15 @@ elif sha != "Unknown":
    version += "+" + sha[:7]
 print("-- Building version " + version)
-version_path = ROOT_DIR / "enhancer" / "version.py"
+version_path = ROOT_DIR / "mayavoz" / "version.py"
 with open(version_path, "w") as f:
    f.write("__version__ = '{}'\n".format(version))
 if __name__ == "__main__":
    setup(
-        name="enhancer",
+        name="mayavoz",
-        namespace_packages=["enhancer"],
+        namespace_packages=["mayavoz"],
        version=version,
        packages=find_packages(),
        install_requires=requirements,
--- a/setup.sh
+++ b/setup.sh
@ -1,13 +0,0 @@
 #!/bin/bash
 set -e
 echo "Loading Anaconda Module"
 module load anaconda
 echo "Creating Virtual Environment"
 conda env create -f environment.yml ||  conda env update -f environment.yml
 source activate enhancer
 echo "copying files"
 # cp /scratch/$USER/TIMIT/.* /deep-transcriber
--- a/tests/loss_function_test.py
+++ b/tests/loss_function_test.py
@ -1,7 +1,7 @@
 import pytest
 import torch
-from enhancer.loss import mean_absolute_error, mean_squared_error
+from mayavoz.loss import mean_absolute_error, mean_squared_error
 loss_functions = [mean_absolute_error(), mean_squared_error()]
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Shahul ES	7133670252	Merge pull request #46 from izam-mohammed/main Updated conv.py	2023-02-08 16:40:35 +05:30
Izam Mohammed	40031ab084	updated conv.py with changing imag variable	2023-01-24 07:12:45 +05:30
Izam Mohammed	379800d3f6	Merge branch 'shahules786:main' into main	2023-01-24 06:57:04 +05:30
Shahul ES	f21fa24f0e	Merge pull request #43 from izam-mohammed/izam-dev-1 Corrected an error in CONTRIBUTING.md file	2023-01-11 23:03:27 +05:30
Izam Mohammed	afa89749ad	Merge pull request #1 from izam-mohammed/izam-dev-1 Corrected an error in CONTRIBUTING.md file	2023-01-11 12:52:13 +05:30
Izam Mohammed	e6fb143c8f	Corrected an error in CONTRIBUTING.md file	2023-01-11 12:51:32 +05:30
Shahul ES	cd7c008d34	update Readme	2022-12-19 16:15:40 +05:30
Shahul ES	287df5bff4	Merge pull request #35 from shahules786/dev add contribution guidelines	2022-12-07 16:21:01 +05:30
shahules786	b56cdf877a	add contribution guidelines	2022-12-07 16:14:30 +05:30
Shahul ES	915574bd30	Merge pull request #31 from shahules786/dev update readme	2022-12-02 12:48:37 +05:30
shahules786	c0cdb9e6e9	update readme	2022-12-02 12:33:28 +05:30
Shahul ES	d57ef2c10a	Merge pull request #30 from shahules786/dev Update documentations	2022-12-01 10:24:38 +05:30
shahules786	0faa06027f	overlap-add	2022-12-01 10:16:10 +05:30
shahules786	c88a87e109	readme	2022-12-01 10:15:39 +05:30
Shahul ES	fe82b398ee	rename owner	2022-12-01 09:37:43 +05:30
Shahul ES	a47b93b699	Merge pull request #28 from shahules786/dev Update Readme	2022-11-24 16:18:45 +05:30
shahules786	763ea60a52	Merge branch 'dev' of https://github.com/shahules786/enhancer into dev	2022-11-24 15:57:19 +05:30
shahules786	0bca3f9949	test pretrained	2022-11-24 15:56:42 +05:30
Shahul ES	31ab30be04	Update README.md	2022-11-24 15:51:18 +05:30
Shahul ES	dd0b060e09	Update ci.yaml	2022-11-24 15:41:09 +05:30
Shahul ES	1d2c5eee55	Update Readme	2022-11-24 15:38:16 +05:30
Shahul ES	f2111321bf	Merge pull request #27 from shahules786/dev Rename dataset	2022-11-24 12:30:23 +05:30
shahules786	25139d7d3f	add MS-SNSD recipes	2022-11-24 12:15:19 +05:30
shahules786	b343ea3610	rmv cli	2022-11-24 11:57:47 +05:30
shahules786	249c535921	rmv cli	2022-11-24 11:57:04 +05:30
shahules786	2de2c715ed	rename dataset	2022-11-24 11:54:27 +05:30
shahules786	612c022d24	tests	2022-11-24 11:54:09 +05:30
shahules786	8b3bc67529	readme	2022-11-24 11:53:55 +05:30
shahules786	9187a940e7	recipes	2022-11-24 11:10:50 +05:30
shahules786	18c95cf219	cli	2022-11-24 11:10:29 +05:30
shahules786	502cad0984	notebooks	2022-11-24 11:10:05 +05:30
Shahul ES	dd27de7467	Update .gitattributes	2022-11-23 19:19:43 +05:30
Shahul ES	d8d61a231b	Merge pull request #26 from shahules786/dev Testpypi	2022-11-23 19:18:28 +05:30
shahules786	80320bbf92	downgrade mlflow	2022-11-23 17:44:28 +05:30
shahules786	ceb69a09c3	gitattr	2022-11-23 17:32:13 +05:30
shahules786	60b654a065	update readme	2022-11-23 17:27:43 +05:30
shahules786	22a1f27e63	Merge branch 'main' of https://github.com/shahules786/enhancer into dev	2022-11-23 17:10:51 +05:30
shahules786	65f1924593	setup	2022-11-23 17:10:20 +05:30
shahules786	9525d2491f	include files	2022-11-23 17:09:49 +05:30
Shahul ES	f94bd22eb6	Merge pull request #25 from shahules786/dev dev install	2022-11-21 22:04:25 +05:30
shahules786	f1fe1a803a	notebooks	2022-11-21 21:43:28 +05:30
Shahul ES	386931d09b	Update README.md	2022-11-15 22:06:48 +05:30
shahules786	9927542713	notebooks	2022-11-15 22:05:39 +05:30
Shahul ES	da85de13ad	Merge pull request #24 from shahules786/dev Minor improvements/bug fixes	2022-11-15 22:03:45 +05:30
shahules786	9ee809a047	rename to train	2022-11-15 21:51:45 +05:30
shahules786	7afe928ee1	relative imports	2022-11-15 21:51:28 +05:30
shahules786	434b44ddc9	minor fixes	2022-11-15 21:51:06 +05:30
shahules786	191c6a7499	add warnings	2022-11-15 21:50:24 +05:30
shahules786	b99ef95719	train config	2022-11-15 21:45:07 +05:30
shahules786	2bfca78caa	fix duration	2022-11-15 21:42:02 +05:30
shahules786	003bab91f9	tests	2022-11-15 21:39:47 +05:30
shahules786	d9b817f650	gitignore	2022-11-15 21:39:35 +05:30
shahules786	90fbfbce73	examples	2022-11-15 21:39:18 +05:30
Shahul ES	7c7db84c39	update readme	2022-11-15 15:11:23 +05:30
Shahul ES	a4f0fda6a5	Merge pull request #23 from shahules786/dev rename package	2022-11-15 15:08:06 +05:30
shahules786	8bc63becce	rename dataset	2022-11-15 14:33:27 +05:30
shahules786	bfd53937c2	rename to mayamodel	2022-11-15 14:29:04 +05:30
shahules786	ba63c54399	ci-cd	2022-11-14 16:31:51 +05:30
shahules786	12cde1b0ab	change save name	2022-11-14 16:30:14 +05:30
shahules786	f8a44f823a	fix typo	2022-11-14 16:19:57 +05:30
shahules786	7838e744a9	rename package	2022-11-14 11:37:26 +05:30
shahules786	1abc450ef8	Merge branch 'dev' of https://github.com/shahules786/enhancer into dev	2022-11-14 10:50:05 +05:30
shahules786	4a2865ff03	negate si-snr	2022-11-14 10:48:31 +05:30
Shahul ES	0e664ed371	Update readme	2022-11-10 19:36:02 +05:30
Shahul ES	cb6f9c20ed	update readme	2022-11-10 19:30:59 +05:30
Shahul ES	8e4c12b98d	update readme	2022-11-10 17:29:01 +05:30
Shahul ES	a0e38c5e5c	Merge pull request #22 from shahules786/dev Dev	2022-11-10 17:27:31 +05:30
Shahul ES	ebba5952e5	Merge pull request #20 from shahules786/dev-recipe recipes and tutorials	2022-11-10 17:09:27 +05:30
shahules786	69c7a0100c	recipes	2022-11-10 16:54:53 +05:30
shahules786	470ec74bcb	add license	2022-11-10 16:27:55 +05:30
shahules786	a2e083b315	add cache	2022-11-10 16:01:06 +05:30
shahules786	252d380acc	rmv badge:	2022-11-10 13:58:39 +05:30
shahules786	4eff036c1c	add cli tutorials	2022-11-10 12:03:59 +05:30
shahules786	d2a7e3c730	add badges	2022-11-10 11:15:32 +05:30
shahules786	3b8551640f	add sheilds	2022-11-10 10:52:36 +05:30
Shahul ES	1d366d6096	Merge pull request #21 from shahules786/dev Merge changes to main	2022-11-10 10:43:12 +05:30
shahules786	d90db16bce	remove hawk files	2022-11-10 10:35:50 +05:30
shahules786	e941235ec0	mv coeff to device	2022-11-10 10:34:48 +05:30
shahules786	27ddf0bec9	advanced tutorial	2022-11-09 13:32:27 +05:30
shahules786	bc13fc03bf	readme	2022-11-08 17:09:56 +05:30
shahules786	7a502671e2	check github copy	2022-11-08 17:01:27 +05:30
shahules786	6384915e17	recipes	2022-11-08 16:48:20 +05:30
shahules786	94ab778c0b	update links	2022-11-08 13:00:11 +05:30
shahules786	ef06786d8c	getting started	2022-11-08 12:45:09 +05:30
shahules786	e0fbf55dca	add recipes table	2022-11-07 20:03:21 +05:30
shahules786	ed210a8c60	mv coeff to input device	2022-11-07 16:00:47 +05:30
shahules786	3cbd0ba7cc	mv coeff to device	2022-11-07 13:00:34 +05:30
shahules786	82308750dc	add direction si-snr	2022-11-07 12:28:25 +05:30
shahules786	234e1a89de	Merge branch 'dev' of https://github.com/shahules786/enhancer into dev	2022-11-07 12:01:36 +05:30
shahules786	47cfc84295	add si-snr	2022-11-07 12:01:20 +05:30
Shahul ES	4adb388a34	Merge pull request #18 from shahules786/dev-dccrn DCCRNET implementation	2022-11-07 11:38:57 +05:30
shahules786	6626ad75e7	fix tests	2022-11-07 11:34:21 +05:30
shahules786	6573bc4c5e	ensure num_channels	2022-11-07 11:33:00 +05:30
shahules786	77699ce7f9	fix tests	2022-11-07 11:15:30 +05:30
shahules786	1a4102cc53	dccrn	2022-11-07 10:53:08 +05:30
shahules786	40e8722014	fix o/p shape	2022-11-07 10:52:35 +05:30
shahules786	15c1d1ad94	fix batchnorm eval() mode	2022-11-07 10:52:11 +05:30
shahules786	511d2141d4	DCCRN implementation	2022-11-07 10:26:51 +05:30
shahules786	fc33bd83b6	transforms test	2022-11-07 10:25:54 +05:30
shahules786	c1d5e56ec0	transforms test	2022-11-07 10:25:27 +05:30
shahules786	d7f3847917	add complex-cat	2022-11-07 10:24:47 +05:30
shahules786	60fc4607d0	init projection_size as None	2022-11-07 10:24:18 +05:30
shahules786	c21f05e307	fix padding & init	2022-11-07 10:23:46 +05:30
shahules786	70d17f6586	add imports	2022-11-05 16:59:04 +05:30
shahules786	2e4a3cd254	add imports	2022-11-05 16:58:50 +05:30
shahules786	4388820921	add imports	2022-11-05 16:58:16 +05:30
shahules786	e2e413f8f3	rmv	2022-11-05 16:55:23 +05:30
shahules786	a3b20d5ddb	fix imports	2022-11-05 16:40:19 +05:30
shahules786	b98599f21e	rename module	2022-11-05 16:36:27 +05:30
shahules786	981763207a	init dccrn	2022-11-05 16:35:57 +05:30
shahules786	d3e052c5f3	complex batchnorm 2d test	2022-11-03 16:06:14 +05:30
shahules786	da1b986d31	complex batchnorm 2d	2022-11-03 16:05:55 +05:30
shahules786	e932dc6c75	batchnorm	2022-11-03 11:37:58 +05:30
Shahul ES	a082474034	Merge pull request #19 from shahules786/dev-loss Support custom loss functions	2022-11-03 09:53:25 +05:30
shahules786	b857754626	add documentation	2022-11-02 18:00:05 +05:30
shahules786	7e298b811f	rmv typo	2022-11-02 17:57:44 +05:30
shahules786	2f85f48d69	add support for custom loss	2022-11-02 17:57:30 +05:30
shahules786	b1144e7b81	tests complexnn	2022-11-01 10:35:49 +05:30
shahules786	0b50a573e8	complex lstm	2022-11-01 10:35:30 +05:30
shahules786	7abd266ab2	test complexnn	2022-10-31 11:43:50 +05:30
shahules786	26cccc6772	complex tranposed conv	2022-10-31 11:43:32 +05:30
shahules786	6f6e7f7ad8	init	2022-10-29 13:20:04 +05:30
shahules786	cf1e5c07a9	test transforms	2022-10-29 11:35:35 +05:30
shahules786	c18a85b5c8	stft	2022-10-29 11:34:51 +05:30
shahules786	7f3dcf39c5	rmv padding_mode	2022-10-29 10:39:32 +05:30
shahules786	6f1acf0423	Revert "add random sampler" This reverts commit `aa52d1ed93`.	2022-10-29 10:33:59 +05:30
shahules786	ad208ca0a0	add padding	2022-10-29 09:41:56 +05:30
shahules786	aa52d1ed93	add random sampler	2022-10-28 13:06:49 +05:30
shahules786	fb2543e81e	fix typo	2022-10-27 16:18:31 +05:30
Shahul ES	a1445b0a95	Merge pull request #17 from shahules786/dev-datafix foolproof iteration	2022-10-27 15:21:54 +05:30
shahules786	e1963ff001	split validation criterion	2022-10-27 15:19:02 +05:30
shahules786	085a85d9ae	fourier transforms using cnn	2022-10-27 11:32:50 +05:30
shahules786	47bbee2c32	rmv augmentations	2022-10-26 21:47:29 +05:30
shahules786	c51dea6885	revert to torchmetric pesq	2022-10-26 21:46:19 +05:30
shahules786	1edc10e9f5	time shift	2022-10-26 12:01:19 +05:30
shahules786	ee40259a8d	fix iterator	2022-10-26 12:00:57 +05:30
shahules786	f07c8741ba	fix resampling	2022-10-26 11:59:58 +05:30
shahules786	24a06ba9be	rename loss	2022-10-26 10:27:23 +05:30
shahules786	04782ba6e9	fix optimizer scheduler	2022-10-26 10:26:27 +05:30
shahules786	23da02d47d	dccrn	2022-10-26 09:36:55 +05:30
shahules786	485a74fc4e	convt stft	2022-10-26 09:36:28 +05:30
shahules786	58de41598e	change matrix	2022-10-25 15:10:36 +05:30
shahules786	4acad6ede8	fix augmentation	2022-10-25 15:10:13 +05:30
shahules786	b070613b64	config"	2022-10-25 12:48:37 +05:30
shahules786	d1bafb3dc6	add augmentations	2022-10-25 12:43:54 +05:30
shahules786	cdffe5c485	DEMUCS w/o stride	2022-10-25 10:57:07 +05:30
shahules786	03d0dc57fc	add torch audiomentations	2022-10-24 22:13:19 +05:30
shahules786	542ab23d8a	add torch-augmentations	2022-10-24 21:50:30 +05:30
shahules786	5dc5fd8f90	default stride None	2022-10-24 21:15:25 +05:30
shahules786	75ebef2462	Waveunet w/o stride	2022-10-24 10:01:54 +05:30
shahules786	101ee563cb	decrease precision	2022-10-23 19:30:46 +05:30
shahules786	97b4a61d9c	half BS	2022-10-23 19:07:53 +05:30
shahules786	460366bd8b	min conf acc ablation study	2022-10-23 17:15:17 +05:30
shahules786	3128fed71e	params	2022-10-23 12:38:20 +05:30
shahules786	fc41de1530	VCTK + DEMUCS	2022-10-23 12:36:43 +05:30
shahules786	ea5c78798a	model assigment'	2022-10-23 12:33:38 +05:30
shahules786	40e2d6e0b0	change to mapstyle	2022-10-23 12:32:58 +05:30
shahules786	02192e5567	to cpu	2022-10-22 12:00:30 +05:30
shahules786	6eb905c1bb	rmv print statements	2022-10-22 12:00:18 +05:30
shahules786	9f658424a6	rmv slicing	2022-10-22 11:18:32 +05:30
shahules786	5f1ed8c725	iterable dataset	2022-10-22 11:17:37 +05:30
shahules786	05e40f84b6	replace pesq	2022-10-22 11:17:22 +05:30
shahules786	9b15534812	print len	2022-10-22 11:05:39 +05:30
shahules786	6314d210c3	debug git commit -m debug '	2022-10-22 11:05:19 +05:30
shahules786	7fa54fc414	debug	2022-10-22 10:30:27 +05:30
shahules786	c4a27686da	debug	2022-10-22 09:57:27 +05:30
shahules786	8457e1cbe2	debug num_workers	2022-10-21 23:23:37 +05:30
shahules786	cd9ffc1a68	fix randomization	2022-10-21 23:22:56 +05:30
shahules786	a75f3c32a3	num_workers	2022-10-21 19:23:59 +05:30
shahules786	a7fb27bb0f	debug	2022-10-21 17:17:02 +05:30
shahules786	20c12556ff	debug	2022-10-21 16:25:24 +05:30
shahules786	9c7a650130	div by batchsize in __len__	2022-10-21 11:37:26 +05:30
shahules786	5d7ea582c9	debug	2022-10-21 11:18:24 +05:30
shahules786	0d3bfd3412	debug	2022-10-21 11:13:17 +05:30
shahules786	178a4523ef	fix worker init fn	2022-10-21 09:48:28 +05:30
shahules786	ba10719520	add arg	2022-10-20 21:03:38 +05:30
shahules786	f2561d7cf7	config	2022-10-20 09:53:27 +05:30
shahules786	c5824cb34a	gitignore	2022-10-20 09:53:06 +05:30
shahules786	a6a2e4a4ae	add batch info	2022-10-20 09:50:04 +05:30
shahules786	2ad49faa67	debug iterative dataset	2022-10-20 09:49:27 +05:30
shahules786	e4f13946e8	fix demucs output	2022-10-19 12:38:29 +05:30
shahules786	edb7f020f7	stride waveform	2022-10-18 15:23:07 +05:30
shahules786	415ed8e3d0	normalize input	2022-10-18 15:22:34 +05:30
shahules786	e118c31f18	specify valid size in mins	2022-10-17 13:10:22 +05:30
shahules786	dab7e73d53	DNS 2020	2022-10-16 11:14:13 +05:30
shahules786	d99fd0eb61	fix duration estimation	2022-10-16 11:13:44 +05:30
shahules786	0910d9ac84	fix dns loader	2022-10-15 12:23:51 +05:30
		`@ -1,2 +0,0 @@`
			`experiment_name : shahules/enhancer`
			`run_name : baseline`
		`@ -1 +0,0 @@`
			`from enhancer.data.dataset import EnhancerDataset`
`@ -1 +1,2 @@`
	`__import__("pkg_resources").declare_namespace(__name__)`	`__import__("pkg_resources").declare_namespace(__name__)`
		`from mayavoz.models import Mayamodel`
		`@ -0,0 +1,2 @@`
							`experiment_name : shahules/mayavoz`
							`run_name : Demucs + Vtck with stride + augmentations`
		`@ -0,0 +1 @@`
							`from mayavoz.data.dataset import MayaDataset`
		`@ -0,0 +1,2 @@`
							`_target_: pytorch_lightning.Trainer`
							`fast_dev_run: True`
		`@ -0,0 +1,2 @@`
							`experiment_name : shahules/mayavoz`
							`run_name : demucs-ms-snsd`