Dialpad, Inc.

Speech Recognition Engineer Dec 2019 - Present

  • Architect and built a next-gen Speech Recognition product end-to-end from R&D to production that is toolkit agnostic and performing better than HMM hybrid ASR models
  • Lead the R&D on streaming end-to-end ASR for conversational, telephony, and videoconferencing speech under low latency and multi accent scenarios
  • Benchmarked various toolkits including Kaldi, K2, ESPnet, NeMo, and WeNet to architect the next-gen ASR system
  • Trained and benchmarked various end-to-end ASR architectures with CTC, Attention-based Encoder-Decoder (AED), Transducer, Transformer, and Conformer models with hybrid ASR models and external ASR services
  • Developed interfaces for the shallow fusion of multi-level (sub-word and word) RNNLMs and n-gram LMs
  • Developed methods to bias the models towards a list of keywords, resulting in an absolute WERR of 7%
  • Automated the data preparation pipeline for training ASR models, reducing the turnaround time for experiments and increasing productivity of the team
  • Developed pronunciation-assisted sub-word models using fast-align, GIZA++, and Pynini, resulting in an absolute WERR of 3% compared to BPE sub-words
  • Post-training quantization of ASR models to achieve 50% faster RTF and 75% smaller models on disk
  • Implemented the ASR inference in ONNX runtime, reducing the latency by 3x
  • Developed performance monitoring techniques for end-to-end ASR models based on RNN-AED and CTC confidence scores, and their efficacy in semi-supervised and self-supervised learning techniques
  • Developed better endpoint detection for hybrid models and achieved 4% relative WERR
  • Developed a web-app for internal users to query production calls and visualize hypotheses using wavesurfer-js

Open-source contributions:

  • MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
    • Third Prize in the challenge
    • Team contributions to multilingual and low-resource ASR for Indian Languages. Benchmarking and open-sourcing various end-to-end methods and studying effects of channel distortions on language identification
    • Code available here

Observe AI

Machine Learning Intern - ASR May 2019 - Aug 2019

  • Developed a feature extraction pipeline using tf.signal and tf.data
  • Implemented different keyword-spotting (KWS) papers - Deep-KWS, CTC KWS
  • Developed methods to convert a custom PyTorch model to TensorFlow
  • Deployed the KWS model using TensorFlow serving with an RTF of 0.05 on GPU


Research Scholar Jan 2017 - Dec 2019

  • Developed end-to-end methods for multilingual and code-switching scenarios in Indian Languages
  • Developed joint ASR and KWS systems using joint phoneme-grapheme recognition
  • Developed a more accurate and faster training method by jointly training alignment and ASR model
  • Mentored MTech and iMTech students in their projects and thesis work and delivered various tutorials and talks around ASR
  • Developed a remote hardware laboratory to study control systems using embedded programming and web technologies
  • Developed HCI visualizations for a humanoid using Unity and C#

I was also involved in different labs and activities including:

Invited Talks

  • IIIT-B Samvaad talk Bengaluru, IN | Dec 2020

    • Multi-task learning in end-to-end attention-based automatic speech recognition (MS Thesis)
    • Open challenges in multi-task learning for ASR
  • IIIT-B Guest Lecture Series - Deep Learning for ASRBengaluru, IN | Sep 2020 - Dec 2020

    • Discussions on RNN-CTC, RNN-AED, RNN-T, and Transformer models for ASR
    • Discussions on model quantization and weight sparsity in RNN-T models for low computational resource and latency constraints
    • Unpacking and analyzing the Pixel Recorder app to showcase how tflite models are packed with custom TensorFlow ops
  • Artificial Intelligence : A Way Forward Bengaluru, IN | Sep 2019

    • Faculty development program at Dayananda Sagar College of Arts, Science and Commerce, Bangalore
    • Discussions on the use of AI in speech and language technology
  • TCS Think Labs Bengaluru, IN | Feb 2019

    • Motivation and introduction to end-to-end ASR
    • Discussion on the topics of RNN, CTC, Attention and LM fusion
  • IIIT-B AI Reading Group Bengaluru, IN | Nov 2018

    • Discussions on various attention models in end-to-end ASR
    • Semi-supervised learning with end-to-end ASR models
  • BMSCE AI Workshop Bengaluru, IN | Sep 2018

    • Artificial Intelligence and Deep Neural Networks Workshop for undergraduate students at BMS College of Engineering, Bangalore
    • Code examples and tutorials in TensorFlow Keras

Sonus Networks

SVT Engineer Aug 2015 - Jan 2017

  • Worked as a part of Sustaining SVT on Real-Time communication products Sonus Insight (EMS) and SBC
  • Developed automated test frameworks in Python, Perl, Linux, and Java
  • Worked with CentOS, Red Hat Enterprise Linux, and Solaris to develop and test the products
  • Developed tools that reduced team effort from many hours to a couple of minutes