Dialpad, Inc.
Senior Speech Recognition Engineer Dec 2019 - Present
- Architect and built a next-gen Speech Recognition product end-to-end from R&D to production that is toolkit agnostic and performing better than HMM hybrid ASR models
- Lead the R&D on streaming end-to-end ASR for conversational, telephony, and videoconferencing speech under low latency and multi accent scenarios
- Benchmarked various toolkits including Kaldi, K2, ESPnet, NeMo, and WeNet to architect the next-gen ASR system
- Trained and benchmarked various end-to-end ASR architectures with CTC, Attention-based Encoder-Decoder (AED), Transducer, Transformer, and Conformer models with hybrid ASR models and external ASR services
- Developed interfaces for the shallow fusion of multi-level (sub-word and word) RNNLMs and n-gram LMs
- Developed methods to bias the models towards a list of keywords, resulting in an absolute WERR of 7%
- Automated the data preparation pipeline for training ASR models, reducing the turnaround time for experiments and increasing productivity of the team
- Developed pronunciation-assisted sub-word models using fast-align, GIZA++, and Pynini, resulting in an absolute WERR of 3% compared to BPE sub-words
- Post-training quantization of ASR models to achieve 50% faster RTF and 75% smaller models on disk
- Implemented the ASR inference in ONNX runtime, reducing the latency by 3x
- Developed performance monitoring techniques for end-to-end ASR models based on RNN-AED and CTC confidence scores, and their efficacy in semi-supervised and self-supervised learning techniques
- Developed better endpoint detection for hybrid models and achieved 4% relative WERR
- Developed a web-app for internal users to query production calls and visualize hypotheses using wavesurfer-js
Open-source contributions:
- MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
- Third Prize in the challenge
- Team contributions to multilingual and low-resource ASR for Indian Languages. Benchmarking and open-sourcing various end-to-end methods and studying effects of channel distortions on language identification
- Code available here
Observe AI
Machine Learning Intern - ASR May 2019 - Aug 2019
- Developed a feature extraction pipeline using
tf.signal
andtf.data
- Implemented different keyword-spotting (KWS) papers - Deep-KWS, CTC KWS
- Developed methods to convert a custom PyTorch model to TensorFlow
- Deployed the KWS model using TensorFlow serving with an RTF of 0.05 on GPU
IIIT-Bangalore
Research Scholar Jan 2017 - Dec 2019
- Developed end-to-end methods for multilingual and code-switching scenarios in Indian Languages
- Developed joint ASR and KWS systems using joint phoneme-grapheme recognition
- Developed a more accurate and faster training method by jointly training alignment and ASR model
- Mentored MTech and iMTech students in their projects and thesis work and delivered various tutorials and talks around ASR
- Developed a remote hardware laboratory to study control systems using embedded programming and web technologies
- Developed HCI visualizations for a humanoid using Unity and C#
I was also involved in different labs and activities including:
- E-Health Research Center (EHRC)
- Developed rehabilitation robotics applications
- Machine Intelligence and Robotics Center (MINRO)
- Developed multi-lingual applications of Speech and Language Technologies in the domain of e-governance.
- Intel AI Academy Student Ambassador.
- Built a small-footprint ASR application (keyword spotting, wake-word detection) on the “edge” using Intel’s Neural Compute Stick 2 (NCS2) and OpenVINO.
- Graduate Teaching Assistant
- Deep Learning for Automatic Speech Recognition
- Automatic Speech Recognition
- Introduction to Robotics
Invited Talks
-
IIIT-B Samvaad talk Bengaluru, IN | Dec 2020
- Multi-task learning in end-to-end attention-based automatic speech recognition (MS Thesis)
- Open challenges in multi-task learning for ASR
-
IIIT-B Guest Lecture Series - Deep Learning for ASRBengaluru, IN | Sep 2020 - Dec 2020
- Discussions on RNN-CTC, RNN-AED, RNN-T, and Transformer models for ASR
- Discussions on model quantization and weight sparsity in RNN-T models for low computational resource and latency constraints
- Unpacking and analyzing the Pixel Recorder app to showcase how tflite models are packed with custom TensorFlow ops
-
Artificial Intelligence : A Way Forward Bengaluru, IN | Sep 2019
- Faculty development program at Dayananda Sagar College of Arts, Science and Commerce, Bangalore
- Discussions on the use of AI in speech and language technology
-
TCS Think Labs Bengaluru, IN | Feb 2019
- Motivation and introduction to end-to-end ASR
- Discussion on the topics of RNN, CTC, Attention and LM fusion
-
IIIT-B AI Reading Group Bengaluru, IN | Nov 2018
- Discussions on various attention models in end-to-end ASR
- Semi-supervised learning with end-to-end ASR models
-
BMSCE AI Workshop Bengaluru, IN | Sep 2018
- Artificial Intelligence and Deep Neural Networks Workshop for undergraduate students at BMS College of Engineering, Bangalore
- Code examples and tutorials in TensorFlow Keras
Sonus Networks
SVT Engineer Aug 2015 - Jan 2017
- Worked as a part of Sustaining SVT on Real-Time communication products Sonus Insight (EMS) and SBC
- Developed automated test frameworks in Python, Perl, Linux, and Java
- Worked with CentOS, Red Hat Enterprise Linux, and Solaris to develop and test the products
- Developed tools that reduced team effort from many hours to a couple of minutes