Work - Shree

Dialpad, Inc.

Senior Speech Recognition Engineer Dec 2019 - Present

Architect and built a next-gen Speech Recognition product end-to-end from R&D to production that is toolkit agnostic and performing better than HMM hybrid ASR models
Lead the R&D on streaming end-to-end ASR for conversational, telephony, and videoconferencing speech under low latency and multi accent scenarios
Benchmarked various toolkits including Kaldi, K2, ESPnet, NeMo, and WeNet to architect the next-gen ASR system
Trained and benchmarked various end-to-end ASR architectures with CTC, Attention-based Encoder-Decoder (AED), Transducer, Transformer, and Conformer models with hybrid ASR models and external ASR services
Developed interfaces for the shallow fusion of multi-level (sub-word and word) RNNLMs and n-gram LMs
Developed methods to bias the models towards a list of keywords, resulting in an absolute WERR of 7%
Automated the data preparation pipeline for training ASR models, reducing the turnaround time for experiments and increasing productivity of the team
Developed pronunciation-assisted sub-word models using fast-align, GIZA++, and Pynini, resulting in an absolute WERR of 3% compared to BPE sub-words
Post-training quantization of ASR models to achieve 50% faster RTF and 75% smaller models on disk
Implemented the ASR inference in ONNX runtime, reducing the latency by 3x
Developed performance monitoring techniques for end-to-end ASR models based on RNN-AED and CTC confidence scores, and their efficacy in semi-supervised and self-supervised learning techniques
Developed better endpoint detection for hybrid models and achieved 4% relative WERR
Developed a web-app for internal users to query production calls and visualize hypotheses using wavesurfer-js

Open-source contributions:

MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
- Third Prize in the challenge
- Team contributions to multilingual and low-resource ASR for Indian Languages. Benchmarking and open-sourcing various end-to-end methods and studying effects of channel distortions on language identification
- Code available here

Observe AI

Machine Learning Intern - ASR May 2019 - Aug 2019

Developed a feature extraction pipeline using tf.signal and tf.data
Implemented different keyword-spotting (KWS) papers - Deep-KWS, CTC KWS
Developed methods to convert a custom PyTorch model to TensorFlow
Deployed the KWS model using TensorFlow serving with an RTF of 0.05 on GPU

IIIT-Bangalore

Research Scholar Jan 2017 - Dec 2019

Developed end-to-end methods for multilingual and code-switching scenarios in Indian Languages
Developed joint ASR and KWS systems using joint phoneme-grapheme recognition
Developed a more accurate and faster training method by jointly training alignment and ASR model
Mentored MTech and iMTech students in their projects and thesis work and delivered various tutorials and talks around ASR
Developed a remote hardware laboratory to study control systems using embedded programming and web technologies
Developed HCI visualizations for a humanoid using Unity and C#

I was also involved in different labs and activities including:

E-Health Research Center (EHRC)
- Developed rehabilitation robotics applications
Machine Intelligence and Robotics Center (MINRO)
- Developed multi-lingual applications of Speech and Language Technologies in the domain of e-governance.
Intel AI Academy Student Ambassador.
- Built a small-footprint ASR application (keyword spotting, wake-word detection) on the “edge” using Intel’s Neural Compute Stick 2 (NCS2) and OpenVINO.
Graduate Teaching Assistant
- Deep Learning for Automatic Speech Recognition
- Automatic Speech Recognition
- Introduction to Robotics

Invited Talks

IIIT-B Samvaad talk Bengaluru, IN | Dec 2020
- Multi-task learning in end-to-end attention-based automatic speech recognition (MS Thesis)
- Open challenges in multi-task learning for ASR
IIIT-B Guest Lecture Series - Deep Learning for ASRBengaluru, IN | Sep 2020 - Dec 2020
- Discussions on RNN-CTC, RNN-AED, RNN-T, and Transformer models for ASR
- Discussions on model quantization and weight sparsity in RNN-T models for low computational resource and latency constraints
- Unpacking and analyzing the Pixel Recorder app to showcase how tflite models are packed with custom TensorFlow ops
Artificial Intelligence : A Way Forward Bengaluru, IN | Sep 2019
- Faculty development program at Dayananda Sagar College of Arts, Science and Commerce, Bangalore
- Discussions on the use of AI in speech and language technology
TCS Think Labs Bengaluru, IN | Feb 2019
- Motivation and introduction to end-to-end ASR
- Discussion on the topics of RNN, CTC, Attention and LM fusion
IIIT-B AI Reading Group Bengaluru, IN | Nov 2018
- Discussions on various attention models in end-to-end ASR
- Semi-supervised learning with end-to-end ASR models
BMSCE AI Workshop Bengaluru, IN | Sep 2018
- Artificial Intelligence and Deep Neural Networks Workshop for undergraduate students at BMS College of Engineering, Bangalore
- Code examples and tutorials in TensorFlow Keras

Sonus Networks

SVT Engineer Aug 2015 - Jan 2017

Worked as a part of Sustaining SVT on Real-Time communication products Sonus Insight (EMS) and SBC
Developed automated test frameworks in Python, Perl, Linux, and Java
Worked with CentOS, Red Hat Enterprise Linux, and Solaris to develop and test the products
Developed tools that reduced team effort from many hours to a couple of minutes