Don't underestimate your own edge
It is May 27th 2024. Nvidia’s stock is at $1064.69.
It is May 27th 2024. Nvidia’s stock is at $1064.69.
This is a list of stuff I read or saw in 2023 and really liked. I’m going to say a sentence or more about each and maybe include a quote. I’m hoping somebody who stumbles across this finds something they really like that they otherwise wouldn’t have, or someone who knows me might find an interest in common we didn’t know we had!
The new ASR toolkit k2/icefall gets great results while training models quickly. This is an explanation of how it does that by efficiently calculating the transducer loss and thereby using much less memory. Code is also shown.
First off I understand the need for a tool to avoid teammates bickering with each other, and if I joined a team using black I would follow their rules.
Contrastive learning has become very popular recently, see here for a good overview of recent papers.
Premature abstraction is something most people are aware of but I think a more common mistake is the “do everything” abstraction.
I used to be quite skeptical of E2E ASR. I thought that yes, the approach was interesting and worth investigating, but it felt like it was putting too much responsibility on the shoulders of a single system (the neural network) with no priors attached. It did not feel like there was an advantage to it other than simplicity (which by itself will not help performance).
2024 Update: The title seems blindingly obvious in light of the current trends of training on trillions of words. Still I think it’s good to point out practically how, for those not aware, language has a surprisingly long tail.
BPE is a remarkably effective algorithm for finding a set of subwords. Just count pairs of tokens, merge the most frequent one, repeat until you have the desired number of subwords. Why does this work, and why would just picking the k most frequent ngrams not?
This post will be about the python-based tool (“texterrors”) I created for getting error metrics (relevant for ASR). It is split in two parts: First a refresher on standard WER calculation and an illustration of how this can be suboptimal when interested in analysing errors. Then an introduction to the approach I use which fixes the problems mentioned. You can skip to the second part by clicking here.
Here I’m going to describe methods for using kaldi for decoding when you want to do something a bit custom. I will use an OpenFST wrapper and scripts using it which can be found here.
This is about the .ark and .scp files that are used with kaldi and have spread to other toolkits like ESPNet. It’s not complicated to understand to them, but I’ve noticed a surprising number of people who use them don’t. This is supposed to be a concise summary of what they are.