Don't underestimate your own edge

It is May 27th 2024. Nvidia’s stock is at $1064.69.

Favourites of 2023

This is a list of stuff I read or saw in 2023 and really liked. I’m going to say a sentence or more about each and maybe include a quote. I’m hoping somebody who stumbles across this finds something they really like that they otherwise wouldn’t have, or someone who knows me might find an interest in common we didn’t know we had!

How k2 calculates the transducer loss quickly

The new ASR toolkit k2/icefall gets great results while training models quickly. This is an explanation of how it does that by efficiently calculating the transducer loss and thereby using much less memory. Code is also shown.

Why I don't like the black code formatter

First off I understand the need for a tool to avoid teammates bickering with each other, and if I joined a team using black I would follow their rules.

Why the Temperature Matters for Contrastive Loss

Contrastive learning has become very popular recently, see here for a good overview of recent papers.

The do everything abstraction

Premature abstraction is something most people are aware of but I think a more common mistake is the “do everything” abstraction.

Changing My Mind On E2E ASR

I used to be quite skeptical of E2E ASR. I thought that yes, the approach was interesting and worth investigating, but it felt like it was putting too much responsibility on the shoulders of a single system (the neural network) with no priors attached. It did not feel like there was an advantage to it other than simplicity (which by itself will not help performance).

Why you need (at least) a billion words to get a good language model

2024 Update: The title seems blindingly obvious in light of the current trends of training on trillions of words. Still I think it’s good to point out practically how, for those not aware, language has a surprisingly long tail.

Deriving BPE from scratch

BPE is a remarkably effective algorithm for finding a set of subwords. Just count pairs of tokens, merge the most frequent one, repeat until you have the desired number of subwords. Why does this work, and why would just picking the k most frequent ngrams not?

On WER in ASR

This post will be about the python-based tool (“texterrors”) I created for getting error metrics (relevant for ASR). It is split in two parts: First a refresher on standard WER calculation and an illustration of how this can be suboptimal when interested in analysing errors. Then an introduction to the approach I use which fixes the problems mentioned. You can skip to the second part by clicking here.

Doing non-standard stuff with kaldi decoding

Here I’m going to describe methods for using kaldi for decoding when you want to do something a bit custom. I will use an OpenFST wrapper and scripts using it which can be found here.

First post: Ark and scp files in kaldi

This is about the .ark and .scp files that are used with kaldi and have spread to other toolkits like ESPNet. It’s not complicated to understand to them, but I’ve noticed a surprising number of people who use them don’t. This is supposed to be a concise summary of what they are.