Rudolf A Braun

2024 Year in Review

31 Dec 2024

Like last year I’m going talk about some favourite media I’ve consumed, but inspired by this post I’m going try something new and include some freeform thoughts.

Don't underestimate your own edge

27 Jul 2024

It is May 27th 2024. Nvidia’s stock is at $1064.69.

This is a list of stuff I read or saw in 2023 and really liked. I’m going to say a sentence or more about each and maybe include a quote. I’m hoping somebody who stumbles across this finds something they really like that they otherwise wouldn’t have, or someone who knows me might find an interest in common we didn’t know we had!

How k2 calculates the transducer loss quickly

15 Jul 2022

The new ASR toolkit k2/icefall gets great results while training models quickly. This is an explanation of how it does that by efficiently calculating the transducer loss and thereby using much less memory. Code is also shown.

Why I don't like the black code formatter

18 Nov 2021

First off I understand the need for a tool to avoid teammates bickering with each other, and if I joined a team using black I would follow their rules.

Why the Temperature Matters for Contrastive Loss

30 Apr 2021

Contrastive learning has become very popular recently, see here for a good overview of recent papers.

The do everything abstraction

12 Apr 2021

Premature abstraction is something most people are aware of but I think a more common mistake is the “do everything” abstraction.

Changing My Mind On E2E ASR

12 Apr 2021

I used to be quite skeptical of E2E ASR. I thought that yes, the approach was interesting and worth investigating, but it felt like it was putting too much responsibility on the shoulders of a single system (the neural network) with no priors attached. It did not feel like there was an advantage to it other than simplicity (which by itself will not help performance).

Why you need (at least) a billion words to get a good language model

01 Apr 2021

2024 Update: The title seems blindingly obvious in light of the current trends of training on trillions of words. Still I think it’s good to point out practically how, for those not aware, language has a surprisingly long tail.

Deriving BPE from scratch

09 Feb 2021

BPE is a remarkably effective algorithm for finding a set of subwords. Just count pairs of tokens, merge the most frequent one, repeat until you have the desired number of subwords. Why does this work, and why would just picking the k most frequent ngrams not?

On WER in ASR

27 Nov 2020

This post will be about the python-based tool (“texterrors”) I created for getting error metrics (relevant for ASR). It is split in two parts: First a refresher on standard WER calculation and an illustration of how this can be suboptimal when interested in analysing errors. Then an introduction to the approach I use which fixes the problems mentioned. You can skip to the second part by clicking here.

Doing non-standard stuff with kaldi decoding

06 Nov 2020

Here I’m going to describe methods for using kaldi for decoding when you want to do something a bit custom. I will use an OpenFST wrapper and scripts using it which can be found here.

First post: Ark and scp files in kaldi

04 Oct 2020

This is about the .ark and .scp files that are used with kaldi and have spread to other toolkits like ESPNet. It’s not complicated to understand to them, but I’ve noticed a surprising number of people who use them don’t. This is supposed to be a concise summary of what they are.