Context: We want to learn an appropriate function f provided samples from a dataset Dn={(X,Y)}n.
Turns out, you can do better than the naive Bayes update,
P(f∣Dn)=P(Dn)P(Dn∣f)P(f).
Tempered Bayes
Introduce an inverse temperature, β, to get the tempered Bayes update [1]:
Pβ(f∣Dn)=Pβ(Dn)P(Dn∣f)βP(f).
At first glance, this looks unphysical. Surely P(A∣B)βP(B)=P(A,B) only when β=1?
If you're one for handwaving, you might just accept that this is just a convenient way to vary between putting more weight on the prior and more weight on the data. In any case, the tempered posterior is proper (integrable to one), as long as the untempered posterior is [2].
If you're feeling more thorough, think about the information. Introducing an inverse temperature is simply scaling the number of bits contained in the distribution. P(X,Y∣f)=exp{−βI(X,Y∣f)}.
TODO: Check out Grünwald's Safe Bayes papers
Generalized Bayes
If you're feeling even bolder, you might replace the likelihood with a general loss term, ℓβ,n(f), which measures performance on your dataset Dn,
Pβ(f∣Dn)=Znℓβ,n(f)P(f),
where we write the normalizing constant or partition function as Zn to emphasize that it isn't really an "evidence" anymore.
The most natural choice for ℓβ,n is the Gibbs measure:
Machine learning has gotten sloppy over the years.
It used to be that we thought carefully about the theoretical underpinnings of our models and proceeded accordingly.
We used the L2 loss in regression because, when doing maximum likelihood estimation over, L2 loss follows from the assumption that our samples are distributed according to i.i.d. Gaussian noise around some underlying deterministic function, fw. If we have a likelihood defined as
We'd squeeze in some weight decay because when performing maximum a posteriori estimation it was equivalent to having a Gaussian prior over our weights, φ(w)∼N(0,λ−1). For the same likelihood as above,
Nowadays, you just choose a loss function and twiddle with the settings until it works. Granted, this shift away from Bayesian-grounded techniques has given us a lot of flexibility. And it actually works (unlike much of the Bayesian project which turns out to just be disgustingly intractable).
But when you're a theorist trying to catch up to the empirical results, it turns out the Bayesian frame is rather useful. So we want a principled way of recovering probability distributions from arbitrary choices of loss function. Fortunately, this is possible.
The trick is simple: multiply your loss, rn(w), by some parameter β, whose units are such that βrn(w) is measured in bits. Now, negate and exponentiate and out pop a set of probabilities:
p(Dn∣w)=Zne−βrn(w).
We put a probability distribution over the function space we want to model, and we can extract probabilities over outputs for given inputs.
So I was thinking about the dynamics of training, and looking for a way to model some phenomenon I had observed.
Initially, I had dismissed Brownian motion/random walk as an option because the trend was clearly linear — not square root. But then I was asked what Brownian motion looked like in higher dimensions and I panicked.
I had assumed that Brownian motion was universal — that dimensionality didn't enter into the equation (for average distance from the origin as a function of time). But now I was doubting myself.
So I did a quick derivation, which I'm sharing because it gave me a new way to interpret the chi-squared distribution. Here's the set-up: we have a walker on a D-dimensional (square) lattice. At every timestep, the walker chooses uniformly among the 2D available directions to take one step.
This means that every dimension is sampled on average once every D timesteps, so we can treat a D-dimensional walker after N timesteps as a collection of D independent 1-dimensional walkers after N/D timesteps.
If the 1-dimensional walker's probability distribution is just a Gaussian centered at 0 with standard deviation N, then the probability for the D-dimensional walker is the product of D univariate gaussians centered at 0 with standard deviation N/D.
But that means that the expected distance traveled by the D-dimensional walker, E[∣X∣2], is just the square root of the chi-squared distribution,
Y=i=1∑DXi2,
scaled by a factor N/D. (This is not the chi distribution, which we'd want if we were taking the square root before the expectation.)
In this context, D is the number of degrees of freedom. After multiplying the mean of the chi distribution by our factor N/D, we get
E[∣X∣2]=DDN=N.
It's independent of lattice dimensionality. Just as I originally thought before starting to doubt myself before going down this rabbit hole before reestablishing my confidence.
If that's not enough for you, here are the empirical results.
I started the year set on committing to the path of entrepreneurship. I ended it as an AI safety researcher. What can I say? Priorities change.
One day you may be sold on the earning-to-give route (or — if you're feeling cynical — the social status accompanying entrepreneurship and philanthropy). The next day, you're sold on maybe preventing powerful AI from causing the demise of humanity (or the social status accompanying AI safety research within the EA bubble).
Social status dynamics aside, it's a better fit. Working on Health Curious (the company I founded) made me feel like my brain was shrinking. I just wasn't built to spend my days writing React apps.
Meanwhile, I've always been fascinated by AI (and have always contorted my physics degrees into excuses to study NNs). Research keeps my curiosity levels far better satiated. I also didn't have anything near this healthy a support network while working on my company.
I'm happier, more focused, and working far more productively. If anything, my life has gotten much easier and better since it's gotten less balanced. Having the one overarching priority of "solve alignment" makes taking any kind of decision much easier.
I still think founding some kind of a research organization might very well be in my future. I like working with people and working on big-picture strategy. There's a big premium on that kind of thing in technical AI safety (considering we're a bunch of nerds).
Goals for 2022
Overall, I'd say my goals for 2022 had about a 50% success rate. It's that low mainly because they were bad goals that didn't suit the person I ended up becoming.
Or maybe that's just my coping strategy.
🛑 No more scrolling (YouTube, Reddit, Porn, etc.):
Complete failure. I even ended up joining a new platform (Twitter).
🚪 Screen time:
I'd call this a success. My screen time for my phone is about an hour. For my computer, it's atrocious, often upwards of 8-10 hours. But hey, it's my job, so I accept it as the price of admission.
⏲ Self-monitoring:
Altogether a success. My main innovation was getting on Linear and building an integration with Toggl, so that my tasks are automatically time-tracked. This meant I didn't have to do much thinking to log my time, which is the best way to make sure it actually gets logged.
There's definitely room for improvement: I'm not actually doing anything with the information. I think the most natural way to address this would be to build a little dashboard for all my sources of data.
The other main room for improvement is that there's always more I could track: I didn't manage to track additional media consumption beyond books.
📚 Books (1 book per week):
I didn't get anywhere close to my goal of 50 books this year, at least as logged on Goodreads (it says 22). In terms of total volume, however, I think I far exceeded last year. The discrepancy consists in, e.g., Worm being listed as a single book (at 6,680 pages, about 1.5 times the Harry Potter series), and most of my reading consisting of textbooks and papers.
The bigger failure is that I didn't meet most of the particular categorical targets I set (in terms of, e.g., reading X books of a specific language, X books by Y author).
Just goes to show that optimizing the wrong metric is stupid.
🗃 PKM:
Bit of a failure. My personal knowledge management is a mess in need of a thorough cleaning.
✍️ Writing:
I missed a few of the targets, but I'm overall happy with what I've published.
The main new thing I'm trying to do is publish my own notes on a given subject.
🗣 Languages:
I learned Portuguese to a pretty high level, but I've given up on German (for the time being), and now accept that I will lose my bet with my roommate on reading Faust in the original German by my 25th birthday. I've also seriously fallen behind on the Mandarin.
The bigger problem is that I've had trouble keeping my Anki habit active. For a period of about 6 years, I did Anki pretty much every day, and I need to get back to that commitment, not just for languages, but for everything in my brain.
Oh, and I didn't meet any of the specific create X flashcards targets. They were too ambitious.
🏃 Moving
I stopped diligently trying to close my Apple Watch rings. Bad Jesse. And I've been bad about walking. But overall, my fitness has been pretty good. It really peaked in Brasília when I was doing pilates every day and the pilates instructor was this demon sent from the seventh circle of hell to torture us with her core wrath.
As for the other subgoals, I can manage a 15s or so handstand, but the 30s is not quite there yet. And I've stopped doing my Kegel exercises — the non-ejaculatory orgasm will have to wait.
🍽 Fasting
I love food too much to make myself not eat for a full day every month. So I'm going to stick to the 16/8 that has served me well for years.
🌏 Diet:
I've become progressively more and more vegetarian, and I think it's about time to make the full plunge.
👓 Myopia:
Nope. Didn't make progress here. But that's also because I stopped putting much effort into this.
👥 Relationships:
This is where I've had the most success. I've found a network of people doing the same things I'm doing, and I'm now doing the programs that will get me where I need to be. I have a research mentor and plenty of other guidance to help me along the way.
💰 Money:
Between the SERI MATS stipends and the FTX regrant (which I may ultimately have to repay), I'm doing well. My partner and I are financially and locationally independent.
My takeaways for next year are to set fewer goals and to allow myself more freedom within the goals (e.g., don't try to prescribe a list of the exact authors I'm going to read). I ended up constraining myself more than was useful, and setting goals for things that weren't actually priorities. Lessons learned.
My priority is solving alignment, and for now that means seeking out (1) a position with some financial padding so I can continue doing research and (2) mentorship so I can keep getting better at it. So either: a position at a research organization like Anthropic, OpenAI, DeepMind, etc. or a PhD position (with one of a handful of advisors doing actually relevant research).1
The Subgoals
The main way I'm going to get a research position is — no surprises — to do research. I'll be at SERI MATS for the next two months with explicitly this purpose.
In particular, I'll be doing research on path dependence and theory of deep learning under the guidance of Evan Hubinger. I'm aiming to publish (in conferences) two or three papers out of this work because you have to play the signaling game just a bit if you hope to succeed.
Afterwards, there's an option for an extension (~6 months). If I decide to stick it out with the industry route, I'm aiming to obtain a position by the end of SERI MATS. If I decide for the academia route (or if two months turns out to be a crazy, unrealistic timeline), I'll go with the extension.
My lesson from last year was to set fewer goals and offer myself more freedom within each goal (e.g., to avoid a reading list of exactly these and these authors).
Writing
I want to publish impactful research (or at least have content that I could publish if I decided it was worth it to go through the process of submission).
Since it's hard to measure impact (at least on a one-year timescale), let's stick to setting targets for the observables (and live with the true goal in mind)...
📢 Publish(able) 3 papers. This seems pretty conservative target considering I have 2 already in the works.
📚 Launch Textbook on AI Safety. This has taken a seat on the back-burner for the last month, but it seems pretty important and valuable. I'm going to throw out the sections on "Foundations" and "Machine Learning", and work on the thing it's actually about.
📝 Publish notes at least once a month. Something I want to get in the rhythm of is publishing high-quality notes. Let's be real, writing little blog articles is fun, but it's not the best thing I can be doing provided I can get my writing fill in other ways, such as publishing notes. Which is what I'll be doing.
Reading List
I'm going to throw out my specific "# of books" goals from previous years, though I will set some specific goals in terms of reading textbooks.
First, though a definition of "reading textbooks."
Reading textbooks means skipping the content that doesn't matter, skimming the content that seems possibly somewhat relevant, and investing in the content that seems important (with multiple readings and problem sets). It doesn't mean actually read end-to-end.
Currently in progress
Artificial Intelligence by Russell and Norvig
Reinforcement Learning by Sutton and Barto
The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
Algebraic Geometry and Statistical Learning Theory by Watanabe
Pattern Recognition and Machine Learning by Bishop
An Introduction to Kolmogorov Complexity and Its Applications by Li and Vitányi
Scaling and Renormalization in Statistical Physics by Cardy
New
Mathematical Theory of Bayesian Statistics by Watanabe
Probability Theory: The Logic of Science by Jaynes
Introduction to the theory of Computation by Sipser
Stretch/Undecided
Category Theory by Awodey
Topology by Munkres
Radically Elementary Probability Theory by Nelson
Stretch Goals
🇨🇳 Learn Mandarin. I like learning languages, and hobbies seem healthy even when the world is ending. It also seems valuable to make myself a future asset if world governments ever get their shit together to figure out AI policy.
👓 Myopia. Actually reduce my diopters by 0.5 in both eyes.
🏃 Moving. I'd like to balance out my exercise regime a bit more. Right now, I'm going to hot pilates/yoga several times a week which seems to get me what I need in terms of mobility/flexibility/core/cardio. I'd like to get some actual strength training into my regimen.
💰 Money. I'd love to generate a bit of passive income. Obvious routes are selling content (some notes or lecture series) or some kind of (AI-driven) service.
Footnotes
I'm avoiding the independent research route because I think the value of a strong group of peers and mentors is too high to be missed. ↩