Collaborative proof development is the next thing you need (time travel would be great too)

I've been reducing certain old proofs of mine to one-liners, and wrote down my thoughts.

Feb. 06, 2024

Token-to-token predictions involve generating or predicting the next token based on a sequence of previous tokens. This method is highly granular, allowing for flexible generation of mathematical expressions, code, or text character by character or token by token.

It turned out that this method allowed machines to successfully talk to humans about a number of topics (let’s keep in mind that AIs evolve differently than humans and will (for a while) not be able to behave exactly like humans, which is fine). Humans, on the other hand, can now use natural language rather than search to find answers (and resources) rather than (mostly) resources. That’s a new phenomenon (compare the new tech to the existing chatbots traction- and go-to-market-wise). It’s one of the many small steps, but it’s also a huge one at the same time (builders appreciate builders — it’s tough (and fun, i.e. coding is fun) to build! and one needs many kind people along the way)

How many more steps do we need to solve the RH? To get to 80% on ARC? To solve 80% of IMO problems? To get to the level where an AI could be considered a teacher or a friend? To hire an AI to be your child’s language or math teacher? To have AIs assist in M&A proceedings (involving hundreds, if not thousands, of pages). I don’t know (my estimates tend to be rather optimistic — builders are wired this way; my kids are already learning from (and with) AIs). I’m experimenting with approximate formalization and deep reasoning.

Still, it’s not only the tech. Teaching a new habit at scale is hard. But the network effects are definitely there — and when one person starts to do (what is considered to be) magic, more people will want it.

Of course, not everything is solved. It’s just the beginning, but the market has already validated a new way of doing things (to a good extent). More and more people are learning how to interact with AIs. No matter how we call these AIs (it actually matters, but it’s hard to predict the actual go-to-market accurately), they’ll be more comfortable communicating with humans than anything we have seen so far. Human behavior is changing. Intelligence boost is happening. And much more than that.

Still, deep insight is still hard. If you ask the oracle for a strategy recommendation, some people won’t get even close to what they need. Let’s keep in mind that most of the human experience is not available in text form. The good news is — perhaps we can get to that (or similar) data.

Just like the real numbers can be seen as the completion of the rationals (the real numbers fill in the "gaps" between rational numbers, ensuring that every Cauchy sequence of rationals converges to a real number), so too the idea generation needs some elasticity in the idea space. Math learning could therefore be a promising area of research that can help language models get even better (it could add more versatility to the NN structure).

Just as we encounter contextual understanding challenges with token-level predictions, we face similar issues in mathematics. However, these challenges present differently from a data perspective. Mathematics deals with "tokens" such as problems, tactics, proofs, and lemmas, which can be further utilized to structure information and quantify entities in space (such as in physics) and other languages. For instance, we can use spatial information about a bike and its distance from surrounding objects to infer the cyclist's emotions. We can also learn the range-vs-range concept in card games, or return-oriented programming in memory access games.

Math learning seems to be a promising area. I like both the spatial aspect (imagine coming up with complex numbers by watching a video) as well as the low-level math assembly aspect that can be used to learn how to deal with all information (just as viewing number theory through the lens of category theory enriches our understanding by highlighting the interconnectedness and structure underlying mathematical objects and their relationships, so too language models could climb (well, it’s much more than climbing) the proof ladder and get deeper insights into reality).

Let’s then have a look at low-level predictions in math. The prediction process can involve generating a sequence of steps in solving math problems, constructing proofs, or applying tactics based on a given set of premises or a problem statement. By applying successor (category theory term) while quantifying (anything), a new structure is inherently learnt (behind the scenes). Eventually, generation (producing conjectures, proofs, problem statements, and explanations in human-readable form), atomic collaborative learning (smaller pieces, copilot style, breaking down complex problems into manageable steps, guiding users through the reasoning process), and recommendations (we already have them in VSCode, but they could be much better with language models) could be possible.

In order to get there, atomicity and collaborative environments may be necessary (so that the models can get enough instruction-related input). That could mean “Kaggle for Math” and “Math as Code” (see infrastructure as code), democratization of math and new mathematical structures (machine math assistants could assist in exploring uncharted territories of mathematics, identifying new problems, conjectures, and even contributing to proof development). Think about terms “real”, “rational” and “irrational” in math — there must be a better way. In a way, complex numbers seem more “rational” and “irrational” numbers more “exciting”. Even when viewed from the perspectives of category theory or set theory, natural numbers are interpreted differently. In set theory, they are often described as sets, while in category theory, they are approached as inductive types. It gets even nicer when one looks from a programmer perspective - one needs a recursor, two more terms and a type. Check out these examples from the Lean Prover website — it feels like math is slowly getting close to engineering:

theorem exists_hilbertBasis(𝕜 : Type u_2) [IsROrC 𝕜] (E : Type u_3) [NormedAddCommGroup E] [InnerProductSpace 𝕜 E] [cplt : CompleteSpace E] :
∃ (w : Set E) (b : HilbertBasis (↑w) 𝕜 E), (fun (i : ↑w) => (LinearIsometryEquiv.symm b.repr) (lp.single 2 i 1)) = Subtype.val
theorem IsCompact.exists_mapClusterPt{X : Type u} [TopologicalSpace X] {s : Set X} {ι : Type u_2} (hs : IsCompact s) {f : Filter ι} [Filter.NeBot f] {u : ι → X} (hf : Filter.map u f ≤ Filter.principal s) :
∃ x ∈ s, MapClusterPt x f u
structure FirstOrder.Language.ElementaryEmbedding(L : FirstOrder.Language) (M : Type u_1) (N : Type u_2) [FirstOrder.Language.Structure L M] [FirstOrder.Language.Structure L N] :
Type (max u_1 u_2)
An elementary embedding of first-order structures is an embedding that commutes with the realizations of formulas.
- toFun : M → N
- map_formula' : ∀ ⦃n : ℕ⦄ (φ : FirstOrder.Language.Formula L (Fin n)) (x : Fin n → M), FirstOrder.Language.Formula.Realize φ (↑self ∘ x) ↔ FirstOrder.Language.Formula.Realize φ x
theorem Nat.Partrec.merge'{f : ℕ →. ℕ} {g : ℕ →. ℕ} (hf : Nat.Partrec f) (hg : Nat.Partrec g) :
∃ (h : ℕ →. ℕ), Nat.Partrec h ∧ ∀ (a : ℕ), (∀ x ∈ h a, x ∈ f a ∨ x ∈ g a) ∧ ((h a).Dom ↔ (f a).Dom ∨ (g a).Dom)

Now think about forking a template with 20 incomplete proofs with numerous tactics and lots of sorry’s (Lean term). It feels like more and more of these gaps will eventually be completed by AIs. Languages are everywhere and apparently (this level of ) prediction already got us far, and the next level (not proof ladder climbing) will get us way farther.

Just asked ChatGPT about Partrec.merge, and got this

This theorem states the existence of a partial recursive function h that combines the outputs of two given partial recursive functions f and g. The theorem guarantees that for any natural number a, the function h(a) will yield values that are either in f(a) or g(a), and the domain of h(a) coincides with the domain of f(a) or g(a). Essentially, it describes a way to merge the results of two partial recursive functions into a single one, ensuring that the new function's domain of definition reflects the union of the original functions' domains. This theorem is significant in the theory of computation, particularly in the context of computable functions and their properties.

Then I tried to use it in combination with Language.ElementaryEmbedding, and got this recommendation,

To combine the theorem about partial recursive functions with the concept of an elementary embedding in first-order structures, consider developing a framework where computational properties (like those described by the partial recursive functions f and g) are embedded into a larger logical structure (such as M and N in the elementary embedding). This approach could allow for the exploration of how computational processes relate to logical structures, examining the preservation of computational properties under embeddings that respect the logical operations and formulas of the structures. This could lead to insights into the computability of logical models and the logical properties of computational systems.

But then playing around in VSCode was still a lot of engineering-related pain (fun). The same thing applies to conjectures. Getting them to connect with the rest of the math universe is still quite tricky.

That was fun! I hope you enjoyed it.

Substack von Michael

Diskussion über diese Post