Deep Reasoning Through Flexible Universal Language
Let me tell you a story of this little creature.
Look at this piece of code from the Equational Theories project (Terence Tao, Pietro Monticone, Shreyas Srinivas) below. Still unsolved (well, it’s been only a couple of days). So short, yet insightful. Hard to prove and formalize.
import Mathlib.Tactic
class Magma (α : Type _) where
op : α → α → α
infix:65 " ◇ " => Magma.op
abbrev Equation1485 (G: Type _) [Magma G] := ∀ x y z : G, x = (y ◇ x) ◇ (x ◇ (z ◇ y))
abbrev Equation151 (G: Type _) [Magma G] := ∀ x : G, x = (x ◇ x) ◇ (x ◇ x)
theorem Equation1485_implies_Equation151 (G: Type _) [Magma G] (h: Equation1485 G) : Equation151 G := by
theorem Equation1485_not_implies_Equation151 : ∃ (G: Type) (_: Magma G), Equation1485 G ∧ ¬ Equation151 G := by
1485 is x = (y ◇ x) ◇ (x ◇ (z ◇ y)), and 151 is x = (x ◇ x) ◇ (x ◇ x). 1485 is implied by 168 (which is x = (y ◇ x) ◇ (x ◇ z)). And 168 is implied by 2 (x = y). 1485 is special in that its magma operation to be determined by the graph homomorphism condition.
Did some z3, some Lean, some Sage, and so on. But this search is really hard. Claims, heuristics, optimization problems, and solvers. Looking for insights.
One could instead enable agents — create an environment (turn goals, hypotheses, and so on, into states, tactics into actions, and dream up a reward structure that involves greedy magma extensions). Or build a transformation device (transformer, cross-coder, etc.) from scratch.
I have been building a dataset for fine tuning one of the language models (even ran some fine tuning jobs), and learning.
Eventually, Mario Carneiro (after a longer conversation with Terence Tao) posted this
import Mathlib.Order.Chain
import Mathlib.Data.Set.Countable
theorem Exists.classicalRecOn_eq {α : Sort*} {p : α → Prop} (h : ∃ a, p a)
{C : Sort*} (H : ∀ a, p a → C) : ∃ a pa, h.classicalRecOn H = H a pa :=
⟨_, _, rfl⟩
theorem exists_greedy_chain {α β : Type*} [Preorder α] [Countable β]
(task : β → Set α) (H : ∀ a b, ∃ a', a ≤ a' ∧ a' ∈ task b) (a : α) :
∃ c, IsChain (· ≤ ·) c ∧ a ∈ c ∧ (∀ x ∈ c, a ≤ x) ∧ ∀ b, ∃ a ∈ c, a ∈ task b := by
have ⟨f, hf⟩ := exists_surjective_nat (Option β)
let T := Σ a', {c // a' ∈ c ∧ IsChain (· ≤ ·) c ∧ ∀ x ∈ c, a ≤ x ∧ x ≤ a'}
let G : Nat → T → T := fun n ⟨a', c, ac, hc, hl⟩ =>
match f n with
| none => ⟨a', c, ac, hc, hl⟩
| some b => by
refine (H a' b).classicalRecOn fun a'' ⟨a'a'', _⟩ => ?_
refine ⟨a'', insert a'' c, by simp, IsChain.insert hc ?_, ?_⟩
· exact fun d dc _ => .inr (le_trans (hl _ dc).2 a'a'')
· refine Set.forall_mem_insert.2 ⟨⟨le_trans (hl _ ac).1 a'a'', le_rfl⟩, ?_⟩
exact fun d dc => (hl _ dc).imp_right (le_trans · a'a'')
let F : Nat → Σ a', {c // a' ∈ c ∧ IsChain (· ≤ ·) c ∧ ∀ x ∈ c, a ≤ x ∧ x ≤ a'} :=
Nat.rec ⟨a, {a}, rfl, Set.subsingleton_singleton.isChain, by simp⟩ G
have hF i : F (i+1) = G i (F i) := rfl
have : Monotone (fun i => (F i).2.1) := monotone_nat_of_le_succ fun n => by
rw [hF]; obtain ⟨a', c, ac, hc, hl⟩ := F n
simp only [G]; split <;> simp only [le_refl]
exact let ⟨a, ha, eq⟩ := Exists.classicalRecOn_eq _ _; eq ▸ by cases ha; simp
refine ⟨⋃ i, (F i).2.1, ?_, ?_, ?_, fun b => ?_⟩
· rintro a ⟨_, ⟨i, rfl⟩, hi⟩ b ⟨_, ⟨j, rfl⟩, hj⟩ ab; simp at hi hj ⊢
wlog hle : i ≤ j generalizing a b i j
· exact (this j hj i hi ab.symm ((le_total ..).resolve_left hle)).symm
exact (F j).2.2.2.1 (this hle hi) hj ab
· refine ⟨_, ⟨0, rfl⟩, rfl⟩
· rintro a' ⟨_, ⟨i, rfl⟩, h⟩; exact ((F i).2.2.2.2 _ h).1
· clear_value F
have ⟨i, hi⟩ := hf (some b)
specialize hF i; simp only [G] at hF
revert hF; obtain ⟨a', c, ac, hc, hl⟩ := F i; simp [hi]
refine let ⟨a, ha, eq⟩ := Exists.classicalRecOn_eq _ _; eq ▸ ?_
obtain ⟨h1, h2⟩ := ha; simp; intro hF
refine ⟨a, ⟨i+1, ?_⟩, h2⟩
rw [hF]; simp
and
namespace Greedy
variable (G) in
def ExtBase := G × Nat
instance [Countable G] : Countable (ExtBase G) := inferInstanceAs (Countable (_ × _))
variable (G) in
abbrev PreExtension := Finset (ExtBase G × ExtBase G)
def PreExtension.induced (E : PreExtension G) (x y : ExtBase G) : Set (ExtBase G) :=
{z | IsGood x.1 z.1 y.1 ∧ (x, z) ∈ E ∧ (z, y) ∈ E}
theorem PreExtension.induced_mono {E E' : PreExtension G} (H : E ≤ E') {x y : ExtBase G} :
E.induced x y ⊆ E'.induced x y :=
fun _ ⟨h1, h2, h3⟩ => ⟨h1, H h2, H h3⟩
structure PreExtension.OK (E : PreExtension G) : Prop where
path x y : (x, y) ∈ E → Path x.1 y.1
consistent x y : Set.Subsingleton (E.induced x y)
variable (G) in
abbrev Extension := {E : PreExtension G // E.OK}
variable (G) in
structure GreedyArgs where
[ct : Countable G]
e₀ : Extension G
H E x y : ∃ E' : Extension G, E ≤ E' ∧ (E'.1.induced x y).Nonempty
variable (A : GreedyArgs G)
private theorem exists_extension :
∃ op : ExtBase G → ExtBase G → ExtBase G, ∃ E : ExtBase G → ExtBase G → Prop,
(∀ a b c, c = op a b ↔ IsGood a.1 c.1 b.1 ∧ E a c ∧ E c b) ∧
(∀ a b, (a, b) ∈ A.e₀.1 → E a b) ∧
(∀ a b, E a b → Path a.1 b.1) := by
obtain ⟨e₀, H⟩ := A
have ⟨c, hc, h1, _, h3⟩ := exists_greedy_chain
(task := fun x : ExtBase G × ExtBase G => {e : Extension G | (e.1.induced x.1 x.2).Nonempty})
(fun a ⟨b1, b2⟩ => H a b1 b2) e₀
simp only [Subtype.exists, Prod.forall] at h3
choose f hf1 hf2 op hop using h3
refine ⟨op, fun a b => ∃ e ∈ c, (a, b) ∈ e.1, ?_, fun a b H => ⟨_, h1, H⟩, ?_⟩
· refine fun a b c => ⟨fun H => ?_, fun ⟨h1, ⟨i, hi, h2⟩, ⟨j, hj, h3⟩⟩ => ?_⟩
· exact let ⟨h1, h2, h3⟩ := hop a b; H ▸ ⟨h1, ⟨_, hf2 _ _, h2⟩, ⟨_, hf2 _ _, h3⟩⟩
· have ⟨k, hk, ik, jk⟩ := hc.directedOn _ hi _ hj
have ⟨l, _, kl, fl⟩ := hc.directedOn _ hk _ (hf2 a b)
exact l.2.2 _ _ ⟨h1, le_trans ik kl h2, le_trans jk kl h3⟩ ((f ..).induced_mono fl (hop a b))
· exact fun a b ⟨i, _, hi⟩ => i.2.1 a b hi
def GreedyMagma (_ : GreedyArgs G) := ExtBase G
noncomputable instance : Magma (GreedyMagma A) where
op := (exists_extension A).choose
noncomputable def GreedyArgs.edge : GreedyMagma A → GreedyMagma A → Prop :=
(exists_extension A).choose_spec.choose
theorem GreedyArgs.induced :
∀ {a b c}, c = a ◇ b ↔ IsGood a.1 c.1 b.1 ∧ A.edge a c ∧ A.edge c b :=
@(exists_extension A).choose_spec.choose_spec.1
theorem GreedyArgs.base : ∀ {a b}, (a, b) ∈ A.e₀.1 → A.edge a b :=
@(exists_extension A).choose_spec.choose_spec.2.1
theorem GreedyArgs.path : ∀ {a b}, A.edge a b → Path a.1 b.1 :=
@(exists_extension A).choose_spec.choose_spec.2.2
noncomputable instance : WeakCentralGroupoid (GreedyMagma A) where
eqn _ _ _ := by
refine .symm <| A.induced.2 ⟨?_, ?_⟩
· exact isGood_five (A.induced.1 rfl).1 (A.induced.1 rfl).1 (A.path (A.induced.1 rfl).2.2)
· exact ⟨(A.induced.1 rfl).2.2, (A.induced.1 rfl).2.1⟩
end Greedy
and
open RelaxedWeakCentralGroupoid Greedy
@[equational_result]
theorem not_3457 : ∃ (G : Type) (_ : Magma G), Facts G [1485] [3457] := by
have ⟨x,y,z,w, h1, h2, h3⟩ : ∃ x y z w, IsGood x z x ∧ IsGood z w y ∧ ¬IsGood x z w :=
⟨.C true, .B false, .C false, .C false, .mk' ⟨⟩, ⟨.cc, .cb, nofun⟩, (·.2.2 ⟨nofun⟩)⟩
classical
let e : Extension G0 := by
refine let S := {((x,0),(z,2)),((z,2),(x,0)),((z,2),(w,3)),((w,3),(y,1))}; ⟨S, ?_, ?_⟩
· simp [or_imp, forall_and, isGood_path h1, isGood_path h2, S]
· intro a b
let f : ℕ → ExtBase G0 | 1 => (w, 3) | 2 => (x, 0) | _ => (z, 2)
refine Set.subsingleton_of_forall_eq (a := f b.2) fun u ⟨hu1, hu2, hu3⟩ => ?_
simp only [Finset.mem_insert, Finset.mem_singleton, S] at hu2 hu3
obtain ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ := hu3 <;> rfl
refine ⟨GreedyMagma e, inferInstance, WeakCentralGroupoid.eqn1485, fun h => ?_⟩
have := h (x, 0) (y, 1)
rw [← (e.induced' (x,0) (x,0) (z,2)).2 ⟨h1, e.base (by simp), e.base (by simp)⟩,
← (e.induced' (z,2) (y,1) (w,3)).2 ⟨h2, e.base (by simp), e.base (by simp)⟩] at this
exact h3 (e.induced.1 this).1
@[equational_result]
theorem not_3511 : ∃ (G : Type) (_ : Magma G), Facts G [1485] [3511] := by
have ⟨x,y,z,w, h1, h2, h3⟩ : ∃ x y z w, IsGood x z y ∧ IsGood z w x ∧ ¬IsGood x z w :=
⟨.C true, .B true, .C true, .C false, ⟨.cc, .cb, nofun⟩, .mk' ⟨⟩, (·.2.2 ⟨nofun⟩)⟩
classical
let e : Extension G0 := by
refine let S := {((x,0),(z,2)),((z,2),(y,1)),((z,2),(w,3)),((w,3),(x,0))}; ⟨S, ?_, ?_⟩
· simp [or_imp, forall_and, isGood_path h1, isGood_path h2, S]
· intro a b
let f : ℕ → ExtBase G0 | 0 => (w, 3) | 2 => (x, 0) | _ => (z, 2)
refine Set.subsingleton_of_forall_eq (a := f b.2) fun u ⟨hu1, hu2, hu3⟩ => ?_
simp only [Finset.mem_insert, Finset.mem_singleton, S] at hu2 hu3
obtain ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ := hu3 <;> rfl
refine ⟨GreedyMagma e, inferInstance, WeakCentralGroupoid.eqn1485, fun h => ?_⟩
have := h (x, 0) (y, 1)
rw [← (e.induced' (x,0) (y,1) (z,2)).2 ⟨h1, e.base (by simp), e.base (by simp)⟩,
← (e.induced' (z,2) (x,0) (w,3)).2 ⟨h2, e.base (by simp), e.base (by simp)⟩] at this
exact h3 (e.induced.1 this).1
@[equational_result]
theorem not_2087_2124 : ∃ (G : Type) (_ : Magma G), Facts G [1485] [2087, 2124] := by
have ⟨x,y,z,w,v, h1, h2, h3, h4, h5⟩ : ∃ x y z w v,
IsGood y z x ∧ IsGood y z y ∧ IsGood z w x ∧ IsGood x v x ∧ ¬IsGood w x v :=
⟨.C true, .A, .B false, .C true, .C false,
⟨.ab, .bc, nofun⟩, ⟨.ab, .ba, nofun⟩, ⟨.bc, .cc, nofun⟩, .mk' ⟨⟩, (·.2.2 ⟨nofun⟩)⟩
classical
let e : Extension G0 := by
refine let S := {
((y,1),(z,2)),((z,2),(x,0)),((z,2),(y,1)),((z,2),(w,3)),
((w,3),(x,0)),((x,0),(v,4)),((v,4),(x,0))}; ⟨S, ?_, ?_⟩
· simp [or_imp, forall_and, isGood_path h1, isGood_path h2, isGood_path h3, isGood_path h4, S]
· intro a b
let f : ℕ → ℕ → ExtBase G0
| 2, 0 => (w, 3)
| 2, 2 => (y, 1)
| 1, _ => (z, 2)
| 0, _ => (v, 4)
| _, _ => (x, 0)
refine Set.subsingleton_of_forall_eq (a := f a.2 b.2) fun u ⟨hu1, hu2, hu3⟩ => ?_
simp only [Finset.mem_insert, Finset.mem_singleton, S] at hu2 hu3
refine (or_iff_left_of_imp (b := a = (z, 2)) ?_).1 ?_
· rintro rfl
obtain ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ := hu2 <;>
obtain ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ := hu3 <;> rfl
· obtain ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ | ⟨⟨⟩⟩ := hu2 <;>
first | right; rfl | left; rfl
refine ⟨GreedyMagma e, inferInstance, WeakCentralGroupoid.eqn1485, fun h => ?_, fun h => ?_⟩
· have := h (x, 0) (y, 1)
rw [← (e.induced' (y,1) (x,0) (z,2)).2 ⟨h1, e.base (by simp), e.base (by simp)⟩,
← (e.induced' (z,2) (x,0) (w,3)).2 ⟨h3, e.base (by simp), e.base (by simp)⟩,
← (e.induced' (x,0) (x,0) (v,4)).2 ⟨h4, e.base (by simp), e.base (by simp)⟩] at this
exact h5 (e.induced.1 this).1
· have := h (x, 0) (y, 1)
rw [← (e.induced' (y,1) (y,1) (z,2)).2 ⟨h2, e.base (by simp), e.base (by simp)⟩,
← (e.induced' (z,2) (x,0) (w,3)).2 ⟨h3, e.base (by simp), e.base (by simp)⟩,
← (e.induced' (x,0) (x,0) (v,4)).2 ⟨h4, e.base (by simp), e.base (by simp)⟩] at this
exact h5 (e.induced.1 this).1
Which I found fascinating.
It does not yet solve 1485. But it gave us the formalized greedy, relaxed wcg (weak central grupoid). It gives us a potentially very powerful tool (or a step toward a very powerful tool).
Long story short, I think RL could profit from this kind of insight (this is also an inspiration from prof. Tao). And not only RL. A flexible universal language could be created. Thinking deeper through CoT/RL is definitely helpful. But I can imagine math could be the key driver to thinking really, really deeply. Math and, therefore, physics too.
But it’ll somehow, somewhat counterintuitively, start from math. Even if the initial data seems to be coming from other directions. The (unexpectedly groundbreaking) depth will come from truly deep insights.
Reasoning based on the flexible universal language will be a function of depth of our math. And by math I mean not only the theorems and tools for the AI, but also the stories that make the connection between math and the rest of the flexible universal language.
What we used to call deep years ago is not that deep anymore. It’s, again, a matter of scale.
(You can skip this paragraph) Btw. the same logic applies to extracting information from numeric data for numbers 0-1000, and numbers 1,000,000-1,001,000. The latter has a different distribution of primes, and tells us a slightly different story.
Scale has always been part of the equation.
That being said, it’s going to be a flywheel: such insights will improve the depth of reasoning, and reasoning will help us formalize more insight (which will make reasoning even better). And perhaps run autonomous curriculum agents that introduce perturbations into concept creation and start formalizing theories and structures themselves. The ability to collaborate with agents will raise the bar too.
But why this kind of insight? Why are those insights so special?
When I first looked at the data, and took the network analysis approach, I got this.
This small (and beautiful) creature clearly indicates many layers of structures, because —for a concept (here represented as a graph) of this size to be created— there must exist a way to create many layers of smaller concepts. And that way would be very special too, because it would encode fundamental implications as well as (implicitly) some sort of intuition (and apparently not only intuition in some cases) that certain heuristics could work. And implications are everywhere.
Moreover, implications at sufficiently large scales for a sufficiently large number of parameters don’t feel like implications. One comes up with partial explanations and then tries to unify — because getting right to the deeper concept is just too bright to see.
It also means that, with much more data, we’d discover that this thing would look completely differently at a larger scale (again, this is a similar to the Cosmic Distance Ladder concept from Terence Tao).
This graph is in some ways a watermark on top of a maze solving bot. And the watermark is created backwards — the size of the problem defines the quality of the bot that solved it.
The depth of the truth — due to better information throughput — is then defined mostly by extreme resilience, originality, clarity, and the ability to embrace both humans and AIs.