Rendered at 13:46:43 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
himata4113 1 days ago [-]
talos is already in use by https://github.com/siderolabs/talos, was confused for a second when I saw talos and wasm for a second, got excited about native wasm pod support.
Looks very interesting! We have done a lot with WasmCert-Isabelle (and there's also WasmRef-Isabelle and their 2023 paper, and the earlier WasmCert-Coq); other than being in Lean instead of Isabelle/HOL or Coq, how would you compare the approach you used? E.g. are you able to do an "in-place Store" like WasmRef-Isabelle, and can you represent memories and tables as plain vectors of bytes/refs in memory, can you grow them in-place, etc.? Or any other optimizations/lessons learned?
I'm also curious -- are you just implementing the Wasm binary and text parsing, validation algorithm, and execution semantics in Lean from scratch by reading the English prose in the spec document, and then checking it against the spec tests and the SpecTec description? Or do you have some sort of automated (classical or LLMy) transformation happening? (One could imagine directly transforming the SpecTec, or the OCaml reference interpreter, into Lean... but it sounds like you're not doing that? I think one needs to be a little careful here because e.g. at this point some of the English prose and reference interpreter implementation, and I think maybe some of the tests, are autogenerated from the SpecTec.) Which parts (if any) are outside the scope of the formalization? E.g. for WasmCert-Isabelle, I believe the binary and definitely the text parsing, and I think some of the arithmetic ops, are not covered.
How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc., and that's all before we get to the threads proposal? Because if the goal is to prove programs correct, one risk is that I prove my program correct against your Wasm interpreter (which maybe makes certain choices that aren't determined by the spec), and then I run it against another fully-conforming interpreter in the wild and it behaves incorrectly.
mfornet 13 hours ago [-]
There is a lot to unpack in your comment, thanks for commenting.
We are heavily using LLMs and agents for writing and verifying all the code. We have some safeguard inplace, such as not breaking the wasm testsuite, and being able to run wasm code and produce the correct results (even if that is not the goal of this interpreter). There is an ongoing effort (not by us) about creating a formal WASM spec in lean, generated from SpecTec, when that lands our plan is to prove that our interpreter follow the specification.
> E.g. are you able to do an "in-place Store" like WasmRef-Isabelle, and can you represent memories and tables as plain vectors of bytes/refs in memory, can you grow them in-place, etc.? Or any other optimizations/lessons learned?
Even if the interpreter can be use to run wasm code, we don't really care about efficiency, it is not intended to run WASM, but instead to be able to verify code, so we are intentionally not building an optimized interpreter, or rather we are optimizing toward ease of demonstration.
> How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc.,
Going over your list, some functions that might fail, like {memory., table.}grow and stack exhaustion can't fail on our interpreter, so that transition is not represented here. We need to revisit this hypothesis, but part of the reasoning is that we all properties we prove about the WASM bytecode, are properties held by the original code, and usually (for example in lean) you can prove that a function has some property for all values of a natural number, even if you can't really run the function for all values due to some kind of "system" failure (stack overflow).
NaN, we certainly need to revisit floats, we are delegaing its behaviour to Lean floats. Again part of why we think this might work is because we are not planning to run code with this interpreter, but just to write proofs.
Host calls is properly modeled and being developed as we speak, since we really care about being able to proofs properties of a code that runs on some particular host. When you write a statement about some code, you do it on the presence of some particular host. I'm exploring the possibility of formalizing NEAR smart contracts, which are WASM binaries.
jacobjwalters 1 days ago [-]
What is the program logic used here? The num_integer verification example seems to be hardcoding addresses in the spec; what if I want to reason about larger programs that dynamically allocate, where the addresses may not be known statically? How can I make sure these do not overlap? And since this is a shallow embedding into lean, what’s the approach for verifying properties of non-terminating programs?
mfornet 1 days ago [-]
> what if I want to reason about larger programs that dynamically allocate, where the addresses may not be known statically? How can I make sure these do not overlap?
We are actively working on this, as it is a pre-condition :P to reason about the simplest of useful programs. The idea is to develop an API around separation logic that allows you to reason about logic that manipulate non-overlapping regions of memory.
It won't be relevant if address are not known statically since API theorems will be parametrized over non-relevant constants such as addresses, function indices, etc...
> And since this is a shallow embedding into lean, what’s the approach for verifying properties of non-terminating programs?
To use the interpreter there is the concept of fuel, which we explicitly hide from the reasoning layer. Using fuel you can write statements of the form, this function returns out of fuel for any value of fuel passed to the interpreter, which is equivalent to prove that your program doesn't terminate.
jacobjwalters 20 hours ago [-]
Is the plan to build a new separation logic framework, or use e.g. iris-lean or splean as a base?
And even if fuel isn’t exposed in the program logic, I’d imagine you’d still want step indexing to allow reasoning around cyclic heap structures. My experience with getting Claude (even fable) to do step indexed logical relations/SL proofs autonomously is that it struggles hard even on toy examples, particularly around the index shuffling. Do you have a setup in mind that can scale to larger programs?
Fun project in any case! I look forward to seeing how it develops :)
mfornet 14 hours ago [-]
> Is the plan to build a new separation logic framework, or use e.g. iris-lean or splean as a base?
We plan to build our own, though we are currently evaluating what is the current state of iris-lean. Thanks for pointing to splean, I wasn't aware of it before.
> I’d imagine you’d still want step indexing to allow reasoning around cyclic heap structures
We have some APIs around loops and function calls that hides the "fuel-detail" and instead asks you to provide a well-founded relation that is used to prove that the behavior of the program is correct for "some" fuel.
The GCD example make use of this primitive to provide the invariants of the loop plus the argument that loop will terminate.
> Fun project in any case! I look forward to seeing how it develops :)
Feel free to join the telegram dev chat (link in github)
quietusmuris 2 days ago [-]
Interesting. Do I have to write specs in Lean against the Wasm semantics or can you annotate Rust directly?
mfornet 1 days ago [-]
Both.
You can write "annotate" your rust code using asserts. On the wasm side asserts are converted to trap instructions, so the Lean spec will simply be: For every input this code never traps.
Part of our focus is making sure that specs are both easy to write and read, since they are human facing. Eventually you could imagine how writing code will mostly be writing specs, and both the code and the proofs will be handled by AI agents. In this scenario it is very important that humans can easily audit and modify the specs.
quietusmuris 23 hours ago [-]
Doesn't that put the Rust compiler (and its assert lowering) in the trusted base? How do you know the asserts you wrote are the traps you're reasoning about?
mfornet 14 hours ago [-]
> Doesn't that put the Rust compiler (and its assert lowering) in the trusted base?
Yes, but I would argue the are already in the trusted base before this project, we are not removing that. We want instead remove "your code" from the trusted base, and just keep the compiler and the specs.
> How do you know the asserts you wrote are the traps you're reasoning about?
You just do. The asserts and the specs have a similar role, they are both consumer facing, and consumers need to make sure they are correct and cover what it is intended.
IshKebab 22 hours ago [-]
How do you actually prove it though? I understand if it's fully automated SMT-style proof, but doesn't Lean require tediously explicit proofs? If it doesn't prove automatically do you have to write out Lean helper proofs about the compiled WASM?
mfornet 14 hours ago [-]
AI has been great so far filling in most of the proofs, and I'm trying to avoid SMT-style proofs early on, to make sure we have a solid API that can be scaled to arbitrary complex code without increasing too much the cost of the verification. I'm sure nonetheless that SMT solvers will play a role going forward in filling up some proofs.
The reason AI has been so good at filling most of the proofs in my opinion, is that proofs are actually "simple", but very tedious and long. Part of our work right now is make sure that the tedious part can be solved mechanically as easy and efficient as possible, so both human or AI can focus on the interesting parts.
lukerj00 2 days ago [-]
I’m on the Cajal team - not OP, but happy to answer questions.
The core bet is that Wasm is a good verification target (close to compiled artifacts, many languages target it), and Lean is the right place to do verification.
Super interested in hearing from people working with Lean, compilers or other Wasm verification frameworks (eg Iris-Wasm).
jsmorph 22 hours ago [-]
Cool. I've been working on a compiler for a subset of Lean that targets WASM. The compiler is implemented in Lean.
Initially we considered formalizing rust code, aeneas is a very promising project that would unlock a lot of features right way by transpiling to lean. However, we didn't want to lock ourselves to rust, so we decided to use a lower level target such that we could verify code from "any" language.
We considered LLVM-IR, and RISC-V.
Ultimately WASM felt like the right decisions. More importantly WASM spec is very well done in details, and it is written with formal verification in mind early on, and there are are plans from to include Lean as one of the targets for generating the spec automatically from SpecTec. Once this exist, we will formalize that our interpreter is correct under the definition generated from the official Wasm-Lean-Spec so we remove it from the Trusted-Base going forward.
lukerj00 23 hours ago [-]
More on this - LLVM-IR has no official formal semantics and it's riddled with UB. RISC-V has a formal model in Sail, but it's an ISA so you throw away the structured control flow and types which we want for proving.
Wasm has different levels we can validate against - starting with W3C test suite, then later full verification against the SpecTec-generated Lean semantics so that we can drop our interpreter from the trusted base.
snthpy 20 hours ago [-]
Ha ha my brain read that as "Lean interpretor in WASM".
Actually i think it's the "for" in the title that's at fault. The repo title is quite clear, but the submission title is ambiguous to me.
BobbyTables2 21 hours ago [-]
Only thing left is to make a Kanban out of it…
oulipo2 1 days ago [-]
Interesting, have you also looked at other formal methods, like Abstract Interpretation?
23 hours ago [-]
dev-kdrainc 24 hours ago [-]
[flagged]
Poi5eN 1 days ago [-]
[flagged]
CurryFurry 1 days ago [-]
[flagged]
johnsonjo 1 days ago [-]
Lean is a programming language [1]
> Lean is an open-source programming language and proof assistant that enables correct, maintainable, and formally verified code
I wish these comments were banned. They come up every time someone names a project with a name that was also used by one guy for his forgotten Lisp dialect in the 70s.
I'm also curious -- are you just implementing the Wasm binary and text parsing, validation algorithm, and execution semantics in Lean from scratch by reading the English prose in the spec document, and then checking it against the spec tests and the SpecTec description? Or do you have some sort of automated (classical or LLMy) transformation happening? (One could imagine directly transforming the SpecTec, or the OCaml reference interpreter, into Lean... but it sounds like you're not doing that? I think one needs to be a little careful here because e.g. at this point some of the English prose and reference interpreter implementation, and I think maybe some of the tests, are autogenerated from the SpecTec.) Which parts (if any) are outside the scope of the formalization? E.g. for WasmCert-Isabelle, I believe the binary and definitely the text parsing, and I think some of the arithmetic ops, are not covered.
How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc., and that's all before we get to the threads proposal? Because if the goal is to prove programs correct, one risk is that I prove my program correct against your Wasm interpreter (which maybe makes certain choices that aren't determined by the spec), and then I run it against another fully-conforming interpreter in the wild and it behaves incorrectly.
We are heavily using LLMs and agents for writing and verifying all the code. We have some safeguard inplace, such as not breaking the wasm testsuite, and being able to run wasm code and produce the correct results (even if that is not the goal of this interpreter). There is an ongoing effort (not by us) about creating a formal WASM spec in lean, generated from SpecTec, when that lands our plan is to prove that our interpreter follow the specification.
> E.g. are you able to do an "in-place Store" like WasmRef-Isabelle, and can you represent memories and tables as plain vectors of bytes/refs in memory, can you grow them in-place, etc.? Or any other optimizations/lessons learned?
Even if the interpreter can be use to run wasm code, we don't really care about efficiency, it is not intended to run WASM, but instead to be able to verify code, so we are intentionally not building an optimized interpreter, or rather we are optimizing toward ease of demonstration.
> How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc.,
Going over your list, some functions that might fail, like {memory., table.}grow and stack exhaustion can't fail on our interpreter, so that transition is not represented here. We need to revisit this hypothesis, but part of the reasoning is that we all properties we prove about the WASM bytecode, are properties held by the original code, and usually (for example in lean) you can prove that a function has some property for all values of a natural number, even if you can't really run the function for all values due to some kind of "system" failure (stack overflow).
NaN, we certainly need to revisit floats, we are delegaing its behaviour to Lean floats. Again part of why we think this might work is because we are not planning to run code with this interpreter, but just to write proofs.
Host calls is properly modeled and being developed as we speak, since we really care about being able to proofs properties of a code that runs on some particular host. When you write a statement about some code, you do it on the presence of some particular host. I'm exploring the possibility of formalizing NEAR smart contracts, which are WASM binaries.
We are actively working on this, as it is a pre-condition :P to reason about the simplest of useful programs. The idea is to develop an API around separation logic that allows you to reason about logic that manipulate non-overlapping regions of memory.
It won't be relevant if address are not known statically since API theorems will be parametrized over non-relevant constants such as addresses, function indices, etc...
> And since this is a shallow embedding into lean, what’s the approach for verifying properties of non-terminating programs?
To use the interpreter there is the concept of fuel, which we explicitly hide from the reasoning layer. Using fuel you can write statements of the form, this function returns out of fuel for any value of fuel passed to the interpreter, which is equivalent to prove that your program doesn't terminate.
And even if fuel isn’t exposed in the program logic, I’d imagine you’d still want step indexing to allow reasoning around cyclic heap structures. My experience with getting Claude (even fable) to do step indexed logical relations/SL proofs autonomously is that it struggles hard even on toy examples, particularly around the index shuffling. Do you have a setup in mind that can scale to larger programs?
Fun project in any case! I look forward to seeing how it develops :)
We plan to build our own, though we are currently evaluating what is the current state of iris-lean. Thanks for pointing to splean, I wasn't aware of it before.
> I’d imagine you’d still want step indexing to allow reasoning around cyclic heap structures
We have some APIs around loops and function calls that hides the "fuel-detail" and instead asks you to provide a well-founded relation that is used to prove that the behavior of the program is correct for "some" fuel.
The GCD example make use of this primitive to provide the invariants of the loop plus the argument that loop will terminate.
> Fun project in any case! I look forward to seeing how it develops :)
Feel free to join the telegram dev chat (link in github)
You can write "annotate" your rust code using asserts. On the wasm side asserts are converted to trap instructions, so the Lean spec will simply be: For every input this code never traps.
Part of our focus is making sure that specs are both easy to write and read, since they are human facing. Eventually you could imagine how writing code will mostly be writing specs, and both the code and the proofs will be handled by AI agents. In this scenario it is very important that humans can easily audit and modify the specs.
Yes, but I would argue the are already in the trusted base before this project, we are not removing that. We want instead remove "your code" from the trusted base, and just keep the compiler and the specs.
> How do you know the asserts you wrote are the traps you're reasoning about?
You just do. The asserts and the specs have a similar role, they are both consumer facing, and consumers need to make sure they are correct and cover what it is intended.
The reason AI has been so good at filling most of the proofs in my opinion, is that proofs are actually "simple", but very tedious and long. Part of our work right now is make sure that the tedious part can be solved mechanically as easy and efficient as possible, so both human or AI can focus on the interesting parts.
The core bet is that Wasm is a good verification target (close to compiled artifacts, many languages target it), and Lean is the right place to do verification.
Super interested in hearing from people working with Lean, compilers or other Wasm verification frameworks (eg Iris-Wasm).
https://github.com/jsmorph/leanexe
I think I managed to use Talos to prove the WAT generated from an example LeanExe program is correct. ?
https://gist.github.com/jsmorph/275a15dc21af037e1d02a1b433be...
Fun.
We considered LLVM-IR, and RISC-V.
Ultimately WASM felt like the right decisions. More importantly WASM spec is very well done in details, and it is written with formal verification in mind early on, and there are are plans from to include Lean as one of the targets for generating the spec automatically from SpecTec. Once this exist, we will formalize that our interpreter is correct under the definition generated from the official Wasm-Lean-Spec so we remove it from the Trusted-Base going forward.
Wasm has different levels we can validate against - starting with W3C test suite, then later full verification against the SpecTec-generated Lean semantics so that we can drop our interpreter from the trusted base.
Actually i think it's the "for" in the title that's at fault. The repo title is quite clear, but the submission title is ambiguous to me.
> Lean is an open-source programming language and proof assistant that enables correct, maintainable, and formally verified code
[1]: https://lean-lang.org/
https://en.wikipedia.org/wiki/Lynx_(disambiguation)
I wish these comments were banned. They come up every time someone names a project with a name that was also used by one guy for his forgotten Lisp dialect in the 70s.