Journey Through Formal Verifications
Prerequisites: You should know some functional programming — I give an introduction of it in the dependent languages chapter.
Glossary and Resources
Resources
Model Checking: Temporal Logics, Automata, SAT, $ \mu $-Calculus, concurrency
Foundations
Outline
- Glossary: Introduce terms, reference later
- Paradoxes: Puzzles to start thinking about that relates to inconsistency and others
- Incompleteness, Undecidability, Inconsistency: Defining three key terms
- Computability and the Typed Lambda Calculus: Defining computability, another key term, define lambda calculus up to simply typed.
- Propositions as Types: The First Correspondence — Between Gentz's natural deduction and Church's lambda calculus. This allows us to start generalizing.
- Church Encodings: It might be confusing what one can do in typed lambda calculus: So let's show how to encode actual useful things (arithmetic, boolean, lists, ADTs).
- Lambda Cube: From STLC to CiC
- Propositions as Types: More Incarnations —
- System F and the Polymorphic Lambda Calculus
- Modal Logic and Monads
- HoTT: Topology and type theory?!!
Uncompiled
- Russell's paradox. This is a paradox expressable (encodable) in naive set theory and formal logic.
Addressing his own paradox, Russell built his theory of types, which appeared in his and Whitehead's Principia Mathematica. This is the first type theory.
The theory of types presents a hierarchy of types, which is still seen in coq's universes.
But we can't just abondon set theory and start using theory of types (TODO: Why)! So, the Zermelo-Fraenkel set theory with the axiom of choice (ZFC) is also developed in response to the foundational crisis.
Simultaenously, the axiomatic method became a de facto standard: the proof of a theorem must result from explicit axioms and previously proven theorems.
- Hilbert's program.
As a solution to the foundational crisis, Hilbert sought to ground all existing theories to a finite, complete set of axioms, and provide a proof that these axioms were consistent. Ultimately, the consistency of all of mathematics could be reduced to basic arithmetic.
In other words, Hilbert sought for a foundation that was:
- Complete: All true math statements can be proven in the foundation,
- Consistent: A proof that no contradiction can be obtained in the foundation,
- Decidability: There should be an algorithm for deciding the truth or falsity of any math statement.
(TODO: Godel Incompleteness, then Turing and Church takes defining computability and algorithms)
Although Godel stirred things up, Hilbert's program is not completely loss. Modern work on SAT solvers and decision procedures are rooted in these (and of earlier logical foundations).
Uncompiled
Paradoxes
Incompleteness_Undecidability_Inconsistency
- Why must non-terminating (non-total) functions be rejected in Coq? Here is an example showing what would go wrong if Coq allowed non-terminating recursive functions:
Fixpoint loop_false (n : nat) : False := loop_false n.
As a consistent logic, this is not allowed. How would one define the evaluation function for
a small imperative language with a while
construct in Coq, then? One way is to not define
a function (which ought to be total) in the first place, but a relation. We define an
inductive proposition whose return type is Prop
. The cost of this is that we now need
to construct a proofs for evaluations.
Computability_And_Lambda_Calculus
Propositions As Types
Curry noticed that every proof in propositional logic corresponded (bidirectionally) to a function (term) in the (simply) typed lambda calculus. For example, consider
λx:a->b. λy:a. x y
, since this is proper function (term) in the simply typed lambda calculus, it witnesses the truth of some statement in propositional logic, intuitively, we know this is
(a->b) -> a -> b
A false statement in propositional logic will correspond to a badly-typed lambda expression. No typed lambdas should be able to witness something false!
What happens if we upgrade and complicate the propositional logic? Consider the predicate logic, which is propositional logic with quantiifers.
Well, since propositional logic is practically 1-to-1 with the simply typed lambda calculus, we will need new constructs to handle the quantifiers. Howard and De Bruijn introduced the dependent product and dependent pairs for the universal and existential quantiifer, respectively.
∀x:Nat. ∃y:Nat. x < y
-- corresponds to
Π(x: Nat) → Σ(y : Nat) × (x < y)
Here is a crucial question: Logic is more intuitive than the type of some lambda term. So are there situations where it would serve us better if we think in terms of type theory rather than logic?
Yes. Type theory is constructive by nature, logic needs not be.
In Coq
, this means one can extract and compute with Type
(constructive type theory) but not with Prop
(logical propositions).
For example, given a proof of A \/ B
, which is defined as an inductive prop, you do not have access to whether an A
or B
was constructed to witness A \/ B
.
To obtain the witness, we have to use orb
, which is the Type
way of doing things.
As a further example, if you wish to obtain a witness, you must use a dependent pair; if you do not, a regular existential proposition is fine.
Definition foo' (B: Type) P := exists b: B, P b.
Definition bar' B P (H: foo' B P) : B.
unfold fooable in H.
destruct H. (* Error here *)
Definition foo'' P := {b: B | P b}.
Definition bar'' B P (H: foo'' B P) : B.
unfold fooable in H.
destruct H. (* Just fine *)
exact x.
Qed.
(* Or, we could just stay in prop land, the only diff here is B: Prop)
Definition foo' (B: Prop) P := exists b: B, P b.
Definition bar' B P (H: foo' B P) : B.
unfold fooable in H.
destruct H. (* Error here *)
Resources
https://en.wikipedia.org/wiki/Dependent_type
Church_Encodings
Lambda_Cube
Propositions_As_Types_More
System_F
Modal_Logic_And_Monads
HoTT
Dependently_Typed_Languages
The goal is to eventually synthesize all of these languages such that I can present a feature and show its realization in multiple language. But for now, for my own note taking purposes, they would have to be separate.
Reminder: Make a list of capabilities for each language and think critically about what is accomplishable/unaccomplishable in each.
TODO: Introduction to Functional Programming.
Idris
Though I have been long aware of Idris's existence, I didn't see real Idris code until I saw the following tweet in Dec 2024:

This immediately captured my interest. I have spent time thinking about pipeline designs.
Coq
Lean
Dependent Types
Leans thinks of dependent types as types that depend on parameters (not necessarily values). For example, in
inductive Vector (α : Type u) : Nat → Type u where
| nil : Vector α 0
| cons : (a : α) → {n : Nat} → Vector α n → Vector α (n + 1)
, Vector α n
depends on two parameters: the type of the elements in the vector (a type) and the length of the vector (a value).
The Lean books conceptualizes dependent products as dependent functions and adopts
the "indexed family of types" view of dependent products.
Dependent functions can be seen as (a: α) -> β a
. β
is a family of types over α
, that is, there is a type β a
for each value a: α
.
Indexed family
This is fundamentally different from type constructors, which have form (α: Type) -> β α
. With type constructors, values aren't mentioned.
So "indexed family" is something
given α: Type
and β: α -> Type
,
For example, consider
def cons (α : Type) (a : α) (as : List α) : List α :=
List.cons a as
think of β as a family of types over α, that is, a type β a
Agda
Isabelle_HOL
FStar
SMT_Solvers
Unfiltered notes below:
1. A History of Satisfiability
1.1 Encountering the definition of Satisfiability
The history of satisfiability is best understood along its logic roots.
In logic, syntax and semantics are distinct.
Syntax is associated with proof theory, syntactic derivation, axioms with a specified grammatical forms, inference rules.
Semantics deals with model theory, satisfiability with respect to worlds.
In logic, there are two key terms: validity and consistency. They are interdefinable. if we make use of $\neg$: the argument from $ p_1,\dots,p_n $ to $ q $ is valid if and only if the set $ \set{p_1, \dots, p_n, \neg q} $ is inconsistent.
In proof theory, validity is called derivability. Consistency is just called consistency.
Derivability: $ q $ is derivable from $ p_1, \dots, p_n $ (in symbols, $ p_1, \dots, p_n \vdash q $) if and only if you can syntactically transform using inference rules of the system.
In model theory, validity is just called validity. Consistency is called satisfiability. $ A \vDash p $ means that $p$ is true in the structure $ A $.
Satisfiability: $ \set{p_1, \dots, p_n} $ is satisfiable if and only if for some $ A, A \vDash p_1, \dots, A \vDash p_n $.
The two branches of logic are systematically related. Most languages have syntactic and semantic concepts that coincide exactly ($\vdash$ corresponds to $\vDash$) — such an axiom system is said to be complete.
A solid example of proof theory vs model theory
1.2 Syllogism -> Boolean
Boole married algebra into logic, ending the two millennium reign of Aristotelian syllogisms.
TODO: Start with syllogism, quickly mention stoics, then go into bool and Venn. The story is that Boolean rendered syllogism unneeded and that Boole introduced Algebra into logic.
Aristotle invented syllogistic or categorical logic. The building block of the logic is categorical propositions, which is a subject-predicate sentence appearing in one of four forms: every $S$ is $P$, no $S$ is $P$, some $S$ is $P$, and some $S$ is not $P$.
In modern interpretation, a syllogism has the syntax $(p, q) \to r$ where $p, q, r$ are all categorical propositions. $p$ is the major premise, $q$ is the minor premise, and $r$ is the conclusion. But back in Aritotle's time, there is no notion of $\to$ and $\land$ yet, so a syllogism is written out as three sentences in natural language.
To show the validity of a syllogism, we need to be able to derive $r$ from $(p, q)$, using the following rules:
$$ \begin{align*} \text{Axiom 1:} & & (\text{every }X\text{ is }Y \land \text{every }Y\text{ is }Z) & \to \text{every }X\text{ is }Z \ \text{Axiom 2:} & & (\text{every }X\text{ is }Y \land \text{no }Y\text{ is }Z) & \to \text{no }X\text{ is }Z \ \end{align*} $$
and
$$ \begin{align*}
\text{Rule 1:} & \text{ From } & (p \land q) \to r & \text{ infer } & (\lnot r \land q) \to \lnot p \ \text{Rule 2:} & \text{ From } & (p \land q) \to r & \text{ infer } & (q \land p) \to r \ \text{Rule 3:} & \text{ From } & \text{no }X\text{ is }Y & \text{ infer } & \text{no }Y\text{ is }X \ \text{Rule 4:} & \text{ From } & (p \land q) \to \text{ no }X\text{ is }Y & \text{ infer } & (p \land q) \to \text{ some }X\text{ is not }Y \end{align*} $$
The axioms are valid syllogisms. The inferenced rules derive one syllogism from another. A prototypical example of a derivation follows:
- All men are mortal
- Socrates is a man
- Socrates is mortal
First, let's convert these to categorical propositions:
- Every man (Y) is mortal (Z)
- Every thing identical to Socrates (X) is men (Y)
- Every thing identical to Socrates (X) is mortal (Z)
The categorical expressions are less natural, but this is what it takes to lift singular terms into categorical proposition form. After conversion, our starting form almost matches Axiom 1 already. We have $$(\text{every }Y\text{ is }Z \land \text{every }X\text{ is }Y) \to \text{every }X\text{ is }Z$$ , applying Rule 2, we get
$$(\text{every }X\text{ is }Y \land \text{every }Y\text{ is }Z) \to \text{every }X\text{ is }Z$$
and hence our syllogism is valid by Axiom 1.
The syllogistic logic is intimately related with natural language — categorical propositions deal with subjects and predicates by default. This logic is less abstract than first-order logic but aligns with our natural reasoning a bit more.
The Stoics (300s-200s BC) generalized categorical propositions to propositions most are familiar with: A sentence which is either true or false (evalutes to a boolean). Though, in modern times, a proposition's truth might not even be decidable!
The Stoics introduced inference rules that have remained part of the logical lore ever since:
$$ \begin{align*} p, p \to q &\vDash q && \text{(modus ponens)} \ \neg q, p \to q &\vDash \neg p && \text{(modus tollens)}\ \neg q, p \lor q &\vDash q && \text{(disjunctive syllogism)}\ p \to q, q \to r & \vDash p \to r && \text{(hypothetical syllogism)} \end{align*} $$ But again, the syntax above is the modern interpretation of the Stoic rules. Back then $\to$, $\neg$ were not introduced.
So who introduced these symbols? Who brought algebra into logics?
It was Boole, Jevon, and Venn.
The Boolean algebra was invented by Boole and then subsequently improved upon by Jevon and Venn. A structure $\langle B, \lor, \land, \neg, 0, 1\rangle$ is a Boolean algebra if and only if $\lor$ and $\land$ are binary operations satisfies some rules.
$$ \begin{align*} x ∧ y = y ∧ x; && x ∨ ¬x = 1; \ x ∨ y = y ∨ x; && 1 ∧ x = x; \ x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z); && 0 ∨ x = x; \ x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z); && x ∧ ¬x = 0; \end{align*} $$
Boolean algebra is a structure, not an instance, so we can find examples for them.
The prototypical example is the two-element Boolean algebra:
$\land$ | $0$ | $1$ | $\lor$ | $0$ | $1$ | $a$ | $0$ | $1$ | ||
---|---|---|---|---|---|---|---|---|---|---|
$0$ | $0$ | $0$ | $0$ | $0$ | $1$ | $\neg a$ | $1$ | $0$ | ||
$1$ | $0$ | $1$ | $1$ | $1$ | $1$ |
More interestingly, the power set of any nonempty set $S$ forms a Boolean algebra. If you have taken discrete mathematics, then swapping $\land, \lor$ with $\cap$ and $\cup$ in the equations above will give you set theory statements you have surely seen.
So, the two-element Boolean algebra is a boolean algebra, and any powerset is a boolean algebras. Since the two-element Boolean algebra is used for logic, we see some connection between logic and set theory! There is no clearly defined, set-theoretic semantics for logic yet, but we are on our way.
But, we may use set theory to provide the semantics for syllogistic logic already! Due to the fact that syllogisms always come in 3 categorical propositions, we can use Venn diagrams to prove things (see this)! Previously, we proved validity (derivability in proof theory) using axioms and inference rules, all of which were syntactic. However, with set theory and Venn diagrams, the semantic validity is visual!
Usually, we don't get such nice visual semantics, but with syllogisms coming in precisely 3 terms, we can always use Venn diagrams. In general, proving is done with inference rules (syntactic), as they are mechanized and efficient. The underlying, semantic model of things contains the spirit of the logic. They give meaning to our logic and guide us in designing syntactic rules. Such semantics could be set-theorectical (as for syllogistic logic), category theorectic (used in type theory), and operational and denotational (used in programming language).
If you are curious about the structure itself, here are some algebraic, order-theoretic, and category-theoretic statements we can ascribe to it —
- A Boolean algebra give rises to a boolean ring (and vice versa). The rules of boolean algebra is close but is not trivially a ring. This is because $\lor, \land$ as addition ($+$) and multiplication ($\circ$) won't cut it. Consider the additive identity rule of rings: $x + (-x) = 0$. This is interpreted as $x \lor \neg x = 0$, which is not true. However, if we define and use exclusive or: $x \veebar y = (x \lor y) \land \neg(x \land y)$ as the addition, a ring follows.
- A Boolean algebra is a complemented distributive lattice. Terms like filters and ultrafilters are also thrown around here.
- Boolean algebras and boolean homomorphisms form a category. $\mathcal{P}(1)$ is initial.
Resources
Handbook of satisfiability
Propositional Logic -> valuations Predicate Logic / FOL -> structures
Both of these things are interpretations of our language/signature. They need not satisfy any theory. For example, a structure for the signature of groups need not be a group. But a model of a group is a structure for the signature of groups that satisfies the theory of groups.
In writing down the theory is where you actually assign constaints.
TODO: Add in Colin's notes
SMT Theory and Implementation
SAT
SAT is a NP-complete problem.
Input to SAT solver is a propositional formula in CNF (conjunctive normal form). The CNF is a conjunction of one or more clauses, where a clause is a disjunction of literals. In other words: One outer, multi-arged AND of many ORs.
For example, here's the CNF for the constraints on a single sudoku cell:
(and (or x1 x2 x3 ...)
(or (not x1) (not x2))
(or (not x1) (not x3))
(or (not x1) (not x4))
...
(or (not x8) (not x9))
)
The cell needs to be one of the values between [1..9]
, and it can not be
simultaenously more than one value.
Looking at the sudoku
Blob
asdf
Interfacing Solvers with SAT
Keywords
(TODO: Some of these should go into a SAT fundamentals page)
Atomic formula: An atomic formula is the smallest predicate grounded in a theory, they do not compose with logical connectives. For example, an atomic formula in difference arithmetic is $t -s \le c$. This whole formula corresponds to a single boolean variable in SAT.
Literal: Either an atomic formula or a negated atomic formula.
Theory Solver: A procedure that SMT solvers use for deciding the satisfiability of conjunctions of literals.
CNF (Conjunctive Normal Form)
Decide, Propagate, Backtrack
Theory Lemma
Example Theory Solver: Difference Arithmetic
In difference arithmetic, literals are of form $t -s \le c$ where $t, sare variables and
c` a constant.
Difference arithmetic is a subset of linear arithmetic that can be sped up due to their unique structure.
More precisely, we can search for negative cycles in a graph where $t - s \le c$ corresponds to an edge from node $s$ to $t$ with weight $c$.
By finding negative cycles,
Solving (TODO: what?) in difference arithmetic
Resources
Model_Checking
Full_Applications
Verifying_Rust_With_Verus
Verus Fundamentals
Verus is a tool for verifying the correctness of code written in Rust. Developers write specifications of what their code should do, and Verus statically checks that the executable Rust code will always satisfy the specifications for all possible executions of the code. Rather than adding run-time checks, Verus instead relies on powerful solvers to prove the code is correct.
Even if you do not plan to verify Rust, studying Verus can help you understand the pitfalls of SMT solvers. Further, I think the future of formal methods will have more tools like Verus. You do everything in your production code's language and you do a seamless lift + spec + prove when needed, in a formal language that looks familiar.
Note, the seamless lifting dream is not yet achieved: Verus only supports a subset of Rust. (TODO: Are we just lacking engineering, or are there theorectical limitations?)
Executable vs. Specification
Here is a common Verus
workflow for verifying cryptography:
- Wrap production crypto code in verus block,
- Write a
Verus
spec for each of the components, - Write a
Verus
proof for the equivalence between the production crypto functions and theVerus
spec for them.
But this begs the question: What's so correct about the Verus
specs that we wish to verify our production code on them?
Doesn't Verus
extend Rust
syntax? And so how does it
write any differently?
(TODO: Write spec for salsa20
to help answer this better)
Spec, proof, exec
int vs usize
Seq, Set, Map
Dealing with Recursion
Say f_spec
is recursive but the executable f
is imperative. How to verify that f
adheres to f_spec
?
Hoare logic; loop invariants.
Induction
You probably want to write a f_induct
that ensures the inductive case.
After which, the main proof won't be too far away.
Limitations: How to Fail Something True
The core reason for verification failures is undecidability: There is no algorithm that can prove general formulas true. For example, proving theorems with quantifiers is generally undecidable. We rely on Z3's pattern-based instantiations of quantifiers ("triggers") to use and prove such formulas.
Matching loop
Strategies and Process of Proof Engineering
assert + assume
Verified Binary Search
Evaluation: What was unnatural?
As this is a tour through formal methods, we should critically evaluate our tool. What felt unnatural? What did you thought was automatable and inferrable but is not? What felt unergonomic? Perform this evaluation with total creativity: Don't reject your criticisms just because they can't be implemented theory-wise.
I now present my list.
First, why can't everything just be automated?
Assert + assume is unergonomic.
Recursion and induction felt unergonomic.
Borrow_Checker_Linear_Types
Vest: Verified Binary Parsers in Verus
Analyzing anything verified begins with one key question: What is verified? In other words, what is the actual trusted base of your verified system?
In Vest
, the exact thing that is verified is the round-trip property of
parsers and serializers. That is, given bytes b
, b = (serialize . parse) b
,
and given structured data m
, m = (parse . serialize) m
.
All binary parsers and their corresponding serializers built by Vest
enjoys
round-trip. But what does that mean? Here's what it could mean: The user
writes a pair of parser and serializer using Vest
combinators
and then invokes some Vest
proof function to see if round-trip works.
If not, maybe the user gets to see counterexamples.
Vest
supports this. But, ideally, verification stays in the verification
repo as much as possible. If you write a verified crypto library, you expect the user
to be able to just use your verified cryptography. So, the user should not have to
import Verus
to invoke some Vest
proof function.
Instead, in the more idiomatic workflow, you write a Vest
protocol specification,
and Vest
will generate from it the round-tripped parser and serializers!
This DSL (domain specific language) -> Generation approach is employed by many formal systems.
So let's explore how Vest
internally verifies that for any correctly specified DSL,
the parser and serializer generated from it enjoys round-trip.
(TODO: Mention other parser properties like totality and completeness. TODO: Do those pertain to Vest?)
Verified_Cryptography
Cryptol (For Haskell Knowers)
The Cryptol language is a size-polymorphic dependently-typed programming language with support for polymorphic recursive functions. It has a small syntax tuned for applied cryptography, a lightweight module system, a Read–Eval– Print loop (REPL) top-level, and a rich set of built-in tools for performing high-assurance (literate) programming. Cryptol performs fairly advanced type inference, though as with most mainstream strongly typed functional languages, types can be manually specified as well.
Cryptol is implemented in Haskell. Cryptol is very similar to Haskell: From syntax to type system and language features to even the REPL. The key unique feature of Cryptol is size polymorphism.
Size-polymorphism
The following are types of selected Cryptol functions. You can use :t
in the Cryptol REPL to find out for yourself.
take : {front, back, a} [front + back]a -> [front]a
drop : {front, back, a} (fin front) => [front + back]a -> [back]a
split : {parts, each, a} (fin each) => [parts * each]a -> [parts][each]a
groupBy : {each, parts, a} (fin each) => [each * parts]a -> [parts][each]a
join : {parts, each, a} (fin each) => [parts][each]a -> [parts * each]a
transpose : {rows, cols, a} [rows][cols]a -> [cols][rows]a
head : {n, a} [1 + n]a -> a
tail : {n, a} [1 + n]a -> [n]a
Here's how we read these types. [front + back]a
can be thought of as a list (Cryptol calls it a sequence)
consisting of front + back
number of a
s. Note that because the number of elements is directly encoded
within the brackets, the item type we contain, a
, comes after the brackets. There is one implicit case:
If you just see a standalone [n]
, it means [n]Bit
. For example,
0xabc : [12]
[1..12 : [Integer]] : [12]Integer
[1..12 : [4]] : [12][4]
[[(y,x) | x <- [1..3 : Integer]] | y <- [1..3 : Integer]] : [3][3](Integer, Integer)
0xabc
is a 12-bit word (since each hexadecimal requires 4-bits), we type 0xabc : [12]Bit
,
but as a shorthand, this is just 0xabc : [12]
.
With this in mind, let's read the type of join
out loud:
join : {parts, each, a} (fin each) => [parts][each]a -> [parts * each]a
join
takes in a 2D array of some a
. This 2D array has row length of parts
and column length of each
. parts
and each
are inferrable length values.
(fin each)
is a typeclass constraint to enforce each
to be of finite length (we can ignore the why).
join
outputs a 1D array of the same a
. This 1D array has length parts * each
.
Compare this to the typical join
type in Haskell,
-- typical Haskell join
join :: [[a]] -> [a]
Cryptol's join
is a lot more powerful than Haskell's join
! Though, in the next section, I show you how one might get Cryptol's join
in Haskell.
The power to Cryptol's join
is that arithmetic constraints can be imposed at the type level.
Cryptol calls this "size-polymorphism". But more generally, when types can depend on values, as is the case here,
a language is "dependently typed". Cryptol offers limited dependent types, allowing types
to only depend on arithmetic values. This seems to be the sweet spot for a Cryptography DSL.
Size-polymorphism: Implementation
To encode size-polymorphism in Haskell is to wonder how to generally encode dependent types in Haskell. Dependent types allow us to variate (depend) types on values. If you have experiences with modern dependently typed languages such as those in Chapter 4, this will be smooth sailing. Though, you will see that dependent types are not ergonomic in Haskell.
If you have Haskell experience but no type-level Haskell experience, this could be a good intro to the topic! Just note, Haskell is not made for dependent types. Try Idris if you want the full Haskell + dependent types experience.
Here's how we encode a list that can depend on concrete length, in Haskell:
{-# LANGUAGE DataKinds #-}
data Nat = Z | S Nat
data List (n :: Nat) a where
Nil :: List 'Z a
Cons :: a -> List n a -> List ('S n) a
twoItems :: List ('S ('S 'Z)) String
twoItems = Cons "hi" (Cons "yo" Nil)
Usually, data Nat = Z | S Nat
is read as "type constructor Nat
introduces value constructors Z
and S
". But with -XDataKinds
enabled,
this will instead be interpreted as — "kind constructor Nat
introduces type constructors Z
and S
".
The type constructors Z
and S
are only referrable as 'Z
and 'S
. This is to help the compiler to distinct kind-type land from type-value land.
The key: Since 'Z
and 'S
are type constructors, and n
is a type constructor whose type is the kind Nat
, List 'Z a
and
List ('S n) a
are valid expressions. Hence, we can now type twoItem :: List ('S ('S 'Z)) String
— a concrete length is in this type!
Note, our type constructors have no value constructors. This is good, we are in kind-type land and not type-value land.
Finally, with Nat
being defined inductively, (TODO: type family plus and mult and define join).
TODO: Nat
is defined inductively to account for all natural numbers.
More on Arithmetic Types and Constraints
Cryptol interfaces with bit sequences and numbers through words and integers and operations
defined on them. Words have n
-bit types (i.e., [n]
); integers have type Integer
.
Cryptol is unsigned when operating on words and signed when operating on integers. For example,
Cryptol> (-1: Integer) <= 0
True
Cryptol> (-1: [1]) <= 0
False
Cryptol> -1 == 0xff
True -- because -1 is inferred as [8]
Cryptol> 0xff <= 0
False -- Of course not
Overflow/underflow happens, but it is expected because Cryptol uses modular arithmetic. The types suggest this —
(+): {a} (Ring a) => a -> a -> a
(-): {a} (Ring a) => a -> a -> a
(*): {a} (Ring a) => a -> a -> a
(/.) : {a} (Field a) => a -> a -> a -- division
How is it suggested? Well, (+)
deals only with the same a
. If we apply 0xabc : [12]
to (+)
, then
we must apply another [12]
to get a final [12]
. The only reasonable way for the final [12]
to be a proper sum of some sort is if additions wrap around (overflow).
As a side note, Ring
and Field
are algebraic structures. The set of integers form a Ring
because
we can define (+)
and (*)
according to a set of rules. However, the set of integers do not form
a Field
, which requires every nonzero element to have a multiplicative inverse. The prototypical
example of a Field
is the set of integers modulo $ m $ with $ m $ prime.
The last piece worth mentioning is arithmetic type inferencing. I illustrate these with examples instead of describing rules.
Cryptol> 1 + 1
2
Cryptol> 1 + 1 : [_]
-- Using '1' for type wildcard (_)
0
Cryptol> 1 + 1 : [2]
1
Cryptol> [1: [_] ...]
-- Using '1' for type wildcard (_)
[1, 0, 1, 0, 1, ...]
Cryptol> [0: [_] ...]
-- Using '0' for type wildcard (_)
[0, 0, 0, 0, 0, ...]
Other Concepts
Corecursion
Cryptol recommends on using corecursion (also known as recurrence, unfolds, or anamorphisms) as opposed to explicit recursion.
Let's understand through an example. The task: Create a stream that is seeded with four initial values and whose subsequent elements are appended into the stream and are computed by xor-ing the current stream element (1st) with two additional elements further into the stream (the 2nd and the 4th)?
With explicit recursion, this looks something like:
stream = helper [0x3F, 0xE2, 0x65, 0xCA]
where
helper xs =
let n = length xs
a = xs ! (n-4) -- three generations ago
b = xs ! (n-3) -- two generations ago
c = xs ! (n-1) -- current element
next = a `xor` b `xor` c
in helper (xs ++ [next])
Be hold. Corecursion.
stream = [0x3F, 0xE2, 0x65, 0xCA] # new
where new = [ a ^ b ^ c | a <- stream
| b <- drop 1 stream
| c <- drop 3 stream]
Soak in the beauty for a bit, then, think carefully about how laziness enables this code. Beware of showing this to your imperative friends.
Those who were seen dancing were thought to be insane by those who could not hear the music.
More generally,
ys = [i] # [ f (x, y) | x <- xs
| y <- ys
]
where xs
is some input sequence, i
is the result corresponding to the empty sequence, and f
is a transformer to compute the next element, using the previous result and the next input.
Using Type in Variable Context
The following code is legal:
length : {n, a, b} (fin n, Literal n b) => [n]a -> b
length _ = `n
The surprising thing is that we can take a type variable, n
, and use it in
the value definition. Dependent types (precisely size-polymorphism) allow this.
Arithmetic Type Constraint
Continuing the discussion on size-polymorphism, the following is also legal:
myWidth : {bits,len,elem} (fin len, fin bits, bits >= width len) => [len] elem -> [bits]
Look at the arithmetic constraint bits >= width len
. TODO: A better example?
myWidth
has two typeclass constraints, fin len
and fin bits
. It also has an
value-wise arithmetic constraint bits >= width len
.
At last, a quick discussion on other typeclasses:
- Logic: Includes operators
&&
,||
,^
, and~
. Cryptol types made of bits are instances ofLogic
. - Zero: Includes special constant
zero
and shifting operators<<
and>>
. All Cryptol built-in types are instances of classZero
(zero
is the bottom value). - Cmp: Includes
<
,>
,<=
,>=
,min
,max
. Requirement forEq
. - SignedCmp:
<$
,>$
, $<=$, and $>=$
. These interpret bitvectors as signed 2's complement numbers.
Classic Ciphers
Cryptol and SAW
Cryptol is very Haskell-like, from the syntax to the type system, and even to the REPL. Since Cryptol is a crypto DSL, it accustoms to the low-level, providing types such as bit-width. Let's play with the REPL:
# prints with base=16 by default
Cryptol> 0b101
0x5
Cryptol> :t 0b101 # :t or :type
0xabc : [3]
Cryptol> :t 0xabc
0xabc : [12]
Cryptol> :t 100
100 : {a} (Literal 100 a) => a
Cryptol> 100 : [6]
[error] at <interactive>:31:1--31:4:
• Unsolvable constraint:
6 >= 7
arising from
use of literal or demoted expression
at <interactive>:31:1--31:4
Cryptol> 100 : [7]
0x64
The [n]
is a type that informs us the number of bits a word is represented with.
Numbers such as 0b101
, 0xabc
, and 100
are all words.
0xabc : [12]
because each hexadecimal requires 4 bits to store.
Prompting the REPL with the value 100 : [6]
(explicitly annotated type)
gives an unsolvable constraint because the decimal 100 can't be represented in 6 bits.
The only unexpected output is :t 100
. What the hell is 100 : {a} (Literal 100 a) => a
?
Essentially, this is an uninferred type. The reason Cryptol cannot infer 100 : [7]
is
that Cryptol does not perform real arithmetic during type inference. Thus, since decimal
digit require anywhere from 1 to 4 bits to represent, Cryptol does not give 100 a concrete type.
Conversely, the number of bits used to represented binaries and hexadecimals are evident at
the syntax level.
0101
1010