Journey Through Formal Verifications

Prerequisites: You should know some functional programming — I give an introduction of it in the dependent languages chapter.

Glossary and Resources

Resources

Model Checking: Temporal Logics, Automata, SAT, $ \mu $-Calculus, concurrency

Foundations

Outline

  • Glossary: Introduce terms, reference later
  • Paradoxes: Puzzles to start thinking about that relates to inconsistency and others
  • Incompleteness, Undecidability, Inconsistency: Defining three key terms
  • Computability and the Typed Lambda Calculus: Defining computability, another key term, define lambda calculus up to simply typed.
  • Propositions as Types: The First Correspondence — Between Gentz's natural deduction and Church's lambda calculus. This allows us to start generalizing.
  • Church Encodings: It might be confusing what one can do in typed lambda calculus: So let's show how to encode actual useful things (arithmetic, boolean, lists, ADTs).
  • Lambda Cube: From STLC to CiC
  • Propositions as Types: More Incarnations —
    • System F and the Polymorphic Lambda Calculus
    • Modal Logic and Monads
  • HoTT: Topology and type theory?!!

Uncompiled

  1. Russell's paradox. This is a paradox expressable (encodable) in naive set theory and formal logic.

Addressing his own paradox, Russell built his theory of types, which appeared in his and Whitehead's Principia Mathematica. This is the first type theory.

The theory of types presents a hierarchy of types, which is still seen in coq's universes.

But we can't just abondon set theory and start using theory of types (TODO: Why)! So, the Zermelo-Fraenkel set theory with the axiom of choice (ZFC) is also developed in response to the foundational crisis.

Simultaenously, the axiomatic method became a de facto standard: the proof of a theorem must result from explicit axioms and previously proven theorems.

  1. Hilbert's program.

As a solution to the foundational crisis, Hilbert sought to ground all existing theories to a finite, complete set of axioms, and provide a proof that these axioms were consistent. Ultimately, the consistency of all of mathematics could be reduced to basic arithmetic.

In other words, Hilbert sought for a foundation that was:

  1. Complete: All true math statements can be proven in the foundation,
  2. Consistent: A proof that no contradiction can be obtained in the foundation,
  3. Decidability: There should be an algorithm for deciding the truth or falsity of any math statement.

(TODO: Godel Incompleteness, then Turing and Church takes defining computability and algorithms)

Although Godel stirred things up, Hilbert's program is not completely loss. Modern work on SAT solvers and decision procedures are rooted in these (and of earlier logical foundations).


Uncompiled

Paradoxes

Incompleteness_Undecidability_Inconsistency

  • Why must non-terminating (non-total) functions be rejected in Coq? Here is an example showing what would go wrong if Coq allowed non-terminating recursive functions:
Fixpoint loop_false (n : nat) : False := loop_false n.

As a consistent logic, this is not allowed. How would one define the evaluation function for a small imperative language with a while construct in Coq, then? One way is to not define a function (which ought to be total) in the first place, but a relation. We define an inductive proposition whose return type is Prop. The cost of this is that we now need to construct a proofs for evaluations.

Computability_And_Lambda_Calculus

Propositions As Types

Curry noticed that every proof in propositional logic corresponded (bidirectionally) to a function (term) in the (simply) typed lambda calculus. For example, consider

λx:a->b. λy:a. x y

, since this is proper function (term) in the simply typed lambda calculus, it witnesses the truth of some statement in propositional logic, intuitively, we know this is

(a->b) -> a -> b

A false statement in propositional logic will correspond to a badly-typed lambda expression. No typed lambdas should be able to witness something false!

What happens if we upgrade and complicate the propositional logic? Consider the predicate logic, which is propositional logic with quantiifers.

Well, since propositional logic is practically 1-to-1 with the simply typed lambda calculus, we will need new constructs to handle the quantifiers. Howard and De Bruijn introduced the dependent product and dependent pairs for the universal and existential quantiifer, respectively.

∀x:Nat. ∃y:Nat. x < y

-- corresponds to

Π(x: Nat) → Σ(y : Nat) × (x < y)

Here is a crucial question: Logic is more intuitive than the type of some lambda term. So are there situations where it would serve us better if we think in terms of type theory rather than logic?

Yes. Type theory is constructive by nature, logic needs not be.

In Coq, this means one can extract and compute with Type (constructive type theory) but not with Prop (logical propositions). For example, given a proof of A \/ B, which is defined as an inductive prop, you do not have access to whether an A or B was constructed to witness A \/ B. To obtain the witness, we have to use orb, which is the Type way of doing things.

As a further example, if you wish to obtain a witness, you must use a dependent pair; if you do not, a regular existential proposition is fine.

Definition foo' (B: Type) P := exists b: B, P b.
Definition bar' B P (H: foo' B P) : B.
  unfold fooable in H.
  destruct H. (* Error here *)

Definition foo'' P := {b: B | P b}.
Definition bar'' B P (H: foo'' B P) : B.
  unfold fooable in H.
  destruct H. (* Just fine *)
  exact x.
Qed.

(* Or, we could just stay in prop land, the only diff here is B: Prop)
Definition foo' (B: Prop) P := exists b: B, P b.
Definition bar' B P (H: foo' B P) : B.
  unfold fooable in H.
  destruct H. (* Error here *)

Resources

https://en.wikipedia.org/wiki/Dependent_type

Church_Encodings

Lambda_Cube

Propositions_As_Types_More

System_F

Modal_Logic_And_Monads

HoTT

Dependently_Typed_Languages

The goal is to eventually synthesize all of these languages such that I can present a feature and show its realization in multiple language. But for now, for my own note taking purposes, they would have to be separate.

Reminder: Make a list of capabilities for each language and think critically about what is accomplishable/unaccomplishable in each.

TODO: Introduction to Functional Programming.

Idris

Though I have been long aware of Idris's existence, I didn't see real Idris code until I saw the following tweet in Dec 2024:

screenshot of idris tweet

This immediately captured my interest. I have spent time thinking about pipeline designs.

Coq

Lean

Dependent Types

Leans thinks of dependent types as types that depend on parameters (not necessarily values). For example, in

inductive Vector (α : Type u) : Nat → Type u where
  | nil  : Vector α 0
  | cons : (a : α) → {n : Nat} → Vector α n → Vector α (n + 1)

, Vector α n depends on two parameters: the type of the elements in the vector (a type) and the length of the vector (a value).

The Lean books conceptualizes dependent products as dependent functions and adopts the "indexed family of types" view of dependent products. Dependent functions can be seen as (a: α) -> β a. β is a family of types over α, that is, there is a type β a for each value a: α.

Indexed family

This is fundamentally different from type constructors, which have form (α: Type) -> β α. With type constructors, values aren't mentioned. So "indexed family" is something

given α: Type and β: α -> Type,

For example, consider

def cons (α : Type) (a : α) (as : List α) : List α :=
  List.cons a as

think of β as a family of types over α, that is, a type β a

Agda

Isabelle_HOL

FStar

SMT_Solvers

Unfiltered notes below:

1. A History of Satisfiability

1.1 Encountering the definition of Satisfiability

The history of satisfiability is best understood along its logic roots.

In logic, syntax and semantics are distinct.

Syntax is associated with proof theory, syntactic derivation, axioms with a specified grammatical forms, inference rules.

Semantics deals with model theory, satisfiability with respect to worlds.

In logic, there are two key terms: validity and consistency. They are interdefinable. if we make use of $\neg$: the argument from $ p_1,\dots,p_n $ to $ q $ is valid if and only if the set $ \set{p_1, \dots, p_n, \neg q} $ is inconsistent.

In proof theory, validity is called derivability. Consistency is just called consistency.

Derivability: $ q $ is derivable from $ p_1, \dots, p_n $ (in symbols, $ p_1, \dots, p_n \vdash q $) if and only if you can syntactically transform using inference rules of the system.

In model theory, validity is just called validity. Consistency is called satisfiability. $ A \vDash p $ means that $p$ is true in the structure $ A $.

Satisfiability: $ \set{p_1, \dots, p_n} $ is satisfiable if and only if for some $ A, A \vDash p_1, \dots, A \vDash p_n $.

The two branches of logic are systematically related. Most languages have syntactic and semantic concepts that coincide exactly ($\vdash$ corresponds to $\vDash$) — such an axiom system is said to be complete.

A solid example of proof theory vs model theory

1.2 Syllogism -> Boolean

Boole married algebra into logic, ending the two millennium reign of Aristotelian syllogisms.

TODO: Start with syllogism, quickly mention stoics, then go into bool and Venn. The story is that Boolean rendered syllogism unneeded and that Boole introduced Algebra into logic.

Aristotle invented syllogistic or categorical logic. The building block of the logic is categorical propositions, which is a subject-predicate sentence appearing in one of four forms: every $S$ is $P$, no $S$ is $P$, some $S$ is $P$, and some $S$ is not $P$.

In modern interpretation, a syllogism has the syntax $(p, q) \to r$ where $p, q, r$ are all categorical propositions. $p$ is the major premise, $q$ is the minor premise, and $r$ is the conclusion. But back in Aritotle's time, there is no notion of $\to$ and $\land$ yet, so a syllogism is written out as three sentences in natural language.

To show the validity of a syllogism, we need to be able to derive $r$ from $(p, q)$, using the following rules:

$$ \begin{align*} \text{Axiom 1:} & & (\text{every }X\text{ is }Y \land \text{every }Y\text{ is }Z) & \to \text{every }X\text{ is }Z \ \text{Axiom 2:} & & (\text{every }X\text{ is }Y \land \text{no }Y\text{ is }Z) & \to \text{no }X\text{ is }Z \ \end{align*} $$

and

$$ \begin{align*}

\text{Rule 1:} & \text{ From } & (p \land q) \to r & \text{ infer } & (\lnot r \land q) \to \lnot p \ \text{Rule 2:} & \text{ From } & (p \land q) \to r & \text{ infer } & (q \land p) \to r \ \text{Rule 3:} & \text{ From } & \text{no }X\text{ is }Y & \text{ infer } & \text{no }Y\text{ is }X \ \text{Rule 4:} & \text{ From } & (p \land q) \to \text{ no }X\text{ is }Y & \text{ infer } & (p \land q) \to \text{ some }X\text{ is not }Y \end{align*} $$

The axioms are valid syllogisms. The inferenced rules derive one syllogism from another. A prototypical example of a derivation follows:

  1. All men are mortal
  2. Socrates is a man
  3. Socrates is mortal

First, let's convert these to categorical propositions:

  1. Every man (Y) is mortal (Z)
  2. Every thing identical to Socrates (X) is men (Y)
  3. Every thing identical to Socrates (X) is mortal (Z)

The categorical expressions are less natural, but this is what it takes to lift singular terms into categorical proposition form. After conversion, our starting form almost matches Axiom 1 already. We have $$(\text{every }Y\text{ is }Z \land \text{every }X\text{ is }Y) \to \text{every }X\text{ is }Z$$ , applying Rule 2, we get

$$(\text{every }X\text{ is }Y \land \text{every }Y\text{ is }Z) \to \text{every }X\text{ is }Z$$

and hence our syllogism is valid by Axiom 1.

The syllogistic logic is intimately related with natural language — categorical propositions deal with subjects and predicates by default. This logic is less abstract than first-order logic but aligns with our natural reasoning a bit more.

The Stoics (300s-200s BC) generalized categorical propositions to propositions most are familiar with: A sentence which is either true or false (evalutes to a boolean). Though, in modern times, a proposition's truth might not even be decidable!

The Stoics introduced inference rules that have remained part of the logical lore ever since:

$$ \begin{align*} p, p \to q &\vDash q && \text{(modus ponens)} \ \neg q, p \to q &\vDash \neg p && \text{(modus tollens)}\ \neg q, p \lor q &\vDash q && \text{(disjunctive syllogism)}\ p \to q, q \to r & \vDash p \to r && \text{(hypothetical syllogism)} \end{align*} $$ But again, the syntax above is the modern interpretation of the Stoic rules. Back then $\to$, $\neg$ were not introduced.

So who introduced these symbols? Who brought algebra into logics?

It was Boole, Jevon, and Venn.

The Boolean algebra was invented by Boole and then subsequently improved upon by Jevon and Venn. A structure $\langle B, \lor, \land, \neg, 0, 1\rangle$ is a Boolean algebra if and only if $\lor$ and $\land$ are binary operations satisfies some rules.

$$ \begin{align*} x ∧ y = y ∧ x; && x ∨ ¬x = 1; \ x ∨ y = y ∨ x; && 1 ∧ x = x; \ x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z); && 0 ∨ x = x; \ x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z); && x ∧ ¬x = 0; \end{align*} $$

Boolean algebra is a structure, not an instance, so we can find examples for them.

The prototypical example is the two-element Boolean algebra:

$\land$$0$$1$$\lor$$0$$1$$a$$0$$1$
$0$$0$$0$$0$$0$$1$$\neg a$$1$$0$
$1$$0$$1$$1$$1$$1$

More interestingly, the power set of any nonempty set $S$ forms a Boolean algebra. If you have taken discrete mathematics, then swapping $\land, \lor$ with $\cap$ and $\cup$ in the equations above will give you set theory statements you have surely seen.

So, the two-element Boolean algebra is a boolean algebra, and any powerset is a boolean algebras. Since the two-element Boolean algebra is used for logic, we see some connection between logic and set theory! There is no clearly defined, set-theoretic semantics for logic yet, but we are on our way.

But, we may use set theory to provide the semantics for syllogistic logic already! Due to the fact that syllogisms always come in 3 categorical propositions, we can use Venn diagrams to prove things (see this)! Previously, we proved validity (derivability in proof theory) using axioms and inference rules, all of which were syntactic. However, with set theory and Venn diagrams, the semantic validity is visual!

Usually, we don't get such nice visual semantics, but with syllogisms coming in precisely 3 terms, we can always use Venn diagrams. In general, proving is done with inference rules (syntactic), as they are mechanized and efficient. The underlying, semantic model of things contains the spirit of the logic. They give meaning to our logic and guide us in designing syntactic rules. Such semantics could be set-theorectical (as for syllogistic logic), category theorectic (used in type theory), and operational and denotational (used in programming language).

If you are curious about the structure itself, here are some algebraic, order-theoretic, and category-theoretic statements we can ascribe to it —

  1. A Boolean algebra give rises to a boolean ring (and vice versa). The rules of boolean algebra is close but is not trivially a ring. This is because $\lor, \land$ as addition ($+$) and multiplication ($\circ$) won't cut it. Consider the additive identity rule of rings: $x + (-x) = 0$. This is interpreted as $x \lor \neg x = 0$, which is not true. However, if we define and use exclusive or: $x \veebar y = (x \lor y) \land \neg(x \land y)$ as the addition, a ring follows.
  2. A Boolean algebra is a complemented distributive lattice. Terms like filters and ultrafilters are also thrown around here.
  3. Boolean algebras and boolean homomorphisms form a category. $\mathcal{P}(1)$ is initial.

Resources

Handbook of satisfiability

Syllogism and Venn Diagrams

Propositional Logic -> valuations Predicate Logic / FOL -> structures

Both of these things are interpretations of our language/signature. They need not satisfy any theory. For example, a structure for the signature of groups need not be a group. But a model of a group is a structure for the signature of groups that satisfies the theory of groups.

In writing down the theory is where you actually assign constaints.

TODO: Add in Colin's notes

SMT Theory and Implementation

SAT

SAT is a NP-complete problem.

Input to SAT solver is a propositional formula in CNF (conjunctive normal form). The CNF is a conjunction of one or more clauses, where a clause is a disjunction of literals. In other words: One outer, multi-arged AND of many ORs.

For example, here's the CNF for the constraints on a single sudoku cell:

(and (or x1 x2 x3 ...)
     (or (not x1) (not x2))
     (or (not x1) (not x3))
     (or (not x1) (not x4))
	 ...
     (or (not x8) (not x9))

)

The cell needs to be one of the values between [1..9], and it can not be simultaenously more than one value.

Looking at the sudoku

Blob

asdf

Interfacing Solvers with SAT

Keywords

(TODO: Some of these should go into a SAT fundamentals page)

Atomic formula: An atomic formula is the smallest predicate grounded in a theory, they do not compose with logical connectives. For example, an atomic formula in difference arithmetic is $t -s \le c$. This whole formula corresponds to a single boolean variable in SAT.

Literal: Either an atomic formula or a negated atomic formula.

Theory Solver: A procedure that SMT solvers use for deciding the satisfiability of conjunctions of literals.

CNF (Conjunctive Normal Form)

Decide, Propagate, Backtrack

Theory Lemma

Example Theory Solver: Difference Arithmetic

In difference arithmetic, literals are of form $t -s \le c$ where $t, sare variables andc` a constant.

Difference arithmetic is a subset of linear arithmetic that can be sped up due to their unique structure.

More precisely, we can search for negative cycles in a graph where $t - s \le c$ corresponds to an edge from node $s$ to $t$ with weight $c$.

By finding negative cycles,

Solving (TODO: what?) in difference arithmetic

Resources

Model_Checking

Full_Applications

Verifying_Rust_With_Verus

Verus Fundamentals

Verus is a tool for verifying the correctness of code written in Rust. Developers write specifications of what their code should do, and Verus statically checks that the executable Rust code will always satisfy the specifications for all possible executions of the code. Rather than adding run-time checks, Verus instead relies on powerful solvers to prove the code is correct.

Even if you do not plan to verify Rust, studying Verus can help you understand the pitfalls of SMT solvers. Further, I think the future of formal methods will have more tools like Verus. You do everything in your production code's language and you do a seamless lift + spec + prove when needed, in a formal language that looks familiar.

Note, the seamless lifting dream is not yet achieved: Verus only supports a subset of Rust. (TODO: Are we just lacking engineering, or are there theorectical limitations?)

Executable vs. Specification

Here is a common Verus workflow for verifying cryptography:

  1. Wrap production crypto code in verus block,
  2. Write a Verus spec for each of the components,
  3. Write a Verus proof for the equivalence between the production crypto functions and the Verus spec for them.

But this begs the question: What's so correct about the Verus specs that we wish to verify our production code on them? Doesn't Verus extend Rust syntax? And so how does it write any differently?

(TODO: Write spec for salsa20 to help answer this better)

Spec, proof, exec

int vs usize

Seq, Set, Map

Dealing with Recursion

Say f_spec is recursive but the executable f is imperative. How to verify that f adheres to f_spec?

Hoare logic; loop invariants.

Induction

You probably want to write a f_induct that ensures the inductive case. After which, the main proof won't be too far away.

Limitations: How to Fail Something True

The core reason for verification failures is undecidability: There is no algorithm that can prove general formulas true. For example, proving theorems with quantifiers is generally undecidable. We rely on Z3's pattern-based instantiations of quantifiers ("triggers") to use and prove such formulas.

Matching loop

Strategies and Process of Proof Engineering

assert + assume

Evaluation: What was unnatural?

As this is a tour through formal methods, we should critically evaluate our tool. What felt unnatural? What did you thought was automatable and inferrable but is not? What felt unergonomic? Perform this evaluation with total creativity: Don't reject your criticisms just because they can't be implemented theory-wise.

I now present my list.

First, why can't everything just be automated?

Assert + assume is unergonomic.

Recursion and induction felt unergonomic.

Borrow_Checker_Linear_Types

Vest: Verified Binary Parsers in Verus

Analyzing anything verified begins with one key question: What is verified? In other words, what is the actual trusted base of your verified system?

In Vest, the exact thing that is verified is the round-trip property of parsers and serializers. That is, given bytes b, b = (serialize . parse) b, and given structured data m, m = (parse . serialize) m.

All binary parsers and their corresponding serializers built by Vest enjoys round-trip. But what does that mean? Here's what it could mean: The user writes a pair of parser and serializer using Vest combinators and then invokes some Vest proof function to see if round-trip works. If not, maybe the user gets to see counterexamples.

Vest supports this. But, ideally, verification stays in the verification repo as much as possible. If you write a verified crypto library, you expect the user to be able to just use your verified cryptography. So, the user should not have to import Verus to invoke some Vest proof function.

Instead, in the more idiomatic workflow, you write a Vest protocol specification, and Vest will generate from it the round-tripped parser and serializers! This DSL (domain specific language) -> Generation approach is employed by many formal systems.

So let's explore how Vest internally verifies that for any correctly specified DSL, the parser and serializer generated from it enjoys round-trip.

(TODO: Mention other parser properties like totality and completeness. TODO: Do those pertain to Vest?)

Verified_Cryptography

Cryptol (For Haskell Knowers)

The Cryptol language is a size-polymorphic dependently-typed programming language with support for polymorphic recursive functions. It has a small syntax tuned for applied cryptography, a lightweight module system, a Read–Eval– Print loop (REPL) top-level, and a rich set of built-in tools for performing high-assurance (literate) programming. Cryptol performs fairly advanced type inference, though as with most mainstream strongly typed functional languages, types can be manually specified as well.

Cryptol is implemented in Haskell. Cryptol is very similar to Haskell: From syntax to type system and language features to even the REPL. The key unique feature of Cryptol is size polymorphism.

Size-polymorphism

The following are types of selected Cryptol functions. You can use :t in the Cryptol REPL to find out for yourself.

take : {front, back, a} [front + back]a -> [front]a
drop : {front, back, a} (fin front) => [front + back]a -> [back]a
split : {parts, each, a} (fin each) => [parts * each]a -> [parts][each]a
groupBy : {each, parts, a} (fin each) => [each * parts]a -> [parts][each]a
join : {parts, each, a} (fin each) => [parts][each]a -> [parts * each]a
transpose : {rows, cols, a} [rows][cols]a -> [cols][rows]a
head : {n, a} [1 + n]a -> a
tail : {n, a} [1 + n]a -> [n]a

Here's how we read these types. [front + back]a can be thought of as a list (Cryptol calls it a sequence) consisting of front + back number of as. Note that because the number of elements is directly encoded within the brackets, the item type we contain, a, comes after the brackets. There is one implicit case: If you just see a standalone [n], it means [n]Bit. For example,

0xabc : [12]
[1..12 : [Integer]] : [12]Integer
[1..12 : [4]] : [12][4]
[[(y,x) | x <- [1..3 : Integer]] | y <- [1..3 : Integer]] : [3][3](Integer, Integer)

0xabc is a 12-bit word (since each hexadecimal requires 4-bits), we type 0xabc : [12]Bit, but as a shorthand, this is just 0xabc : [12].

With this in mind, let's read the type of join out loud:

join : {parts, each, a} (fin each) => [parts][each]a -> [parts * each]a

join takes in a 2D array of some a. This 2D array has row length of parts and column length of each. parts and each are inferrable length values. (fin each) is a typeclass constraint to enforce each to be of finite length (we can ignore the why).

join outputs a 1D array of the same a. This 1D array has length parts * each.

Compare this to the typical join type in Haskell,

-- typical Haskell join
join :: [[a]] -> [a]

Cryptol's join is a lot more powerful than Haskell's join! Though, in the next section, I show you how one might get Cryptol's join in Haskell.

The power to Cryptol's join is that arithmetic constraints can be imposed at the type level. Cryptol calls this "size-polymorphism". But more generally, when types can depend on values, as is the case here, a language is "dependently typed". Cryptol offers limited dependent types, allowing types to only depend on arithmetic values. This seems to be the sweet spot for a Cryptography DSL.

Size-polymorphism: Implementation

To encode size-polymorphism in Haskell is to wonder how to generally encode dependent types in Haskell. Dependent types allow us to variate (depend) types on values. If you have experiences with modern dependently typed languages such as those in Chapter 4, this will be smooth sailing. Though, you will see that dependent types are not ergonomic in Haskell.

If you have Haskell experience but no type-level Haskell experience, this could be a good intro to the topic! Just note, Haskell is not made for dependent types. Try Idris if you want the full Haskell + dependent types experience.

Here's how we encode a list that can depend on concrete length, in Haskell:

{-# LANGUAGE DataKinds #-}

data Nat = Z | S Nat

data List (n :: Nat) a where
   Nil :: List 'Z a 
   Cons :: a -> List n a -> List ('S n) a

twoItems :: List ('S ('S 'Z)) String
twoItems = Cons "hi" (Cons "yo" Nil)

Usually, data Nat = Z | S Nat is read as "type constructor Nat introduces value constructors Z and S". But with -XDataKinds enabled, this will instead be interpreted as — "kind constructor Nat introduces type constructors Z and S".

The type constructors Z and S are only referrable as 'Z and 'S. This is to help the compiler to distinct kind-type land from type-value land.

The key: Since 'Z and 'S are type constructors, and n is a type constructor whose type is the kind Nat, List 'Z a and List ('S n) a are valid expressions. Hence, we can now type twoItem :: List ('S ('S 'Z)) String — a concrete length is in this type!

Note, our type constructors have no value constructors. This is good, we are in kind-type land and not type-value land.

Finally, with Nat being defined inductively, (TODO: type family plus and mult and define join). TODO: Nat is defined inductively to account for all natural numbers.

More on Arithmetic Types and Constraints

Cryptol interfaces with bit sequences and numbers through words and integers and operations defined on them. Words have n-bit types (i.e., [n]); integers have type Integer. Cryptol is unsigned when operating on words and signed when operating on integers. For example,

Cryptol> (-1: Integer) <= 0
True
Cryptol> (-1: [1]) <= 0
False
Cryptol> -1 == 0xff
True -- because -1 is inferred as [8]
Cryptol> 0xff <= 0
False -- Of course not

Overflow/underflow happens, but it is expected because Cryptol uses modular arithmetic. The types suggest this —

(+): {a} (Ring a) => a -> a -> a
(-): {a} (Ring a) => a -> a -> a
(*): {a} (Ring a) => a -> a -> a
(/.) : {a} (Field a) => a -> a -> a -- division

How is it suggested? Well, (+) deals only with the same a. If we apply 0xabc : [12] to (+), then we must apply another [12] to get a final [12]. The only reasonable way for the final [12] to be a proper sum of some sort is if additions wrap around (overflow).

As a side note, Ring and Field are algebraic structures. The set of integers form a Ring because we can define (+) and (*) according to a set of rules. However, the set of integers do not form a Field, which requires every nonzero element to have a multiplicative inverse. The prototypical example of a Field is the set of integers modulo $ m $ with $ m $ prime.

The last piece worth mentioning is arithmetic type inferencing. I illustrate these with examples instead of describing rules.

Cryptol> 1 + 1
2
Cryptol> 1 + 1 : [_]
-- Using '1' for type wildcard (_)
0
Cryptol> 1 + 1 : [2]
1
Cryptol> [1: [_] ...]
-- Using '1' for type wildcard (_)
[1, 0, 1, 0, 1, ...]
Cryptol> [0: [_] ...]
-- Using '0' for type wildcard (_)
[0, 0, 0, 0, 0, ...]

Other Concepts

Corecursion

Cryptol recommends on using corecursion (also known as recurrence, unfolds, or anamorphisms) as opposed to explicit recursion.

Let's understand through an example. The task: Create a stream that is seeded with four initial values and whose subsequent elements are appended into the stream and are computed by xor-ing the current stream element (1st) with two additional elements further into the stream (the 2nd and the 4th)?

With explicit recursion, this looks something like:

stream = helper [0x3F, 0xE2, 0x65, 0xCA]
  where
    helper xs =
      let n = length xs
          a = xs ! (n-4) -- three generations ago
          b = xs ! (n-3) -- two generations ago
          c = xs ! (n-1) -- current element
          next = a `xor` b `xor` c
      in helper (xs ++ [next])

Be hold. Corecursion.

stream = [0x3F, 0xE2, 0x65, 0xCA] # new
  where new = [ a ^ b ^ c | a <- stream 
                          | b <- drop 1 stream 
                          | c <- drop 3 stream]

Soak in the beauty for a bit, then, think carefully about how laziness enables this code. Beware of showing this to your imperative friends.

Those who were seen dancing were thought to be insane by those who could not hear the music.

More generally,

ys = [i] # [ f (x, y) | x <- xs
                      | y <- ys
]

where xs is some input sequence, i is the result corresponding to the empty sequence, and f is a transformer to compute the next element, using the previous result and the next input.

Using Type in Variable Context

The following code is legal:

length : {n, a, b} (fin n, Literal n b) => [n]a -> b
length _ = `n

The surprising thing is that we can take a type variable, n, and use it in the value definition. Dependent types (precisely size-polymorphism) allow this.

Arithmetic Type Constraint

Continuing the discussion on size-polymorphism, the following is also legal:

myWidth : {bits,len,elem} (fin len, fin bits, bits >= width len) => [len] elem -> [bits]

Look at the arithmetic constraint bits >= width len. TODO: A better example?

myWidth has two typeclass constraints, fin len and fin bits. It also has an value-wise arithmetic constraint bits >= width len.

At last, a quick discussion on other typeclasses:

  • Logic: Includes operators &&, ||, ^, and ~. Cryptol types made of bits are instances of Logic.
  • Zero: Includes special constant zero and shifting operators << and >>. All Cryptol built-in types are instances of class Zero (zero is the bottom value).
  • Cmp: Includes <, >, <=, >=, min, max. Requirement for Eq.
  • SignedCmp: <$, >$, $<=$, and $>=$. These interpret bitvectors as signed 2's complement numbers.

Classic Ciphers

Cryptol and SAW

Cryptol is very Haskell-like, from the syntax to the type system, and even to the REPL. Since Cryptol is a crypto DSL, it accustoms to the low-level, providing types such as bit-width. Let's play with the REPL:

# prints with base=16 by default
Cryptol> 0b101
0x5
Cryptol> :t 0b101 # :t or :type
0xabc : [3]
Cryptol> :t 0xabc
0xabc : [12]
Cryptol> :t 100
100 : {a} (Literal 100 a) => a
Cryptol> 100 : [6]

[error] at <interactive>:31:1--31:4:
  • Unsolvable constraint:
      6 >= 7
        arising from
        use of literal or demoted expression
        at <interactive>:31:1--31:4
Cryptol> 100 : [7]
0x64

The [n] is a type that informs us the number of bits a word is represented with. Numbers such as 0b101, 0xabc, and 100 are all words. 0xabc : [12] because each hexadecimal requires 4 bits to store. Prompting the REPL with the value 100 : [6] (explicitly annotated type) gives an unsolvable constraint because the decimal 100 can't be represented in 6 bits.

The only unexpected output is :t 100. What the hell is 100 : {a} (Literal 100 a) => a? Essentially, this is an uninferred type. The reason Cryptol cannot infer 100 : [7] is that Cryptol does not perform real arithmetic during type inference. Thus, since decimal digit require anywhere from 1 to 4 bits to represent, Cryptol does not give 100 a concrete type. Conversely, the number of bits used to represented binaries and hexadecimals are evident at the syntax level.

0101

1010

Cryptol_And_Saw

Verified_Hardware_With_SystemVerilog