Introduction

It’s just syntax, said once a trained mathematician, and this has a grain of truth. Different higher-order theorem provers do not differ too much in their definitional mechanisms, only in concrete syntax. Still, if you want to master them, syntax is essential.

Here I would like to gather the most important surface syntax of the HOL4 theorem prover.

A Script file

HOL4 proof scripts are stored in XXXXScript.sml files. This file shows the basic Syntax:

open HolKernel bossLib

val _ = new_theory "trivial"

Theorem reflexivity:
  x = x
Proof
  simp []
QED

val _ = export_theory();

With open we can refer to existing HOL4 libraries. For an extensive documentation of libraries, see the HOL Reference Page.

There are two main modes of using HOL4, the interactive mode that is helped with the hol-vim, emacs hol mode or the Visual Studio Code extension, and the batch mode, that is, running your script files through Holmake. In the interactive mode you should load these libraries by hand, in batch mode it is done automatically. To do so interactively, one must use the keyword load, but the user interfaces derive the component list from the open declaration. The interactive mode is a read-eval-print-loop of the underlying SML compiler. The user interfaces are using this interactive loop.

The command new_theory defines the name of the theory and starts the session. The file name’s prefix should coincide with the string given here, otherwise you get an error message. So here my file is trivialScript.sml.

The file ends with export_theory, which closes the session started with new_theory. Running Holmake will create a file that can be used from other script files, the same way as we imported system libraries with “open”.

Between these there is a theorem and its proof in a Theorem-Proof-QED block. These keywords should start at the beginning of the line. We should give a name to the theorem, afterwards we can refer to it by this name. After theorem, we can give the goal we would like to prove, and after Proof, one can write the tactic that is meant to prove the goal.

Definitions

The following user-friendly syntax elements are mentioned in the Description index as special syntactic forms for scripts .

Datatypes

A datatype is an algebraic type, familiar from functional languages. For a binary tree, we can write

Datatype `bintree = Leaf | Node bintree 'a bintree`

in HOL4, which is the equivalent of

datatype 'a bintree = Leaf | Node of 'a bintree * 'a * 'a bintree

in Standard ML. As we can see, the HOL4 definition is more terse. Be aware of the use of backticks for the Datatype definition in HOL4. An alternative HOL4 syntax is

Datatype: btree = LEAF | NODE btree 'a btree
End

As before, the Datatype and `End´ keywords should start their lines.

Recursive Functions

Defining a recursive function has this syntax:

Definition fibonacci_def:
  (fib 0 = 0) ∧
  (fib 1 = 1) ∧
  (fib n = fib (n-2) + fib (n-1))
End

As usual, the order of patterns matter, the first has precedence over the others, and so on. Also usual is the position of Definition and End at the start of the line.

When the system cannot prove automatically that the function definition always terminates, so the function is total, we need to submit a termination proof. Its syntax involves the keyword Termination at the beginning of a line following the function definition:

Definition qsort_def:
  (qsort ord [] = []) /\
  (qsort ord (h::t) =
    qsort ord (FILTER (\x. ord x h) t) ++ [h] ++
    qsort ord (FILTER (\x. ~(ord x h)) t))
Termination
  WF_REL_TAC `measure (LENGTH o SND)` THEN cheat
End

Inductive Relations

For defining inductive relations the syntax is:

Inductive Even:
  even 0 ∧
  ∀n. even n ⇒ even (n+2)
End

Records

We can define records as

Datatype: person = <| employed : bool ; age : num ; name : string |>
End

and for a value let’s have

<| age := 21; employed := F; name := "Layabout" |>

For field access, person_age is the syntax. For field update, bob with employed updated_by f is written, where f is the function that is used for a new value of the field employed.

Case expressions

Case expressions are for pattern matching:

case n of
  0 => "none"
| 1 => "one"
| 2 => "two"
| _ => "many"

Type abbreviations

Compound types can be abbreviated with the new syntax:

Type set = “:’a -> bool”

This is an input-only syntax if used this way. However, if you use the attribute pp, which is for pretty-printing, then HOL4 will print the type abbreviation in its output properly:

Type set[pp] = “:’a -> bool”

Triviality

For locally important theorems, we can use an attribute local for Theorem definitions:

Theorem fermat[local]:

These are not exported, only used locally. The user-friendly syntax for this is

Triviality fermat:

with the usual Proof and QED.

Overloading

For overloading, there is the syntax

Overload "%%" = “CEILING_MOD”;

Types

HOL4 gives us its type grammar with the command type_grammar:

> type_grammar();
val it =
   Rules:
     (50)   TY  ::=  TY -> TY [fun] (R-associative)
     (60)   TY  ::=  TY + TY [sum] (R-associative)
     (70)   TY  ::=  TY # TY [prod] (R-associative)
            TY  ::=  bool | (TY, TY) fun | ind | TY itself | TY list | num |
                     one | TY option | (TY, TY) prod | TY recspace | TY set |
                     (TY, TY) sum | unit
            TY  ::=  TY[TY] (array type)
   Type abbreviations:
     bool             = min$bool
     (α, β) fun       = (α, β) min$fun
     ind              = min$ind
     α itself         = α bool$itself
     α list           = α list$list
     num              = num$num
     one              = one$one                [not printed]
     α option         = α option$option
     (α, β) prod      = (α, β) pair$prod
     α recspace       = α ind_type$recspace
     α set            = (α, min$bool) min$fun  [not printed]
     (α, β) sum       = (α, β) sum$sum
     unit             = one$one               : type_grammar.grammar

We start by covering the non-standard notation. At the beginning we see the notation for function, sum and record types, with record type being the somewhat unusual hash mark, not the standard star in SML. All these are right-associative (i.e. A op B op C means A op (B op C)). The ind type is the one for individuals, as we know it in formal logic. For any type, α itself is a one-element type whose element represents the type. Its role is technical in building more complex types like the fixed width word type. The type num is the type of natural numbers. The type one is the same as the unit type with the only element (). Type α recspace is again a technical type used in the type definition package, holding finitely branching recursive types built from subsets of α. For an expert description see an old mail of John Harrison.

Type variables

For input syntax, type variables are possibly a prime followed by an alphanumeric string: 'a 'foo 'A11. A prime followed by a single lower case latin letter is translated to a lowercase greek letter in output. Lambda is omitted, because of its special status in higher order logic, namely, the lambda operator. The input syntax 'l is translated to μ, and so on, until 'y. Directly entering lowercase greek letters is also possible. Caveat: do not use the leading prime for them.

Type constraint

You might need a leading space before a type constraint so the system would not try to interpret the leading colon as part of the previous symbol:

$= :bool->bool->bool

Lexing

HOL4 variable names should be identifiers:

alphanumerics with leading letter and possibly underscores inside
symbolic ASCII and Unicode identifiers: not letter and not digit Unicode characters are regarded as symbolic

Dollar

The dollar has a number of uses: single $ is a low-precedence right-associative function application operator borrowed from Haskell: FACT $ SUC $ 2 + 3 = FACT(SUC(2+3)) . Starting with $ only $-only symbolic identifiers are possible: $, $$, $$$ … . We can get rid of infix and other special syntax status of identifiers with a prefixing dollar: $< . Without $ HOL4 gives an error message, but this works:

> type_of ``$<``;
val it = “:num -> num -> bool”: hol_type

This is useful when we want to use the function as an argument. An alternative syntax for this is using parentheses: (<).

Last, $ is used as a separator between theory name and identifier, forming a long identifier: bool$COND .

Non-aggregeting characters

Characters in this class, unless there is a symbolic identifier defined with them, are lexed separately: parentheses, curly braces and square brackets: () {} [], tilde, dot, comma, semicolon and hyphen: ~ . , ; - . So, (( +; and -+ are treated as two characters. Non-aggregating Unicode characters are:

¬ U+00AC ⟨ U+27E8
⌈ U+2308 ⟩ U+27E9
⌉ U+2309 ⦃ U+2983
⌊ U+230A ⦄ U+2984
⌋ U+230B ⦇ U+2987
⟦ U+27E6 ⦈ U+2988
⟧ U+27E7

Whitespace

Space, Carriage Return, Line Feed, Tab and Form Feed are separators.

Numbers

A string of digits is a number. The underscore is a pseudo-digit to give spacing if needed: 3_000_000 . There is postfix notation for the num and the int type: 3n, 3i . For hexadecimal and binary numbers there are prefixal forms 0xAF12 and 0b1110 . For hexadecimals, besides A-F, lowercase a-f can be used as well.

Comments

Comments are written with standard SML delimiters: (* This is a comment *)

Parsing

In functional programming, interesting data structures are built using the constructors of data types. But, of course, it would be cumbersome to write

APP (APP (APP (IDENT "COND", IDENT "P"), IDENT "Q"), IDENT "R")

instead of

if P then Q else R

Overloading

HOL4 maintains an overload map from strings to lists of terms for parsing, and another one in the opposite direction for prettyprinting: from lists of terms to identifiers . This allows to use + for natural numbers, integers and words. Naturally, the use of overloaded terms might require type constraints.

Functions with special syntax

Functions mostly use curried postfix syntax for arguments: f x y z for a three-argument function f. But there is a possibility to use binder, prefix, suffix, infix, closefix and mixfix syntax.

A binder binds a variable. Technically, due to Church, this is done on applying a binder constant to a lambda abstraction. So ∀x.M is ∀(\x.M) . Consecutive application of the same binder can be shorthanded with only one instance of the binder: ∀x y z.M .

Infix operator op works as A op B. It can be left-associative, right-associative or non-associative.

For prefix operations, standard function application is a good example: f x .

Relation closures are good examples of suffix operators: R^+ for transitive closure of R.

Closefix operators are for delimiters: we can use [| x |] for a denotation semantics of x .

For mixfix syntax, the if-then-else clause and the [ t1 / t2 ] t3 notation for term substitution is a good example.

Quotation and antiquotation

Quotations are of type ‘a frag list and have the concrete syntax delimited with backticks: `f x = 3`. Parse.Term is of type term quotation -> term and it creates a term of a quotation.

A direct way of constructing such a term is to use double quotations: ``f x = 3`` . If the expression is meant to be a type, then it should begin with a colon: ``:’a -> bool``.

An advantage of quotations is the free use of newlines and backslash characters.

The escape character in quotations is the caret: ^ . The ^(t) syntax is for antiquotations: its aim is to embed ML type or term expressions. If the expression is an ML identifier, then the parentheses can be ommitted altogether.

Term grammar

So HOL4 has a parser that translates the surface syntax into primitive constructors. Similarly to type_grammar, function term_grammar gives us the current state of the grammar used in parsing. I inject explanations to the output of the command to make it understandable.

> term_grammar();
val it =
   (0)    TM  ::= "OLEAST" <..binders..> "." TM
                | "LEAST" <..binders..> "." TM | "some" <..binders..> "." TM
                | "∃!" <..binders..> "." TM [?!] | "?!" <..binders..> "." TM
                | "∃" <..binders..> "." TM [?] | "?" <..binders..> "." TM
                | "∀" <..binders..> "." TM [!] | "!" <..binders..> "." TM
                | "@" <..binders..> "." TM | "λ" <..binders..> "." TM
                | "\" <..binders..> "." TM

First, binders. If c is a binder, c x. t translates to $c \x.t, that is, the standard syntax version of the binder applied to the lambda abstraction λx.t . Also, c x1 x2 .. xN . t translates to c x1. c x2. .. c xN . t .

The LEAST binder works on functions defined for natural numbers, giving back the least number that satisfies its argument predicate: ∀P. $LEAST P = WHILE ($¬ ◦ P) SUC 0 . The while loop starts from 0 and goes until a successor satisfies predicate P. The binder OLEAST is an option valued binder that gives SOME n when LEAST would define a concrete value and NONE otherwise: ∀P. $OLEAST P = if ∃n. P n then SOME (LEAST n. P n) else NONE .

To explain “some”, take binder @, which is a higher order version of Hilbert’s choice operator ε, an element that satisfies a predicate P, given there is such an element: ∃x.P x ⇒ P(@x.P x) . The binder some (not to mistake for the SOME constructor of the option type) is ∀P. $some P = if ∃x. P x then SOME (@x. P x) else NONE, it tells us whether the choice operator would return a real value satisfying the relation.

Then comes the uniquely exists quantor ∃! and its ASCII counterpart ?!, the exists quantor ∃ and its ASCII version ?, the universal quantor ∀ and ASCII !, the lambda operator λ and its ASCII equivalent \ .

   (1)    TM  ::= "case" TM "of" TM "|" TM "|" TM
                    [case magic$default - %case%]
                | "case" TM "of" "|" TM  [case magic$default - %case%]
                | "case" TM "of" TM "|" TM  [case magic$default - %case%]
                | "case" TM "of" TM  [case magic$default - %case%]

The case construction. Here is an example that explains itself:

case n of
  0 => "none"
| 1 => "one"
| 2 => "two"
| _ => "many"

Writing a vertical bar in front of the first case 0 is also valid syntax.

   (4)    TM  ::= TM "::" TM (restricted quantification operator)
   (5)    TM  ::= TM TM  (binder argument concatenation)
   (8)    TM  ::= "let" LTM< _ letnil, _ letcons,;> "in" TM  [ _ let]
   (9)    TM  ::= TM "and" TM  [ _ and]   (L-associative)
   (12)   TM  ::= TM "=>" TM  [case magic$default - %arrow%]
                    (non-associative)

Restricted quantification is giving a bounding set for the values of the bound variable: (\ x :: P. M) = M[y/x] only if y ∈ P. Binder argument concatenation is the case I described above: we can write only one binder with a list of bound variables. The let construct is let v = M in N. The left-associative “and” construct is the one we can use in a let expression: let v = M and u = P in N. The arrow => is the mapping of a value to a pattern in a case construction, see the previous HOL4 example.

   (50)   TM  ::= TM "," TM     (R-associative)
   (70)   TM  ::= "if" TM "then" TM "else" TM  [COND]
   (80)   TM  ::= TM ":-" TM     (non-associative)
   (100)  TM  ::= TM "↦" TM  [  combinpp.leftarrow]
                | TM "|->" TM  [  combinpp.leftarrow] | TM "⇎" TM  [<=/=>]
                | TM "<=/=>" TM   | TM "⇔" TM  [<=>] | TM "<=>" TM
                    (non-associative)
   (200)  TM  ::= TM "⇒" TM  [==>] | TM "==>" TM     (R-associative)

The right-associative comma is the pair constructor. There’s no need for an explanation of the if-then-else construct that translates to the COND internal primitive, as seen above. An input syntax for the turnstile logical symbol is :- . It is used only in low-level definitions. The maplet symbol ↦ and its ASCII equivalent |-> is the finite map type operator. For logical equivalence and non-equivalence, there is ⇔ and ⇎, and their ASCII counterpart <=> and <=/=> . For logical implication there is ⇒ and the ASCII ==> .

   (300)  TM  ::= TM "∨" TM  [\/] | TM "\/" TM     (R-associative)
   (310)  TM  ::= TM ":>" TM     (L-associative)
   (320)  TM  ::= TM "=+" TM  [UPDATE]   (non-associative)
   (400)  TM  ::= TM "∧" TM  [/\] | TM "/\" TM     (R-associative)
   (425)  TM  ::= TM "refines" TM   | TM "partitions" TM
                | TM "equiv_on" TM   | TM "∉" TM  [NOTIN] | TM "NOTIN" TM
                | TM "∈" TM  [IN] | TM "IN" TM
                    (non-associative)

There is ∨ for logical or and its ASCII equivalent \/ . The :> operator is reverse function application: x :> f is f x. And it lets us write x :> g :> f for f(g(x)) . The =+ symbol is function update: (a =+ b) f updates f with value b at argument a . For logical and there is ∧ and its ASCII counterpart /\ . For two set of sets, A refines B if for every element set of A there is an element of B so that the former is a subset of the latter: for a ∈ A there is b ∈ B so that a ⊆ b . The infix equiv_on operator tells that a relation R is an equivalence relation on the underlying set S, that is, R is reflexive, symmetric and transitive. The expression X partitions Y holds when X is a set of sets, the partition of Y according to some equivalence relation R. For set membership and non-membership there is ∈ and ∉ and their ASCII notation IN and NOTIN .

   (450)  TM  ::= TM "≼" TM  [<<=] | TM "<<=" TM   | TM "PERMUTES" TM
                | TM "HAS_SIZE" TM   | TM "⊂" TM  [PSUBSET]
                | TM "PSUBSET" TM   | TM "⊆" TM  [SUBSET] | TM "SUBSET" TM
                | TM "≥" TM  [>=] | TM ">=" TM   | TM "≤" TM  [<=]
                | TM "<=" TM   | TM ">" TM   | TM "<" TM
                | TM "⊆ᵣ" TM  [RSUBSET] | TM "RSUBSET" TM   | TM "≠" TM
                | TM "<>" TM   | TM "=" TM
                    (non-associative)

The unicode symbol ≼ and its ASCII version <<= is for the list prefix partial order isPREFIX . The infix operator PERMUTES tells that PERMUTES f dom if f is a bijection on dom. For HAS_SIZE, A HAS_SIZE n if A is a finite set and has n elements. The notation for subset and proper subset is SUBSET and PSUBSET, and the usual symbols. For usual ordering relations HOL4 has the usual symbols. The operator RSUBSET is a subset relation for relations. Equality and non-equality is again the usual.

   (460)  TM  ::= TM "$" TM  [_ fnapp] | TM "with" TM  [record update]
                | TM ":=" TM  [record field update]
                | TM "updated_by" TM  [functional record update]
                    (R-associative)

The dollar symbol is a low-precedence function application symbol: FACT $ SUC $ 2 + 3 = FACT(SUC(2+3)), spares a few parentheses. This is a Haskell notation. Other than solo dollar, dollar immediately followed by a possibly symbolic identifier returns a normal function that can be used as a function argument: ASCII tilde, ~ is negation, $~ is a usable name for a function argument. When the identifier is infix, dollar removes its infix status and the combined identifier can be used as a prefix function name.

The keyword with and symbolic identifier := can be used in a record field update: rec with fieldname := new_value . Using updated_by allows us to apply a function to a record field: bob with employed updated_by $~ , this inverts the boolean field employed. And, it is right-associative.

   (480)  TM  ::= TM "⧺" TM  [++] | TM "++" TM   | TM "+++" TM
                    (L-associative)
   (490)  TM  ::= TM "::" TM  [CONS] | TM "INSERT" TM
                | TM "###" TM  [PAIR_REL] | TM "LEX" TM   | TM "##" TM
                    (R-associative)

The ASCII double plus, ++ operator and its Unicode variant ⧺ is list append. The triple plus +++ is an operator on relations: its type is :(α -> β -> bool) -> (γ -> δ -> bool) -> α + γ -> β + δ -> bool, so it creates a componentwise sum type for the two relations.

The double colon :: is the cons operation as suggested: extending a list with an element on the left. The keyword INSERT is set insertion: 1 INSERT {1;3} = {1;3} . The triple hash mark ### operates on two relations and gives back a relation that is of componentwise products of the original relations’ base types: :α # γ -> β # δ -> bool , from :α -> β -> bool and :γ -> δ -> bool .

   (500)  TM  ::= TM "⊔" TM  [disjUNION] | TM "<+>" TM  [disjUNION]
                | TM "DELETE" TM   | TM "DIFF" TM   | TM "∪" TM  [UNION]
                | TM "UNION" TM   | TM "<*>" TM  [APPLICATIVE_FAPPLY]
                | TM "−" TM  [-] | TM "-" TM   | TM "+" TM
                | TM "∪ᵣ" TM  [RUNION] | TM "RUNION" TM
                    (L-associative)
   (600)  TM  ::= TM "∩" TM  [INTER] | TM "INTER" TM   | TM "\\" TM
                | TM "DIV" TM   | TM "*" TM   | TM "∩ᵣ" TM  [RINTER]
                | TM "RINTER" TM
                    (L-associative)
   (601)  TM  ::= TM "×" TM  [CROSS] | TM "CROSS" TM   | TM "⊗" TM  [*,]
                | TM "*," TM
                    (R-associative)

The <+> and its Unicode counterpart ⊔ is disjoint union for relations: for types :α -> bool and :β -> bool it gets :α + β -> bool. The infix operator DELETE deletes an element from a set. The next one, DIFF is set difference. The meaning of UNION and its Unicode variant ∪ is obviously set union. The infix operator <*>, nicknamed APPLICATIVE_FAPPLY applies all functions in a function list to a list of arguments and concatenates the resulting lists: [(λx. x + 2); (λx. x + 4)] <*> [1; 2; 3] = [3; 4; 5; 5; 6; 7] .

The Unicode minus sign seems to be just an alias to standard ASCII minus sign - . The ASCII plus + is addition for natural numbers, of type num. The infix operator RUNION or its Unicode equivalent ∪ᵣ lifts set union operation to relations, and so does RINTER and ∩ᵣ for intersection. The infix INTER and its Unicode counterpart ∩ is obviously set intersection. The infix \\ is an operator on finite maps, it removes the right argument from the domain of the finite map on the left.

The infix DIV is obviously natural number division, and ASCII star is natural number multiplication. The infix CROSS and its Unicode notation × creates a cross product of two sets, on other words, a set of pairs: :α # β -> bool . Remember that hash mark # is for the product type. The infix ASCII *, operator on natural numbers and its Unicode variant ⊗ allows us to encode two natural numbers in one through triangular numbers. It is possible then to decode the first and second number from this encoded value.

   (650)  TM  ::= TM "%%" TM   | TM "MOD" TM     (L-associative)
   (700)  TM  ::= TM "**" TM   | TM "EXP" TM     (R-associative)

The symbolic identifier %% looks like a modulo operation before MOD but it is in fact a derived operation: CEILING_MOD m n = (n - m MOD n) MOD n . MOD is the usual modulo operation. The standard EXP and its symbolic counterpart ** is natural number exponentiation.

   (800)  TM  ::= TM "∘ᵣ" TM  [O] | TM "O" TM   | TM "∘" TM  [o]
                | TM "o" TM
                    (R-associative)

Capital O and its symbolic counterpart ∘ᵣ is relation compióosition, while small o and Unicode ∘ is function composition.

   (899)  TM  ::= TM ":" TY  (type annotation)
   (900)  TM  ::= "&" TM   | "-" TM  [numeric_negate] | "¬" TM   | "~" TM
   (2000) TM  ::= TM TM  (function application)   (L-associative)

With a weak precedence, colon : is for type annotation. Ampersand, &, is for converting from num to int, provided integerTheory is loaded. It is also of internal use for nat_elim__magic. The standard minus sign is numeric negation. The ASCII tilde ~ and its Unicode counterpart ¬ is for boolean negation. Function application, writing terms consecutively with only whitespace has a high precedence to allow one-argument function applications to write without parentheses.

   (2100) TM  ::= TM "³"   | TM "²"   | TM "ᵀ"  [relinv] | TM "^="  [EQC]
                | TM "꙳"  [RTC] | TM "^*"  [RTC] | TM "⁺"  [TC]
                | TM "^+"  [TC]
                | TM "⦇" LTM<  combinpp.nil,  combinpp.cons,;> "⦈"
                    [  combinpp.toplevelupd]
                | TM "(|" LTM<  combinpp.nil,  combinpp.cons,;> "|)"
                    [  combinpp.toplevelupd]

The superscripts ()³ and ()² are obviously cubic and quadratic powers. The superscript T, Unicode ()ᵀ is the inverse of a relation. The following are closures of a relation of type A x A , for it is meaningful if we can compose the relation with itself. Equality closure is ^=, reflexive transitive closure is ^* and Unicode ꙳, and transitive closure is ^+ or Unicode ()⁺ .

The delimiters (| and |) and their Unicode equivalent ⦇ and ⦈ are for function update: (\x.x) (| 2 |-> 3 |) creates a function that is the identity function on natural numbers, except that for argument 2 it gives back 3. And so does the Unicode notation (λx. x)⦇2 ↦ 3⦈ .

   (2200) TM  ::= "𝕌" TM   | "univ" TM
   (2500) TM  ::= TM "." TM  [record field selection]   (L-associative)
          TM  ::= "[" LTM<NIL,CONS,;> "]"  [ListForm<NIL,CONS>]
                | "{" LTM<EMPTY,INSERT,;> "}"  [ListForm<EMPTY,INSERT>]
                | "{" TM "|" TM "|" TM "}"  [gspec2 special]
                | "{" TM "|" TM "}"  [gspec special] | "(" ")"  [one]
                | "<|" LTM< _ record nil, _ record cons,;> "|>"
                    [ListForm< _ record nil, _ record cons>]
                | "(" TM ")"  [just parentheses, no term produced]

The identifier univ and Unicode 𝕌 is for the universal set of a type. Its definition is ⊢ 𝕌(:α) = (λx.T), in other words, any element of the type is an element of the universal set. The usual dot notation is for record field selection, rec.fieldname . Square brackets with semicolons are for lists: type_of [1;2;3]; returns “:num list”. Curly braces with semicolons are for concrete sets: type_of {1;2;3}; returns “:num -> bool”. Sets are modelled with their characteristic function on their base type. Curly braces with vertical bars allow us to write set comprehensions: ⊢ ∀s t. s DIFF t = {x | x ∈ s ∧ x ∉ t} . The usual hermite () is the only element of type one and the isomorphic type unit.

Record specifications and values can be written with <| and semicolons and |> : for a specification let’s have

Datatype: person = <| employed : bool ; age : num ; name : string |>
End

and for a value let’s have

<| age := 21; employed := F; name := "Layabout" |>

The following output does not convey new information about symbolic identifiers so I trimmed it:

   Known constants:
       _ fakeconst4.case,S10.case magic,7.default ! ## %% & () * ** *, + ++
     +++ , - /\ 0 :- :> < <<= <= <=/=> <=> <> = ==> > >= ? ?! @ ABS_DIFF

The next section gives some help on assigning meaning to symbols. I omit the already discussed symbolic identifiers and alphabetic identifiers that are easy to find in the source code.

First, there are some spooky identifiers that wouldn’t print:

   Overloading:
       <won't print>    ->  λ(x :α). list$CONS x (list$NIL :α list)
                            λ(h :α) (l :α list).
                                bool$~ (bool$IN h (list$LIST_TO_SET l))
                            λ(x :α itself). 𝕌(:α)

A properly constructed grep command gives us the empty string definitions for these:

$ grep 'overload_on.*""' -R HOL/src
HOL/src/pred_set/src/pred_setScript.sml:val _ = overload_on ("", “\x:'a itself. UNIV : 'a set”)
HOL/src/list/src/listScript.sml:val _ = overload_on ("", “\h:'a l:'a list. ~(h IN LIST_TO_SET l)”)
HOL/src/list/src/listScript.sml:val _ = overload_on("", “\x:'a. [x]”)

The first thing in the term_grammar output that won’t print is the [x] notation for having a one-element list, see the third line in grep output.

The second non-printing grammar element in the term grammar is the second in grep output. I give context to it in HOL/src/list/src/listScript.sml :

val _ = overload_on ("", “\h:'a l:'a list. ~(h IN LIST_TO_SET l)”)
  (* last over load here causes the term ~(h IN LIST_TO_SET l) to not print
     using overloads.  In particular, this prevents the existing overload for
     NOTIN from firing in this type instance, and allows ~MEM a l to print
     because the pretty-printer will traverse into the negated term (printing
     the ~), and then the MEM overload will "fire".
  *)

So, ~MEM is expressing that an element does not appear in a list, converting the list to a set internally.

Digging the code for the third we meet a technicality in HOL/src/pred_set/src/pred_setScript.sml, the first hit in grep output:

val _ = overload_on ("univ", ``\x:'a itself. UNIV : 'a set``)
val _ = set_fixity "univ" (Prefix 2200)

val _ = overload_on (UnicodeChars.universal_set, ``\x:'a itself. UNIV: 'a set``)
val _ = set_fixity UnicodeChars.universal_set (Prefix 2200)
(* the overloads above are only for parsing; printing for this is handled
   with a user-printer.  (Otherwise the fact that the x is not bound in the
   abstraction produces ARB terms.)  To turn printing off, we overload the
   same pattern to "" *)
val _ = overload_on ("", “\x:'a itself. UNIV : 'a set”)

so non-printing has an internal technical reason here: that is, pretty printing.

     \\                 ->  arithmetic$CEILING_DIV
                            (relation$RDOM_DELETE :(α -> β -> bool) ->
                                                   α -> α -> β -> bool)

The symbolic identifier \\ overloads for two purposes: for the arithmetic CEILING_DIV: CEILING_DIV m n = (m + (n - 1)) DIV n and RDOM_DELETE, which removes an element from the domain, i.e. the left supporting set of a relation.

     tri⁻¹              ->  numpair$invtri

The Unicode superscript -1 is the inverse of the triangle number function defined in HOL/src/num/extra_theories/numpairScript.sml .

     Π                  ->  (pred_set$PROD_IMAGE :(α -> num) ->
                                                  (α -> bool) -> num)
     ∅                  ->  (pred_set$EMPTY :α -> bool)
     ∅ᵣ                 ->  (relation$EMPTY_REL :α -> α -> bool)
     ∑                  ->  (pred_set$SUM_IMAGE :(α -> num) ->
                                                 (α -> bool) -> num)

Capital Pi is the usual product on natural numbers, whose first argument is a number-valued function on domain α that is corresponding to the formula, and the second argument is the characteristic function on the same domain α representing the set the production goes over.

Unicode ∅ and ∅ᵣ is for the empty set and the empty relation, respectively.

Capital Sigma is similar in its arguments to Pi, but this is for summing natural numbers.

Unicode

With a properly configured terminal connection, vim and font installation, the editor will display the above Unicode symbols properly. But there is a question: how to enter these Unicode symbols.

In hol-vim, these symbols can be entered through digraphs. See the character combinations for them in holabs.vim in HOL4’s github repository. To enter mathematical symbols with digraphs, we need to push CTRL-K and then the two-letter digraph. Here is a full list of mathematical Unicode characters in hol-vim and their corresponding digraphs:

∧ AN conjunction
∨ OR disjunction
¬ NO -, negation
⇒ => implication
≤ =< less or equal
≥ >= greater or equal
⇔ == bi-implication, if and only if
≠ != not equal
∀ FA universal quantor, for all
∃ TE existential quantor, exists
λ l* the lambda operator
∈ (- element of
+ (+ not an element, this did not work,
     use ^Vu2209 instead for ∉, HOL4 will understand it
∩ (U meet of sets
∪ U) union of sets
⊆ (_ subset (might be equal)
⊂ (C proper subset (cannot be equal)