# What would you want from an automatic mathematical research assistant?

This page is part of the FMathL - Formal mathematical language web site at http://www.mat.univie.ac.at/~neum/FMathL.html

Between January to March 2009, I asked a number of people I know, and also people on various mailing lists and newsgroups the following questions (or variants thereof):

What would be the kind of computer system that could take the role of a general-purpose mathematical assistant to practicing mathematicians? Which features should it have in order that you would like to use it? What is necessary or desirable to make such a mathematical assistant a routine tool?

How should it improve upon tools such as LaTeX (type-setting, no contents checking), Mathematica (symbolic math, no proofs), or Matlab (computations, no rigor) that are much used by mathematicians today? How must it differ from the current generation of proof assistants (rigorous but difficult to use) that are hardly used by mathematicians, and not at all used by them for their daily work?

How much (and which) mathematics should such a system know and/or make accessible to the user, and at which level of formality?

On this page, I collect the sufficiently significant replies I got, sometimes slightly shortened.

Arnold Neumaier

## My own views on the problem

Part of my own vision can be found at An automatic mathematical assistant - a vision

The 30 year old paper

gives a widely known account of the social process of the role of proofs in mathematics that comes close to my own impressions. But they draw very pessimistic conclusions on what this means for formalizing mathematics that I do not share at all.

The responses below, together with the Quotes on formalized mathematics and automatic mathematical assistants, collected from the Web, possibly give a representative picture of the situation now, 30 years after this paper.

## Replies to my query

### Richard Fateman

These questions were asked in the design of Macsyma, and before that, by Marvin Minsky in early MIT Artificial Intelligence Lab memos.

The goal of a mathematical assistant was explicitly mentioned in

''...one can imagine a computer system which serves the mathematician as a tireless, capable, knowledgeable servant, co-worker, and occasionally, mentor. The system would know and be able to apply all of the straightforward techniques of mathematical analysis. In addition, it would be a storehouse of the knowledge accumulated about many specific problem areas, such as treatments of differential equations or series. In some areas, such as symbolic integration, it would apply complex and tedious algorithms to produce results considered to be in the domain of unstructured problem-solving only a few years ago. If such a system can be constructed, its impact on applied mathematics would be substantial. Books would still be used, but only for tutorial exposition. It would be possible for the casual mathematician at a time-shared computer terminal to bring to bear on his problem a wider and more current range of methods and information. It seems reasonable to expect that a mathematician's thinking and productivity would be stimulated when he could quickly work out the consequences of his ideas. The way would be opened for the discovery of new problem-solving techniques.''

Although the above indicated paper was written by William A. Martin and Richard Fateman, those words are Bill's.

### Marc Nardmann

To get a large-scale formalisation project started, I think the following approach would be useful.

Each formalised statement S should have a list (which is optional, i.e. possibly empty) of "preliminaries". The person who wants to create a formalised proof of S will certainly want to use a lot of standard facts in the proof. Such standard facts might be the Sobolev imbedding theorem in PDE theory, the Poincare duality theorem in topology, the Atiyah-Singer index theorem in geometry, or many much simpler and less prominent lemmas. If a database of formalised mathematics were created in a perfectly organised way, of course all these standard facts would already be in the database before our mathematician tries to formalise the proof of S. But if we insist this to be the case, the formalisation project will be very inefficient; many people who are in principle willing to formalise their favourite theorems would not do so before all the main auxiliary theorems are already in the database, thereby slowing down the project considerably and unnecessarily. Thus our mathematician should be allowed to give a list of formalised "preliminaries" for the proof of S: statements which (s)he claims to be correctly formalised versions of facts whose unformalised proofs can be found in the literature, and for which (s)he provides references and, if possible, standard names (e.g. "Sobolev imbedding theorem").

A formalised proof of a statement S equipped with a list of preliminaries would by definition be a formalised proof that the preliminaries imply S. If the list of preliminaries is nonempty, this would be called a "relative proof of S". If the list is empty, it would be called an "absolute proof of S", which is just a usual formalised proof of S.

When sometime later a preliminary of S has been proved, it can be removed from the list of S (either by hand or by a robot which checks the database regularly). Once the list is empty, the relative proof of S has become an absolute proof. (In order to avoid errors in this process, the proof checker should be applied each time after a preliminary has been removed from a relative proof.)

Rankings of the most frequently used preliminaries could be published to show potential formalisers where their help is needed most urgently.

On the social side, it might be helpful to organise the whole formalisation project in a wiki-like style. I expect that many undergraduate and graduate students would be quite motivated to formalise proofs they have just learned, or used or invented in their own research; especially if the the formalisers are given credit in the database.

As the project becomes larger, certainly more powerful tools will be developed to assist people in the formalisation process, on all levels on the open-ended scale of artificial intelligence. Of course these tools will contain bugs, but that's no problem because the same simple proof checker will detect each error in the end.

### anonymous (name known to me)

Marc Nardmann's comment has similarities with a comment I made earlier, so I'm pleased that someone else also thinks in those terms.

Importing only the definitions and theorems one actually needs into one's local environment (without the supporting proofs and other theorems) is an important key to keeping the tool lightweight -- so that users don't need to download more of the master database(s) than they actually need, and don't necessarily need an active Internet connection to remote database(s) when using the tool.

Dynamic (on-demand) loading of locally stored theorems is probably a important feature too. (It improves startup time.) I have in mind pre-compiled images of theorems (a binary block in exactly the right format for the tool to use as-is without needing to re-interpret it). So dynamic loading of an existing theorem boils down to little more than memory-mapping a file somewhere, much like dynamically libraries.

The various others comments about the ultimate supremacy of social processes are also relevant here:- we'd like to efficiently import into our local work those previous results that we feel confident about... regardless of whether they've been formally verified.

### Randy Pollack

much tool development needs doing before large scale textbook mathematics is feasibly formalizable: large scale libraries, search, cooperative working (like a WIKI). Much to do in automation, decision procedures, ... Even syntax notations in existing tools are inadequate for really large scale formalization.

There are projects on formalizing specific mathematics, such as Flyspeck, clearly inspiring and useful for the long term goal of formal mathematics. Equally, much tool development needs doing before large scale textbook mathematics is feasibly formalizable: large scale libraries, search, cooperative working (like a WIKI). Much to do in automation, decision procedures, ... Even syntax notations in existing tools are inadequate for really large scale formalization. Foundations: as Larry said, HOL *suffices* to formalize immense chunks of mathematics, but that doesn't mean that it is the right choice. HOL (as we know it) could use more polymorphism. HOL suffices, but is not convenient, when you want to do induction over the height of derivations of some inductively defined relation. Modularity in HOL is (at best) second class. Binding: nominal Isabelle is a great tool, but I wouldn't say that formalization of binding is a done deal. And so forth. Many of these suggestions offer possibilities for collaboration, such as Flyspeck.

### Ian Parker

Mathematics has been described as a "language". This leads me onto the type of assistant I would want. My assistant would have the whole of mathematics encoded onto it. Isn't that a lot? Yes and no. In fact proofs of virtually everything are available somewhere on the Web. This effectively transforms our "proof" problem into a linguistics problem. What we need is a spider that goes round looking at proofs, and builds up a set of theorums. A theorem is, of couse, simply a sentence in our mathematical language.

I also have an idea about how such a language might me written. If you have an online Arabic-English dictionary you look up words using a hashing algorithm. Hashing algorithms, as we all know look up in a constant time independently of the amount of information in them. Could theorems be placed in a hashing algotithm?

I would envisage the assistant as sitting on the Web. It would in fact be a large number of programs ranging in rigor from the extremely rigorous, to simply the use of results which are sustected but not rogorously proved.

For example theorems in Number Theory dependent on the Riemann hypothesis (not yet proved) would therefore form part of the system.

If you use manifold http://www.ercim.org/publication/Ercim_News/enw35/arbab.html you can bing toether routines written in Java, Fortran, C++ etc. If you have a mathematical description (Trivial for things like Calculus and Matrix theory) Manifold will write programs for you very fast.

### My follow-up questions

How would you propose to cope with the problem that the same mathematical result may exist in many variants differing slightly in notation, phrasing, exact asssumptions?

How would you find the theorem that ''every number not leaving 7 as remainder when divided by 8 is the sum of three primes'' in such a dictionary? [actually intended was ''is a sum of three squares?'']

The problem is: would everyone else contributing to the dictionary have written it that way so that a search would locate it if you pose this as the question?

And would you rely on that names always have the same meaning for every author?

What if someone labels the statement 2+2=4 the ''Riemann hypothesis'', and then gives a proof for it?

No two theories created by different people are fully equivalent, even when they are about the same subject. How do you recognize which theorems are equivalent, and to which extent?

Searching for keywords will not be good enough if you look for a lemma that might help you in your attempted proof of something but have no idea how someone else ould have called such a lemma.

1) In the case of |n|8 = 7 (i.e., 8 divides n-7)

The example you gave me is quite interesting. |n|8 = 7 is quite irrelevant. Without it we have straight Goldbach. Associated with |n|2 = 0 - n = p1 + p2 is n = p1 + p2 + p3 (any n). To retrieve that result though we need to be able to eliminate the condition |n|8 = 7 which is not needed. To retrieve abstractly therefore we need to be able to generalize any theorem we come across and find out whether there is a theorem which is true without some of the restrictions imposed to prove this instance of the theorem. We need

• a) Te internal representation of Mizar.
• b) A generalization algorithm that searches for the theorem and more general forms of the theorem. In other words it takes out |n|8 = 7 and looks for a theorem.

2) The Riemann hypothesis. I think it would follow that if you had named theorems/lemmas that people's use of terminology could be checked off against what was there. This is yet another example of the advantages of being able to retrieve on both a verbal description and a formal description.

The next few remarks deal with replies to my personal contribution.

More generally I feel we are edging towards a more general type of linguistics/AI scenario. Whereas I feel that it would be premature to make too much of it at this stage I do have the following observations.

If we verbalize R = A^4 for example we get the fourth power. Google has translated ''four times the temperature of a surface.''

The fact is that in the Arabic text (Arabic does not seem to distinguish between Cardinals and Ordinals) the Stefan Boltzmann is stated correctly in symbols. The Arabic is only in fact a statement about the equation. Would it not be far better to make a statement in either German or English independently.

I looked too at your vocabulary and your lecture notes. All the words will translate into English pretty well one for one. I sometimes wonder whether you could construct a context free language.

To some extent, of course, we have already done this in that we have Mathematics!

Mathematical language is much simpler than general language, hence much easier to parse and to process semantically. And the differences in language are fairly small so that we consider a multilingual approach, using the Grammatical framework by Aarne Ranta. Still, it is not easy.

We are currently investigating the structure of the ALA.sentences, i.e., the ALA text after formulas were replaced by single tokens. Much of it looks context-free, but the number of production rules to achieve this is in our current version (which only parses a subset and ignores inflection) already many hundreds, and there are ambiguities that cannot be resolved on the context-free level. This means that we need a general LR parser. But the GLR parser from GNU bison performs very poorly...

### Freek Wiedijk

You want my personal opinion? What is needed to realize an updated version of the QED project is

• A proof assistant that one can sell to mathematicians Mizar currently is the closest to that, and that shows it's not there yet, as Mizar certainly is not a good system (no automation, no binders, no partial operations, some bad design choices for the logic). I include a small note I wrote that describes how _I_ see such a system. Note that in my community most people very much disagree with that vision. Everyone wants something else. (Which I guess is good, because it means many things will be tried.)
• A good infrastructure of the kind that the Linux or Wikipedia people have. And a _small_ group that has a clear vision of what the library of formalized math built in the project should become, to keep things tidy. So the community contributing should be big, but the group ("cabal"?) keeping things in hand should be small. On the other hand it should not be _too_ small a group, to prevent ideosyncrasies from creeping in.
• There should _not_ be a culture of "we only need to formalize mathematically trivial statements". _Everything should be formalized, even the boring stuff, everything from a handful of axioms. It's the only way. Else everything will be quite incorrect rather soon.

### Josef Urban

The first sentence at http://www-unix.mcs.anl.gov/qed/ is:

"The aim of the QED project is to build a single, distributed, computerized repository that rigorously represents all important, established mathematical knowledge."

I think this is a target of multiple projects today - one of them Mizar, but e.g. all the "semantic wiki for formal math" recent suggestions are also driven by this goal (and by the huge practical success of Wikipedia, showing that such things are really possible in a couple of years if started and coordinated well).

The other point is that no project is sofar nowhere near achieving the goal. I think that the "goal" should rather be defined as a process that gradually tries to involve as many mathematicians/students as possible, and continuously tries to make formalization easier and easier for newcomers. That is the other attractive and probably very productive thing about the original idea of Wiki: it is never "finished" and "stable", and tries to be very friendly to fresh solutions. I fear that without such supportive climate and humble awareness of the huge existing bottlenecks that are waiting to be solved, no project will succeed.

So I think that in some sense the world is still waiting for the proper effort towards implementation of QED. We know that it is possible, but it is both a technical problem, and a human resource management problem. Especially solving the latter is in my opinion the crucial thing. The only way how a "technical solution" could be more important than the "human solution" is if very good tools and AI are developed, which will practically turn the "normal" mathematical papers into "formal" mathematical papers. E.g. TeX is an example of a technical solution that changed the world in a similar sense, and you never know where AI (for automated formalization of "normal" math) is :-) .

### Freek Wiedijk replied to this:

That is the other attractive and probably very productive thing about the original idea of Wiki: it is never "finished" and "stable", and tries to be very friendly to fresh solutions.

I'm not sure I agree with this, not if you are talking about Wikipedia.

To me that seems very stable both in its _style_ and in its look and feel. The only thing that is being updated all the time is the content. And even then you have a whole crowd of Wikipedians breathing down your neck (is that an expression in English?), "stabilizing" what you do.

If you take this approach to formal math, it would mean that the style and look and feel of formal mathematics would have to be very stable.

Also, the great strength of Wiki is its simplicity. Wiki markup for example is very simple, no? I wish we had this kind of simplicity in the world of formal math...

### Josef Urban, upon seeing the FMathL site

This is certainly an interesting and ambitious project. It is very good that there are fresh ideas and people determined to move the field forward. I think that one quite established, though not as much known project which is close to your plans is also the SAD/ForTheL language, checker, and project.

Hans de Nivelle (in Wroclaw) is also having a bit similar plans, i.e. building a new proof assistant suitable for mathematicians (as some people put it: something like Mizar, but amenable to further development).

I like your plan to work with natural language, and to try to formalize your book. This is in my opinion a very promising area. I have had for several years myself plans to try automated formalization of the book Compendium of Continuous Lattices, which was to a large extent already formalized in Mizar, and thus it might be possible to "learn" the formalization by combined machine learning and automated reasoning methods. Klaus Grue (creator of Logiweb) told me that he has another suitable (very detailed) book.

But one thing that I do not like, is that your book is in German. It will prevent your work for becoming studied by wider audience (the field is pretty small already now). This reminds me a bit of the work of Herman Helbig who has done a lot of interesting work on semantic processing of German (I think they even process the German Wikipedia now), but other researchers (like me) would really like to test his systems and methods on English. But it is well understandable, that processing your own book is much easier for you.

One needs to start with the standard undergraduate material if one wants to support the general mathematician. The PI's ''Analysis und Lineare Algebra'' book contains everything is a uniform style, and its meanings and intentions are completely known to the PI. That this book is in German is an asset, since as a byproduct of the project, there will be both a German and an English version of the book.

The book just serves as a first test case. All techniques developed to parse and process the book and its contents will be completely general, so that they will apply later (with minor adaptations) to any mathematics book in any language.

There will be a semiautomatic tool for the creation of the lexicon, then to be annotated by the specialist in an interactive fashion, Grammatical framework (GF) support for inflection in multiple languages, semiautomatic tools for the creation of the language-specific mathematical grammar, and automatic translation modules from the internal representation of a mathematical text into various target languages.

### anonymous (name known to me)

In the December 2008 issue of the Notices of the AMS journal there were three papers on formal proof tools. In one of those papers it was explained that translating a single page of a low level university student's math text into the system and getting it to be verified would take about one days work for someone who was skilled in the use of the tools, and that perhaps in a few decades the tools would perhaps be improved to the point where it might only take an hour to accomplish such a task.

Doctors have long resisted changing their thinking from using only their experience and intuition to using more formal methods, despite evidence of how many lives this would save. Mathematicians have in general, I think, mostly done the same.

I do not consider myself a mathematician, much to my regret. But I would dearly love to have an assistant who would peer over my shoulder and when it saw me go from step d to step e, but could not find a way to justify that leap would then tap me on the shoulder and say "I am skeptical." If it were able to make few enough mistakes that I did not immediately think this was another annoying misunderstanding on its part then I would immediately start looking for one more silly mistake that I had made.

I would hope that by restricting the assistant to only having to check the last step each time this might make the task easier than the "theorem prover" projects, or even the "go find us interesting new mathematics and let us know when you have found it" projects.

### My comments on the Notices issue

The papers in the December 2008 issue of the Notices of the American Mathematical Society, which was almost entirely devoted to describing the state of the art on formal proofs, show how far away the state of the art on formal proofs is from the dream of formalizing all of mathematics.

The whole region of formally proved mathematics is currently a shallow desert with a few peaks, the work of a few persistent people with lots of patience. Nothing that could easily be used by the ordinary practicing mathematician.

Also, there remain important questions. Am I better off trusting the correctness of the 4-color theorem because it was verified in COQ by Georges Gonthier, rather than trusting the more informal versions that had existed before? I have neither time nor energy going through the details of either proof. But if I had, the traditional proof would probably teach me more...

### Tim Chow replied to this:

That's certainly true; however, the recent progress has been substantial. It was not long ago that the specialists could not have formalized the prime number theorem or the four color theorem with any reasonable amount of effort. My gut feeling is that some kind of threshold has been crossed, and that within the next decade or two, formalizing theorems will not be much harder than writing (say) Mathematica or Maple code is now.

But again, you are right; we're not there yet.

The 4-color theorem is a relatively favorable case for the partisan of formal proofs. If you had chosen the prime number theorem, then it would be clear that formalizing the proof of the prime number theorem does not add to anyone's confidence in its correctness. The point of formalizing it, from the point of view of the QED project, is that it can then be used as a building block for more complicated proofs whose correctness is harder to check.

Also, it's obvious that going through a human-readable proof *teaches* you more. No one believes that a machine-checkable proof will win any awards for excellence in exposition. One of the main points of having a machine check correctness is so that you don't have to bother with that tedious process yourself, but can focus on the main conceptual ideas without worrying about silly bugs in the argument.

But back to the 4-color theorem. The Appel-Haken-Koch proof certainly left a lot to be desired as far as correctness was concerned. It was certainly believable that a proof along those lines would work, but as for the details...well, their paper contained a large microfiche supplement with lots of little tiny hand-drawn diagrams that you had to check by hand. Very few people had the stomach for that. Those that did bother to check, found that there were lots of errors. The errors were fixable, to be sure, but could we be sure that we'd caught all of them? As far as I know, nobody independently went through the entire Appel-Haken-Koch proof to verify it.

When Robertson, Sanders, Seymour, and Thomas produced their second generation proof, they gave up on trying to check the AHK proof and simply used the main ideas (with some ideas of their own) to generate their own proof. The RSST proof does, in my view, eliminate all the doubts about the correctness of the four-color theorem, but it does so by encoding the critical part of the proof in machine-checkable form! No more tedious hand-verification of hand-drawn diagrams---just a relatively short human-readable part and a straightforward machine-checkable part.

Given what RSST did, Gonthier's effort does not substantially increase our confidence in the correctness of the result, which was already plenty high. But as I said, this doesn't mean that machine-checkability did nothing to increase our confidence, since it was the machine-checkability of (part of) the RSST proof that gave us that confidence in the first place.

Note also that the process of formalizing the proof in Coq led to new conceptual insights into the mathematics of the situation, as you will see if you read Gonthier's account of the process carefully. The fact that the process of formalization can generate creative new mathematical ideas is often overlooked in these dicussions.

Hales's Flyspeck project is another example where the standard process of checking correctness broke down (the referees refused to certify 100% correctness because they ran out of energy before they were done) and indeed motivated the entire project. Although it is possible that not everyone will be convinced even when the Flyspeck project is complete, it is clear that Flyspeck will *increase* the confidence that the mathematical community as a whole will have in the correctness of Hales's proof. Those who are skeptical of the HOL Light proof will surely also be skeptical of all those CPLEX runs, whereas there are certainly those who will be more confident of an HOL Light proof than of the CPLEX runs.

[and, in a related mail:]

Mizar was mentioned in Wiedijk's article in the AMS Notices issue. Also mentioned in that article were HOL Light (Hales's choice for Flyspeck), ProofPower, Isabelle (Avigad's choice for the prime number theorem), and Coq (Gonthier's choice for the four-color theorem).

I'm not sure that this proliferation of choices is a good thing. In principle the different proof assistants should be intertranslatable, meaning that if you construct a proof in one system, it should be a purely mechanical procedure to translate it into one of the other systems. In practice, matters are not so simple. Mizar, for example, is based on (a slight strengthening of) ZFC and first-order logic, whereas HOL Light is based on higher-order logic and is built on top of OCaml. Transferring a theorem from one to the other while ensuring that the results are still readable and easily used by humans is not a trivial task.

At this stage of development, however, the community is still gaining experience, and it's important to be able to experiment with a variety of options to see what works best. Eventually, though, if "ordinary mathematicians" are to formalize their own theorems, a standard system needs to prevail, and it needs to be much more user-friendly than the existing systems are. Below I quote two paragraphs from Wiedijk's article to give you some sense of where things stand now.

''The main advantage of HOL Light is its elegant architecture, which makes it a very powerful and reliable system. A proof of the correctness of the 394 line HOL Light "logical core" even has been formalized. On the other hand HOL has the disadvantage that it sometimes cannot express abstract mathematics---mostly when it involves algebraic structures---in an attractive way. It *can* essentially express all abstract mathematics though. Another disadvantage of HOL is that the proof parts of the HOL scripts are unreadable. They can only be understood by executing them on the computer.''

''Mizar on the other hand allows one to write abstract mathematics very elegantly, and its scripts are almost readable like ordinary mathematics. Also Mizar has by *far* the largest library of already formalized mathematics (currently it is over 2 million lines). However, Mizar has the disadvantage that it is not possible for a user to automate recurring proof patterns, and the proof automation provided by the system itself is rather basic. Also, in Mizar it is difficult to express the formulas of calculus in a recognizable style. It is not possible to "bind" variables, which causes expressions for constructions like sums, limits, derivatives, and integrals to look unnatural.''

### David Ullrich

Reading that issue of the Notices a while ago blew me away - things are still somewhat primitive but they're much farther along than I had any idea.

About human readability: Are you aware of any work on attractive interfaces?

I've sometimes speculated that an electronic format for informal proofs that allowed the reader to easily show or hide details might be a good thing. Writing a proof one has a problem deciding whether or not to explain this or that point; if you don't then some readers will be stuck, while if you do then other readers will find the proof harder to follow, the forest being obscured by trivial trees. The sort of thing I have in mind would solve this problem if it was done right.

Seems like something similar could be interesting or useful in the context of formal proofs. Maybe something like a traditional collapsible directory tree in a windowing system; clicking on a node shows the proof of something, subnodes contain proofs of the lemmas used in the proof.

Or some such - I'm certainly not proposing that it should be done in exactly this or that way, just curious whether anyone's been doing anything along these lines.

### Tim Chow replied to this

About human readability: Are you aware of any work on attractive interfaces?

I am not aware of such work, although I'm just an interested bystander so I might just be ignorant. What you suggest is certainly a good idea, but my feeling is that it may be a case of "premature optimization." That is, there are much more fundamental user-friendliness issues that need to be addressed first.

Let me illustrate by telling you of my first experience with HOL Light. First, since HOL Light runs on top of OCaml, I either had to install OCaml first or get access to a machine with OCaml on it. Once I had OCaml, installing HOL Light was not hard, but then I discovered that in addition to the "core" code, one really needed to load up a package of "basic theorems" in order to do much useful. One either has to load up this package every time one starts an HOL Light session---the loading time is surprisingly long, and will only get worse as the library gets bigger over time---or else install some "checkpointing" software to save the state of the program. The checkpointing software is a separate package, which I couldn't get to work.

But never mind all that---it's par for the course nowadays for any computer software. The main problem was that HOL Light proofs are very hard to read. Here's a sample, again from Wiedijk's article:

   let CONG_MINUS1_SQUARE = prove
('2 <= p ==> ((p - 1) * (p - 1) == 1) (mod p)',
SIMP_TAC[LE_EXISTS; LEFT_IMP_EXISTS_THM] THEN
REPEAT STRIP_TAC THEN
REWRITE_TAC[cong; nat_mod;
ARITH_RULE '(2 + x) - 1 = x + 1'] THEN
MAP_EVERY EXISTS_TAC ['0'; 'd:num'] THEN
ARITH_TAC);;

The difficulty here is not just learning the language and syntax; all those "_TAC" commands refer to "proof tactics," which are strategies for proving things. One really needs to learn a new way of thinking about proofs, and cannot simply rely on one's pre-existing mathematical expertise.

Mizar is better in this regard, because it's structured more like ordinary mathematics. Sample from Wiedijk's article:

   theorem Th11:
i gcd m = 1 & i is_quadratic_residue_mod m &
i,j are_congruent_mod m
proof
assume
A1: i gcd m = 1 &
i,j are_congruent_mod m;
then consider x being Integer such that
A2: (x2 - i) mod m = 0 by Def2;
m divides (i - j) by A1,INT_2:19;
then
A3: (i - j) mod m = 0 by Lm1;
(x2 - j) mod m
= ((x2 - i) + (i - j)) mod m
.= (((x2 - i) mod m) + ((i - j) mod m))
mod m by INT_3:14
.= 0 by A2,A3,INT_3:13;
hence thesis by Def2;
end;

Here you can guess at most of what's being asserted here without even studying the Mizar manual. It's easier to imagine implementing your suggestion of an interface that presents more detail or less detail according to the user's needs. But constructing proofs in Mizar turns out to be more labor-intensive for the human, because it has a much more limited capacity for "filling in obvious steps" automatically or creating templates when more-or-less the same argument, with minor changes, keeps coming up.

### Henryk Trappmann

Nobody in this discussion mentioned the Mizar project yet. I didnt get into the details but I would say it is exactly what you are asking for, a formalization of mathematics made for computer verification. They built up a database for definitions and proofs and the project is still very active, see Mizar.

### Mike Sussman

In the old old days when secretaries used to type mathematics for you, the best of them could tell when a mathematical expression was wrong without understanding the math. Their intuition was based, I believe, on knowledge of the way the mathematical symbols were put together and also on preceeding mathematical expressions in that (or previous) papers. Symmetry notions also played some part. I am describing something I might call "reasonableness checking" that would work like spell-checking. Add "reasonableness checking" to Latex and you would save a lot of people a lot of time double-checking their own work.

### Igor Khavkine

As to your question, in most generic terms, I think a computer based mathematical assistant's basic function is to reduce human error and allow humans to make more efficient use of their time. To some degree, already available mathematical software already does that to some degree. I think all the domains of application that you mentioned (typesetting, symbolic manipulation, computation, proof assistance) are important. I personally make regular use of all but the last one. I would wish all of them to be done better and to be easier to use, as there's quite a bit of room for improvement. I would also add mathematical visualization (from quantitative graphs to conceptual diagrams) to that list.

To get back to the question of proof assistants. I've not yet managed or even made a serious effort to make use of any of the existing ones. I do hope that they become easier to use in the future. I see two important applications of them, perhaps even in my own work.

One is formal verification of algorithms. It would be really nice if computer algebra systems like Mathematica or its many cousins came with a checkable certificate of correctness for any particular routine. If proof assistants make creating such certificates very easy, it might become common software engineering practice for implementing mathematical and other kinds of algorithms.

The other application I have in mind is actual assistance in writing proofs. If proofs are easily formalized and may be manipulated at a symbolic or higher level, then common informal steps in proofs (such as statements including "similarly" or "mutatis mutandis") could be supplemented by short snippets of code confirming their validity, thus eliminating possibly subtle logical pitfalls.

Of course, there could be many more useful applications that I haven't thought about or am not aware of. Incidentally, building large databases of formalized and formally verified mathematical facts is a necessary step in the development of better and easier to use proof assistants.

### John Halleck

My opinion is biased, I'm the author of a theorem prover. And what I investigate are non-classical logic systems.

So what I'd want in a prover is a bit different from most.

I'd want the ability to recognise a specific system when it finds it. (I.E. to say that the theorems proven include the complete axiom set for a ring or a field for a specific set of operators... Or to say that what it has been given defines an X, and the usual text for that is Y's book) If I start with a base for (as an example) group theory, add on some other restrictions for what I'm doing, and the assistant internally has generated the derived restriction that the group commutes, this would be something that I would really like to have volunteered.

As a silly example, if it can establish that the rules given describe rationals of LeRoy Eide's sort, a parenthetical comment that
Rational Representations
is a good reference is appropriate. [Likely the person inputing the axioms for different structures and putting in the names also knows a good reference... so this ought not to be that hard.]

Proving that I've stumbled back into something mainline is of obvious interest to me, independent of what the assistant was told to be looking for when it found it. This is not a sexy avenue of research or development, but I think this is a social effort that a real assistant is good at but programs are not.

Other than that I assume my tastes in assistant programs are probably the usual...

• Ability to decide something is interesting and dump the relevant information to a file without the dead ends that led to it.
• Ability to mark some introduced lemma or theorem as a bad idea, and remove everything generated for it that isn't part of something I've marked interesting.

### Hans Feichtinger

• 1) I strongly believe that one should use modern information technology tools also for the daily work of mathematicians. The use of public data bases (such as Zentralblatt, Math.Reviews) or preprint servers (such as ESI or ARXIV) are obvious, probably also Scholar GOOGLE, GOOGLE, or other database systems by various groups of publishers, the ISI Science Citation Index, etc.
• 2) In addition most of us use math. software (MATLAB, MATHEMATICA, MAPLE, DERIVE) to perform demonstrations, experiments etc.
• 3) All of us use LATEX in order to write articles, scripts, reviews, course material.
• 4) In addition we have to administer projects, run journals, fulfil administrative tasks, examine students, organize conferences, be a member of a selection committee, etc.

CERTAINLY an integration of these tools is of great use and importance. Using the internet, databases (such as MYSQL) browsers, etc. can/should help to have information available universally (i.e. "in the internet").

One more aspect is the time aspect. Ideally we would like to find the information needed to continue our work in the appropriate format at the time when we need it, on the other hand we are very busy most of the time to absorb/read and MANAGE (copy, sort, store, comment, archive) the material that we get. For a while I have grouped (paper) material according to authors, then according to topics, or projects, or co-authors for a joint paper to be written. Database structures are flexible enough to have "virtual" copies accessible under the different regimes.

At the moment we have at our NuHAG, Numerical Harmonic Analysis Group web site a variety of auxiliary databases, some of them relevant for the project and group management (employment, visitors, etc.), but some of them equally useful for individual researchers, such as the BIBTEX database, which contains as "kernel" a collection of 6500 BIBTEX items

   (see www.nuhag.eu >> DB+tools >>  BIBTEX, or MATLAB-tools, TEX etc.)


WE ALSO have a LATEX collection of macros, which ensures that we are using "all the same set of macros.."..

### Lawrence Paulson

Even if we restrict ourselves to the algebra and analysis typically taught to mathematics undergraduates in the first couple of years, I think the great majority of this material has never been formalised.

### Alasdair Urquhart

There is an article by the computer scientists De Millo, Lipton and Perlis that I think is very apposite to this thread. The authors argue eloquently against ease of formal verification being a criterion of language design. The article is called "Social Processes and Proofs of Theorems and Programs." It is available online, e.g., at
R.A. de Millo, R.J. Lipton and A.J. Perlis, Social processes and proofs of theorems and programs Communications of the ACM 22 (1979), 271-280.
The authors sum up their main thesis as follows:

''We believe that, in the end, it is a social process that determines whether mathematicians feel confident about a theorem--and we believe that, because no comparable social process can take place among program verifiers, program verification is bound to fail.

I recommend the article highly.

### Iesse Alama replied to this

I also highly recommend that article to get a feel for what was going on in the debate about program verification in the 70's. For a thorough discussion of that debate, I also recommend

DA MacKenzie and PN Edwards,
Mechanizing Proof: Computing, Risk, and Trust
MIT Press, 2001

### My excerpts from this book

Sources of knowledge, whether of the properties of the natural world or of those of artifacts, can usefully be classified into three broad categories:

• induction - we learn the properties by observation, experiment, and (especially in the case of artifacts) testing and use;
• authority - people whom we trust tell us what the properties are; and
• deduction - we infer the properties from other beliefs, for example by deducing them from theories or models. (p.2)

Reliance upon experts is an inevitable aspect of high modernity. Particular experts, including those regarded by the scientific community as the appropriate experts, may be disbelieved (a sense that the public authority of science has declined may well be one factor fueling the passions of the science wars), but, without turning one' s back entirely on high modernity, that distrust cannot be generalized to rejection of cognitive authority of all kinds. Disbelief in one set of experts' knowledge claims is often belief in those of others, in those of environmental groups such as Greenpeace, for example, rather than in those of biotechnology companies such as Monsanto. (p.7)

DeMillo, Lipton, and Perlis claimed that ''proofs'' of computer programs performed by automated theorem provers were quite different from proofs within mathematics. A real proof, they asserted, was not a chain of formal logical inferences of the kind that a computer could perform, but an argument that a human being could understand, check, accept, reject, modify, and use in other contexts. A computer-produced verification, thousands of lines long and readable only by its originator, could not be subject to these social processes and in consequence was not a genuine proof. To practitioners who had felt criticized by advocates of the mathematicization of computing, the attack by DeMillo, Lipton, and Perlis was welcome puncturing of what they took to be a theorists' bubble. To Edsger W. Dijkstra, in contrast, the attack was ''a political pamphlet from the Middle Ages.'' Dijkstra was computer science's great rationalist, a man who describes himself as ''a happy victim of the Enlightenment.'' As the years went by, indeed, Dijkstra' s views came to differ even more sharply from those of DeMillo, Lipton, and Perlis. He was no longer prepared to accept proof as ordinarily conducted by mathematicians as the model for computer scientists to emulate. Even mathematicians carried the taint of the Middle Ages: they were a ''guild.'' They ''would rather continue to believe that the Dream of Leibniz is an unrealistic illusion.'' Computer science, in contrast, showed the way to proofs more rigorous, more formal, than those of ordinary mathematics. (p.17f)

### Joel Malard

It is not clear to me that a general-purpose mathematical assistant would be much used. Mathematics is at the core a creative process, tools that are too much general purpose tend to become cumbersome, another way to put it may be that even though chalk sticks, pens and brushes are by and large standardized, yet one general drawing tool still does not suffice. Systems like LaTeX are specialized tools that supplement working mathematicians in areas they cannot avoid and typically are not interested in and therefore are happy to delegate.

Here are some areas where I found myself and several of my colleagues lacking:

Temporal thinking: including lexical rules for dealing with ambiguity.
Non-continuous thinking: if it has measure zero it will turn up on the next coin toss.
Combinatorial thinking: tools to generate counter examples.
Multidisciplinary thinking: tools to get across jargon.
Computational thinking: whatever comes free comes with no guarantee, meaning a free O(n2) algorithm will typically not beat an O(n log(n)) algorithm, at least on the long run.

What I find most useful are tools that help me to get to the data (in a wide sense), view it or communicate it, LaTeX, and Matlab fall in that category.

### Jean-Pierre Merlet

Here are my thought about this exciting problem, especially in view of its use for improving the efficiency of interval analysis.

You know well that in AI we are using a lot of general purpose mathematical theorems. I will focus on the Kantorovitch theorem that is very efficient to certify that we have found a single solution of a square system of equations. This theorem is general (just require that the equations are C^2) but does not well take into account the structure of the system. An on-going project we have with the INRIA Marelle team is to use Coq to redo the demonstration of the Kantorovitch theorem, taking into account the structure of the system, allowing to get a stronger version of the theorem (namely an increased size of the ball including a single solution compared to the one given by the general version of the theorem).

So my view of a mathematical assistant for IA is as follow:

• it will adapt the demonstration of all the theorems we are using in AI to produce versions that are specific to the system at hand and to implement these versions in some programming language
• using the previous results it will build a solver that is completely specific to the system at hand.

### Wolfgang Mückenheim

quotes Herman Weyl:

''... classical logic was abstracted from the mathematics of finite sets and their subsets .... Forgetful of this limited origin, one afterwards mistook that logic for something above and prior to all mathematics, and finally applied it, without justification, to the mathematics of infinite sets. This is the fall and original sin of [transfinite] set theory ....''

### Ian Grant

Doron Zeilberger has been claiming something like this for years. He writes Maple programs which output 'papers'. See '' A generalization of the Central Limit Theorem'' for an example. This formulae was not conceived by him before it was proved. This is from The Automatic Central Limit Theorems Generator (and Much More!)

Of course Maple is not a proof assistant except insofar as it computes, but Zeilberger's idea, as I understand him, is that 'symbol crunching' produces theorems in the symbolic domain (i.e. purely syntactically) which would not be accessible to 'naked brain humans' operating in either the domain of discourse: real analysis, say, or just pushing symbols themselves. The paper "Real" Analysis is a Degenerate Case of Discrete Analysis is interesting and provocative.

### Freek Wiedijk on the compatibility of different foundations

There are four systems that are really interesting for mathematics, AFAIC.

 1a. HOL4, HOL Light, ProofPower (all the same system, really)
1b. Isabelle/HOL (very close to the previous three)
2.  Coq
3.  Mizar

The first two items (or four systems, depending how you count) in this list are the same, foundationally, and you can easily transfer results from one to the other. Of course the definitions will differ between libraries, so some "glue proof" will be needed to make a result from one system useful in one of the others. But apart from that, it's all the same. And there already is code to actually _do_ this kind of conversion, and more is on the way. (Ask for details if you're interested.)

Mizar is much stronger and much weaker than the rest. Without its axioms (Mizar _as a system_ is not related to set theory; just its _library_ is based on set theory), Mizar is just unsorted first order logic, and therefore its statements can easily be imported in all the other systems. _With_ its axioms it's much stronger than each of the other systems, so in that case results from all the others can be imported in Mizar. (This has not been implemented, but it certainly is possible.)

Coq is a strange system. Without axioms it's intuitionistic, and therefore not stronger than any of them (whatever your interpretation for the logical connectives you won't get the choice operator that you need in the other systems.) The other way around Coq is weaker than Mizar, but not weaker than the HOL. (I'm not 100% sure of what I'm saying about Coq here, really. I'm not a logician! Coq is very much a logician's system. Much more than the others.)

But if you add a few obvious axioms to Coq, you can easily import HOL results to Coq. See for instance my .

Summarizing, it's quite easy. The inclusion order is:

  Mizar [3] without the axioms <
the HOLs [1] <
Coq [2] with a few axioms <
Mizar [3] with the axioms

Only Coq without axioms is making things complicated here, and I am not interested in constructive mathematics. So I would not be worrying about that if I were you.

### Herwig Hauser

Of course, the ideal would be to have semantic access to theorems, i.e. to have a language where you can formulate the statement of a conjectured result and the software checks a datebase for this result (without sticking to the specific formulation).

A much more modest typesetting tool we developed some time ago (inside pdf) were multihyperlinks, called lollipops:

When you come to a technical term in reading online a math text, say "singularity", you may place the mouse at this spot and four lollipops pop up, of different colour, each for a different additional info on the item (definition, example, comment, literature). Upon clicking on one of them, a window with the respective text opens. Of course, this is standard for web-brousers, but seems not to exist yet in mathematical typesetting.

### Johannes Schoissengeier

An und fuer sich bin ich mit Mathematica immer ganz gut ausgekommen. Aber vor ein paar Jahren stand ich vor dem Problem, Polynomidentitaeten zu beweisen (oder zu widerlegen), also die Frage zu loesen, ob ein gegebenes Polynom p(X_1,...,X_n) (etwa mit ganzen Koeffizienten) das Nullpolynom ist. Das kann Mathematica nur fuer kleine n, aber nicht fuer variables n. Ich war damals sogar am Linzer Buchberger-Institut und dort konnten mir die Experten auch nicht helfen. Um ein ganz einfaches Beispiel zu geben:

(X_1+...+X_n)2=X_12+...+X_n2+2\sum\limits_{i
kann man zwar fuer n=1,2,3,..100 mit Mathematica nachrechnen, aber
nicht fuer variables n. Warum eigentlich nicht?

Some of My Other Pages

FMathL - Formal mathematical language

my