------------------------- A theoretical physics FAQ ------------------------- ------------------------- The present document is no longer maintained. It is an old ASCII version (from January 9, 2010) of the theoretical physics FAQ at http://wwwi.mat.univie.ac.at/~neum/physfaq/physics-faq.html. See this site for the newest, reorganized and clickable version of the FAQ. ------------------------- Consider everything, and keep the good. (St. Paul, 1 Thess. 5:21) This document (a simple ASCII file) contains answers to some more or less frequently asked questions from theoretical physics. Currently, the FAQ contains 148 topics, grouped into 20 chapters, and filling over 11000 lines of text (about half a megabyte), corresponding to a book of about 220 pages. Starting in 2004, the topics were edited from my answers to postings to the moderated newsgroup sci.physics.research (or, for some, translated from postings to the unmoderated German newsgroup de.sci.physik). If you like the FAQ and/or found it useful, please link to it from your home page to make it more widely known. If you spot errors or have suggestions for improvements, please write me (at Arnold.Neumaier@univie.ac.at). If you have questions, please post them to the moderated newsgroup sci.physics.research (http://www.lns.cornell.edu/spr)! If you found this FAQ useful you are likely to benefit also from reading our book Arnold Neumaier and Dennis Westra, Classical and Quantum Mechanics via Lie algebras, http://www.mat.univie.ac.at/~neum/papers/physpapers.html#QML http://de.arxiv.org/abs/0810.1019 Of course, the FAQ refers only to a tiny part of theoretical physics, namely to what I happened to discuss on sci.physics.research. The answers are only as good as my understanding of the matter. This doesn't mean that they are poor but probably that they are not perfect. Many topics are discussed quite in detail, but this is not a book, so don't expect completeness or comprehensiveness in any sense. On topics where the physics community has not yet reached a consensus, my point of view is of course only one of the possibilities, and not always the mainstream view, although I tend to discuss that view, too. In any case, I try to be accurate, consistent, and intelligible. Happy Reading! Arnold Neumaier University of Vienna http://www.mat.univie.ac.at/~neum/ I like to see people grow ----------------- Table of Contents ----------------- The 21 topics in the initial version, posted there on April 28, 2004, have grown to 88 by January 1, 2005, to 116 by January 4, 2006, to 128 by January 3, 2007, to 140 by January 3, 2008, to 147 by January 30, 2009, and are likely to grow further. (A * indicates addition of a new topic, or large modification of an old one, since January 30, 2009. Minor changes or additions to old topics are not indicated.) The various topics can usually be read independently of each other; they are arranged into groups of loosely related topics. To read a particular entry, grep for its label, e.g., S2e. The labels may change with time as answers to further questions will be added and old answers regrouped. So, to quote part of the FAQ, refer to the title of a section and not only to its label. Abbreviations: QM = quantum mechanics, QFT = quantum field theory, QED = quantum electrodynamics, CCR = canonical commutation relations, s.p.r. = sci.physics.research (newsgroup). Strings like quant-ph/0303047 or arXiv:0810.1019 refer to electronic documents in the e-Print archive at http://xxx.lanl.gov and mirror sites. p_0 and \p are the time and space part of a 4-vector p; the Minkowski inner product is always taken to be p^2=p_0^2-\p^2. Chapter 1 (20 sections) S1a. What are bras and kets? S1b. Projective geometry and quantum mechanics S1c. What is the meaning of the entries of a density matrix? S1d. Postulates for the formal core of quantum mechanics S1e. Open quantum systems S1f. Interaction with a heat bath S1g. Quantum-classical mechanics S1h. Can all quantum states be realized in nature? S1i. Modes and wave functions of laser beams S1j. Classical and quantum tunneling S1k. Quantization in non-Cartesian coordinates S1l. Second quantization S1m. When is an object macroscopic? S1n. The role of the ergodic hypothesis S1o. Does quantum mechanics apply to single systems? *S1p. Dissipative dynamics and Lagrangians *S1q. How can QM be stochastic while the Schroedinger equation is not? *S1r. Measurement theory for real numbers *S1s. The classical limit of quantum mechanics *S1t. The classical limit via coherent states Chapter 2 (10 sections) S2a. Lie groups and Lie algebras S2b. The Galilei group as contraction of the Poincare group S2c. Representations of the Poincare group, spin and gauge invariance S2d. Forms of relativistic dynamics S2e. Is there a multiparticle relativistic quantum mechanics? S2f. What is a photon? S2g. Particle positions and the position operator S2h. Localization and position operators *S2i. Position operators in relativistic quantum field theory S2j. Coherent states of light as ensembles Chapter 3 (6 sections) S3a. What are 'bare' and 'dressed' particles? S3b. How meaningful are single Feynman diagrams? S3c. How real are 'virtual particles'? S3d. What is the meaning of 'on-shell' and 'off-shell'? S3e. Virtual particles and Coulomb interaction S3f. Are virtual particles and decaying particles the same? Chapter 4 (10 sections) S4a. How do atoms and molecules look like? S4b. Why are observable densities state-dependent? S4c. Are electrons pointlike/structureless? S4d. How much information is in a particle? S4e. Entropy and missing information S4f. How real is the wave function? S4g. How real are Feynman's paths? S4h. Can particles go backward in time? S4i. What about particles faster than light (tachyons)? S4j. Do free particles exist? Chapter 5 (9 sections) S5a. QM pictures and representations S5b. Inequivalent representations of the CCR/CAR S5c. Why does QFT look so different from QM? S5d. Why is QFT based on a classical action? S5e. Why does the action only contain first derivatives? S5f. Why normal ordering? S5g. Why locality and causal commutation relations? S5h. Creation operators and rigged Hilbert space S5i. Why Feynman diagrams? Chapter 6 (8 sections) S6a. Nonperturbative computations in quantum field theory S6b. The formal functional integral approach to QFT S6c. Functional integrals, Wightman functions, and rigorous QFT S6d. Is there a rigorous interacting QFT in 4 dimensions? S6e. Constructive field theory S6f. The classical limit of relativistic QFT S6g. What are interpolating fields? S6h. Hilbert space and Hamiltonian in relativistic quantum field theory *S6i. 2-dimensional quantum field theory Chapter 7 (3 sections) S7a. What is the mass gap? S7b. Why can a bound state of massless quarks be heavy? S7c. Bound states in relativistic quantum field theory Chapter 8 (9 sections) S8a. Why renormalization? S8b. Renormalization without infinities I S8c. Renormalization without infinities II S8d. Renormalization and coarse graining S8e. Renormalization scale and experimental energy scale S8f. Dimensional regularization S8g. Nonrelativistic quantum field theory S8h. Nonrenormalizable theories as effective theories S8i. What about infrared divergences? Chapter 9 (6 sections) S9a. Summing divergent series S9b. Is QED consistent? S9c. What about relativistic QFT at finite times? S9d. Perturbation theory and instantaneous forces S9e. QED and relativistic quantum chemistry S9f. Are protons described by QED? Chapter 10 (13 sections) S10a. How are matrices and tensors related? S10b. Is quantum mechanics compatible with general relativity? S10c. Difficulties in quantizing gravity S10d. Renormalization in quantum gravity S10e. Hadamard states and their Hilbert spaces S10f. Why do gravitons have spin 2? S10g. What is the tetrad formalism? S10h. Energy in general relativity S10i. What happened to the aether? S10j. What is time? S10k. Time in quantum mechanics S10l. Diffeomorphism invariant classical mechanics S10m. The concept of ''Now'' Chapter 11 (7 sections) S11a. A concise formulation of the measurement problem of QM S11b. The double slit experiment S11c. The Stern-Gerlach experiment S11d. The minimal interpretation S11e. The preferred basis problem S11f. Master equation and pointer variables S11g. Does decoherence solve the measurement problem? Chapter 12 (6 sections) S12a. Which interpretation of quantum mechanics is most consistent? S12b. Which textbook of quantum mechanics is best for foundations? S12c. What is the role of quantum logic? S12d. Stochastic quantum mechanics S12e. Is there a relativistic measurement theory? S12f. Quantum mechanics and dice Chapter 13 (10 sections) S13a. Random numbers and other random objects S13b. What is the meaning of probabilities? S13c. What about the subjective interpretation of probabilities? S13d. Are probabilities limits of relative frequencies? S13e. How meaningful are probabilities of single events? S13f. Objective probabilities S13g. How probable are realizations of stochastic processes? S13h. How do probabilities apply in practice? S13i. Incomplete knowledge and statistics S13j. Priors and entropy in probability theory Chapter 14 (4 sections) S14a. Theoretical challenges close to experimental data S14b. Does the standard model predict chemistry? S14c. Is the result of a measurement a real number? S14d. Why use complex numbers in physics? Chapter 15 (5 sections) S15a. How precise can physical language be? S15b. Why bother about rigor in physics? S15c. Justifying the foundations of a theory S15d. Foundations, theory and experiment S15e. Theoretical physics as a formal model of reality Chapter 16 (12 sections) S16a. On progress in science S16b. How different are physical sciences and social sciences S16c. Can good theories be falsified? S16d. What, then, distinguishes a good theory? S16e. When is a theory preferred to another one? S16f. What is a fact? S16g. Physics and experience S16h. Modeling reality S16i. What is a system (e.g., an ideal gas)? S16j. When is a theory confirmed? S16k. What is real? S16l. How many angels fit onto the tip of a needle? Chapter 17 (8 sections) S17a. How to get information from sci.physics.research S17b. How to get your work published S17c. How to respond to critical referee's reports S17d. How to sell your revolutionary idea S17e. Useful background, online lecture notes, etc. S17f. Stories about physicists S17g. Other physics FAQs *S17h. Naming in science Chapter 18 (5 sections) S18a. What is the meaning of 'self-consistent'? S18b. What is a vector? S18c. Learning quantum mechanics at age 14 S18d. Research at age 16 S18e. Are there indefinite Hilbert spaces? Chapter 19 (1 section) S19a. God and physics Chapter 20 (1 section) S20a. Acknowledgments Since March 1, 2005, there is also a related FAQ in German language, Ein Theoretische Physik FAQ http://www.mat.univie.ac.at/~neum/physik-faq.txt where I describe some more topics which I have not translated. (Among other topics, it discusses a new interpretation of quantum mechanics, which I call the 'consistent experiment interpretation'. It gives a new meaning to the foundations of physics, less paradox than the conventional interpretations. I expect to have soon an English version of it.) ---------------------------- S1a. What are bras and kets? ---------------------------- In the language of linear algebra, kets |psi> are just column vectors psi (for systems with finitely many levels only; each component gives the amplitude for the corresponding level), and the corresponding bras , the bra(c)ket, is therefore = phi^*psi = sum_k phi^k^* psi_k. For the basis bra = psi_k. In infinite dimensions, the sum becomes an integral, and we get = integral dx phi(x)^* psi(x) and for the basis bra . Actually, in infinite dimensions, one needs functional analysis in place of linear algebra to get a concise definition; kets are smooth functions from some nice function space, and bras are linear functionals on the dual space. The dual space is larger and also contains distributions. (For those who want to be fully rigorous: kets belong to a so-called nuclear space H_inf, for example the space of Schwartz functions; its closure H under the Euclidean norm gives the conventional Hilbert space, and together with the dual H_inf^* = H_-inf, these define a Gelfand triple or rigged Hilbert space, two names for the same concept). Physicists are less picky, however, and allow kets also to be less smooth functions and even distributions, so that every bra has a corresponding ket. Thus they use the ket |x> although this is not a function but a delta distribution centered at x. This allows them to write not only psi(x) = , but also psi(x)^* = ^* = . The price to be paid is that inner products are no longer well-defined in general; for example, is infinite. They say, |x> is not normalizable and mean that it is not in the Hilbert space of well-behaved pure states. Caution: Physicists often use different bases which may cause confusing notation. For example = 0 if x and y are distinct positions, and = 0 if p and q are distinct momenta, the inner product of a momentum bra (or vice versa) is never zero. (Exercise: Verify this by computing explicit formulas for and !) Thus, unlike in mathematics, the formulas are not invariant under substitution of letters for the variables! About the pitfalls when not using the required care, I recommend reading F. Gieres, Mathematical surprises and Dirac's formalism in quantum mechanics, Rep. Prog. Phys. 63 (2000) 1893-1931. quant-ph/9907069 and G. Bonneau, J. Faraut, G. Valent, Self-adjoint extensions of operators and the teaching of quantum mechanics, Amer. J. Phys. 69 (2001) 322-331. quant-ph/0103153 ---------------------------------------------- S1b. Projective geometry and quantum mechanics ---------------------------------------------- Projective geometry means that one works with rays instead of vectors to designate points in a geometry. Think of the 2-dimensional affine plane. The points are represented by vectors in R^2. On the other hand, by moving an affine plane lying on the floor a little upwards into the air (the same amount at every point), one may think of each point as being represented by the ray from an origin on the floor to the point on the plane. (Actually, instead of the ray one should consider the whole line; strictly speaking, a ray is only a half-line. But in quantum physics, one custonmarily calls the 1-dimensional subspaces rays. Since the coefficient field is complex, the rays are actually rotated complex number planes.) Similarly, lines are now 2-spaces through the origin. This gives projective geometry (or homogeneous coordinates, which is the same in more algebraic terms). But now one also has some additional points, corresponding to rays parallel to the affine plane. These points form the 'line at infinity' = the 2-space through the origin parallel to the affine plane. A slightly closer look reveals that the geometry has become more complete: Now not only every two points have a unique connecting line but also any two lines have a unique intersections - what were before parallels are now lines intersecting 'at infinity'. Imagine two long, straight rails of a railway track... Thic can be extended to higher dimensions. n-dimensional affine geometry can be respresented by rays through 0 in n+1 dimensional space, and can be completed there to a projective geometry, in which the vector subspaces are the geometrical objects. In Hilbert space one cannot count anymore dimensions, but otherwise everything is similar. Since, in quantum mechanics, state vectors are only defined up to a phase (even when normalized), they correspond uniquely to rays = 1-dimensional subspaces in Hilbert space. Hence quantum mechanics is intrinsically projective. ------------------------------------------------------------ S1c. What is the meaning of the entries of a density matrix? ------------------------------------------------------------ Density matrices are a convenient way of describing states of quantum systems in contact with an environment. (State vectors = wave functions are appropriate only for isolated systems at zero absolute temperature, though they can be used in an approximate way in thermally isolated contexts. But contact with an environment means positive temperature.) If the quantum system has only a finite number n of levels, the density matrix is an n x n matrix; otherwise it is a linear operator on Hilbert space (but nevertheless called a matrix). The real use for density matrices is to compute expectations = trace (rho f) for quantities f of interest. Indeed, rho is just a collection of numbers enabling one to calculate these expectations. The fact that the constant 1 must have expectation 1 leads to the restriction that sum_k rho_kk = trace rho = 1. Apart from that, rho must be a Hermitian, positive semidefinite matrix, to satisfy the requirements of statistics. (See quant-ph/0303047 for details.) For small systems, all such density matrices can indeed be approximately realized in practice. Since diagonal entries of a semidefiniteness are always nonnegative, the p_k:=rho_kk are nonnegative numbers summing to 1 and thus look like probabilities. What the components mean depends on the basis used. In particluar, if the basis consists of eigenstates of a Hamiltonian, and the eigenvalues E_k are all nondegenerate, a diagonal element rho_kk can be interpreted as the probability that upon measuring the energy of the system one will find the value E_k. If f is a function of the Hamiltonian H, and the basis used consists of eigenstates |k> of H, with H|k>=E_k|k> then the density matrix rho has entries rho_jk = . If one now calculates the expectation of a function f(H), the equation f(H)|k>=f(E_k)|k> implies that = trace (rho f(H)) = sum_k = sum_k = sum_k f(E_k) = sum_k rho_kk f(E_k). If we average the results f(E) of a number of measurements of the energy, where the energy E_k is measured with probability p_k, we get = sum_k p_k f(E_k). Thus, to match the expectations no matter which function we are averaging, we need to take p_k=rho_kk. This gives the claimed probability interpretation of the diagonal entries. Off-diagonal elements have no simple interpretation. Usually one does not look at off-diagonal elements at all, but they are important in intermediate steps of calculations. Close to absolute zero temperature, and assuming the absence of degeneracy, (but also in certain other, well prepared nearly isolated systems), quantum state have the property that all columns of the density matrix are nearly parallel to a wave function psi that is conventionally normalized to have norm 1, psi^*psi=1. (In Dirac language, this says =1; see the FAQ entry for bras and kets.). This vector psi, which is clearly determined only up to a complex number of absolute value 1, is called the wave vector (or, in infinite dimensions, the wave function) of the state. Idealizing this situation, one describes such quantum systems by states in which all columns of the density matrix are exactly parallel to some nonzero wave vector psi. (Such matrices are called rank 1 matrices; the wave vector, also referred to as a wave function, is defined only up to a phase factor.) Then the k-th column is a multiple c_k psi of psi. The fact that rho is Hermitian forces each row to be a multiple of psi^*. But this implies that c_k is a multiple of phi^*_k, so that rho is a multiple of psi psi^*. Since psi is normalized, the multiplication factor is just the trace, and since the trace is 1 we find rho = psi psi^* for any rank 1 density matrix. If we now calculate the probability of measuring the energy E_k, we find p_k = rho_kk = = = , and since is just the complex conjugate of , we end up with p_k = ||^2. This is Born's squared amplitude formula for calculating probabilities. Thus one sees that the traditional wave vector calculus is just a special case of the density matrix calculus, appropriate (only) for the study of tiny, well-prepared nearly isolated systems and for systems close to zero absolute temperature. For the study of ordinary matter under ordinary conditions, one needs to represent states by density matrices. Everything that is done with wave vectors can also be done with density matrices, or equivalently with the associated expectation mapping. Indeed, everything becomes simpler that way, much closer to classical mechanics, and much less weird-looking. See quant-ph/0303047 for an exposition of the foundations of quantum mechanics (including the probability interpretation, uncertainty relations, nonlocality, and Bell's theorem) in terms of expectations. -------------------------------------------------------- S1d. Postulates for the formal core of quantum mechanics -------------------------------------------------------- Quantum mechanics consists of a formal core that is universally agreed upon (basically being a piece of mathematics with a few meager pointers on how to match it with experimental reality) and an interpretational halo that remains highly disputed even after 80 years of modern quantum mechanics. The latter is the subject of the foundations of quantum mechanics; it is addressed elsewhere in this FAQ. Here I focus on the formal side. As in any axiomatic setting (necessary for a formal discipline), there are a number of different but equivalent sets of axioms or postulates that can be used to define formal quantum mechanics. Since they are equivalent, their choice is a matter of convenience. My choice presented here is the formulation which gives most direct access to statistical mechanics, which is the main tool for real life applications of quantum mechanics. The relativistic case is outside the scope of the present axioms. Thus the following describes nonrelativistic quantum statistical mechanics in the Schroedinger picture. (The traditional starting point is instead the special case of this setting where all states are assumed to be pure.) There are six basic axioms: A1. A generic system (e.g., a 'hydrogen molecule') is defined by specifying a Hilbert space K whose elements are called state vectors and a (densely defined, self-adjoint) Hermitian linear operator H called the _Hamiltonian_ or the _energy_. A2. A particular system (e.g., 'the ion in the ion trap on this particular desk') is characterized by its _state_ rho(t) at every time t in R (the set of real numbers). Here rho(t) is a Hermitian, positive semidefinite (trace class) linear operator on K satisfying at all times the conditions trace rho(t) = 1. (normalization) A state is called _pure_ at time t if rho(t) maps K to a 1-dimensional subspace, and _mixed_ otherwise. A3. A system is called _closed_ in a time interval [t1,t2] if it satisfies the evolution equation d/dt rho(t) = i/hbar [rho(t),H] for t in [t1,t2], and _open_ otherwise. (hbar is Planck's constant, and is often set to 1.) If nothing else is apparent from the context, a system is assumed to be closed. A4. Besides the energy H, certain other (densely defined, self-adjoint) Hermitian operators (or vectors of such operators) are distinguished as _observables_. (E.g., the observables for an N-particle system conventionally include for each particle a involved several 3-dimensional vectors: the _position_ x^a, _momentum_ p^a, _orbital_angular_momentum_ L^a and the _spin_vector_ (or Bloch vector) sigma^a of the particle with label a. If u is a 3-vector of unit length then u dot p^a, u dot L^a and u dot sigma^a define the momentum, orbital angular momentum, and spin of particle a in direction u.) A5. For any particular system, one associates to every vector X of observables with commuting components a time-dependent monotone linear functional _t defining the _expectation_ _t:=trace rho(t) f(X) of bounded continuous functions f(X) at time t. This is equivalent to a multivariate probability measure dmu_t(X) (on a suitable sigma algebra over the spectrum spec(X) of X) defined by integral dmu_t(X) f(X) := trace rho(t) f(X) =_t. A6. Quantum mechanical predictions amount to predicting properties (typically expectations or conditional probabilities) of the measures defined in axiom A5 given reasonable assumptions about the states (e.g., ground state, equilibrium state, etc.) Axiom A6 specifies that the formal content of the theory is covered exactly by what can be deduced from axioms A1-A5 without anything else added (except for restrictions defining the specific nature of the state), and hence says that Axioms A1-A5 are complete. The description of a particular closed system is therefore given by the specification of a particular Hilbert space in A1, the specification of the observable quantities in A4, and the specification of conditions singling out a particular class of states (in A6). Everything else is determined by the theory and hence is (in principle) predicted by the theory. The description of an open system involves, in addition, the specification of the details of the dynamical law. (For the basics, see the entry 'Open quantum systems' in this FAQ.) In addition to these formal axioms one needs a rudimentary interpretation relating the formal part to experiments. The following _minimal_interpretation_ seems to be universally accepted. MI. Upon measuring at times t_l (l=1,...,n) a vector X of observables with commuting components, for a large collection of independent identical (particular) systems closed for times t 0, all terms exp(-E_k/T)/Z(T) become 0 or 1, with 1 only for the k corresponding to the states with least energy Thus, if the ground state psi_1 is unique, lim_{T->0} rho(T) = psi_1 psi_1^*. This implies that for low enough temperatures, the equilibrium state is approximately pure. The larger the gap to the second smallest energy level, the better is the approximation at a given nonzero temperature. In particular (reinstalling the Boltzmann constant kbar), if the energy gap exceeds a small multiple of E^* := kbar T the approximation is good. States of simple enough systems with a few levels only can often be prepared in nearly pure states, by realizing a source governed by a Hamiltonian in which the first excited state has a much larger energy than the ground state. Dissipation then brings the system into equilibrium, and as seen above, the resulting equilibrium state is nearly pure. To see how the more traditional setting in terms of the Schroedinger equation arises, we consider the case of a closed system in a pure state rho(t) at some time t. If psi(t) is a unit vector in the range of the pure state rho(t) then psi(t), called the _state_vector_ of the system is determined up to a phase, and one easily verifies that rho(t) = psi(t)psi(t)^*. Remarkably, under the dynamics for a closed system specified in the above axioms, this property persists with time (only) if the system is closed, and the state vector satisfies the Schroedinger equation i hbar psi(t) = H psi(t) Thus the state remains pure at all times. Moreover, if X is a vector of observables with commuting components and the spectrum of X is discrete, then the measure from axiom A5 is discrete, integral dmu(X) f(X) = sum_k p_k f(X_k) with nonnegative numbers p_k summing to 1, commonly called _probabilities_. Moreover, associated with the p_k are eigenspaces K_k such that X psi = X_k psi for psi in K_k, and K is the direct sum of the K_k. Therefore, every state vector psi can be uniquely decomposed into a sum psi = sum_k psi_k with psi_k in K_k. psi_k is called the _projection_ of psi to the eigenspace K_k. A short calculation using axiom A5 now reveals that for a pure state rho(t)=psi(t)psi(t)^*, the probabilities p_k are given by the so-called _Born_rule_ p_k = |psi_k(t)|^2, (*) where psi_k(t) is the projection of psi(t) to the eigenspace K_k. Deriving the Born rule (*) from axioms A1-A5 makes it completely natural, while the traditional approach starting with (*) makes it an irreducible rule full of mystery and only justifiable by its agreement with experiment. ------------------------- S1e. Open quantum systems ------------------------- Open quantum systems are usually modelled in a stochastic way to account for the unpredictability of the measurement process. (Note that a measurement is any non-negligible interaction with the environment, whether or not it is observed by something deserving the name 'detector' or 'observer'). In the simplest setting in which states can be assumed to be pure and measurements occur at definite, a priori known times and have a negligible duration, an open quantum system is a discrete stochastic process with values psi(t) in the Hilbert space of state vectors, normalized to norm 1. Between two consecutive measurements, the system is assumed to be closed. Thus between two consecutive measurements at times t' and t''>t', the normalized state psi(t) evolves according to the Schroedinger equation i hbar psidot = H psi, so that psi(t''-0)= P psi(t'+0), P = exp (i/hbar (t'-t'')H). (1) (In the interaction picture, H=0 and psi remains constant between measurements.) A measurement at time t is assumed to happen in infinitesimal time and replaces psi(t-0) independent of other measurements with probability p_s by psi(t+0)= P_s psi(t-0)/p_s if p_s>0, (2) where the P_s are linear operators determined by the experimental arrangement, satisfying the relation sum_s P_s^*P_s = 1, (3) and p_s=|P_s\psi(t-0)|^2 (4) guarantees that psi(t+0) remains normalized. Clearly the p_s are nonnegative and by (3), they sum up to 1 (since psi(t-0) is normalized). (For measurements with more than countably many possible outcomes, one must replace the probabilities by probability densities and the sums by integrals.) Thus this is a well-defined stochastic process. A von-Neumann measurement of a self-adjoint linear operator A corresponds to the special case where P_s is an orthogonal projector to the eigenspace corresponding to the eigenvalue a_s of A (respective to the set of eigenvalues corresponding to the s-th interval in a partition of the continuous spectrum of A.) If the measurement at different times has the same (or different) nature, the P_s at these times are the same (or different). It is possible to introduce 'empty measurements' at arbitrary intermediate times with a trivial sum over a singleton s, where P_s=1. For continuous measurements (where the open system cannot be considered closed at all but a discrete number of times), one needs to take a continuum limit of the above description. Depending how one takes the limit, one gets quantum diffusion processes or quantum jump processes. In this case, the density matrix for the associated deterministic expectation evolves according to a Lindblad dynamics. Realistic measurements (i.e. those taking into account the unavoidable uncertainty) are not modelled by von-Neumann measurements, but rather by positive operator valued measures, short POVMs. These are well explained in http://en.wikipedia.org/wiki/POVM For more on real measurement processes (as opposed to the von-Neumann measurement caricature treated in typical textbooks of quantum mechanics), see, e.g., V.B. Braginsky and F.Ya. Khalili, Quantum measurement, Cambridge Univ. Press, Cambridge 1992 --------------------------------- S1f. Interaction with a heat bath --------------------------------- Quantum mechanics in the presence of a heat bath requires the use of density matrices. Instead of the usual von-Neumann equation rhodot = rho \lp H (for \lp see the section on 'Quantum-classical correspondence'), the dynamics of the density matrix is given by a dissipative version of it, rhodot = rho \lp H + L(rho) usually associated with the name of Lindblad. Here L(rho) is a linear operator responsible for dissipation of energy to the heat bath; it is not a simple commutator but can have a rather complex form. To get the Lindblad dynamics from a Hamiltonian description of system plus bath, one uses the projection operator formalism. The clearest treatment I know of is in H Grabert, Projection Operator Techniques in Nonequilibrium Statistical Mechanics, Springer Tracts in Modern Physics, 1982. The final equations for the Lindblad dynamics are (5.4.48/49) in Grabert's book. -------------------------------- S1g. Quantum-classical mechanics -------------------------------- Quantum mechanics and classical mechanics are very close relatives. There are analogous objects for everything of relevance in classical and quantum statistical mechanics. Observable f: classical - real phase space function f(x,p) quantum - Hermitian linear operator or sesquilinear form f Lie product f \lp g: read \lp as 'Lie', and visualize it as inverted, stylized L; Macro for LaTeX: \def\lp{\mbox{\Large$\,_\urcorner\,$}} classical: f \lp g = {g,f} in terms of the Poisson bracket quantum: f \lp g = i/hbar [f,g] in terms of the commutator The Lie product is bilinear in the arguments and satisfies f \lp g = - g \lp f f \lp gh = (f \lp g)h + g(f \lp h) (Leibniz) f \lp (g \lp h) = (f \lp g) \lp h + g \lp (f \lp h) (Jacobi) Invariant measure: classical - integral f := integral dxdp f(x,p) quantum - integral f := trace f Integrability: integral |f| finite quantum integrable <==> f trace class Partial integration formula: integral f \lp g = 0. Dynamics: df/dt = X_H f := H \lp f with Hermitian H canonical transformations = mappings exp(tX_H) with Hermitian H Liouville's theorem says that integral f = integral exp(tX_H)f The infinitesimal form of this is the partial integration formula. State rho: classical - real integrable phase space function rho(x,p)>=0 quantum - Hermitian positive semidefinite trace class operator rho both normalized to integral rho = 1. expectation of f in state rho: = integral rho f -------------------------------------------------- S1h. Can all quantum states be realized in Nature? -------------------------------------------------- No. Many mathematically conceivable states do not exist in Nature, for example, that of water at an absolute temperature of zero. Quantum mechanics does not demand that all states are realizable. For a number of tiny systems with a few levels, all states are realizable with reasonable precision. However, the larger the system the fewer states are realized. The number of states realized at a given time of very large systems such as human beings or galaxy clusters is even so small that it can be approximately counted! -------------------------------------------- S1i. Modes and wave functions of laser beams -------------------------------------------- The physical state described by a typical laser beam is a state with an indeterminate number of photons, since it is usually not an eigenstate of the photon number operator. This essentially means that in a beam, a certain number of photons cannot be meaningfully asserted; instead, one has a meaningful photon density, referred to as the beam intensity. Thus the traditional N-particle picture does not apply. Instead one has to work in a suitable Fock space. The Maxwell-Fock space is obtained by 'second quantization' of the mode space H_photon, consisting of all mode functions, i.e., solutions A(x) of the free Maxwell equations, describing a classical background electromagnetic field in vacuum. H_photon may be thought of as the single photon Hilbert space, in analogy to the single electron Hilbert space of solutions of the Dirac equation. (However, following up on this analogy and calling A(x) a wave function leads to confusion later on, and is best avoided.) Actually, because of gauge invariance, the situation is slightly more complicasted, and best described in momentum space. The Maxwell equations reduce in Lorentz gauge, partial dot A(x) = 0, to partial^2 A(x)=0, whence the Fourier transform of A(x) has the form delta(p^2) Ahat(p), and Ahat(p) must satisfy the transversality condition p dot Ahat(p) = 0. By gauge invariance, only the coset of Ahat(p) obtained by adding arbitrary multiples of p has a physical meaning, reflecting the transversal nature of the free electromagnetic field. This coset construction is needed to turn the space of modes into a Hilbert space H_photon with invariant inner product = integral Ahat(p) dot Bhat(p) Dp, where Dp = d\p/p_0 = dp_1 dp_2 dp_3/p_0, is the Lorentz invariant measure on the photon mass shell, 0 < p_0 = |\p| = sqrt(p_1^2+p_2^2+p_3^2) (negative frequencies are discarded to get an irreducible representation of the Poincare group). Indeed, without the coset construction, the inner product is only positive semidefinite, hence gives only a pre-Hilbert space. Each (sufficiently nice) mode function A(x) gives rise to a coherent state ||A>> in the Maxwell-Fock space, to an associated annihilation operator a(A) = integral Ahat(p) a(p) Dp, where a(p) is the QED annihilation operator for a photon with momentum p, and to the corresponding creation operator a^*(A) = a(A)^*. The annihilation and creation operators a(A) and a^*(A) produce a single-mode Fock subspace consisting of all |A,psi>, where psi is the unnormalized wave function of a harmonic oscillator; |psi|^2 is the intensity of the beam. The coherent state itself corresponds to the normalized vacuum state of the harmonic oscillator, ||A>> = |A,vac>. If psi is a Hermite polynomial H_k, |A,psi> is an eigenstate of the photon number operator with eigenvalue k, and one has a k-photon state. The Maxwell-Fock space is the closure of the space spanned by all the |A,psi> together (and indeed, already the closure of the space spanned by all ||A>>). This space is the pure electromagnetic field sector of QED, describing a physical vacuum, i.e., a region of the universe where matter is absent though radiation may be present. In optics experiments, laser beams are often idealized by ignoring their extension perpendicular to the transmission direction. Then each beam can be described by some |A,psi>. In particular, for a monochromatic beam, A is a plane wave, A(x)=A_0 exp(-i p dot x). Of course, this matches the original approximation that we have a beam only with a grain of salt, since a plane wave is not normalized. A coherent pair of laser beams obtained by splitting is described by a superposition |A_1,psi_1> + |A_2,psi_2> of the two beams. Beams of thermal light (such as that from the sun) and pairs of beams created by independent sources, cannot be described by wave functions alone, but need a density formulation. A single light beam is then described (in the same idealization) by a mode A and a density matrix rho in a single-mode Fock space, while k light beams are described by k modes A and a density matrix rho in a k-mode Fock space. In many treatments, the modes are left implicit, so that one works only in the k-mode Fock space. This simplifies the presentation, but hides the connection to the more fundamental QED picture. For a thorough study of the latter, see the bible on quantum optics, L. Mandel and E. Wolf, Optical Coherence and Quantum Optics, Cambridge University Press, 1995. ------------------------------------ S1j. Classical and quantum tunneling ------------------------------------ Consider a particle in an external potential. Assume the potential is everywhere finite, locally constant and positive near the origin, and decays to zero far away. There is no force, when the motion is deterministic and classical. In practice, however, the classical, deterministic setting is an approximation only, and the particle makes random motions. Thus it moves away from the origin and will sooner or later reach the nonconstant part of the potential. With low probability p, it will even escape over any barrier; roughly, log p is proportional to the negative barrier height. For details, you might wish to consult my paper A. Neumaier, Molecular modeling of proteins and mathematical prediction of protein structure, SIAM Rev. 39 (1997), 407-460. http://www.mat.univie.ac.at/~neum/papers/physpapers.html#protein and the references there. Quantum mechanically, there is always a probability of escaping to infinity, without assuming any approximations. This is called tunneling. In both cases, once the particle is in the infinite region, the probability that it returns is zero. Thus a positive potential drives a particle in the long run off to infinity (though, in case of a high barrier, one has to wait a long time). In particular, in the classical case one also has a form of (stochastic) tunneling. Thus it is justified to refer to a potential such as the above as repelling. However, no one would object if you call a potential repelling _only_ in the neighborhood of a strict local minimizer, i.e., close to a metastable state. Of course, a golf ball sitting on top of a flat hill will not move down the hill; because of friction it remains in a metastable state. Thus the above is an idealization. But most of physics is idealized, and the language is also somewhat idealized (and, as actually used by people, not even completely precise). ---------------------------------------------- S1k. Quantization in non-Cartesian coordinates ---------------------------------------------- Textbook quantization rules assume (often silently, without warning) Cartesian coordinates. The rules derived there are based on canonical commutation rules and are invalid for systems described in other coordinate systems. In particular, a Hamiltonian alone does not have a physical meaning since it can be quite arbitrarily transformed by coordinate transformations. The Hamiltonian needs to be combined with the correct Poisson bracket to yield the correct dynamical equations. Only if the classical Poisson bracket satisfies the canonical commutation rules, the quantum mechanics is obtained by imposing canonical commutation rules on the commutators. The standard quantization procedure assumes that the symplectic form underlying the Hamiltonian description has the standard form p dq - q dp. Under a coordinate transformation, the symplectic form changes into something nonstandard, and naive quantization gives wrong results. To get correct results, one has to take account of the correct symplectic structure, more precisely of the Poisson bracket defined by it. This is most naturally done in a differential geometric setting, in terms of symplectic manifolds and Poisson manifolds. To proceed, one must quantize a symplectic (or a Poisson) manifold together with a Hamiltonian defined on it. This combination is invariant under coordinate transformations and hence has a coordinate-independent geometric meaning. How to quantize Hamiltonians on a symplectic (or a Poisson) manifold is the subject of geometric quantization, about which there is a significant literature. ------------------------ S1l. Second quantization ------------------------ Second quantization is a way of writing the quantum mechanics of indistinguishable particles in such a way that it makes statistical mechanics calculations easy and makes everything look like field theory. One starts with a distinguished vacuum state |vac> and a family of annihilation operators a(x) whith their adjoints, the creation operators a^*(x), satisfying the canonical commutation relations (CCR) [a(x),a(y)]=[a^*(x),a^*(y)]=0, [a^(x),a^*(y)]=delta(x-y). (This is for Bosons; for Fermions one has instead canonical anticommutation relations, CAR, and everything below gets additional minus signs in certain places.) A pure (permutation symmetric) N-particle state with wave function psi(x_1:N) is written in 2nd quantization as psi = integral dx_1:N psi(x_1:N) a^*(x_1:N) |vac>, hence the corresponding density matrix rho = psi psi^* takes the form rho = integral dx_1:N dy_1:N rho(x_1:N,y_1:N), where rho(x_1:N,y_1:N) is the rank one operator psi(x_1:N)psi^*(y_1:N)a^*(x_1:N)|vac> = integral dx dy f(x,y) defines the 1-particle density matrix Rho. The form of f in second quantization is f = integral dx dy f(x,y) a^*(x) a(y) (exercise: check that it has indeed the desired action on an N-particle state!), hence one has = integral dx dy f(x,y) . and comparison with the definition of Rho gives the formula = = trace a(x) rho a^*(y), which can therefore be viewed as the definition of the 1-particle density matrix in second quantization. Authers who fear integrals write instead similar formulas with sums in place of integrals and discrete indices in place of the x,y. Also, one can do the same in momentum space rather than position space, which amounts to a change of basis but generally leads to computationally more tractable formulations. ----------------------------------- S1m. When is an object macroscopic? ----------------------------------- One says that thermodynamics and statistical mechanics apply to macroscopic objects. But when is an object macroscopic? Thermodynamics and statistical mechanics are approximate, asymptotic descriptions valid for 'sufficiently large' objects. The approximations made are better and better the larger the object. One can place the barrier anywhere; if one puts it too low, the approximate description will be poor, if one puts it too high it won't apply to the system of interest. Thus the loose language accommodates the freedom in modeling the user has when choosing the description level and the accuracy level. It is only in the same sense subjective as is the choice of a system of interest. What is interesting for one person or investigation may be different from what is interesting for another person or investigation; nevertheless, both may employ objective tools. The mathematical meaning underlying this loose language is called the thermodynamic limit. It makes the term 'macroscopic' precise in a similar way as the mathematical notion of a limit N->inf makes the term 'N sufficiently large' precise. If one accepts the vague terminology to avoid talking always about limits, one can give the following definition (which reflects the subjectivity in the qualification about the modeling accuracy): In statistical mechanics, all macroscopic observables are ensemble averages. Thus, formally, a "macroscopic observable" is the expectation of a space-time dependent field operator which remains constant within the modeling accuracy under changes in space and time smaller than the modeling accuracy. --------------------------------------- S1n. The role of the ergodic hypothesis --------------------------------------- Statistical mechanics textbook often invoke the so-called ergodic hypothesis (assuming that every phase space trajectory comes arbitrarily close to every phase space point with the same values of all conserved variables as the initioal point of the trajectory) to derive thermodynamics from the foundations. However, textbook statistical mechanics gives only a gross simplification of the power of thermodynamics. The ergodic hypothesis is not needed to make thermodynamics valid. Indeed, the ergodic hypothesis is invalid in many cases - namely always when the system needs additional variables to be thermodynamically described. This is the case for fluids near the critical point, for finite objects at their surfaces, for systems with interfaces, for metastable states, for molecular systems in the absence of chemical reactions (here the number of molecules of each species is conserved), etc. But this does not invalidate thermodynamics - the latter only requires that a sufficiently large set of macroscopic variables (in the above sense) is included in the list of thermodynamic variables. Indeed, traditional thermodynamics accounts for molecules, surface tension, metastability, etc., without any change to the formalism. Probably the ergodic hypothesis, restricted to a limited piece of a submanifold of the phase space with fixed values of the macroscopic variables (whether conserved or not) is ''roughly'' equivalent to the completeness of the set of distinguished macroscopic observables, in the sense that every other macroscopic observable can be defined in terms of the distinguished ones. But ... 1. It is the latter property (only) which can be checked experimentally: Completeness holds if and only if the properties of the system under study are indeed predicable by the thermodynamics of the distinguished observables. Experiment (or experience), together with simplicity of the description, decides in _all_ practical situations what is the set of distinguished observables. Indeed, we refine a model whenever we discover significant deviations from the thermodynamical behavior of a previous simpler model. Thus thermodynamics takes the form of a setting for describing material properties to which any successful description has to conform by axiomatic decree. 2. The ergodic hypothesis can be proved only for extremely simple systems. In particular, these systems must conform to classical mechanics - there is no simple quantum version of ergodic dynamics. Moreover, there are many classical systems which are chaotic only in part of their phase space - they are probably not ergodic, as the number of conserved quantities depends on where in the phase space one is. 3. Thermodynamics applies also for nearly conserved quantities, where the ergodic argument becomes vague; conversely, near ergodicity (up to the model accuracy) is enough to make a thermodynamic description valid. In particular, thermodynamics applies near a critical point where there cannot be an ergodic argument since there is no extra conserved quantity but an order parameter is needed to give a correct description. (At which distance from the critical point should one ignore the order parameter? Ergodic arguments have nothing to say here.) 4. There are studies about the nonergodic behavior of supercooled liquids, e.g., Phys. Rev. A 43, 1103 - 1106 (1991). Thus I think it is best to ignore the ergodic hypothesis as a means for explaining statistical mechanics, except in some simple model cases. It should have no deeper relevance than the hard sphere model of a monatomic gas (which has been shown to be ergodic, I believe). ---------------------------------------------------- S1o. Does quantum mechanics apply to single systems? ---------------------------------------------------- It is clear phenomenologically that statistical mechanics (and hence quantum mechanics) applies to single systems like a particular cup of tea, irrespective of what the discussions about the foundations of physics say (see many other entries in this FAQ). Thus statistical mechanics and quantum mechanics do not only apply - as is often claimed - to large ensembles of independently and identically prepared systems; when the system is large enough (i.e., macroscopic), a _single_ system is enough. (For smaller single systems, see the entry ''How do atoms and molecules look like?'' in the present FAQ.) In classical statistical mechanics, the traditional bridge between the ensemble view and thermodynamics (which clearly applies to single systems) is the ergodic hypothesis. But there is not enough time in the universe to explore more than an extremely tiny region of the about 10^25-dimensional phase space of the cup of tea to explain the success of the thermodynamical description by ergodicity. In quantum mechanics, the situation is even worse - usually it is not even attempted here to bridge the gap. The best treatment I know of the foundational problems involved in classical statistical mechanics is in the book L. Sklar, Physics and Chance, Cambridge Univ. Press, Cambridge 1993. but it does not present a solution. Other sources are not better in this respect. My own solution is the ''thermal interpretation'' of physics, discussed to some extent in Chapter 7 of the book Arnold Neumaier and Dennis Westra, Classical and Quantum Mechanics via Lie algebras, Cambridge University Press, to appear (2009?). http://www.mat.univie.ac.at/~neum/papers/physpapers.html#QML arXiv:0810.1019 and in my recent slides A. Neumaier, Classical and quantum field aspects of light, http://www.mat.univie.ac.at/~neum/papers/physpapers.html#lightslides and A. Neumaier, Optical models for quantum mechanics, http://www.mat.univie.ac.at/~neum/papers/physpapers.html#optslides and explored in more detail in my German Ein Theoretische Physik FAQ http://www.mat.univie.ac.at/~neum/physik-faq.txt under the name ''consistent experiment interpretation'' The key idea is that mathematical expectation has two different interpretations in physics, one as average over a large number of cases, and the other as a means of defining observables. That the two interpretations have the same mathematical properties is the reason they have been confused in the past. The thermal interpretation separates them neatly and thus gets rid of most of the confusing aspects of the foundations of physics. ----------------------------------------- S1p. Dissipative dynamics and Lagrangians ----------------------------------------- Any system of ordinary differential equations can be brought into an artificial Lagrangian form, by first rewriting it in first order form F(q,q')=0 doubling the degrees of freedom by introducing conjugate variables p, and then considering the Lagrangian L(p,q)= p^T F(q,q'). In particular, this provides a Lagrangian formulation of dissipative systems, such as the damped harmonic oscillator m q'' + c q' + k q = 0 (m,c,k >0) Unfortunately, the Hamiltonian in such a formulation has nothing to do with the physical energy E = (m q'^2 + k q^2)/2 The same holds for various other representations for the damped harmonic oscillator found in the literature. Lagrangians for the damped harmonic oscillator go back to H. Bateman, Phys. Rev. 38, 815-819 (1931); the treatise P.M. Morse and H. Feshbach, Methods of Theoretical Physics MacGraw-Hill, Boston 1953 discusses the procedure in Chapter 3 in terms of 'mirror images' = additional dynamical variables needed to absorb the missing energy, and remarks on p 313: ''The introduction of the mirror image ... is probably too artificial a prcedure to expect to obtain much of physical significance from it.'' And indeed, the book doesn't make use of it anywhere. Having a formal Lagrangian or Hamiltonian is no virtue in itself. In particular, for a _quantum_ system, the Hamiltonian _must_ be the energy. Playing around with alternative Lagrangians and Hamiltonians may be amusing, but does not produce relevant physics. Since dissipative equations (like the diffusion equation or the damped harmonic oscillator) describe open systems (where energy is lost to an unspecified environment), they cannot be described by a Schroedinger equation. Classically, dissipative systems are described by stochastic differential equations (and their equivalent deterministic Fokker-Planck equations) or master equations; the diffusion equation is the particular case of a Fokker-Planck equation for Brownian motion. Quantum mechanically, dissipative systems are described by stochastic Schroedinger equations or, corresponding to the Fokker-Planck level, by quantum Liouville equations with Lindblad terms. This gives correct physics in a dissipative environment. Many quantum optical systems are directly modeled on the Lindblad level, where the terms have an understandable and experimentally verifiable meaning independent of any underlying more microscopic model. An important recent example is that of photons on demand, M. Keller, B Lange, K Hayasaka, W Lange and H Walther, A calcium ion in a cavity as a controlled single-photon source, New Journal of Physics 6 (2004), 95. There is no trace of a Lagrangian in the modeling, and indeed, a useful Lagrangian formulation does not exist - unless one extends the dynamics and explicitly includes the environment. Of course, in theory, a dissipative system is thought to be a contracted version of a bigger conservative system which includes the envoironment, and in simple situations, this theoretical view can indeed be substantiated. If one models the dissipative environment explicitly, on gets a bigger conservative system, not a dissipative system. Of course, this conservative system has a Hamiltonian or Lagrangian description, but it does not describe the dissipative system alone. When one contracts it to the degrees of freedoms of the original system, one gets an integro-differential equation with memory, which is no longer described by a physically meaningful Hamiltonian or Lagrangian framework. The reduced dynamics takes the exact form m x''(t) + k x(t) = int_0^t G(s) x(t-s) ds + F(t). with functions F(t) (the noise caused by the environment) and G(s) (the memory kernel) that depend on the state of the environment. If the interaction is of the usual, dissipative nature then both F(t) and G(s) are extremely oscillating, even for intervals short compared to the inverse frequency T of the oscillator. But the short time averages of the memory Kernel have an exponentially decaying bound on their size and become negligible after some relaxation time tau << T. Thus it suffices in a good approximation to take the integral from s=0 to s=tau only. This allows us to expand x(t-s) in a second order Taylor expansion (valid since s<=tau<0, recovering the traditional equation for the damped harmonic oscillator, including a stochastic force term. (Its size can be related to the damping coefficient and the temperature of the environment, a relation known as the fluctuation-dissipation theorem.) A thorough discussion of the reduction of microscopic conservative large systems to dissipative subsystems of interest is given in H Grabert, Projection Operator Techniques in Nonequilibrium Statistical Mechanics, Springer Tracts in Modern Physics, 1982 at a much more general level that also applies for many other dissipative systems. There are cases where one needs to model the memory to capture the essence of the reduced dynamics. But in many cases, a simpler, memory-free description is possible and adequate. One can remove the memory by employing a Markov approximation, and gets again a differential equation, which defines the Lindblad (or, classicallally, the Focker-Planck) dynamics. Again, this is no longer described by a Hamiltonian or Lagrangian framework. In the extended formulation with explicit environment or with memory, already a simple damped harmonic oscillator becomes a huge and unwieldy dynamical system which is no longer equivalent to the damped harmonic oscillator, but includes unwanted environment terms or memory terms. In cases where one really needs to model the memory, the system therefore is no longer a damped harmonic oscillator. The latter is described by a simple linear constant coefficient second order differential equation for a single function, and has no memory. Its analysis is very simple, and compared to that any more detailed description is unwieldy. In practice, the dissipative formulation therefore stands by itself (apart from lip service paid to a hypothesized more fundamental conservative description). The situation is similar to that in fluid dynamics. In theory, the Navier-Stokes equations (which are dissipative) should be derivable from a Lagrangian. Indeed, such derivations have been given, but only for very simple model problems such as an ideal gas. However, there is no microscopic derivation of the Navier-Stokes equations in the practically interesting case of water at room temperature... --------------------------------------------------------------------- S1q. How can QM be stochastic while the Schroedinger equation is not? --------------------------------------------------------------------- The Schroedinger equation is a deterministic wave equation. But when we set up an experiment to measure either position or momentum, we get uncertain, stochastic outcomes. So - is quantum mechanics deterministic or stochastic? One has to be careful in the interpretation of the foundations... Fortunately, the same apparent paradox already occurs in classical physics; hence the paradox cannot have anything to do with the peculiarities of quantum mechanics. Indeed, a Focker-Planck equation is a deterministic partial differential equation. But when measuring a process modelled by it - such as the position of a grain of pollen in Brownian motion -, we get only probabilistic results. Now Focker-Planck equations are essentially equivalent to classical stochstic differential equations. So - do they describe a deterministic or a stochastic process? The point resolving the issue is that, both in stochstic differential equations and in quantum mechanics, probabilities satisfy deterministic equations, while the quantities observed to deduce the probabilities do not. Thus, in both cases, probabilities are deterministic ''observables'' while the position of a grain of pollen in classical mechanics, or position and momentum in quantum mechanics, ar not. ---------------------------------------- S1r. Measurement theory for real numbers ---------------------------------------- The standard textbook measurement theory says that the possible measurement results in measuring an observable given by a Hermitian operator A are its possible eigenvalues, with a probability density depending on the state of the system. This is part of the content of Born's rule, and counts as one of the cornerstones of the interpretation of quantum mechanics. But Born's rule gives only a very idealized account of measurement theory, and gives no sufficient explanation for what is going on in many nontrivial measurements. The spectrum of the Hamiltonian of the electron of a hydrogen atom has a discrete part, catering for its bound states. According to the idealized textbook measurement theory, a measurement of the energy of a bound state should produce an infinitely accurate value agreeing with one of the values in the (QED-corrected) Balmer (etc.) series. But this is ridiculous. Repeated preparation and measurement of the position of the ``same'' spectral lines (which provide these energy measurements, relative to an appropriate zero of the energy) yields different results, from which the energies themselves can be obtained only to a certain accuracy. Thus Born's rule does not account for the interpretation of a measurement of the energy of an electron. For similar reasons, measurements of particle masses or resonance energies do not reveal the exact values (which they should according to Born's rule) but only approximations whose quality depends a lot on the way the measurement is done (an aspect that does not figure at all in Born's rule). Measurements such as that of a particle lifetime or the integral cross section of a particular reaction do not even have a natural associated operator of which the measurement result would be an eigenvalue. The idealized textbook measurement theory based on Born's rule is appropriate only for the measurement of spin and related variables that result in recording decisions of finite information content. Thus the measurement process as described by von Neumann (and copied from there to numerous textbooks) is an unrealistic idealization compared with many (and probably most) real measurements. The latter are usually much better described by suitable POVMs (positive operator valued measures) rather than by Born's rule, which corresponds to PVMs (projection-valued measures), a special case of POVMs in which the positive operators are in fact projections. See Sections 7.3-7.5 of the book A. Neumaier and D. Westra, Classical and Quantum Mechanics via Lie algebras, arXiv:0810.1019 for a realistic account of measurement theory not dependent on Born's rule. The latter is derived there as a special case, together with giving the condition in which it is applicable. --------------------------------------------- S1s. The classical limit of quantum mechanics --------------------------------------------- Classical mechanics is often seen as the formal limit hbar-->0 of quantum mechanics. Strictly speaking, this cannot be true since hbar is a constant of nature, which is often even set to one to have convenient units. The classical limit really is the limit of large quantum numbers M (typically of mass, number of particles, or size of angular momentum), when attention is limited to quantities whose uncertainties are small compared to their expectations. In these situations, the effect is similar to taking the limit hbar --> 0. In these cases the relative uncertainties scale with sqrt(hbar/M), which becomes small if either hbar is made formally tiny or if M is large. Indeed, a quantum system is essentially classical if its relevant quantities have uncertainties that are small compared to their expectations. The relation between classical mechanics is most easily seen if -- as in statistical mechanics -- quantum mechnaics is presented in terms of mixed states, which correspond to density matrices. (Almost all quantum mechanics applied to real systems not in the ground state needs density matrices, since pure states are very difficult to create and propagate unless a system is in the ground state. Pure states describe only an idealized version of quantum reality, which in statistical mechanics appears as the approximation in the cold limit T-->0.) Density matrices are intrinsically quantum mechanical. Nevertheless they exhibit very close analogies to classical densities. Therefore everyone interested in the relations between classical and quantum mechanics is well-advised to look at both theories in the statistical mechanics version, where the analogies are obvious, and the transition from quantum to classical takes the form of a simple approximation. QM in the statistical mechanics version is almost as intuitive as classical statistical mechanics. The only somewhat nonintuitive part is in both cases how to interpret probability. (This is already a severe problem in classical statistical mechanics, as the book by Laurence Sklar, Physics and Chance, explains in detail.) A density matrix describes the stochastic behavior of a quantum system in the same way as a density function describes the stochastic behavior of a classical system. In both cases, if the system is nice enough that the stochastic uncertainties (square roots of variances) in the quantities of interest are much smaller than the quantities themselves, one can form a deterministic approximation. This deterministic approximation is given by a classical dynamical system for the (expectations of the) quantities of interest. Thus, in a sense, classical variables are simply expectations of relevant quantum variables with small uncertainty. Then (and only then) is a deterministic approximation adequate. The small uncertainty makes these variables approximately predictable in each individual event, and hence classical. Classicality therefore develops whenever the uncertainties of the quantities of interest become small compared to their expectations. Of course, there is significant interest in quantum systems where this does not happen, since these are decidedly non-classical, but quantum theory gets its strange, counterintuitive feature only when one concentrates on these systems only. For more details, see, e.g., Sections 7.3-7.5 of A. Neumaier and D. Westra, Classical and Quantum Mechanics via Lie algebras http://de.arxiv.org/abs/0810.1019 -------------------------------------------- S1t. The classical limit via coherent states -------------------------------------------- One method for producing classical mechanics from a quantum theory is by looking at coherent states of the quantum theory. The standard (Glauber) coherent states have a localized probability distribution in classical phase space? whose center follows the classical equations of motion when the Hamiltonian is quadratic in positions and momenta. (For nonquadratic Hamiltonians, this only holds approximately over short times. For example, for the 2-body problem with a 1/r^2 interaction, Glauber coherent states are not preserved by the dynamics. In this particular case, there are, however, alternative SO(2,4)-based coherent states that are preserved by the dynamics, smeared over Kepler-like orbits. The reason is that the Kepler 2-body problem -- and its quantum version, the hydrogen atom -- are superintegrable systems with the large dynamical symmetry group SO(2,4).) In general, roughly, coherent states form a nice orbit of unit vectors of a Hilbert space H under a dynamical symmetry group G with a triangular decomposition, such that the linear combinations of coherent states are dense in H, and the inner product phi^*psi of coherent states phi and psi can be calculated explicitly in terms of the highest weight representation theory of G. The diagonal of the N-th tensor power of H (coding systems with N-fold quantum numbers) has coherent states phi_N (labelled by the same classical phase space as the original coherent states, and orresponding to the N-fold highest weight) with inner product phi_N^*psi_N=(phi^*psi) N and for N --> inf, one gets a good classical limit. For the Heisenberg group, phi^*psi is a 1/hbar-th power, and the N-th power corresponds to replacing hbar by hbar/N. Thus one gets the standard classical limit. Basic literature on relations between coherent states and the classical limit, based on irreducible unitary representations of Lie groups includes the book A. M. Perelomov, Generalized Coherent States and Their Applications, Springer-Verlag, Berlin, 1986. and the paper L. Yaffe, Large N limits as classical mechanics, Rev. Mod. Phys. 54, 407--435 (1982) Both references assume that the Lie group is finite-dimensional and semisimple. This excludes the Heisenberg group, in terms of which the standard (Glauber) coherent states are usually defined. However, the Heisenberg group has a triangular decomposition, and this suffices to apply Perelomov's theory in spirit. The online book Arnold Neumaier, Dennis Westra, Classical and Quantum Mechanics via Lie algebras, http://lanl.arxiv.org/abs/0810.1019 contains a general discussion of the relations between classical mechanics and quantum mechanics, and discusses in Chapter 16 the concept of a triangular decomposition of Lie algebras and a summary of the associated representation theory (though in its present version not the general relation to coherent states). For other relevant approaches to a rigorous classical limit, see the online sources http://www.projecteuclid.org/Dienst/Repository/1.0/Disseminate/euclid.cmp/1103859040/body/pdf http://www.univie.ac.at/nuhag-php/bibtex/open_files/si80_SIMON!!!.pdf http://arxiv.org/abs/quant-ph/9504016 http://arxiv.org/pdf/math-ph/9807027 -------------------------------- S2a. Lie groups and Lie algebras -------------------------------- Lie groups can be illustrated by continuous rigid motion of a ball with painted patterns on it in 3-dimensional space. The Lie group ISO(3) consists of all rigid transformations. A rigid transformation is essentially the act of picking the ball and placing it somewhere else, ignoring the detailed motion in between and the location one started. Special transformations are for example a translation in northern direction by 1 meter, or a rotation by one quarter around the vertical axis at some particular point (think of a ball with a string attached). 'Rigid' means that the distances between marked points on the ball remains the same; the mathematician talks about 'preserving distances', and the distances are therefore labeled 'invariants'. One can repeat the same transformation several times, or two different transformations and get another one - This is called the product of these transformations. For example, the product of a translations by 1 meter and another one by 2 meters in the same direction gives one of 1+2=3 meters in the same direction. In this case, the distances add, but if one combines rotations about different axes the result is no longer intuitive. To make this more tractable for calculations, one needs to take some kind of logarithms of transformations - these behave again additively and make up the corresponding Lie algebra iso(3) [same letters but in lower case]. The elements of the Lie algebra can be visualized as very small, or 'infinitesimal', motions. General Lie groups and Lie algebras extend these notions to to more general manifolds. A manifold is just a higher-dimensional version of space, and transformations are generalized motions preserving invariants that are important in the manifold. The transformations preserving these invariants are also called 'symmetries', and the Lie group consisting of all symmetries is called a 'symmetry group'. The elements of the corresponding Lie algebra are 'infinitesimal symmetries'. For example, physical laws are invariant under rotations and translations, and hence unter all rigid motions. But not only these: If one includes time explicitly, the resulting 4-dimensional space has more invariant motions or ''symmetries''. The Lie group of all these symmetry transformations is called the Poincar'e group, and plays a basic role in the theory of relativity. The transformations are now about space-time frames in uniform motion. Apart from translations and rotations there are symmetries called 'boosts' that accelerate a frame in a certain direction, and combinations obtained by taking products. All infinitesimal symmetries together make up a Lie algebra, called the Poincar'e algebra. Much more on Lie groups and Lie algebras from the perspective of classical and quantum physics can be found in: Arnold Neumaier and Dennis Westra, Classical and Quantum Mechanics via Lie algebras, Cambridge University Press, to appear (2009?). http://www.mat.univie.ac.at/~neum/papers/physpapers.html#QML arXiv:0810.1019 ----------------------------------------------------------- S2b. The Galilei group as contraction of the Poincare group ----------------------------------------------------------- The group of symmetries of special relativity is the Poincare group. However, before Einstein invented the theory of relativity, physics was believed to follow Newton's laws, and these have a different group of symmetries - the Galilei group, and its infinitesimal symmetries form the Galilei algebra. Now Newton's physics is just a special case of the theory of relativity in which all motions are very slow compared to the speed of light. Physicists speak of the 'nonrelativisitic limit'. Thus one would expect that the Galilei group is a kind of nonrelativistic limit of the Poincar'e group. This notion has been made precise by Inonu. He looked at the Poincar'e algebra and 'contracted' it in an ingenious way to the Galilei algebra. The construction could then be lifted to the corresponding groups. Not only that, it turned out to be a general machinery applicable to all Lie algebras and Lie groups, and therefore has found many applications far beyond that for which it was originally developed. --------------------------------------------------------------------- S2c. Representations of the Poincare group, spin and gauge invariance --------------------------------------------------------------------- Whatever deserves the name ''particle'' must move like a single, indivisible object. The Poincare group must act on the description of this single object; so the state space of the object carries a unitary representation of the Poincare group. This splits into a direct sum or direct integral of irreducible reps. But splitting means divisibility; so in the indivisible case, we have an irreducible representation. Thus particles are described by irreducible unitary reps of the Poincare group. Additional parameters characterizing the irreducible representation of an internal symmetry group = gauge On the other hand, not all irreducible unitary reps of the Poincare group qualify. Associated with the rep must be a consistent and causal free field theory. As explained in Volume 1 of Weinberg's book on quantum field theory, this restricts the rep further to those with positive mass, or massless reps with quantized helicity. Weinberg's book on QFT argues for gauge invariance from causality + masslessness. He discusses massless fields in Chapter 5, and observes (probably there, or in the beginning of Chapter 8 on quantum electrodynamics) roughly the following: Since massless spin 1 fields have only two degrees of freedom, the 4-vector one can make from them does not transform correctly but only up to a gauge transformation making up for the missing longitudinal degree of freedom. Since sufficiently long range elementary fields (less than exponential decay) are necessarily massless, they must either have spin <=1/2 or have gauge behavior. To couple such gauge fields to matter currents, the latter must be conserved, which means (given the known conservation laws) that the gauge fields either have spin 1 (coupling to a conserved vector current), or spin 2 (coupling to the energy-momentum tensor). [Actually, he does not discuss this for Fermion fields, so spin 3/2 (gravitinos) is perhaps another special case.] Spin 1 leads to standard gauge theories, while spin 2 leads to general covariance (and gravitons) which, in this context, is best viewed also as a kind of gauge invariance. There are some assumptions in the derivation, which one can find out by reading Weinberg's papers Phys.Rev. 133 (1964), B1318-B1322 any spin (massive) Phys.Rev. 134 (1964), B882-B896 any spin II (massless) Phys.Rev. 135 (1964), B1049-B1056 grav. mass = inertial mass Phys.Rev. 138 (1965), B988-B1002 derivation of Einstein Phys.Rev. 140 (1965), B516-B524 infrared gravitons Phys.Rev. 181 (1969), 1893-1899 any spin III (general reps.) on 'Feynman rules for any spin' and some related questions, which contain a lot of important information about applying the irreducible representations of the Poincare group for higher spin to field theories, and their relation to gauge theories and general relativity. A perhaps more understandable version of part of the material is in D.N. Williams, The Dirac Algebra for Any Spin, Unpublished Manuscript (2003) http://www-personal.umich.edu/~williams/papers/diracalgebra.pdf Note that there are plenty of interactions that can be constructed using the representation theory of the Lorentz group (and Weinberg's constructions), and there are plenty of (compound) particles with spin >2. See the tables of the particle data group, e.g., Delta(2950) (randomly chosen from http://pdg.lbl.gov/2003/bxxxpdf.html ). R.L. Ingraham, Prog. Theor. Phys. 51 91974), 249-261, http://ptp.ipap.jp/link?PTP/51/249/ constructs covariant propagators and complete vertices for spin J bosons with conserved currents for all J. See also H Shi-Zhong et al., Eur. Phys. J. C 42 (2005), 375-389 http://www.springerlink.com/content/ww61351722118853/ ----------------------------------- S2d. Forms of relativistic dynamics ----------------------------------- Relativistic multiparticle mechanics is an intricate subject, and there are no-go theorems that imply that the most plausible possibilities cannot be realized. However, these no-go theorems depend on assumptions that, when questioned, allow meaningful solutions. The no-go theorems thus show that one needs to be careful not to introduce plausible but inappropriate intuition into the formal framework. To pose the problem, one needs to distinguish between kinematical and dynamical quantities in the theory. Kinematics answers the question "What are the general form and properties of objects that are subject to the dynamics?" Thus it tells one about conceivable solutions, mapping out the properties of the considered representation of the phase space (or what remains of it in the quantum case). Thus kinematics is geometric in nature. But kinematics does not know of equations of motions, and hence can only tell general (kinematical) features of solutions. In contrast, dynamics is based on an equation of motion (or an associated variational principle) and answers the question 'What characterizes the actual solution?', given appropriate initial or boundary conditions. Although the actual solution may not be available in closed form, one can discuss their detailed properties and devise numerical approximation schemes. The difference between kinematical and dynamical is one of convention, and has nothing to do with the physics. By choosing the representation, i.e., the geometric setting, one chooses what is kinematical; everything else is dynamical. Since something which is up to the choice of the person describing an experiment can never be distinguished experimentally, the physics is unaffected. However, the formulas look very different in different descriptions, and - just as in choosing coordinate systems - choosing a form adapted to a problem may make a huge difference for actual computations. Dirac distinguishes in his seminal paper Rev. Mod. Phys. 21 (1949), 392-399 three natural forms of relativistic dynamics, the instant form, the point form, and the fromt form. They are distinguished by what they consider to be kinematical quantities and what are the dynamical quantities. The familiar form of dynamics is the instant form, which treats space (hence spatial translations and rotations) as kinematical and time (and hence time translation and Lorentz boosts) as dynamical. This is the dynamics from the point of view of a hypothetical observer (let us call it an 'instant observer') who has knowledge about all information at some time t (the present), and asks how this information changes as time proceeds. Because of causality (the finite bound of c on the speed of material motion and communication), the resulting differential equations should be symmetric hyperbolic differential equations for which the initial-value problem is well-posed. Because of Lorentz invariance, the time axis can be any axis along a timelike 4-vector, and (in special relativity) space is the 3-space orthogonal to it. For a real observer, the natural timelike vector is the momentum 4-vector of the material system defining its reference frame (e.g., the solar system). While very close to the Newtonian view of reality, it involves an element of fiction in that no real observer can get all the information needed as intial data. Indeed, causality implies that it is impossible for a physical observer to know the present anywhere except at its own position. A second, natural form of relativistic dynamics is, according to Dirac, the point form. This is the form of dynamics in which a particular space-time point x=0 (the here and now) in Minkowski space is distinguished, and the kinematical object replacing space is, for fixed L, a hyperboloid x^2=L^2 (and x_0<0) in the past of the here and now. The Lorentz transformations, as symmetries of the hyperboloid, are now kinematical and take the role that space translations and rotations had in the instant form. On the other hand, _all_ space and time translations are now dynamical, since they affect the position of the here-and-now. This is the form of dynamics which is manifestly Lorentz invariant, and in which space and time appear on equal footing. An observer in the here and now (let us call it a 'point observer') can - in principle, classically - have arbitrarily accurate information about the particles and/or fields on the past hyperboloid; thus causality is naturally accounted for. Information given on the past hyperboloid of a point can be propagated to information on any other past hyperboloid using the dynamical equations that are defined via the momentum 4-vector P, which is a 4-dimensional analogue of the nonrelativistic Hamiltonian. The Hamiltonian corresponding to motion in a fixed timelike direction u is given by H=u dot P. The commutativity of the components of P is the condition for the uniqueness of the resulting state at a different point x independent of the path x is reached from 0. In principle, there are many other forms of relativistic dynamics: As Dirac mentions on p. 396 of his paper, any 3-dimensional surface in Minkowski space works as kinematical space if it meets every world line with time like tangents exactly once. In general, those transformations are kinematical which are also symmetries of the surface one treats as kinematical reference surface. By choosing a surface without symmetries _all_ transformations become dynamical. For reasons of economy, one wants however, a large kinematical symmetry group. The full Poincare group is possible only for free dynamics. This leaves as interesting large subgroups two with 6 linearly independent generators, the Euclidean group ISO(3), leading to the instant form, and the Lorentz group SO(1,3), leading to the point form, and one with 7 linearly independent generators, the stabilizer of a front (or infinite momentum plane), a 3-space with lightlike normal, leading to the front form. This third natural form of relativistic dynamics according to Dirac, has many uses in quantum field theory, but here I won't discuss it further. All forms are equivalent, related classically by canonical transformations preserving algebraic operations and the Poisson bracket, and quantum mechanically by unitary transformations preserving algebraic operations and hence the commutator. This means that any statement about a system in one of the forms can be translated into an equivalent statement of an equivalent system in any of the other forms. Preferences are therefore given to one form over the other depending solely on the relative simplicity of the computations one wants to do. This is completely analogous to the choice of coordinate systems (cartesian, polar, cylindric, etc.) in classical mechanics. For a multiparticle theory, however, the different forms and the need to pick a particular one seem to give different pictures of reality. This invites paradoxes if one is not careful. This can be seen by considering trajectories of classical relativistic many-particle systems. There is a famous theorem by Currie, Jordan and Sudarshan Rev. Mod. Phys. 35 (1963), 350-375 which asserts that interacting two-particle systems cannot have Lorentz invariant trajectories in Minkowski space. Traditionally, this was taken by mainstream physics as an indication that the multiparticle view of relativistic mechanics is inadequate, and a field theoretical formulation is essential. However, as time proceeded, several approaches to valid relativistic multi-particle (quantum) dynamics were found (see the FAQ entry on 'Is there a multiparticle relativistic quantum mechanics?'), and the theorem had the same fate as von Neumann's proof that hidden-variable theories are impossible. Both results are now simply taken as an indication that the assumptions under which they were made are too strong. In particular, once the assumption by Currie, Jordan and Sudarshan that all observers see the same trajectories of a system of interacting particles is rejected, their no-go theorem no longer applies. The question then is how to find a consistent and covariant description without this at first sight very intuitive property. But once it is admitted that different observers see the same world but represented in different personal spaces, the formerly intuitive property becomes meaningless. For objectivity, it is enough that one can consistently translate the views of any observer into that of any other observer. Precisely this is the role of the dynamical Poincare transformations. Thus nothing forbids an instant observer to observe particle trajectories in its present space, or a point observer to observe particle trajectories in its past hyperboloid. However, the present space (or the past hyperboloid) of two different observers is related not by kinematical transforms but dynamically, with the result that trajectories seen by different observers on their different kinematical 3-surface look different. Classically, this looks strange on first sight, although the Poincare group provides well-defined recipes for translating the trajectories seen by one observer into those seen by another observer. Quantum mechanically, trajectories are fuzzy anyway, due to the uncertainty principle, and as various successful multiparticle theories show, there is no mathematical obstacle for such a description. The mathematical reason of this superficially paradoxical situation lies in the fact that there is no observer-independent definition of the center of mass of relativistic particles, and the related fact that there is no observer-independent definition of space-time coordinates for a multiparticle system. The best one can do is to define either a covariant position operator whose components do not commute (thus definig a noncommutative space-time), or a spatial position operator, the so-called Newton-Wigner position operator, which has three commuting coordinates but is observer-dependent. (See the FAQ entry on 'Localization and position operators'.) ------------------------------------------------------------- S2e. Is there a multiparticle relativistic quantum mechanics? ------------------------------------------------------------- In his QFT book, Weinberg says no, arguing that there is no way to implement the cluster separation property. But in fact there is: There is a big survey by Keister and Polyzou on the subject B.D. Keister and W.N. Polyzou, Relativistic Hamiltonian Dynamics in Nuclear and Particle Physics, in: Advances in Nuclear Physics, Volume 20, (J. W. Negele and E.W. Vogt, eds.) Plenum Press 1991. www.physics.uiowa.edu/~wpolyzou/papers/rev.pdf that covered everything known at that time. This survey was quoted at least 116 times, see http://www.slac.stanford.edu/spires/find/hep?c=ANUPB,20,225 looking these up will bring you close to the state of the art on this. They survey the construction of effective few-particle models. There are no singular interactions, hence there is no need for renormalization. The models are _not_ field theories, only Poincare-invariant few-body dynamics with cluster decomposition and phenomenological terms which can be matched to approximate form factors from experiment or some field theory. (Actually many-body dynamics also works, but the many particle case is extremely messy.) They are useful phenomenological models, but somewhat limited; for example, it is not clear how to incorporate external fields. The papers by Klink at http://www.physics.uiowa.edu/~wklink/ and work by Polyzou at http://www.physics.uiowa.edu/~wpolyzou/ contain lots of multiparticle relativistic quantum mechanics, applied to real particles. See also the Ph.D. thesis by Krassnigg at http://physik.uni-graz.at/~ank/dissertation-f.html (Other work in this direction includes Dirac's many-time quantum theory, with a separate time coordinate for each particle; see, e.g., Marian Guenther, Phys Rev 94, 1347-1357 (1954) and references there. Related multi-time work was done under the name of 'proper time quantum mechanics' or 'manifestly covariant quantum mechanics', see, e.g., L.P. Horwitz and C. Piron, Helv. Phys. Acta 48 (1973) 316, but it does not reproduce standard physics, and apparently never reached a stage useful to phenomenology.) Note that in the working single-time approaches, covariance is always achieved through a representation of the Poincare group on a Hilbert space corresponding to a fixed time (or another 3D manifold in space-time), rather than through multiple times. Thus the whole theory has a single time only, whose dynamics is generated by the Hamiltonian, the generator H=P_0 of the Poincare group. (This is completely analogous to the nonrelativistic case, where multiparticle systems also have a single time only.) The natural manifestly covariant picture is that of a vector bundle on Minkowski space-time, with a standard Fock space attached to each point. An observer (i.e., formally, an orthonormal frame attached at some space-time point) moves in space-time via the Poincare group, and this action extends to the bundle by means of the representation defining the Fock space. ---------------------- S2f. What is a photon? ---------------------- According to quantum electrodynamics, the most accurately verified theory in physics, a photon is a single-particle excitation of the free quantum electromagnetic field. More formally, it is a state of the free electromagnetic field which is an eigenstate of the photon number operator with eigenvalue 1. The pure states of the free quantum electromagnetic field are elements of a Fock space constructed from 1-photon states. A general n-photon state vector is an arbitrary linear combinations of tensor products of n 1-photon state vectors; and a general pure state of the free quantum electromagnetic field is a sum of n-photon state vectors, one for each n. If only the 0-photon term contributes, we have the dark state, usually called the vacuum; if only the 1-photon term contributes, we have a single photon. A single photon has the same degrees of freedom as a classical vacuum radiation field. Its shape is characterized by an arbitrary nonzero real 4-potential A(x) satisfying the free Maxwell equations, which in the Lorentz gauge take the form nabla dot nabla A(x) = 0, nabla dot A(x) = 0, expressing the zero mass and the transversality of photons. Thus for every such A there is a corresponding pure photon state |A>. Here A(x) is _not_ a field operator but a photon amplitude; photons whose amplitude differ by an x-independent phase factor are the same. For a photon in the normalized state |A>, the observable electromagnetic field expectations are given by the usual formulas relating the 4-potential and the fields, <\E(x)> = = - partial \A(x)/partial x_0 - c nabla_\x A_0(x), and <\B(x)> = = nabla_\x x \A(x) [hmmm. check if this really is the case...] Here \x (fat x) and x_0 are the space part and the time part of a relativistic 4-vector, \E(x), \B(x) are the electromagnetic field operators (related to the operator 4-potential by analogous formulas), and c is the speed of light. Amplitudes A(x) producing the same \E(x) and \B(x) are equivalent and related by a gauge transformation, and describe the same photon. In momentum space (frequently but not always the appropriate choice), single photon states have the form |A> = integral d\p^3/p_0 A(\p)|\p>, where |\p> is a single particle state with definite 3-momentum \p (fat p), p_0=|\p| is the corresponding photon energy divided by c, and the photon amplitide A(\p) is a polarization 4-vector. Thus a general photon is a superposition of monochromatic waves with arbitrary polarizations, frequencies and directions. (The Fourier transform of A(\p) is the so-called analytic signal A^(+)(x), and by adding its complex conjugate one gets the real 4-potential A(x) in the Lorentz gauge.) The photon amplitude A(\p) can be regarded as the photon's wave function in momentum space. Since photons are not localizable (though they are localizable approximately), there is no meaningful photon wave function in coordinate space; see the next entry in this FAQ. One could regard the 4-potential A(x) as coordinate space wave function, but because of its gauge dependence, this is not really useful. [ This is second quantized notation, as appropriate for quantum fields. This is how things always look in second quantization, even for a harmonic oscillator. The wave function psi(x) or psi(p) in standard (first quantized) quantum mechanics becomes the state vector psi = integral dx psi(x) |x> or integral dp psi(p) |p> in Fock space; the wave function at x or p turns into the coefficient of |x> or |p>. In quantum field theory, x, A (the photon amplitude), and E(x) (the electric field operator) correspond to k (a component of the momentum), x, and p_k. Thus the coordinate index k is inflated to the spacetime position x, the argument of the wave function is inflated to a solution of the free Maxwell equations, the momentum operator is inflated to a field operator, and the integral over x becomes a functional integral over photon amplitudes, psi = integral dA psi(A) |A>. Here psi(A) is the most general state vector in Fock space; for a single photon, psi depends linearly on A, psi(A) = integral d\p^3/p_0 A(\p)|\p> = |A>. Observable electromagnetic fields are obtained as expectation values of the field operators \E(x) and \B(x) constructed by differentiation of the textbook field operator A(x). As the observed components of the mean momentum, say, in ordinary quantum mechanics are = integral dx psi(x)^* p_k psi(x), so the observed values of the electromagnetic field are <\E(x)> = = integral dA psi(A)^* \E(x) psi(A). <\B(x)> = = integral dA psi(A)^* \B(x) psi(A). ] In a frequently used interpretation (valid only approximately), the term A(\p)|\p> represents the one-photon part of a monochromatic beam with frequency nu=cp_0/h, direction \n(\p)=\p/p_0, and polarization determined by A(\p). Here h = 2 pi hbar, where hbar is Planck's number; omega=cp_0/hbar is the angular frequency. The polarization 4-vector A(\p) is orthogonal to the 4-momentum p composed of p_0 and \p, obtained by a Fourier transform of the 4-potential A(x) in the Lorentz gauge. (The wave equation translates into the condition p_0^2=\p^2, causality requires p_0>0, hence p_0=|\p|, and orthogonality p dot A(\p) = 0 expresses the Lorentz gauge condition. For massless particles, there remains the additional gauge freedom to shift A(\p) by a multiple of the 4-momentum p, which can be used to fix A_0=0.) A(\p) is usually written (in the gauge with vanishing time component) as a linear combination of two specific polarization vectors eps^+(p) and eps^-(p) for circularly polarized light (corresponding to helicities +1 and -1), forming together with the direction vector \n(\p) an orthonormal basis of complex 3-space. In particular, eps^+(p) eps^+(p)^* + eps^-(p)eps^-(p)^* + \n(\p)\n(\p)^* = 1 is the 3x3 identity matrix. (This is used in sums over helicities for Feynman rules.) Specifically, eps^+(p) and eps^-(p) can be obtained by finding normalized eigenvectors for the eigenvalue problem [check. The original eigenvalue problem is p dot J eps = lambda eps.] p x eps = lambda eps with lambda = +-i|p|. For example, if p is in z-direction then eps^+(p) = (1, -i, 0)/sqrt(2), eps^-(p) = (i, -1, 0)/sqrt(2), and the general case can be obtained by a suitable rotation. An explicit calculation gives almost everywhere eps^+(p) = u(p)/p_0 where p_0=|p| and u_1(p) = p_3 - i p_2 p'/p'', u_2(p) = -i p_3 - i p_1 p'/p'' u_3(p) = p' with p' = p_1+ip_2, p''= p_3+p_0. [what is eps^-(p)?] These formulas become singular along the negative p_3-axis, so several charts are needed to cover For experiments one usually uses nearly monochromatic light bundled into narrow beams. If one also ignores the directions (which are usually fixed by the experimental setting, hence carry no extra information), then only the helicity degrees of freedom remain, and the 1-photon part of the beam behaves like a 2-level quantum system ('a single spin'). A general monochromatic beam with fixed direction in a pure state is given by a second-quantized state vector, which is a superposition of arbitrary multiphoton states in the Bosonic Fock space generated by the two helicity degrees of freedom. This is the basis for most quantum optics experiments probing the foundations of quantum mechanics. The simplest state of light (generated for example by lasers) is a coherent state, with state vector proportional to e(A) = |vac> + |A> + 1/sqrt(2!) |A> tensor |A> + 1/sqrt(3!) |A> tensor |A> tensor |A> + ... where |A> is a one-photon state. Thus coherent states also have the same degrees of freedom as classical electromagnetic radiation. Indeed, light in coherent states behaves classically in most respects. At low intensity, the higher order terms in the expansion are negligible, and since the vacuum part is not directly observable, a low intensity coherent states resembles a single photon state. On the other hand, true single photon states are very hard to produce to good accuracy, and were created experimentally only recently: B.T.H. Varcoe, S. Brattke, M. Weidinger and H. Walther, Preparing pure photon number states of the radiation field, Nature 403, 743--746 (2000). see also http://www.qis.ucalgary.ca/quantech/fock.html Ordinary light is essentially never, and high-tech light almost never, describable by single photons. A good informal discussion of what a photon is from a more practical perspective was given by Paul Kinsler in http://www.lns.cornell.edu/spr/2000-02/msg0022377.html But this does not tell the whole story. An interesting collection of articles explaining different current views is in The Nature of Light: What Is a Photon? Optics and Photonics News, October 2003 http://www.osa-opn.org/Content/ViewFile.aspx?Id=3185 Further discussion is given in the section ''Coherent states of light as ensembles'' of the present FAQ. The standard reference for quantum optics is L. Mandel and E. Wolf, Optical Coherence and Quantum Optics, Cambridge University Press, 1995. Mandel and Wolf write (in the context of localizing photons), about the temptation to associate with the clicks of a photodetector a concept of photon particles. [If there is interest, I can try to recover the details.] The wording suggests that one should resist the temptation, although this advice is usually not heeded. However, the advice is sound since a photodetector clicks even when it detects only classical light! This follows from the standard analysis of a photodetector, which treats the light classically and only quantizes the detector. Thus the clicks are an artifact of photodetection caused by the quantum nature of matter, rather than a proof of photons arriving!!! A coherent light source (laser) produces a coherent state of light, which is a superposition of the vacuum state, a 1-photon state, a 2-photon state, etc, with squared amplitudes given by a Poisson distribution. At low intensity, this is misinterpreted in practice as random single photons arriving at the end of the beam in a random Poisson process, because the photodetector produces clicks according to this distribution. Incoherent light sources usually consist of thermal mixtures and produce other distributions, but otherwise the description (and misinterpretation) is the same. Nevertheless, one must understand this misinterpretation in order to follow much of the literature on quantum optics. Thus the talk about photons is usually done inconsistently; almost everything said in the literature about photons should be taken with a grain of salt. There are even people like the Nobel prize winner Willis E. Lamb (the discoverer of the Lamb shift) who maintain that photons don't exist. See towards the end of http://web.archive.org/web/20040203032630/www.aro.army.mil/phys/proceed.htm The reference mentioned there at the end appeared as W.E Lamb, Jr., Anti-Photon, Applied Physics B 60 (1995), 77--84 This, together with the other reference mentioned by Lamb, is reprinted in W.E Lamb, Jr., The interpretation of quantum mechanics, Rinton Press, Princeton 2001. I think the most apt interpretation of an 'observed' photon as used in practice (in contrast to the photon formally defined as above) is as a low intensity coherent state, cut arbitrarily into time slices carrying an energy of h*nu = hbar*omega, the energy of a photon at frequency nu and angular frequency omega. Such a state consists mostly of the vacuum (which is not directly observable hence can usually be neglected), and the contributions of the multiphoton states are negligible compared to the single photon contribution. With such a notion of photon, most of the actual experiments done make sense, though it does not explain the quantum randomness of the detection process (which comes from the quantized electrons in the detector). A nonclassical description of the electromagnetic field where states of light other than coherent states are required is necessary mainly for special experiments involving recombining split beams, squeezed state amplification, parametric down-conversion, and similar arrangements where entangled photons make their appearance. There is a nice booklet on this kind of optics: U. Leonhardt, Measuring the Quantum State of Light, Cambridge, 1997. Nonclassical electromagnetic fields are also relevant in the scattering of light, where there are quantum corrections due to multiphoton scattering. These give rise to important effects such as the Lamb shift, which very accurately confirm the quantum nature of the electromagnetic field. They involve no observable photon states, but only virtual photon states, hence they are unrelated to experiments involving photons. Indeed, there is no way to observe virtual particles, and their name was chosen to reflect this. (Observed particles are always onshell, hence massless for photons, whereas it is an easy exercise that the virtual photon mediating electromagnetic interaction of two electrons in the tree approximation is never onshell.) ------------------------------------------------- S2g. Particle positions and the position operator ------------------------------------------------- The standard probability interpretation for quantum particles is based on the Schr"odinger wave function psi(x), a square integrable single- or multicomponent function of position x in R^3. Indeed, with ^* denoting the conjugate transpose, rho(x) := psi(x)^*psi(x) is generally interpreted as the probability density to find (upon measurement) the particle at position x. Consequently, Pr(Z) := integral_Z dx |psi(x)|^2 is interpreted as the probability of the particle being in the open subset Z of position space. Particles in highly localized states are then given by wave packets which have no appreciable size |psi(x)| outside some tiny region Z. If the position representation in the Schr"odinger picture exists, there is also a vector-valued position operator x, whose components act on psi(x) by multiplication with x_j (j=1,2,3). In particular, the components of x commute, satisfy canonical commutation relations with the conjugate momentum p = -i hbar partial_x, and transform under rotations like a 3-vector, so that the commutation relations with the angular momentum J take the form [J_j,x_k] = i eps_{jkl} x_l. Moreover, in terms of the (unnormalizable) eigenstates |x,m> of the position operator correponding to the spectral value x (and a label m to distinguish multiple eigenstates) we can recover the position representation from an arbitrary representation by defining psi(x) to be the vector with components psi_m(x) := . Therefore, if we have a quantum system defined in an arbitrary Hilbert space in which a momentum operator is defined, the necessary and sufficient condition for the existence of a spatial probability interpretation of the system is the existence of a position operator with commuting components which satisfy standard commutation relations with the components of the momentum operator and the angular momentum operator. Thus we have reduced the existence of a probability interpretation for particles in a bounded region of space to the question of the existence of a position operator with the right properties. We now investigate this existence problem for elementary particles, i.e., objects represented by an irreducible representation of the full Poincare group. We consider first the case of particles of mass m>0, since the massless case needs additional considerations. A. Massive case, m>0: Let M := R^3 be the manifold of 3-momenta p. On the Hilbert space H_m^d obtained by completion of the space of all C^infty functions with compact support from M to the space C^d of d-component vectors with complex entries, with inner product defined by := integral d\p/sqrt(p^2+m^2) phi(p)^*psi(p), we define the position operator q := i hbar partial_p, which satisfies the standard commutation relations, the momentum in time direction, p_0 := sqrt(m^2+|p|^2), where m>0 is a fixed mass, and the operators J := q x p + S, K := (p_0 q + q p_0)/2 + p x S/(m+p_0), where S is the spin vector in a unitary representation of so(3) on the vector space C^d of complex vectors of length d, with the same commutation relations as J. This is a unitary representation of the Poincare algebra; verification of the standard commutation relations (given, e.g., in Weinberg's Volume 1, p.61) is straightforward. It is not difficult to show that this representation is irreducible and extends to a representation of the full Poincare group. Obviously, this representation carries a position operator. Since the physical irreducible representations of the Poincare group are uniquely determined by mass and spin, we see that in the massive case, a position operator must always exist. An explicit formula in terms of the Poincare generators is obtained through division by m in the formula mq = K - ((K dot p) p/p_0 + J x p)/(m+p_0), which is straightforward, though a bit tedious to verify from the above. That there is no other possibility follows from T.F. Jordan Simple derivation of the Newton-Wigner position operator J. Math. Phys. 21 (1980), 2028-2032. Note that the position operator is always observer-dependent, in the sense that one must choose a timelike unit vector to distinguish space and time coordinates in the momentum operator. This is due to the fact that the above construction is not invariant under Lorentz boosts (which give rise to equivalent but different representations). Note also that in case of the Dirac equation, the position operator is _not_ the operator multiplying a solution psi(x) of the Dirac equation by the spacelike part of x (which would mix electron and positron states), but a related operator obtained by first applying a so-called Foldy-Wouthuysen transformation. L.L. Foldy and S.A. Wouthuysen, On the Dirac Theory of Spin 1/2 Particles and Its Non-Relativistic Limit, Phys. Rev. 78 (1950), 29-36. B. Massless case, m=0: Let M_0 := R^3\{0} be the manifold of nonzero 3-momenta p, and let p_0 := |p|, n := p/p_0. The Hilbert space H_0^d (defined as before but now with m=0 and with M_0 in place of M) obtained by completion of the space of all C^infty functions with compact support from M to the space C^d of d-component vectors with complex entries, with inner product defined by := integral d\p/sqrt(p^2+m^2) phi(p)^*psi(p), carries a natural massless representation of the Poincare algebra, defined by J := q x p + S, K := (p_0 q + q p_0)/2 + n x S, where q = i hbar partial_p is the position operator, and S is the spin vector in a unitary representation of so(3) on C^d, with the same commutation relations as J. Again, verification of the standard commutation relations is straightforward. (Indeed, this representation is the limit of the above massive representation for m --> 0.) It is easily seen that the helicity lambda := n dot S is central in the (suitably completed) universal envelope of the Lie algebra, and that the possible eigenvalues of the helicity are s,s-1,...,-s, where s=(d-1)/2. Therefore, the eigenspaces of the helicity operator carry by restriction unitary representations of the Poincare algebra, which are easily seen to be irreducible. They extend to a representation of the connected Poincare group. Moreover, the invariant subspace H_s formed by the direct sum of the eigenspaces for helicity s and -s form a massless irreducible spin s representation of the full Poincare group. (It is easy to see that changing K to K-t(p_0)p for an arbitrary differentiable function t of p_0 preserves all commutation relations, hence gives another representation of the Poincare algebra. Since the massless irreducible representations of the Poincare group are uniquely determined by their spin, the resulting representations are equivalent. This corresponds to the freedom below in choosing a position operator.) Now suppose that a Poincare invariant subspace H of L^2(M_0)^d has a position operator x satisfying the canonical commutation relations with p and the above commutator relations with J. Then F=q-x commutes with p, hence its components must be a (possibly matrix-valued) function F(p) of p. Commutation with p implies that partial_p x F = 0, and, since M_0 is simply connected, that F is the gradient of a scalar function f. Rotation invariance then implies that this function depends only on p_0=|p|. Thus F = partial_p f(p_0) = f'(p_0) n. Thus the position operator takes the form x = q - f'(p_0) n. In particular, x x p = q x p. Now the algebra of linear operators on the dense subspace of C^infty functions in H contains the components of p, J, K and x, hence those of J - x x p = J - q x p = S. Thus the (p-independent) operators from the spin so(3) act on H. But this implies that either H=0 (no helicity) or H = L^2(M_0)^d (all helicities between s and -s). Since the physical irreducible representations of the Poincare group are uniquely determined by mass and spin, and for s>1/2, the spin s Hilbert space H_s is a proper, nontrivial subspace of L^2(M_0)^d, we proved the following theorem: Theorem. An irreducible representations of the full Poincare group with mass m>=0 and finite spin has a position operator transforming like a 3-vector and satisfying the canonical commutation relations if and only if either m>0 or m=0 and s<=1/2 (but s=0 if only the connected poincare group is considered). This theorem was announced without giving details in T.D. Newton and E.P. Wigner, Localized states for elementary systems, Rev. Mod. Phys. 21 (1949), 400-406. A mathematically rigorous proof was given in A. S. Wightman, On the Localizability of Quantum Mechanical Systems, Rev. Mod. Phys. 34 (1962), 845-872. See also T.F. Jordan Simple proof of no position operator for quanta with zero mass and nonzero helicity J. Math. Phys. 19 (1980), 1382-1385. who also considers the massless representations of continuous spin, and D Rosewarne and S Sarkar, Rigorous theory of photon localizability, Quantum Opt. 4 (1992), 405-413. For spin 1, the case relevant for photons, we have d=3, and the subspace of interest is the space H obtained by completion of the space of all vector-valued C^infty functions A(p) of a nonzero 3-momentum p with compact support satisfying the transversality condition p dot A(p)=0, with inner product defined by := integral dp/|p| A(p)^* A'(p). It is not difficult to see that one can identify the wave functions A(p) with the Fourier transform of the vector potential in the radiation gauge where its 0-component vanishes. This relates the present discussion to that given in the FAQ entry ''What is a photon?''. As a consequence of our discussion, photons (m=0, s=1) and gravitons (m=0, s=2) cannot be given natural probabilities for being in any given bounded region of space. Chiral spin 1/2 particles also do not have a position operator and hence have no such probabilities, by the same argument, applied to the connected Poincare group. (Note that measured are only frequencies, intensities and S-matrix elements; these don't need a well-defined position concept but only a well-defined momentum concept, from which frequencies can be found via omega=p_0/hbar - since c=1 in the present setting, and directions via n = p/p_0.) However, assuming there are scalar massless Higgs particles (s=0), one could combine such a higgs, a photon, and a graviton into a single reducible representation on L^2(M_0)^5, using the above construction. By our derivation, one can find position eigenstates which are superpositions of Higgs, photon, and graviton. Thus to be able to regard photons and gravitons as particles with a proper probability interpretation, one must consider Higgs, photons, and gravitons as aspects of the same localizable particle, which we might call a graphoton. (Without gravity, a phiggs particle would also do.) If the concept of an observable is not tied to that of a Hermitian operator but rather to that of a POVM (positive operator-valued measure), there is more flexibility, and covariant POVMs for positon measurements can be meaningfully defined, even for photons. See, e.g., A. Peres and D.R. Terno, Quantum Information and Relativity Theory, Rev. Mod. Phys. 76 (2004), 93. [see, in particular, (52)] K. Kraus, Position observable of the photon, in: The Uncertainty Principle and Foundations of Quantum Mechanics, Eds. W. C. Price and S. S. Chissick, John Wiley & Sons, New York, pp. 293-320, 1976. M. Toller, Localization of events in space-time, Phys. Rev. A 59, 960 (1999). P. Busch, M. Grabowski, P. J. Lahti, Operational Quantum Physics, Springer-Verlag, Berlin Heidelberg 1995, pp.92-94. Note that a POVM describes the statistics of a particular measurement process rather than some underlying reality. This is reflected in the fact that there are many possible nonequivalent possible definitions of POVMs, pertaining to the possible different ways to get a measured position. Therefore, the concept of a photon position is necessarily subjective, since it depends on the POVM used, hence on the way the measurement is performed. It does not describe something objective. The POVM does not allow one to talk about the position of a photon - which could exist only if the corresponding operator existed -, but only about the measured position: The photon is somewhere near the range of values established by the measurement, without any more definite statement being possible. On the other hand, for observables corresponding to Hermitian operators, there are states in which a definite statement is (at least theoretically) possible that the observable has a value in a given range. Papers related to position operators: M.H.L. Pryce, Commuting Co-ordinates in the new field theory, Proc. Roy. Soc. London Ser. A 150 (1935), 166-172. (first construction of position operators in the massive case) B. Bakamjian and L.H. Thomas, Relativistic Particle Dynamics. II, Phys. Rev. 92 (1953), 1300-1310. (first construction of massive representations along the above lines) L.L. Foldy, Synthesis of Covariant Particle Equations, Physical Review 102 (1956), 568-581. (nice and readable version of the Bakamjian-Thomas construction for massive representations of the Poincare group) R. Acharya and E. C. G. Sudarshan, ''Front'' Description in Relativistic Quantum Mechanics, J. Math. Phys. 1 (1960), 532-536. (a ''most local'' description of the photon by wave fronts) I. Bialynicki-Birula, Photon wave function, http://arxiv.org/abs/quant-ph/0508202 (A 53 page recent review article, covering various possibilities to define photon wave functions without a position operator acting on them. The best is (3.5), with a nonstandard inner product (5.8). What is left of the probability interpretation is (5.28) and its subsequent discussion.) See also the entry ''Localization and position operators'' in this FAQ. There are a few papers by M. Hawton, e.g. http://arxiv.org/abs/quant-ph/0101011 http://arxiv.org/abs/0711.0112v1 on a nonstandard position operator which does not transform like a 3-vector. This is unphysical since it does not give orientation independent probabilities for observing a photon in a given region of space. Claims to the contrary in http://lanl.arxiv.org/pdf/0804.3773v2, supposedly constructing a Lorentz invariant photon number density, are erroneous; see http://groups.google.at/group/sci.physics.research/browse_thread/thread/815435df4bf2ea93?hl=en# Other nonstandard position operators violating the conditions necessary for a probability interpretation were discussed earlier, starting with M.H.L. Pryce, The Mass-Centre in the Restricted Theory of Relativity and Its Connexion with the Quantum Theory of Elementary Particles, Proc. Roy. Soc. London, Ser. A, 195 (1948), 62-81. ---------------------------------------- S2h. Localization and position operators ---------------------------------------- Position operators are part of the toolkit of relativistic quantum mechanics. In a relativistic setting, one always has a representation of the Poincare algebra. From the generators of the Poincare algebra (namely the 4-momentum p, the angular momentum \J, and the boost generators \K) one can make up (in massive representations) a nonlinear expression for a 3-dimensional \x (the position operator) that together with the space part \p of the 4-momentum has canonical commutation rules and hence gives a Heisenberg algebra. (The backslash is a convenient ascii notation to indicate bold face letters, corresponding to 3-vectors.) The position operator so constructed is unique, once the time coordinate is fixed, and is usually called the Newton-Wigner position operator, although it appears already in earlier work of Pryce. Relevant applications are related to the names Foldy and Wuythousen (for their transform of the Dirac equation, widely used in relativistic quantum chemistry) and Bakamjian and Thomas (for their relativistic multi-particle theories); both groups rediscovered the Newton-Wigner results independently, not being aware of their work. That the time coordinate has to be fixed means that the position operator is observer-dependent. Each observer splits space-time into its personal time (in direction of its total 4-momentum) and personal 3-space (orthogonal to it), and the position operator relates to this 3-space. By a Lorentz transformation, one can transform the 4-momentum to the vector (E_obs 0 0 0), which makes time the 0-component. Most papers on the subject work in the latter setting. For massless representations of spin >1/2, the construction breaks down. This is related to the fact that massless particles with spin >1/2 don't have modes of all helicities allowed by the spin (e.g., photons have spin 1 but no longitudinal modes), which makes them being always spread out, and hence not completely localizable. For details, see the FAQ entry ''Particle positions and the position operator'' Here are a few references: J.P. Costella and B.H.J. McKellar, The Foldy-Wouthuysen transformation, arXiv:hep-ph/9503416 * This paper discusses the physical relevance of the Newton-Wigner representation, and its relation to the Foldy-Wouthuysen transformation T. D. Newton, E. P. Wigner, Localized States for Elementary Systems, Rev. Mod. Phys. 21 (1949) 400-406 * The original paper on localization L. L. Foldy and S. A. Wouthuysen, On the Dirac Theory of Spin 1/2 Particles and Its Non-Relativistic Limit, Phys. Rev. 78 (1950), 29-36. * On the transform of the Dirac equation now carrying the author's name B. Bakamjian and L. H. Thomas Relativistic Particle Dynamics. II Phys. Rev. 92 (1953), 1300-1310. and related papers in Phys. Rev. 85 (1952), 868-872. Phys. Rev. 121 (1961), 1849-1851. * First constructive papers on relativistic multiparticle dynamics, based on a 3D position operator L. L. Foldy, Synthesis of Covariant Particle Equations, Phys. Rev. 102 (1956), 568-581 * A lucid exposition of Poincare representations which start with a 3D position operator, and a discussion of electron localization Before eq. (189), he notes that an observer-independent localization of a Dirac electron (which generally is considered to be a pointlike particle since it can be exactly localized in a given frame) necessarily leaves a fuzziness of the order of the Compton wavelength of the particle. (This is also related to the so-called Zitterbewegung, see, e.g., the discussion in Chapter 7 of Paul Strange's "Relativistic Quantum Mechanics".) A. S. Wightman, On the Localizability of Quantum Mechanical Systems, Rev. Mod. Phys. 34 (1962) 845-872 * A group theoretic view in terms of systems of imprimitiviy T. O. Philips, Lorentz invariant localized states, Phys. Rev. 136 (1964), B893-B896. * A covariant coherent state alternative which does not require to single out a time coordinate V. S. Varadarajan, Geometry of Quantum Theory (second edition), Springer, 1985 * A book discussing some of this stuff L. Mandel and E. Wolf, Optical Coherence and Quantum Optics, Cambridge University Press, 1995. * The bible on quantum optics, a thick but very useful book. Relevant here since it contains a good discussion of the localizability of photons (which can be done only approximately, in view of the above) from a reasonably practical point of view. G.N. Fleming, Reeh-Schlieder meets Newton-Wigner http://philsci-archive.pitt.edu/archive/00000649/ * This paper gives some relations to quantum field theory ------------------------------------------------------------ S2i. Position operators in relativistic quantum field theory ------------------------------------------------------------ In relativistic quantum field theory in its usually given form, position is promoted to the same status as time, and hence becomes a parameter in the quantum field, while in quantum mechanics it is an operator vector. This poses the question of whether there is a position operator in relativistic quantum field theory. Many people think that there is none. But even though there is a parameter called x and referred to as 4-dimensional position, there is also an vector defining a 3-dimensional position operator, provided the relativistic system under consideration is not massless. Indeed, any relativistic theory possesses the Poincare group as a symmetry group, whose infinitesimal generators satisfy the standard commutation rules of the Poincare algebra. But given these, the standard construction by Newton and Wigner gives (in each Lorentz frame) a 3-dimensional position operator with commuting components, and the associated conjugate momentum operators. (See Section S2g ''Particle positions and the position operator'' of this FAQ.) These play exactly the same role as the position and momentum operators in nonrelativistic quantum mechanics. ------------------------------------------ S2j. Coherent states of light as ensembles ------------------------------------------ Let us look in some detail at the setting of a weak laser switched on at time t_0 and switched off again at time t_1. The time T:=t_1-t_0 that the laser is switched on is a variable that we can choose at will. Conventionally one models the light produced by a laser by coherent states. If one tests the photon contents at the end of the beam by a photodetector, one measures a series of clicks indicating (according to tradition) the presence of single photons. Each click is conventionally regarded as the measurement of a single photon; hence one measures an ensemble of photons. Without this interpretation, much of the talk about photons in quantum optics would not make sense. Technically, and completely precisely, one has an ensemble of photons in an indefinite photon number state. (Even a superposition of states describes an ensemble, in the conventional interpretation.) In a weak coherent state, the multiparticle contents is negligible; one has essentially a superposition of the vacuum and the single particle state. Conventionally (as for all somewhat rare events), the vacuum part is ignored - one just restricts attention to the times where a particle is present. This leaves a single particle state. Thus, at least for weak coherent states, it is a good approximation to say that a coherent state of definite frequency is an ensemble of single-particle systems. More formally, in the usual abbreviated form, a weak coherent state of a stationary monochromatic beam has the form |psi> = (1-eps||0> + eps|1> + O(eps^2), (*) with eps<<1, and = = = eps^2 + O(eps^3) is not a mean photon number, but a mean rate - the mean intensity. More precisely, each coherent state has a mode A=A(p); the modes are in 1-1 correspondence with creation operators a^*(A). They create, in field theory language, one photon in this mode. So far, these photons are only constructs on paper, used to be able to write down multiparticle states, and have not yet an observable meaning. An N-particle state of mode A is defined recursively from the vacuum state by |1,A> := a^*(A)|vac>, |N,A>: = a^*(A)|N-1,,A> for N>0, and coherent states with mode A have the form |z,A>> := const* sum_N z^N/sqrt{N!} |N,A> with a complex amplitude z. and satisfies a(A)|z,A>> = z|z,A>>. The mean photon number associated with the coherent state is Nbar := = = <> = <> = z^*z <> = z^*z, hence Nbar = |z|^2, independent of the time T. The events are the clicks, and there is exactly one click per event in a weak signal (for strong signals, one cannot separate the events). But the events happen randomly in time, with a rate proportinal to eps. It is conventional to regard each click as evidence for the presence of a single photon - this more or less defines the experimental notion of a photon. (See also the discussion in the section ''What is a photon?'' of this FAQ.) Note that two photons arriving at different times cannot be considered as being part of a N-particle state with N>1, since states are considered at a fixed time! Also, the fact that the weak coherent state has a negligible contribution of doubly excited states means that N-particle state with N>1 are here completely irrelevant. Thus one has an ensemble of single photons. Clearly, the number of observable photons (in the sense of detector clicks) is proportional to T. This shows that the formal photon number operator in Fock space, N = a^*(A)a(A), has nothing to do with the photon number as defined by the number of clicks; instead its expectation is proportional to the mean rate of clicks per unit time. Thus (*) describes an ensemble of O(T*eps^2) single photons, where $T$ is the duration of the experiment. In particular, plane monochromatic light in the form of a coherent state (three mathematical idealizations involved here) is an endless stream of infinitely many photons passing with the speed of light through a particular position on the beam. The rate of emission of photons is proportional to the intensity of the incident beam. But the fact that the model is an approximation only and that for real preparations, observations are bounded in space and time does not change the results of this analysis. On the other hand, it is clear that a coherent state is not a 1-photon state but a state with an indefinite number of photons (i.e., not an eigenstate of the number operator). Thus there seems to be a conflict in terminology - weak laser light is describerd by a coherent state without definite number contents, but it behaves experimentally as an ensemble of single photons. This shows that the concept of a photon is somewhat ambiguous. Different people mean different and often quite vague things by ''photon'', if they bother to spell out the meaning in some detail (which is usually not done). This can be seen from the diverging explanations given in a recent special issue on this topic: The Nature of Light: What Is a Photon? Optics and Photonics News, October 2003 http://www.osa-opn.org/Content/ViewFile.aspx?Id=3185 which presents five mutually incompatible views, * Light reconsidered (Arthur Zajonc) * What is a photon? (Rodney Loudon) * What is a photon? (David Finkelstein) * The concept of the photon - revisited (Ashok Muthukrishnan, Marlan O. Scully, and M. Suhail Zubairy) * A photon viewed from Wigner phase space (Holger Mack and Wolfgang P. Schleich) In QED, a ''one-photon state'' is a well-defined object, but ''one photon'' in an experiment is not (unless one identifies it with a detector click - which leaves unsaid what an undetected photon would be). The relation between the two is quite indirect, and there is no agreement in the literature on the precise relation. My own views (not mainstream, but consistent with experiment) are: 1. that clicks have nothing at all to do with photons, they are just a stochastic measure of intensity, and arise also if the incindent field is modelled completely classical; 2. that what is typically called a photon is not an arbitrary single particle state of the electromagnetic field (in particular, never an approximately plane wave) but a state of the electromagnetic field that at each time is localized in space, whose energy contents is that of hbar*omega. Otherwise, the idea of producing photons of demands makes no sense. 3. It is the field of the incident beam that counts; the talk about photons in the incoming beam is not very meaningful and only blurs the picture; the right language is that of field theory. Indeed, a theoretical model of a photo-detector excited by an external classical monochromatic e/m field contains no photons, but in this model the detector responds by clicking randomly according to a Poisson statistics; see Chapter 9 of the book L. Mandel and E. Wolf, Optical Coherence and Quantum Optics, Cambridge University Press, 1995. Thus a precise meaning of ''photon'' is not needed to defend statement 1. No matter which view one takes with regard to statement 1., the question is how one relates a 1-photon state to what one actually prepares in a beam of light. What does it mean in experimental terms to have prepared _one_ photon in this state? Reading the details of preparation schemes for photons on demands as discussed (with references to the original literature) in http://www.mat.univie.ac.at/~neum/ms/lightslides.pdf http://www.mat.univie.ac.at/~neum/ms/optslides.pdf one finds that no clear answer can be given to this question, but that the evidence points to statement 2. ov my view presented above. In this view, the difference between the preparation of a coherent state and that of a single photon is that a weak coherent state generates an infinitely long random sequence of Poisson-distributed clicks, while a single photon (in the above sense of a space-localized field) generates (in an ideal detector) a single click only. The practice seems to be that one silently ignores the vacuum contribution in (*) and obtains after rescaling to a normalized state a state psi' = |1> + O(eps) (*') which, with perfect right, can be considered to be an approximate 1-photon state. Indeed, most photon states produced in the laboratory are superposition with the vacuum, and still people speak of photons. This also holds for other systems than simple laser light. For example, entanglement studies are typically made with squeezed states, which differ from coherent states only in that they have instead of (*) a representation psi = (1-eps||0> + eps|2> + O(eps2), (**) and everyone refers to (**) as an ensemble of 2-photon states. Indeed, parametric down conversion is well-known to produce an ensemble of 2-photon states, but if one looks closer at the models one finds that they actually produce states of the form (**) that produce endless streams of photon pairs. While photons on demand are based on exciting single atoms, the only way of reliably creating single photons was for a long time to use a source in a state of the form (**), where the photon pairs are entangled pairs of photons with different momentum vectors (hence located on different beams). Then one observes photons (clicks) on the left beam with a detector, and knows from general principles that at the same time a photon is underway in the other beam. Thus one can know about the presence of single photons without having them observed yet. This interpretation again explains away the vacuum part of the state in (**). One restricts attention to the 2-photon sector of (**) by ignoring the times where nothing but the vacuum part is observed, and focuses on the times when something - and then by the form of (**) the 2-photon part - is observed. This is the sense in which one interprets as an ensemble of 2-photon states. Then one observes the part of the 2-photon system in one beam, to know when a photon is present in the other beam. Bot of course, although this is the way talked about the situation, in reality one still has the superposition with the vacuum, except that one chooses to ignore the times where nothing happens to get rid of the vacuum. --------------------------------------------- S3a. What are 'bare' and 'dressed' particles? --------------------------------------------- A bare electron is the formal entity discussed in textbooks when they do perturbative quantum electrodynamics. The intuitive picture generally given is that a bare electron is surrounded by a cloud of virtual photons and virtual electron-positron pairs to make up a physical, 'dressed' electron. Only the latter is real and observable. The former is a formal caricature of the latter, with paradoxical properties (infinite mass, etc.). On a more substantial level, the observable electrons are produced from the bare electrons by a process called renormalization, which modifies the propagators by self-energy terms and the currents by form factors. As the name says, the latter define the 'form' of a particle. (In the above picture, it would correspond to the shape of the virtual cloud, though it is better to avoid giving the virtual particles too much of meaning.) The dressed object is the renormalized, physical object, described perturbatively as the bare object 'clothed' by the cloud of virtual particles. The dressed interaction is the 'screened' physical interaction between these dress objects. To draw an analogy in nonrelativistic quantum mechanics think of nuclei as bare atoms, electrons as virtual particles, atoms as dressed nuclei and the residual interaction between atoms, computed in the Born-Oppenheimer approximation, as the dressed interaction. Thus, for Argon atoms, the dressed interaction is something close to a Lennard-Jones potential, while the bare interaction is Coulomb repulsion. This is the situation physicists had in mind when they invented the notions of bare and dressed particles. Of course, it is only an analogy, and should not be taken very seriously. It just explains the intuition about the terminology used. (For the serious version of renormalization, see Chapter 8.) The electrons in QM are real, physical electrons that can be isolated. The reason is that they are good eigenstates of the Hamiltonian. On the other hand, virtual particles don't have this nice attribute since the relativistic Hamiltonian H from field theory contains creation and annihilation operators which mess things up. The bare particles correspond to 1-particle states in the Hilbert space (though that is not quite true since there is no good Hilbert space picture in conventional interacting QFT). Multiplying them with H introduces terms with other particle numbers, hence a bare particle can never be an eigenstate of H, and thus never be observable in the way a nonrelativistic particle is. The eigenstates of the relativistic Hamiltonian are, instead, complicated multibody states consisting of a superposition of states with any number of particles and antiparticles, just subject to the restriction that the total quantum numbers come out right. These are the dressed particles. For the computational side of dressing, see, e.g., nucl-th/0102037, or http://www.geocities.com/meopemuk/ ------------------------------------------------ S3b. How meaningful are single Feynman diagrams? ------------------------------------------------ The standard model is a theory defined in terms of a Lagrangian. To get computable output, Feynman graph techniques are used. But individual Feynman graphs are meaningless (often infinite); only the sum of all terms of a given order can be given - after a process called renormalization - a well-defined (finite) meaning. This is well-known; so no-one treats the Feynman graphs as real. What is taken as real is the final outcome of the calculations, which can be compared with measurements. -------------------------------------- S3c. How real are 'virtual particles'? -------------------------------------- Virtual particles are used in perturbation theory with Feynman diagrams. (See the FAQ entry ''Why Feynman diagrams'' for an explanation of their meaning. They do _not_ describe processes in space and time, but certain multiple integrals...) Feynman diagrams change their nature depending on the way one does perturbation theory and what is resummed. In their treatise on QED, Landau and Lifshitz discuss virtual particles in Section 79. They start at the outset with the remark that things depend on which kind of perturbation theory is used, and contrast 'virtual' explicitly with 'real'. Virtual particles are called that in contrast to 'real particles' which are observable and hence real. Unlike the latter, virtual particles occuring in computations _must_ have disappeared from the formulas by the time the calculations lead to something that can be compared with experiment. Whence their 'reality' if there is any is like the reality of characters in a dream. For example, just as we can fly in a dream, virtual particles can be faster than light (since they may have imaginary mass)... The following is a more detailed discussion of the question how meaningful it is to ascribe some sort of reality to virtual particles. All language is only an approximation to reality, which simply is. But to do science we need to classify the aspects of reality that appear to have more permanence, and consider them as real. Nevertheless, all concepts, including 'real' have a fuzziness about them, unless they are phrased in terms of rigorous mathematical models (in which case they don't apply to reality itself but only to a model of reality). In the informal way I use the notion, 'real' in theoretical physics means a concept or object that - is independent of the computational scheme used to extract information from a theory, - has a reasonably well-defined and consistent formal basis - does not give rise to misleading intuition. This does not give a clear definition of real, of course. But it makes for example charge distributions, inputs and outputs of (theoretical models of) scattering experiments, and quarks something real, while making bare particles and virtual particles artifacts of perturbation theory. Quarks must be considered real because one cannot dispense with them in any coherent explanation of high energy physics. Virtual particles must not be considered real since they arise only in a particular approach to high energy physics - perturbation theory before renormalization - that does not even survive the modifications needed to remove the infinities. Moreover, the virtual particle content of a real state depends so much on the details of the computational scheme (canonical or light front quantization, standard or renormalization group enhances perturbation theory, etc.) that calling virtual particles real would produce a very weird picture of reality. Whenever we observe a system we make a number of idealizations that serve to identify the objects in reality with the mathematical concepts we are using to describe them. Then we calculate something, and at the end we retranslate it into reality. If our initial initialization was good enough and our theory is good enough, the final result will match reality well. Because of this idealization, 'real' real particles (moving in the universe) are slightly different from 'mathematical' real particles (figuring in equations). Modern quantum electrodynamics and other field theories are based on the theory developed for modeling scattering events. Scattering events take a very short time compared to the lifetime of the objects involved before and after the event. Therefore, we represent a prepared beam of particles hitting a target as a single particle hitting another single particle, and whenever this in fact happens, we observe end products, e.g. in a wire chamber. Strictly speaking (i.e., in a fuller model of reality), we'd have to use a multiparticle (statistical mechanics) setting, but this is never done since it does not give better information and the added complications are formidable. As long as we prepare the particles long (compared to the scattering time) before they scatter and observe them long enough afterwards, they behave essentially as in and out states, respectively. (They are not quite free, because of the electromagnetic self-field they generate, this gives rise to the infrared problem in quantum electrodynamics and can be corrected by using coherent states.) The preparation and detection of the particles is outside this model, since it would produce only minute corrections to the scattering event. But to treat it would require to increase the system to include source and detector, which makes the problem completely different. Therefore at the level appropriate to a scattering event, the 'real' real particles are modeled by 'mathematical' in/out states, which therefore are also called 'real'. On the other hand, 'mathematical' virtual particles have nothing to do with observations, hence have no counterpart in reality; therefore they are called 'virtual'. The figurative virtual objects in QFT are there only because of the well-known limitations of the foundations of QFT. In a nonperturbative setting they wouldn't occur at all. This can be seen by comparing with QM. One could also do nonrelativistic QM with virtual objects but no one does so (except sometimes in motivations for QFT), because it does not add value to a well-understood theory. Virtual particles are an artifact of perturbation theory that give an intuitive (but if taken too far, misleading) interpretation for Feynman diagrams. More precisely, a virtual photon, say, is an internal photon line in one of the Feynman diagrams. But there is nothing real associated with it. Detectable photons are always real, 'dressed' photons. Virtual particles, and the Feynman diagrams they appear in, are just a visual tool of keeping track of the different terms in a formal expansion of scattering amplitudes into multi-dimensional integrals involving multiple propaators - the momenta of the virtual particles represent the integration variables. They have no meaning at all outside these integrals. They get out of mathematical existence once one changes the formula for computing a scattering amplitude. Therefore virtual particles are essentially analogous to virtual integers k obtained by computing log(1-x) = sum_k x^k/k by expansion into a Taylor series. Since we can compute the logarithm in many other ways, it is ridiculous to attach to k any intrinsic meaning. But ... ... in QFT, we have no good ways to compute scattering amplitudes without at least some form of expansion (unless we only use the lowest order of some approximation method), which makes virtual particles look a little more real. But the analogy to the Taylor series shows that it's best not to look at them that way. (For a very informal view of quantum electrodynamics in terms of clouds of virtual particles see http://groups.google.com/groups?selm=3EBBE37C.4D771C4B%40univie.ac.at and the later mails in this thread.) A sign of the irreality of virtual particles is the fact that when one does partial resummations of diagrams (which is essential for renormalization), many of the virtual particles disappear. A fully nonperturbative theory would sum everything, and no virtual particles would be present anymore. Thus virtual particles are entirely a consequence of looking at QFT in a perturbative way rather than nonperturbatively. In the standard covariant Feynman approach, energy (cp_0) and momentum (\p; the backslash indicates 'boldface') is conserved, and virtual particles are typically off-shell (i.e., they do not satisfy the equation p^2 = p_0^2 - \p^2 = m^2 for physical particles). To see this, try to model a vertex in which an electron (mass m_e) absorbs a photon (mass 0). One cannot keep the incoming electron and photon and the outgoing photon on-shell (satisfying p^2 = m^2) without violating the energy-momentum balance. However, many physicists work in light front quantization. There one keeps all particles on-shell, and instead has energy and momentum nonconservation (removed formally by adding an additional 'spurion'). The effect of this is that the virtual particle structure of the theory is changed completely: For example, the physical vacuum and the bare vacuum now agree, while in the standard approach, the vacuum looks like a highly complicated medium made up from infinitely many bare particles.... But bare particles must still be dressed to become physical, though less heavily than in the traditional Feynman approach. Another group of physicists calculate consequences of the standard model using quantization on a lattice. Here virtual particles are completely absent. Clearly concepts such as virtual particles that depend so much on the method of quantization cannot be regarded as being real. Of course, physicists would not talk of virtual particles if the concept had no relevance at all. One can argue with virtual particles to get an intuitive idea of 'dressing', and to gain in this way some understanding of phenomena such as the Casmir effect, Rabi oscillations, the Lamb shift, anomalous magnetic moments, etc. From a nonperturbative point of view, these effects all show up as a consequence of renormalized, effective interactions between physical (dressed, on-shell) particles. See also earlier discussions on s.p.r. such as http://www.lns.cornell.edu/spr/2003-06/msg0051674.html also http://www.lns.cornell.edu/spr/1999-02/msg0014762.html and followups; maybe http://www.lns.cornell.edu/spr/2003-05/msg0051023.html is also of interest. [For a longwinded alternative view of virtual particles that I do _not_ share but rather find misleading, see http://www.desy.de/user/projects/Physics/Quantum/virtual_particles.html] ------------------------------------------------------- S3d. What is the meaning of 'on-shell' and 'off-shell'? ------------------------------------------------------- This applies only to relativistic particles. A particle of mass m is on-shell if its momentum p satisfies p^2 (= p_0^2-p_1^2-p_2^2-p_3^2) = m^2, and off-shell otherwise. The 'mass shell' is the manifold of momenta p with p^2=m^2. Observable (i.e., physical) particles are asymptotic states (scattering states) described (modulo unresolved mathematical difficulties) by free fields based on the dispersion relation p^2=m^2, and hence are necessarily on-shell. Off-shell particles only arise in intermediate perturbative calculations; they are necessarily 'virtual'. The situation is muddled by the fact that one has to distinguish (formal) bare mass and (physical) dressed mass; the above is valid only for the dressed mass. Moreover, the mass shell loses its meaning in external fields, where, instead, a so-called 'gap equation' appears. ---------------------------------------------- S3e. Virtual particles and Coulomb interaction ---------------------------------------------- Virtual objects have strange properties. For example, the Coulomb interaction between two electrons is mediated by virtual photons faster than the speed of light, with imaginary masses. (This is often made palatable by invoking a time-energy uncertaintly relation, which would allow particles to go off-shell. But there is no time operator in QFT, so the analogy to Heisenberg's uncertainty relation for position and momentum is highly dubious.) Strictly speaking, the Coulomb interaction is simply the Fourier transform of the photon propagator 1/q^2, followed by a nonrelativistic approximation. It has nothing at all to do with virtual particle exchanges --- except if one does perturbation theory. But then there is no surprise that it must influence already the tree level. By a hand waving argument (equate the Born approximations) this gives the nonrelativistic correspondence. But to get the Coulomb interaction as part of the Schroedinger equation, one needs to sum all ladder diagrams with 0,1,2,3,...,n,... exchanged photons arranged in form of a ladder. Then one needs to approximate the resulting Bethe-Salpeter equation. These are nonperturbative techniques. (The computations are still done at few loops only, which means that questions of convergence never enter.) Virtual photons mediating the Coulomb repulsion between electrons have spacelike momenta and hence would proceed faster than light if there were any reality to them. But there cannot be; one'd need infinitely many of them, and infinitely many virtual electron-positron pairs (and then superpositions of any numbers of these) to match exactly a real, dressed object or interaction. ----------------------------------------------------------- S3f. Are virtual particles and decaying particles the same? ----------------------------------------------------------- Decaying particles and resonances are used synonymously in the literature; they are complementary views of the same unstable state. A very sharp resonance has a long lifetime relative to a scattering event, hence behaves like a particle in scattering. It is regarded as a real object if it lives long enough that its trace in a wire chamber is detectable, or if its decay products are detectable at places significantly different from the place where it was created. On the other hand, a very broad resonance has a very short lifetime and cannot be differentiated well from the scattering event producing it; so the idealization defining the scattering event is no longer valid, and one would not regard the resonance as a particle. Of course, there is an intermediate grey regime where different people apply different judgment. This can be seen, e.g., in discussions concerning the tables of the Particle Data Group. The only difference between a short-living particle and a stable particle is the fact that the stable particle has a real rest mass, while the mass m of the resonance has a small imaginary part. Note that states with complex masses can be handled well in a rigged Hilbert space (= Gelfand triple) formulation of quantum mechanics. Resonances appear as so-called Siegert (or Gamov) states. A good reference on resonances (not well covered in textbooks) is V.I. Kukulin et al., Theory of Resonances, Kluwer, Dordrecht 1989. For rigged Hilbert spaces (treated in Appendix A of Kukulin), see also quant-ph/9805063 and for its functional analysis ramifications, K. Maurin, General Eigenfunction Expansions and Unitary Representations of Topological Groups, PWN Polish Sci. Publ., Warsaw 1968. But a very short-living particle is not the same as a virtual particle. Often it is a complicated, nearly bound state of other particles. On the other hand, virtual particles are essentally always elementary. (There are exceptions when deriving Bethe-Salpeter equations and the like for the approximate calculations of bound states and resonances, where one creates an effective theory in which the latter are treated as elementary.) Even an unstable elementary particle can be distinguished from a virtual particle. In perturbation theory, unstable elementary particles are modelled exactly like stable particles, namely as external lines in a Feynman diagram. Virtual particles in Feynman diagrams are exactly those parts of the diagram which are not given by external lines. In particular, what is real and what is virtual is not affected by a diagram rotation - this only affects what is input and what is output. The difference can also be seen in the mathematical representation. In an effective theory where the resonance (e.g., the neutron or a meson) is regarded as an elementary object, the resonance appears in in/out states as a real particle, with complex on shell momentum satisfying p^2=m^2, but in internal Feynman diagrams as a virtual particle with real mass, almost always off-shell, i.e., violating this equation. There are also some unstable elementary particles like the weak gauge bosons. Usually, one observes a 4-fermion interaction and the gauge bosons are virtual. But at high energy = very short scales, one can in principle observe the gauge bosons and make them real. This means that they now appear as external lines in the corresponding perturbative calculations, which displays their nonvirtual nature. In any case, from a mathematical point of view, one must choose the framework. Either one works in a Hilbert space, then masses are real and there are no unstable particles (since these 'are' poles on the so-called 'unphysical' sheet); in this case, there are no asymptotic gauge bosons and all are therefore virtual. Or one works in a rigged Hilbert space and deform the inner product; this makes part of the 'unphysical' sheet visible; then the gauge bosons have complex masses and there exist unstable particles corresponding to in/out gauge bosons which are real. The modeling framework therefore decides which language is appropriate. ------------------------------------------ S4a. How do atoms and molecules look like? ------------------------------------------ Today, images of single atoms and molecules can be routinely produced. M. Herz, F.J. Giessibl and J. Mannhart Probing the shape of atoms in real space Phys. Rev. B 68, 045301 (2003) http://prola.aps.org/pdf/PRB/v68/i4/e045301 write in the introduction: ''quantum mechanics specifies the probability of finding an electron at position x relative to the nucleus. This probability is determined by |psi(x)|^2, where psi(x) is the wave function of the electron given by Schroedinger's equation. The product of -e and |psi(x)|^2 is usually interpreted as charge density, because the electrons in an atom move so fast that the forces they exert on other charges are essentially equal to the forces caused by a static charge distribution -e|psi(x)|^2.'' One of the authors, Jochen Mannhart, is one of the 10 winners of the Leibniz prize 2008, http://www.dfg.de/aktuelles_presse/preise/leibniz_preis/2008/ among others for the achievement that, for the first time, he made pictures of atoms with subatomic resolution possible. The Leibniz prize is the highest German academic prize, endowed with a research grant of up to 2.5 Million Euro for each winner, awarded each year to a few excellent younger scientists from all sciences. The orbitals one can look at in physics and chemistry books are the pictures of the squared absolute values of basis functions used for representing single electron wave functions. The actual shape of the wave function of each electron is some linear combination of such basis function. These are calculated (in the simplest realistic approximation) by Hartree-Fock calculations. The atom shape is the shape of all electrons together, forming in the Hartree-Fock approximation a Slater determinant formed from the single-particle wave functions, and in general a linear combinations of such Slater determinants. These live in a multidimensional space with 3n dimensions for an atom with n electrons. The shape one can measure is actually a 3-dimensional charge density rho(x) (x in R^3) formed by integrating the square of the absolute value of the 3n-dimensional wave function psi over 3n-3 dimensions. More precisely, it is defined (nonrelativistically) such that (apart form a constant factor and the charge contribution of the nucleus) integral dx rho(x) f(x) = psi^* O_1(f) psi (1) for all nice 3-dimensional functions f(x) of the space coordinate vector x, where O_1(f) = integral f(x) a^*(x) a(x) is the 1-particle operator corresopnding to f. Here a^* and a denote creation and annihilation operators. Since rho(x) decays quickly as x differs more and more from the atom center, the atom looks like a charge cloud with slightly fuzzy boundary. For isolated atoms in the absence of external fields, rho is typically spherically symmetric, giving symmetric shapes. (In case of particles of nonzero spin, this assumes that we are in a thermal setting where the spin directions average out. In this case, we have instead of (1) the formula integral dx rho(x) f(x) = tr O_1(f) rhohat, where rhohat is the density matrix of the mixed state.) For molecules, rho is in fact also a function of the coordinates of all nuclei involved, and there is no longer any reason to have more symmetry than the symmetry of the configuration of nuclei, which is very little and often none. The shape of molecules is therefore mainly determined by the geometry of the positions of the nuclei. In equilibrium, these arrange themselves such that the potential energy, i.e., the smallest eigenvalue of the Hamiltonian operator for the electrons is minimal among all other positions (or at least a local minimum from which a deeper lying state is very difficult to reach). The charge density of molecules can be identified by means of X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy; however, for complex molecules, doing this reliably from the available indirect information is a highly nontrivial art. A few years ago, I wrote a survey of molecular modeling of proteins, the largest molecules in nature (apart from crystals, which are essentially molecules of macroscopic size): A. Neumaier, Molecular modeling of proteins and mathematical prediction of protein structure, SIAM Review 39 (1997), 407-460. http://www.mat.univie.ac.at/~neum/papers/physpapers.html#protein Viewing atoms or molecules with a scanning tunneling microscope (STM) or an atomic force microscope (AFM) http://en.wikipedia.org/wiki/Atomic_force_microscope amounts to scanning the response of the 3-dimensional charge density to (or, more precisely, the current or force induced by it on) the scanning device, from which a computer generates a picture. http://www.physics.purdue.edu/nanophys/images.html http://www.almaden.ibm.com/vis/stm/gallery.html Thus rho(x) is actually observable, with a resolution of currently up to 0.6 Angstrom = 0.6 10^{-10}m. http://www.hypography.com/article.cfm?id=34288 For a discussion of the charge density of molecules and the resulting operative interpretation of atoms in molecules see, e.g., the encyclopedic article R.F.W. Bader Atoms in Molecules http://59.77.33.35/non-cgi/usrd8wqiernb/5/20/Atoms20in20Molec_1193580192.pdf or Bader's web site http://www.chemistry.mcmaster.ca/faculty/bader/aim/aim_0.html On the other hand, whether atomic or molecular substructures such as orbitals are observable is controversial. See, e.g., J.M. Zuo et al., Direct Observation of d-orbital holes and Cu-Cu bonding in Cu_2O, Nature 401 (1999), 49-52. http://www.nature.com/nature/journal/v401/n6748/pdf/401049a0.pdf http://www.nature.com/nature/journal/v401/n6748/pdf/401021a0.pdf http://www.public.asu.edu/~jspence/NewsViews.pdf http://web.missouri.edu/~glaserr/412f99/synopsis_nv1.pdf http://philsci-archive.pitt.edu/archive/00000228/00/Orbital_Observed.pdf https://jchemed.chem.wisc.edu/HS/Journal/Issues/2001/Jul/abs877_2.html http://philsci-archive.pitt.edu/archive/00001077/00/Jenkins.doc for discussions in 1999-2001, a discussion presenting a positive majority vote among 22 textbooks: http://wwwcsi.unian.it/educa/inglese/halfacen.html and from 2007: http://jjap.ipap.jp/link?JJAP/46/L161/ Also, see the nice pictures in M. Herz, F.J. Giessibl and J. Mannhart Probing the shape of atoms in real space Phys. Rev. B 68, 045301 (2003) http://prola.aps.org/pdf/PRB/v68/i4/e045301 Apparently, it is a matter of terminology. Those who use the term orbital to refer to a charge distribution corresponding to a particular electronic state (and the ball- dumbbell-, or ring-shaped pictures of orbitals in textbooks show just that) find orbitals observable, while purists restricting the usage of orbitals to denoting particular single-electron wave functions find them unobservable. Note that Scerri, who in http://philsci-archive.pitt.edu/archive/00000228/00/Orbital_Observed.pdf defends the unobservability of orbitals, writes explicitly: ''What can be observed, and frequently is observed in experiments, is electron density. In fact, the observation of electron density is a major field of research in which several monographs and review articles have been written.'' and then cites two books and a review article. A more recent review article of some aspects is J.M. Zuo Measurements of electron densities in solids: a real-space view of electronic structure and bonding in inorganic crystals Rep. Progr. Phys. 67 (2004), 2053-2103. -------------------------------------------------- S4b. Why are observable densities state-dependent? -------------------------------------------------- In the preceding, the mass and charge density of a n-particle system (or of a single particle) depends on its quantum state. This is sometimes regarded as a reason for denying the 'reality' of the mass and charge density. However, such a reasoning is misguided. Indeed, the phenomenon is already present in classical mechanics. That mass and charge density depends on the state is no more surprising than that the trajectory of a classical particle depends on its classical state (its position and momentum), or that the density of a cloud in the sky depends on its classical state (the position and momentum of all its particles, or, in the customary fluid mechanics approximation, its mass density field and its velocity field). Of course it has to, to match a particular real life situation. What seems strange at first sight is that the above applies already to a single, indivisible particle. But this is really strange only if one assumes that the particle is pointlike - which we know is the case only for unphysical, bare particles, but not for the physical, renormalized ones. (See the entry ''Are electrons pointlike/structureless?'' elsewhere in this FAQ.) Once one realizes that physical particles are extended (although they are indivisible), there is enough room to accommodate the internal structure described by densities. Thus the only quantum paradox that remains is that particles with nontrivial internal structure (and shape) can nevertheless be indivisible, a fact coming from the representation theory of the fundamental symmetry group of our universe: Indivisibility of an object just means that this object is described by an irreducible representation which cannot be decomposed further without violating a fundamental symmetry. ------------------------------------------- S4c. Are electrons pointlike/structureless? ------------------------------------------- Both electrons and neutrinos are considered to be pointlike as bare particles, because of the way they appear in the standard model. But physical, relativistic particles are not pointlike. A pointlike electron would be described exactly by the 1-particle Dirac equation, which has a degenerate spectrum. But the real electron is described by a modified Dirac equation, resulting in an anomalous magnetic moment and a nonzero Lamb shift resolving the degeneracy of the spectrum. Both are measurable to high accuracy. The relations between form factors for spin 1/2 particles and terms in a modified Dirac equation describing the covariant dynamics of a particle deviating from a point particle are given in L. L. Foldy The Electromagnetic Properties of Dirac Particles Phys. Rev. 87 (1952), 688 - 693. An intuitive argument for the lack of pointlikeness is the fact that their localization to a region significantly smaller than the de Broglie wavelength would need energies larger than that needed to create particle-antiparticle pairs, which changes the nature of the system. (See also this FAQ about localization, and Foldy's papers quoted there.) On a more formal, quantitative level, the physical, dressed particles have nontrivial form factors, due to the renormalization necessary to give finite results in QFT. The form factor measures the deviation form the behavior of an ideal point particle, i.e., a particle obeying exactly the the Dirac equation. The form factor can be measured indirectly, through the anomalous magnetic moment and the Lamb shift. (A point particle has no anomalous magnetic moment and no Lamb shift since it satisfies the Dirac equation exactly.) Nontrivial form factors give rise to a positive charge radius. In his book S. Weinberg, The quantum theory of fields, Vol. I, Cambridge University Press, 1995, Weinberg defines and explicitly computes in (11.3.33) a formula for the 'charge radius' of a physical electron. But his formula is not fully satisfying since it is not fully renormalized (infrared divergence: the expression contains a ficticious photon mass, and diverges if this goes to zero). For electron form factors in light atoms, see hep-ph/0002158 = Physics Reports 342, 63-126 (2001): Equation (28) uses a binding energy dependent cutoff, which makes the electron charge radius depend on its surrounding. Of course, other particles also have form factors and associated charge radii. For proton and neutron form factors, see hep-ph/0204239 and hep-ph/030305. Neutrons have a negative mean squared charge radius. This looks strange but is not since the measure for the mean is not positive; but it means that a classical interpretation of the charge radius of neutrons is dubious. In the introduction of S. Kopecky et al Phys. Rev. C 56, 2229-2237 (1997) one can read: ''The charge radius of the neutron or the mean squared charge radius is described by the volume integral over the neutron integral rho(r)r^2dr, where r is the distance to the center of the neutron and rho(r) is the charge density. Positive as well as negative values of rho(r) will occur coming from the distributions of valence quarks and the negative p-meson cloud outside. Since rho(r) is negative for larger r values, caused by the meson cloud, the r^2 dependence of the integral will lead to a negative value of .'' The paper L.L. Foldy, Neutron-electron interaction, Rev. Mod. Phys. 30, 471-481 (1958). discusses the extendedness of the electron in a phenomenological way. On the numerical side, I only found values for the charge radius of the neutrinos, computed from the standard model to 1 loop order. The values are about 4-6 10^-14 cm for the three neutrino species. See (7.12) in Phys. Rev. D 62, 113012 (2000) http://adsabs.harvard.edu/cgi-bin/nph-bib_query?1992PhDT.......130L gives in an abstract of a 1982 thesis of Anzhi Lai an electron charge radius of ~ 10^{-16} cm (But I haven't seen the thesis.) The "form" of an elementary particle (considered as a free particle at rest) is described by its form factor, which is a well-defined physical function (though at present computable only in perturbation theory) describing how the (spin 0, 1/2, or 1) particle's response to an external classical electromagnetic field deviates from the Klein-Gordon, Dirac, or Maxwell equations, respectively. The form factor contains the complete state-independent information about a free particle, since it determines the (single-particle) Hamilton operator of the free particle and everything else can be computed from it. In Foldy's paper, the form factors are encoded in the infinite sum in (16). The sum is usually considered in the momentum domain; then one simply gets two k-dependent form factors, where k represents the 4-momentum transferred in the interaction. These form factors can be calculated in a good approximation perturbatively from QFT, see for example Peskin and Schroeder's book. An extensive discussion of form factors of Dirac particles and their relation to the radial density function is in D. R. Yennie, M. M. Levy and D. G. Ravenhall, Electromagnetic Structure of Nucleons, Rev. Mod. Phys. 29, 144-157 (1957). and R. G. Sachs High-Energy Behavior of Nucleon Electromagnetic Form Factors Phys. Rev. 126, 2256-2260 (1962) Yennie et al. write: ''Information about the internal structure of the individual nucleons is contained in the results of a variety of experiments performed in recent years. [...] The Lamb shift and the hyperfine splitting also give such information, [...] The charge-current density of the nucleon (proton or neutron) includes all of the effects of the internal structure. [...] The nucleon charge-current density must have the form The functions F_1 and F_2 are relativistic generalizations of the form factors characteristic of finite extension occurring in other experiments, [...]'' However, the form factor contains nothing at all about interaction- or state-dependent information since the interaction-dependent information is coded in an external potential or a multiparticle formulation, and the state-dependent information is coded in the wave function or density matrix, which (at any given time) is independent of the Hamiltonian. Also, the information contained in the form factor is only about the free particle in the rest system, defined by a pure state in which momentum and orbital angular momentum vanish identically. In an external potential, or in a state where momentum (or orbital angular momentum) doesn't vanish, the charge density (and the resulting charge radius) can differ arbitrarily much from the charge density (and charge radius) at rest. For example, for a hydrogen electron in the ground state, the charge density is significant in a region of diameter about 10^-11 cm (a small multiple of the Bohr radius), while the charge radius at rest is probably (in view of the above partial results) << 10^-12 cm. In all cases, the charge distribution is defined as the expectation of the charge density operator of the corresponding quantum field. For molecules, this charge distribution is the computational target of much of quantum chemistry, and defines the shape of a molecule. The shape of a particle determined by the form factor therefore corresponds to the equilibrium shape a molecule takes in its rest frame in the absence of forces, i.e., in its ground state, while the state-dependent shape corresponds to the much less predictable shape of a molecule interacting with its environment. ------------------------------------------- S4d. How much information is in a particle? ------------------------------------------- Knowing a particular electron intimately is infinitely precious. A pure state of an electron is defined by its wave function (up to a phase). Thus knowing all about an electron requires in the traditional interpretation to know all about this wave function - an infinite amount of information. The information humans are interested in is however always finite, since they can hardly remember even 20 decimal digits seen only once. And the amount of information humans are capable of retrieving by experiment is still limited, since each experiment has only a finite accuracy. Thus they simplify things to the point that all they want to know about an electron is its mass, charge and its state to a small number of decimal places. This is only a few bits. But if you want to tell someone else exactly where the electron is that you are referring to, you have an infinitely more difficult task. Of course, any human 'else' will not be patient enough to hear the whole (infinite) story but will be satisfied with a crude position and momentum estimate consistent with the uncertainty relation. But this is not the best possible statement about the electron, which would be telling its complete wave function. You can do it only if you force the electron into a prison where it has to behave in a dull (and hence completely describable) way, being restricted in its freedom to at most a few bits of change. This is indeed done when studying qubits for quantum information processing. For an N-state system, one needs N^2-1 independent pieces of information to reconstruct (by quantum tomography) the density matrix of a finite mixed quantum system, and a fortiori the wave function of a finite pure quantum system. Most natural systems, unlike those systems carefully prepared by modern technology, have infinitely many states, and therefore need an infinite amount of information for their reconstruction to full accuracy. ------------------------------------ S4e. Entropy and missing information ------------------------------------ [This continues the preceding entry.] How is this notion of information related to information in terms of entropy? Informally, entropy is often equated with information, but this is not correct - entropy is _missing_ information! More precisely, in the statistical interpretation, the state belongs not to a single particle but to an ensemble of particles. Entropy measures the amount of information missing for a complete probabilistic description of a system. Entropy is the mean number of binary questions that must be asked in an optimal decision strategy to determine the state of a particular realization given the state of the ensemble to which it belongs. See Appendix A of my paper A. Neumaier, On the foundations of thermodynamics, arXiv:0705.3790 http://lanl.arxiv.org/abs/0705.3790 The formula for the entropy S found in every statistical mechanics textbook is, for a system in a mixed state described by the density matrix rho, S = where = Tr rho f and kbar is Boltzmann's constant. (I use the bar to be free to use k as an index.) In any representation where rho is diagonal, rho = sum_k p_k |k>=1 and rho is positive semidefinite, sum_k p_k = 1 , all p_k >= 0. Thus p_k can be consistently interpreted as the probability of the system to occupy state |k>. This probability interpretation depends on the orthonormal basis used to represent rho; which basis to use is a famous and not really solved problem in the foundations of quantum mechanics. For a pure state psi, rho has rank 1, and the sum extends only over the single index k with |k> = psi. Thus in this case, p_k = 1 and S = kbar 1 log 1 = 0, as it should be for a state of maximal information. The amount of missing information is zero. For more along these lines, and in particular for a way to avoid the probabilistic issues indicated above, see Sections 6 and 12 and Appendix A of my paper A. Neumaier, On the foundations of thermodynamics, arXiv:0705.3790 http://lanl.arxiv.org/abs/0705.3790 But how does the infinite amount of information in a pure state (wave function) square with the finiteness of entropy? Specifying a mixed state _exactly_ provides already an infinite amount of information, since the density matrix rho must be specified to infinite precision. Defining the eigenstates that are of interest in measurement amounts to specifying a Hamiltonian operator H _exactly_, which again provides already an infinite amount of information, since the coefficients of H in an explicit description must be specified to infinite precision. Then only a finite amount of information is missing to determine in which of the eigenstates a particular particle is. Of course in practice one just _postulates_ rho and H based on a finite number of measurements, and _pretends_ (i.e., procedds as if) they are known exactly, while knowing well that one knows them only approximately. In practice, a number of approximations are made. Frewquently, one postulates exact equilibrium, hence a grand canonical ensemble, which of course is not exactly valid. Deviations from equilibrium are handled by means of a hydrodynamical approximation, in which entropy is no longer a number but a field - and specifying the entropy density again requires an infinite amount of information. Of course, one also represents this only to some limited accuracy, to keep things tractable. Thus finiteness of the entropy in a particular model is enforced by making simplifying assumptions which are valid only if one doesn't look too closely. Indeed, as the Gibbs paradox (discussed, e.g., as Example 9.1 in my above thermodynamics paper) shows, the amount of entropy depends on the level of modeling. An analogy contributed by Gerard Westendorp: To describe a classical, slightly biased die exacltly by a probability distribution also requires an infinite amount of information, namely the specification of 5 infinite decimal expansions of the probabilities p_k for getting k eyes. (The sixth is the determined since probabilities sum up to 1.) This is much more than the finite amount of information in saying which particular value k was obtained in a specific die. On the other hand, _given_ the distribution, the entropy S = - sum p_k log p_k is finite. In general, describing the probabilistic state of an ensemble exactly requires much more information than the exact description of a particular realization. ----------------------------------- S4f. How real is the wave function? ----------------------------------- In thought experiments one often assigns a state to a single particle. How defendable is this, and what is the meaning of the state? In a statistical interpretation - see the section on measurements -, this would make no sense, since there the state is a property of the ensemble of particles generated by a given source. But then it is difficult to visualize what happens in each single case. Thus many people prefer the 'realistic' language of particles having definite states. So let us discuss some of its implications. Suppose that the particle is in the pure state represented by the wave function psi. It is possible to give the wave function, or rather its absolute valued squared, a geometric interpretation: m(x)=m|psi(x)|^2 is the mass density and e(x)=e|psi(x)|^2 the charge density. Thus while the wave function itself has no tangible interpretation, certain fields computable from it have. This extends - but not quite in the obvious way - to multiparticle systems: For a system of several, say n particles, the wave function is 3n-dimensional psi(x_1,...,x_n), each x_i being an ordinary 3-dimensional position vector, but the correct densities are still 3-dimensional, obtained by integration: m(x) = sum_a m_a integral dx_1...dx_n delta(x-x_a)|psi(x_1:n)|^2, e(x) = sum_a e_a integral dx_1...dx_n delta(x-x_a)|psi(x_1:n)|^2. This reduces for n=1 to the above, and is consistent with the definition of mass and charge density in quantum field theory as m(x) = , e(x) = , where Psi_0(x) is the time component of the relevant matter field. These formulas are the common starting point for the derivation from first principles of the semiconductor equations in solid state physics. It is also what chemists draw as molecular shapes, using a cutoff where m(x) and e(x) are negligible to delineate the boundary. Indeed, chemists use such an interpretation all the time when visualizing molecules in terms of orbitals, and with great success. The charge distribution of the electron cloud of a molecule is one of the important outputs of quantum chemistry packages such as GAUSSIAN (commercial) http://www.gaussian.com/ MOLPRO (commercial) http://www.molpro.net/ GAMESS (free after registration) http://www.msg.ameslab.gov/GAMESS/pcgamess.shtml In the ground state (but also in definite excited states), the mass or charge distribution is spread out over an infinite region, although it becomes negligibly small outside a tiny core region (or, sometimes, such as in Stern-Gerlach experiments, where the wave function is multimodal, outside a few disconnected core regions). The infinite extension invites apparent paradox in that upon collapse (e.g., due to hitting a detector screen), the particle contracts from its infinite extension to a single spot. This seems to violate the central tenet of relativity that information cannot flow faster than the speed of light. However, special relativity only restricts the observational consequences of theory. Since most of the wave function of an individual particle is unobservable, there is no contradiction. (It is like the nonlocality in tests of Bell's inequalities. Nonlocality is unavoidable in QM, but the observable consequences respect the bound relativity puts on the speed of information flow.) For example, on a TV set, one observes just 3 position degrees of freedom of each electron reaching the screen, while - in contrast to the case of a classical particle - the wave function characterizing a pure state of the electron sits in a space of functions of 3 variables, which has infinitely many degrees of freedom. Thus one observes only a tiny little bit about the electron's state. It is like knowing the velocity of the wind (a 3-dimensional vector field) in the earth's atmosphere at a single point (giving a velocity vector with 3 coordinates)! This unobservability of most of the state causes a problem for those who require that everything a theory is talking about is observable. But this requirement is not satisfied anyway in current microphysics - no one ever observed a quark, but it is generally believed that they make up most of the matter in our universe. Thus, while it is reasonable to require that theory has observable consequences in agreement with Nature, it is not reasonable to require that everything the theory talks about is observable. Then the unobservability of most of the state of a single particle is harmless. On the other hand, one can probe the state of particles in detail if one has a large ensemble of identically prepared particles (to make sure that they have the same state). These are usually created by a carefully calibrated source, such as a laser. Then one can subject them to different kinds of measurements from which one can reconstruct a reasonable approximation of the state by quantum tomography. In theory, one can make the approximation arbitrarily good. Similarly a particle bound to a surface in a stationary state will be measurable repeatedly if after the measurement the particle returns to its state (which is natural if the bound system is in equilibrium). Therefore one can measure equilibrium properties quite accurately. In this sense one can say that the state of a single particle is indeed real, and objective. Note that single particles can nowadays be routinely prepared and studied; see, e.g., D. Leibfried, R. Blatt, C. Monroe, and D. Wineland, Quantum dynamics of single trapped ions, Reviews of Modern Physics 75 (2003), 281-324. S.M. Reimann and M. Manninen, Electronic structure of quantum dots, Reviews of Modern Physics 74 (2002), 1283-1342. ---------------------------------- S4g. How real are Feynman's paths? ---------------------------------- In Feynman's version of quantum mechanics, amplitudes are calculated as sum over all possible classical paths a particle (or a system) can take in a classical phase space. The paths in the Feynman picture of QM should not be regarded as real. All possible paths are about as real as all possible books that can be written, or - closer to physics - all possible items in a statistical ensemble modeling a classical ideal gas. Of course only one state is realized, not all conceivable ones; all others are just there to compare to and compute probabilities. In QM things are slightly more complicated, however, since the 'true' path is smeared by the uncertainty principle. (Even in the many-wolds interpretation, quantum objects have no sharp paths, while the paths integrated over in a path integral must be perfectly accurate.) The paths are just calculational devices that stop to exist once a different approach to computations are taken. This is why I don't ascribe any reality to them. The real objects remain present in _any_ sensible description; the unreal one's don't. --------------------------------------- S4h. Can particles go backward in time? --------------------------------------- In the old relativistic QM (e.g., in Volume 1 of Bjorken and Drell) antiparticles are viewed as particles traveling backward in time. This is based on a consideration of the solutions of the Dirac equation and the idea of a filled sea of negative-energy solutions in which antiparticles appear as holes (though this picture only works for fermions since it requires an exclusion principle). One can go some way with this view, but more sophisticated stuff requires the QFT picture (as in Volume 2 of Bjorken and Drell and most modern treatments). In relativistic QFT, all particles (and antiparticles) travel forward in time, corresponding to timelike or lightlike momenta. (Only 'virtual' particles may have unrestricted momenta; but these are unobservable artifacts of perturbation theory.) The need for antiparticles is in QFT instead revealed by the fact that they are necessary to construct operators with causal (anti)commutation relations, in connection with the spin-statistic theorem. See, e.g., Volume 1 of Weinberg's quantum field theory book. Thus talking about particles traveling backward in time, the Dirac sea, and holes as positrons is outdated; it is today more misleading than it does good. ------------------------------------------------------- S4i. What about particles faster than light (tachyons)? ------------------------------------------------------- Tachyons are hypothetical particles with speed exceeding the speed of light. Special relativity demands that such particles have imaginary rest mass (negative m^2), and hence can never be brought to rest (or below the speed of light); unlike ordinary particles, they speed up as they lose energy, Charged tachyons would produce Cerenkov radiation in vacuum which has never been observed. However, Cerenkov radiation is indeed observed when fast particles enter a dense medium in which the speed of light is smaller than the particle's speed. This is not a problem since relativity only demands that no particle with real mass is faster than the speed of light in vacuum. (Unfortunately, this does no longer allow to discriminate between massless particles having the vacuum speed of light, and tachyons.) Neutrinos are uncharged and have a squared mass of zero or very close to zero, and hence could possibly be tachyons. Recently observed neutrino oscillations confirmed a small squared mass difference between at least two species of neutrinos. This does not yet settle the sign of m^2 for any species. Direct measurements of m^2 have experimental errors still compatible with m^2=0. For data see http://cupp.oulu.fi/neutrino/ The initial interest in tachyons stopped around 1980, when it was clear that the QFT of tachyons would be very different from standard QFT, and that experiment didn't demand their existence. The publications of the particle data group, which contain the biannually revised consensus of the particle physics community, do not even include the search for tachyons in their reviews of hypothetical particles: http://pdg.lbl.gov/2004/reviews/contents_sports.html#hyppartetc In fact, the theory of symmetry breaking demands that tachyons do _not_ exist: When a relativistic field theory is deformed in a way that the square of the mass (pole of the S-matrix) of some physical particle would cross zero, the old physical vacuum becomes unstable and induces a phase transition to a new physical vacuum in which all particles have real nonnegative mass. This would happen already at tiny negative m^2, and is believed to be the cause of inflation in the early universe. (Of course, the exact mechanism is not known since it would require a nonperturbative definition of QFT. But classical and semiclassical computations strongly suggest the correctness of this picture.) Expanding a theory (such as the standard model) around an unstable state (e.g., the Higgs with a local maximum at vanishing vacuum expectation) formally produces a bare tachyon. This does not contradict the above assertion, but only indicates the instability of the bare vacuum. Asymptotic power series expansions around maxima (especially those with tiny or vanishing convergence radius) make meaningless assertions about the behavior of a function near one of its minima. Since physical particles arise from field excitations near the global minimum of the effective energy, perturbations around the maximum are unphysical. An expansion around an unstable state gives no significant information, unless one has a system that actually _is_ close such an unstable state (as perhaps the very early universe). But in that case there are no relevant excitations (tachyons), since the whole process of motion (inflation) towards a more stable state proceeds so rapidly that excitations do not form and everything can be analyzed semiclassically. The physical Higgs field is far away from the unstable maximum, and its particle excitations have a positive real mass, hence are not tachyons. Below are some references about tachyons. the more important papers are marked by an asterisk. * G. Feinberg, Possibility of Faster-Than-Light Particles, Phys. Rev. 159, 1089 (1967). J. Dhar and E. C. G. Sudarshan, Quantum Field Theory of Interacting Tachyons, Phys. Rev. 174, 1808-1815 (1968) M. Glueck, Note on Causal Tachyon Fields, Phys. Rev. 183, 1514 (1969). D. G. Boulware, Unitarity and Interacting Tachyons, Phys. Rev. D 1, 2426 (1970). * B. Schroer, Quantization of m^2<0 Field Equations, Phys. Rev. D 3, 1764 (1971). G. Feinberg Lorentz invariance of tachyon theories Phys. Rev. D 17, 1651 (1978) C. Schwartz Some improvements in the theory of faster-than-light particles Phys. Rev. D 25, 356 (1982) SM. B. Davis, M. N. Kreisler, and T. Alvaeger Search for Faster-Than-Light Particles Phys. Rev. 183, 1132 (1969) * L. W. Jones A review of quark search experiments Rev. Mod. Phys. 49, 717 (1977) [Section IIIG reviews the vain search for tachyons.] The Wikipedia entry for tachyons, http://en.wikipedia.org/wiki/Tachyon gives some more explanations. http://www.weeklyscientist.com/ws/articles/tachyons.htm although mainly speculating about connections between tachyons and inflation, has some links with further useful information. ----------------------------- S4j. Do free particles exist? ----------------------------- Free particles are a convenient mathematical abstraction. In Nature, there are - strictly speaking - no free particles, only interacting ones. This holds both for photons and for other more tangible particles like electrons. However, in sufficiently localized (and nearly empty) regions of space, particles can be approximately free. Again, this holds for both photons and other particles. It is very convenient to approximate such states by free states. For example, this allows to explain much of quantum mechanics in terms of particle scattering. The S-matrix interpretation depends crucially on the fact that the ingoing and outgoing asymptotic states of photons, electrons, quarks, etc. are free. Thus, in this sense, free photons exist just as much (or just as little) as free electrons. ------------------------------------ S5a. QM pictures and representations ------------------------------------ QM exists in different pictures, of which the Schroedinger picture, the Heisenberg picture, the interaction picture, and Feynman's path integral representation are frequently invoked. There is also the algebraic approach using unitary representations of canonical commutation rules (CCR). The Schroedinger picture, the Heisenberg picture, and the interaction pictures are equivalent because there are unitary transformations between them. They all provide different representations of the same canonical commutation rules i[p_j,q_k]= hbar delta_jk between components p_j of momentum p and q_k of position q. The Stone-von Neumann theorem guarantees that the canonical commutation relations (or their unitary version, the Weyl relations) have a unique unitary representation apart from unitary transformations, and hence suffice to specify the QM of finitely many degrees of freedom uniquely, no matter which picture is used. The Stone-von Neumann theorem fails for systems of infinitely many degrees of freedom (see the FAQ entry on 'Inequivalent representations of CCR/CAR'), which in a sense 'causes' the difficulties in quantum field theory. Nevertheless, QFT still has a Schroedinger picture and a Heisenberg picture, and these are still equivalent: The Heisenberg picture can be immediately constructed from the Wightman fields. Then the canonical procedure - fixing the Heisenberg operators at time t=0 and instead defining dynamical states psi(t) := exp(-itH)psi - produces the Schroedinger picture from it. The Feynman path integral is related to the other pictures via the Feynman-Kac formula, which makes the often only formally stated equivalence precise, after analytically continuing the time to purely imaginary times. The Osterwalder-Schrader theory [see, e.g., math-ph/0001010 or the book by Glimm and Jaffe] shows how to go back in case of relativistic quantum field theory. The Feynman path integral only gives time-ordered expectation values; this suffices to compute S-matrix elements, but is inadequate for dynamical investigations needed for nonequilibrium quantum mechanics. The latter can be treated with the so-called closed time path (CPT) integral within the Schwinger-Keldysh formalism. ------------------------------------------------ S5b. Inequivalent representations of the CCR/CAR ------------------------------------------------ Ordinary quantum mechanics of N particles can be written in terms of creation and annihilation operators for the 3N modes of an associated reference harmonic oscillator. The field case, on the other hand, is characterized by the fact that there are infinitely many modes. If the creation and annihilation operators are those in the action or Hamiltonian defining the QFT, the different modes are traditionally referred to as 'bare particles', though this is not recommended for reasons discussed elsewhere in this FAQ. If the creation and annihilation operators are properly renormalized so that they create and annihilate physical particles from the physical vacuum, the modes are referred to as 'dressed particles'; only these have physical relevance. A state in which k modes are excited is called a k-particle state. In many states of interest, however, (the most prominent ones being the coherent states) infinitely many modes are excited (although the notion of infinitely particles is strained in this case). Thus one needs to cater in the formalism for states with arbitrarily many or even infinitely many modes. This has subtle consequences, which account for the big difference between quantum field theory and ordinary quantum mechanics. The canonical commutation rules (CCR) for creation and annihilation operators in field theory take in the simplest case (countably many modes, corresponding to fields confined to a bounded region) the form [a(k),a^*(l)] = delta_kl, k,l=0,1,2,... (1) The Stone-von Neumann theorem, which guarantees that the canonical commutation relations of quantum mechanics (or their unitary version, the Weyl relations) have a unique unitary representation apart from unitary transformations, fails for systems of infinitely many degrees of freedom. The reason for this is that the natural representation space for creation and annihilation operators is the vector space consisting of all formal linear combinations sum psi(n1,n2,n3,...) |n1,n2,n3,...> with _arbitrary_ complex coefficients psi(n1,n2,n3,...), on which a(k) and a^*(l) act as a(k)|n1,....,n_k,...> = sqrt(n_k)|n1,....,n_k - 1,...>, a*(l)|n1,....,n_l,...> = sqrt(1+n_l)|n1,....,1+n_l,...>. This vector space V has no natural Hilbert space structure. To provide a definite inner product, one must select a suitable subspace where this inner product can be defined. This allows many choices; the choice usually discussed in QFT treatises is Fock space, where only basis vectors |n1,....,n_k,0,0,...> with finitely many particles are allowed, and these basis vectors are declared orthonormal. As a result, Fock space contains only the linear combinations sum psi(n1,n2,n3,...,n_k) |n1,n2,n3,...,n_k> where k is variable and sum |psi(n1,n2,n3,...,n_k)|^2 is finite. Unfortunately, if this choice is made for the representation of the bare creation and annihilation operators, it excludes the states relevant for the physical, interacting situation. This is the essential message of Haag's no interaction theorem. Indeed, the physical states lie in a different, inequivalent unitary representation, characterized by a different subspace of V. This subspace is generated by applying to the physical (= renormalized) vacuum state the dressed (= renormalized) creation operators an arbitrary number of times, then taking all finite linear combinations, and finally taking the closure with respect to the innner product in which all a^*(n_1)...a^*(n_k)|vac> are orthonormal. In general, this Hilbert space has only the null vector (_not_ the vacuum) in common with the Fock space, even for the simplest (i.e.,quadratic) Hamiltonians and actions. This case is well understood, giving rise to the theory of quasiparticles and in particular of superconductivity. For example (counting modes by signed nonzero integers for simplicity - they become momenta in the infinite volume limit), if the bare a(k) and b(k) satisfy CCR then do the dressed annihilation operators alp(k) = A(k) a(k) - B(-k) b*(-k), bet(k) = A(k) b(k) - B(-k) a*(-k), and their formal adjoints alp^*(k) = A(-k) a^*(k) - B(k) b(-k), bet^*(k) = A(-k) b^*(k) - B(k) a(-k), provided that A(k), B(k) are real numbers satisfying A(k)^2 - B(k)^2 = 1, or, equivalently, that A(k) = cosh(theta(k)), B = sinh(theta(k)). If there were only finitely many modes, we could define in Fock space the unitary operator G = exp [- sum_k theta(k) (a(k)b(-k) - b*(-k)a*(k))], and verify that alp(k) = G a(k) G^{-1}, bet(k) = G b(k) G^{-1}, showing that we get an equivalent representation of the CCR. We could deduce that |vac> := G|>, where |> is the bare vacuum, is the dressed vacuum on which alp and bet act naturally. The dressed states were simply be the images of the bare states under the Bogoliubov operator G. Unfortunately, if there are infinitely many modes, G can no longer be consistently defined as an operator in Fock space, and the infinite-dimensional version of this scenario breaks down. Ignoring this, one would find all sorts of infinities. Mathematically, however, one simply changed the unitary representation - G does not exist although the dressed representation exists. Physicists say that the above computations hold 'formally', and mean (if a mathematician tries to give it a precise meaning) that it holds in finite mode approximations but does not survive the limit although they usually formulate it in the meaningless, limit form. The canonical anticommutation rules (CAR) also have the form (1), except that the commutator is replaced by an anticommutator. All statements above are valid with appropriate modifications; the most important one being that occupation numbers are now restricted to 0 and 1, and the definition of a^*(l) has 1-n_l in place of 1+n_l. For more details see the book H. Umezawa, H. Matsumoto, and M. Tachiki, Thermo Field Dynamics and Condensed States, North Holland 1982. -------------------------------------------- S5c. Why does QFT look so different from QM? -------------------------------------------- This is only because of technical reasons and the power of tradition. In ordinary quantum mechanics, pure states are described by wave functions (more precisely by rays) in a Hilbert space, there is a Hamiltonian H and an associated Schroedinger equations i hbar psidot = H psi, the time evolution is described by a unitary operator, the bound states are normalized eigenstates of the Hamiltonian, etc. This is also done in traditional quantum field theory, though it is not directly apparent. But one can see it when studying constructive field theory. It gives everything in case of 2D quantum fields. There is a well-defined Hilbert space, a well-defined Hamiltonian constructed without any use of perturbation theory, a well-defined unitary dynamics, well-defined bound states that are eigenstates of the Hamiltonian, and everything is invariant under the 2D Poincare group ISO(1,1). See the book J. Glimm and A Jaffe, Quantum Physics: A Functional Integral Point of View, Springer, Berlin 1987. The only thing wanting is an explicit formula for H in the traditional nonrelativistic form H=H_0+V. Instead, H is constructed in a more abstract way, as analytic continuation of an operator in Euclidean field theory. That the 4D case is more difficult has to do with obstacles in getting tight enough bounds for the analytic estimates needed. These are mathematical difficulties, but not inconsistencies - no one proved that there are contradictions, and the practice of QFT suggests that there are indeed none (at least for asymptotically free theories). On the perturbative level, there is no difficulty at all - see, e.g. the book M Salmhofer, Renormalization: An Introduction, Texts and Monographs in Physics, Springer, Berlin 1999. which constructs the Euclidean theory for Phi^4 theory in 4 dimensions perturbatively, i.e., in the formal power series topology, with full mathematical rigor. If this construction would work nonperturbatively (i.e., give functions instead of formal power series), analytic continuation using Osterwalder-Schrader theory would do the rest. The latter is described, e.g., in Chapter 6 of the above book by Glimm and Jaffe. -------------------------------------------- S5d. Why is QFT based on a classical action? -------------------------------------------- The path integral approach to QFT begins with classical fields that are varied to produce quantum amplitudes as a 'sum over all possible paths'. But, with exception of the elctromagnetic field, the classical fields one meets there are not fields occurring in classical physics. Nevertheless they are rightfully labelled 'classical'. Classical physics is the physics of processes slowly varying in space and time; of course, elementary particles do not belong there. But classical mechanics can also be considered as an abstract mathematical framework for dynamics in a general phase space (described by a Poisson manifold), which has much wider applicability. The classical fields that figure in the path integral belong in this sense to classical mechanics. In QFT, one needs a classical action to be able to implement unitarity of the S-matrix and the cluster decomposition. The first is essential for a correct probabilistic interpretation of QFT, since it amounts to preservation of probability, and the second is necessary to account for the fact that all our experiments are done locally, and what is far away does not contribute significantly except through effectively classical far fields. (What happens with the stars should be irrelevant to experiments on the earth, except for the experiments of astronomers. This is the basis of all physics.) In terms of microphysics, cluster decomposition means that one cannot scatter particles (clusters of elementary particles) at very distant particles (clusters). The arguments why this requires a classical action expressed in terms of creation and annihilation operators are explained in detail in Weinberg's quantum field theory book, Volume I, Chapters 3-7. We need cluster decomposition because it is observed. We need local fields and microcausality, mainly because it implies (modulo fine print involving contact terms) at least perturbatively cluster decomposition, and there is no other known way in QFT to ensure the latter. But there are covariant N-particle models with cluster decomposition, discussed, e.g., in B.D. Keister and W.N. Polyzou, Relativistic Hamiltonian Dynamics in Nuclear and Particle Physics, in: Advances in Nuclear Physics, Volume 20, (J. W. Negele and E.W. Vogt, eds.) Plenum Press 1991. www.physics.uiowa.edu/~wpolyzou/papers/rev.pdf (The constructions are quite messy; they have, however, the advantage that they do not need renormalization, and are useful phenomenological models.) The lack of references to cluster decomposition in standard textbooks of QFT is explained by the fact that local QFT automatically satisfies cluster decomposition. Most people start by taking QFT as starting point, without asking why. Weinberg's treatise is about the only book that asks this question and answers it in some depth. But when you look at the literature on phenomenological covariant multiparticle models, cluster decomposition plays an essential role in that it is the main hurdle to overcome to get realistic models for systems made of more than two unconfined particles. For details see the survey by Keister and Polyzou mentioned above, and the references there. Cluster decomposition for field theory is also discussed from a rigorous point of view in the book by Glimm and Jaffe, where connections are made to multiparticle scattering. Indeed, books on (nonrelativistic) scattering theory are the ones where the cluster decomposition is discussed in detail, since it is needed to describe the result of the most general multiparticle scattering experiments, and an understanding of it is essential for proving the asymptotic completeness of scattering states. Nonrelativistic theory also shows that the 'correct' cluster decomposition is always one for bound states, as can be seen from a more detailed nonrelativistic analysis. (This is not apparent from Weinberg's argument, since perturbation theory breaks down in the presence of bound states. This explains why QCD has no cluster decomposition for isolated quarks.) Unfortunately, most physicists tend to work in isolated fragments of the whole edifice of physics, thus losing connections that may be important to understanding. Cluster decomposition would perhaps be more prominent in QFT if it were easier to calculate properties of bound states and their scattering or breaking up, since that is where one can see the principle at work. But such calculations are presently out of reach without severe approximations. -------------------------------------------------------- S5e. Why does the action only contain first derivatives? -------------------------------------------------------- On the classical level, higher derivatives cause no formal problems, one can form the variational equations as always. There might be problems with causality (= symmetric hyperbolicity), however. These problems become worse (and apparently untractable) in the quantum case. In a k derivative theory with k>1, one can always introduce new fields for the k-1 first derivatives, and add terms to the action that give as variation their defining equations. Thus one can reduce any theory to an equivalent one with only first derivatives in the action. The problems appear when trying to go from the Lagrangian picture to the Hamiltonian - then one gets similar difficulties as for gauge theories. ------------------------- S5f. Why normal ordering? ------------------------- Field theory often deals with polynomial expressions in annihilation operators a(p) and their adjoint creation operators a^*(p). While a(p) is a linear operator on a dense subspace H of the corresponding Fock space, its adjoint isn't. But both are densely defined sesquilinear forms on Fock space. A sesquilinear form is a linear mapping f from a space H (the domain; a dense subspace of the Hilbert space, in the present case of Fock space) to its dual space H^* (which properly contains H), while an operator maps H into H. Thus the latter can be iterated while the former usually cannot. is always defined when phi,psi in H (since f|psi> is in H^*, the inner product is defined). Thus Hermitian sesquilinear forms are satisfying candidates for 'observables'. However, matrix elements of products fg make sense only for operators f,g, since fg|psi> is not defined if g|psi> is outside H. In particular, a(p)a(p)^* is a meaningless construct, while :a(p)a(p)^*: = +-a^(p)*a(p) makes sense as a Hermitian sesquilinear forms. But f(p)=a^(p)*a(p) is no longer an operator in any sense (though good 1-particle operators can be made by integration with suitable test functions). That's why f(p)f(q) is meaningless while the permuted form :f(p)f(q): = +-a^*(p)a^*(q)a(q)a(p) (+ for Bosons, - for Fermions) is well defined (again as sesquilinear form only). More generally, any product O of creation and annihilation operators which has all its creation terms to the left of all its annihilation terms (these are called normally ordered products) defines a sesquilinear form. The reason is that such an O can be written as O=A^*B where A and B are products of annihilation operators only, hence = can be interpreted as the inner product of the two vectors A|phi> and B|psi> obtained from phi and psi by applying annihilation operators only, which produces vectors in H for which the inner product is always defined. Normal ordering just permutes arbitrary products to put them into the normally ordered and hence well-defined form (and adds a minus sign if an odd number of transpositions of Fermion operators is needed to order the product). This is extended by linearity to polynomials and infinite series in power products. Note that normal ordering is defined for formal expressions (i.e. strings of letters), not for operators or forms; only _after_ nornal ordering an expression O one gets a sesquilinear form :O:. In Fock spaces over finite-dimensional Hilbert spaces, the situation is different; there a(p) and a^*(p) are indeed operators on Fock space (and the index p ranges over finitely many items only). Thus all products make sense, and the normally ordered version of a product differs from the original product by terms involving fewer operators. Normal ordering is usually motivated by starting with a finite-dimensional discretization where integrals become finite sums; then one can do all the formal manipulations rigorously. Upon passing to the continuum limit, most expressions become infinite and hence meaningless, but the normally ordered expressions happen to have a well-definedlimit and hence are meaningful. So these are the relevant 'operators' or rather sesquilinear forms. Presenting things as above avoids any infinities. --------------------------------------------------- S5g. Why locality and causal commutation relations? --------------------------------------------------- In measurement terms, locality is the idea that a measurement here and a simultaneous measurement there can be performed independently, and in particular don't limit each other in precision. This is encoded in the requirement that 'local' quantities described by fields Phi_a(\x,t) here (at \x) and fields Phi_b(\y,t) there (at \y) commute if the positions \x and \y are distinct. The covariant form of this locality requirement is that, with x=(ct,\x) and the +--- norm defined by x^2=x_0^2-\x^2, [Phi_a(x),Phi_b(y)]=0 if (x-y)^2<0 (*) Indeed, if x_0=y_0=ct then (x-y)^2=(x_0-y_0)^2-(\x-\y)^2=-(\x-\y)^2<0, so this commutation relation holds at equal time. But then Lorentz covariance implies that it must hold whenever (x-y)^2<0, since any pair (x,y) with (x-y)^2<0 can be transformed into an equal time pair. Thus locality is a property of distinguished fields satisfying (*), called local fields. This property is completely independent of states, since it is understood that the property holds independent of the coincidental properties of the state. Quantum field theory is physics in the Heisenberg picture, with states fixed once and for all, and all spacetime dependence in the fields. The universe is in a definite though largely unknown state, and apart from the Lagrangian of the standard model plus gravitation, all the history, present and future of the universe is encoded in this universal state. Lacking knowledge of this state, physicists are usually contend with describing tiny portions of this state, namely the restriction of the state to a subalgebra of accessible quantities within the lab (or at least close to the solar system). Since there are many such subsystems of interest, and all these are in different states even if described by the same algebra (more precisely by isomorphic ones), all generic properties of physical systems must be independent of the states. ------------------------------------------------ S5h. Creation operators and rigged Hilbert space ------------------------------------------------ Physicists regard Fock space as the Hilbert space containing the basis states |x_1:N> = |x_1,...,x_N> and their linear combinations. However, there is no Hilbert space containing these states. The state |x_1:N> = |x_1,...,x_N> is not in the Hilbert Fock space, for the same reason for which |x> is not in the 1-particle Hilbert space. It is only a distribution. The Hilbert Fock space is made instead of all wave functions psi = sum_N integral dx_1:n psi_N(x_1:N) |x_1,...,x_N> with finite = sum_N |psi_N|^2/N! Physicists also define annihilation operators a(x) and their adjoints, creation operators a^*(x). However, these are not operators, but operator-valued distributions. For example, a^*(x) maps the vacuum state |vac> (with psi_0=1, other psi_N=0) into a^*(x)|vac> = |x>, which is not in the Hilbert Fock space. More generally, for every nonzero Hilbert Fock space vector psi, the vector psi' = a^*(x) psi lies outside the Hilbert Fock space state. Thus the domain of a^*(x) is just {0}. However, the states |x_1:N> = |x_1,...,x_N> lie in the top layer H^* of the right Gelfand triple = rigged Hilbert space. This is the name for a triple H in Hbar in H^* of vector spaces, where Hbar is a Hilbert space, H a dense 'nuclear' subspace (containing very smooth states with very good behavior at infintity) and H^* its dual space (containing among others very singular states and states with very poor behavior at infintity). Observables (in the weak sense) are bilinear forms, or, which is the same, linear mappings from H to H^*. The adjoint of such a linear mapping is again an observable in the weak sense. Annihilation operators a(x) (and their adjoints a^*(x)) are observables in this weak sense, although they are not Hermitian (and a fortiori not self-adjoint). Most physicists take it lightly since the times of Dirac. They don't bother about self-adjointness or any other functional analytic concept, unless ignoring it brings them into trouble. Almost everything they do in the nonrelativistic regime can be made rigorous in the rigged Hilbert space, so they fare right even when they imagine wrongly that they work in a Hilbert space. Thus they get away with their bad practices. What they call 'Hilbert space' _is_ in fact always a rigged Hilbert space; although most of them just don't know and don't care. -------------------------- S5i. Why Feynman diagrams? -------------------------- Feynman diagrams resemble processes with particles moving in space and time, and are often figurately treated as such. But in fact they do _not_ describe such processes, but certain multiple integrals. (To emphasize this, the particles involved in Feynman diagrams are called 'virtual particles'. (Still, many people think mistakenly that virtual particles are somehow also real. See the entries about virtual particles elsewhere in this FAQ.) Although it is nowhere said explicitly, Feynman diagrams are just a mnemonic for nicely picturing the composition of higher order tensors. Create for each tensor of a theory a different vertex type, draw a vertex of this type for each occurence of this tensor in a product expression in Einstein summation convention, and draw a line between two such vertices whenever they share an index to be summed over. The form of the lines defines the value of the coefficient function in such a product, and the sum over Feynman diagrams simply means that one considers a linear combination of these products, integrated over the arguments. Thus this defines a generic representation of an expansion of a function of the tensors of the theory. Tuus Feynman diagrams can be used whenever one expands a function of one or more tensors into a linear combination of products of components of these tensors. Indeed, for this reason, they are also used in classical statistical mechanics and in the analysis of stochastic differential equations by functional integration techniques. --------------------------------------------------------- S6a. Nonperturbative computations in quantum field theory --------------------------------------------------------- There is well-defined theory for computing contributions to the S-matrix in quantum electrodynamics (and other renormalizable field theories) by perturbation theory. There is also much more which uses handwaving arguments and appeals to analogy to compute approximations to nonperturbative effects. Examples are: - relating the Coulomb interaction and corrections to scattering amplitudes and then using the nonrelativistic Schroedinger equation, - computing Lamb shift contributions (now usually done in what is called the NRQED expansion), - Bethe-Salpeter and Schwinger-Dyson equations obtained by resumming infinitely many diagrams. The use of 'nonperturbative' and 'expansion' together sounds paradoxical, but is common terminology in QFT. The term 'perturbative' refers to results obtained directly from renormalized Feynman graph evaluations. From such calculations, one can obtain certain information (tree level interactions, form factors, self energies) that can be used together with standard QM techniques to study nonperturbative effects - generally assuming without clear demonstrations that this transition to quantum mechanics is allowed. Of course, although usually called 'nonperturbative', these techniques also use approximations and expansions. The most conspicous high accuracy applications (e.g. the Lamb shift) are highly nonperturbative. But on a rigorous level, so far only the perturbative results (coefficients of the expansion in coupling constants) have any validity. Although the perturbation series in QED are believed to be asymptotic only, one can get highly accurate approximations for quantities like the Lamb shift. However, the Lamb shift is a nonperturbative effect of QED. One uses an expansion in the fine structure constant, in the ratio electron mass/proton mass, and in 1/c (well, different methods differ somewhat). Starting e.g., with Phys. Rev. Lett. 91, 113005 (2003) one should be able to track the literature. Perturbative results are also often improved by partial summation of infinite classes of related diagrams. This is a standard approach to go some way towards a nonperturbative description. Of course, the series diverges (in case of a bound state it _must_ diverge, already in the simplest, nonrelativistic examples!), but the summation is done on a formal level (as everything in QFT) and only the result reinterpreted in a numerical way. In this way one can get in the ladder approximation Schroedinger's equation, and in other approximations Bethe-Salpeter equations, etc.. See Volume 1 of Weinberg's quantum field theory book. --------------------------------------------------- S6b. The formal functional integral approach to QFT --------------------------------------------------- On a purely formal level (i.e., with power series in place of actual numbers), 4D QFT is very alive and useful. It is now almost always based upon functional integrals. The path integral is discussed e.g., in Weinberg I, Chapter 9, or Peskin/Schroeder, also Chapter 9. As one can see there, the path integral formalism involves no operators at all, only classical (commuting or anticommuting) fields. The quantities obtained in the expansion of the path integral in powers of hbar are time-ordered vacuum expectation values. Since the original ordering in a time-ordered vacuum expectation value is immaterial (apart from a sign for fermions), the same must be the case for the path integral itself, which explains why the fields in the path integral are classical (i.e., commute or anticommute at all arguments). The main strength of the path integral approach is precisely that it avoids quantum operators and replaces all operator arguments by averages over classical paths. (The main weakness is that this averaging process is logically ill-defined. There exists no prescription how the limit in the ``definition'' of the path integral is to be taken to yield (in theory - independent of the difficulty of computing them) numbers that have the properties commonly ascribed to the path integral.) The older canonical quantization approach was fraught with difficulties because of inconsistencies in the operator approach. For example, the canonical commutation rules (CCR) are valid only in the free case, and no one knows how they should be in the interacting case - though one knows that (anti)commutators must still vanish at spacelike related arguments. Moreover, the renormalization program plays havoc with operators. Unfortunately, this means that dynamical isssues and bound states questions, which are comparatively easy to handle in an operator framework, become almost intractable in the path integral approach. However, as Weinberg stresses in his QFT book, an understanding of the relation between path integral and canonical quantization is essential to get the properties of the latter correct in cases like the nonlinear sigma model. --------------------------------------------------------------- S6c. Functional integrals, Wightman functions, and rigorous QFT --------------------------------------------------------------- QFT assumes the existence of interacting (operator distribution valued) fields Phi(x) with certain properties, which imply the existence of distributions W(x_1,...,x_n)=<0|Phi(x_1)...Phi(x_n)|0>. But the right hand side makes no rigorous sense in traditional QFT as found in most text books, except for free fields. Axiomatic QFT therefore tries to construct the W's - called the Wightman functions - directly such that they have the properties needed to get an S-matrix (Haag-Ruelle theory), whose perturbative expansion can be compared with the nonrigorous mainstream computations. This can be done successfully for many 2D theories and for some 3D theories, but not, so far, in the physically relevant case of 4D. To construct something means to prove its existence as a mathematically well-defined object. Usually this is done by giving a construction as a sort of limit, and proving that the limit is well-defined. (This is different from solving a theory, which means computing numerical properties, often approximately, occasionally - for simple problems - in closed analytic form.) To compare it to something simpler: In mathematics one constructs the Riemann integral of a continuous function over a finite interval by some kind of limit, and later the solution of an initial value problem ordinary differential equations by using this and a fixed point theorem. This shows that each (nice enough) initial value problem is uniquely solvable. But it tells very little of its properties, and in practice no one uses this construction to calculate anything. But it is important as a mathematical tool since it shows that calculus is logically consistent. Such a logical consistence proof of any 4D interacting QFT is presently still missing. Since logical consistency of a theory is important, the first person who finds such a proof will become famous - it means inventing new conceptual tools that can handle this currently intractable problem. Wightman functions are the moments of a linear functional on some algebra generated by field operators, and just as linear functionals on ordinary function spaces are treated in terms of Lebesgue integration theory (and its generalization), so Wightman linear functionals are naturally treated by functional integration. The 'only' problem is that the latter behaves much more poorly from a rigorous point of view than ordinary integration. Wightman functions are the moments of a positive state < . > on noncommutative polynomials in the quantum field Phi, while time-ordered correlation functions are the moments of a complex measure < . > on commutative polynomials in the classical field Phi. In both cases, we have a linear functional, and the linearity gives rise to an interpretation in terms of a functional integral. The exponential kernel in Feynman's path integral formula for the time-ordered correlation functions comes from the analogy between (analytically continued) QFT and statistical mechanics, and the Wightman functions can also be described in a similar analogy, though noncommutativity complicates matters. The main formal reason for this is that a Wick theorem holds both in the commutative and the noncommutative case. For rigorous quantum field theory one essentially avoids the path integral, because it is difficult to give it a rigorous meaning when the action is not quadratic. Instead, one only keeps the notion that an integral is a linear functional, and constructs rigorously useful linear functionals on the relevant algebras of functions or operators. In particular, one can define Gaussian functionals (e.g., using the Wick theorem as a definition, or via coherent states); these correspond exactly to path integrals with a quadratic action. If one looks at a Gaussian functional as a functional on the algebra of fields appearing in the action (without derivatives of fields), one gets - after time-ordering the fields - the traditional path integral view and the time-ordered correlation functions. If one looks at it as a functional on the bigger algebra of fields and their derivatives, one gets - after rewriting the fields in terms of creation and annihilation operators - the canonical quantum field theory view with Wightman functions. The algebra is generated by the operators a(f) and a^*(f), where f has compact support, but normally ordered expressions of the form S = integral dx : L(Phi(x), Nabla Phi(x)) : make sense weakly (i.e., as quadratic forms). The art and difficulty is to find well-defined functionals that formally match the properties of the functionals 'defined' loosely in terms of path integrals. This requires a lot of functional analysis, and has been successfully done only in dimensions d<4. For an overview, see: A.S. Wightman, Hilbert's sixth problem: Mathematical treatment of the axioms of physics, in: Mathematical Developments Arising From Hilbert Problems, edited by F. Browder, (American Mathematical Society, Providence, R.I.) 1976, pp.147-240. --------------------------------------------------------- S6d. Is there a rigorous interacting QFT in 4 dimensions? --------------------------------------------------------- The Wightman axioms and the Osterwalder-Schrader axioms [see, e.g., math-ph/0001010 or the book by Glimm and Jaffe] are currently the basis on which rigorous quantum field theory (at least for massive particles) is discussed. In spite of many attempts (and though numerous uncontrolled approximations are routinely computed), no one has so far succeeded in rigorously constructing a single QFT in 4D which has nontrivial scattering. Not even QED is a mathematical object, although it is the theory that was able to reproduce experiments (anomalous magnetic moment of the electron; see the entry ''Is QED consistent"" in this FAQ) with an accuracy of 1 in 10^12. But till today no one knows how to formulate the theory in such a way that the relevant objects whose approximations are calculated and compared with experiment are logically well-defined. See, e.g., the S.P.R. threads http://groups.google.com/groups?q=Unsolved+problems+in+QED http://groups.google.com/groups?q=What+is+well-defined+in+QED This probably explains the high prize tag of 1.000.000 US dollars, promised for a solution to one of the Clay millenium problems, that asks to find a valid construction for d=4 quantum Yang-Mills theories that is strong enough to prove correlation inequalities corresponding to the existence of a mass gap. The problem is to explain rigorously why the mass spectrum for compact Yang Mills QFT begins at a positive mass, while the classical version has a continuous spectrum beginning at 0. The mass gap is a property of the theory, not of a wave function. Intuitively, it means that, in the rest frame of the total system, the ground state (=vacuum) is an isolated eigenstate of the Hamiltonian H, i.e., that the spectrum of H is a subset of {0} union [E_1,inf]. The largest E_1 with this property defines the mass gap m_1=E_1/c^2. This would make proper sense for a nonrelativistic theory. For a relativistic theory one has to read between the lines and interpret everything in terms of suitable analogies, for lack of a consistent mathematical theory. The millenium problem essentially asks for a rigorous mathematical setting in which the above can be made precise and proved. The real problem is the rigorous construction of a Hilbert space with a unitary representation of the Poincare group, such that a perturbation argument recovers the traditional renormalized order by order approximation of quantum field theory. The state of the art at the time the problem was crowned by a prize is given in www.claymath.org/Millennium_Prize_Problems/Yang-Mills_Theory/_objects/Official_Problem_Description.pdf and the references quoted there. See also http://www.claymath.org/millennium/Yang-Mills_Theory/ym2.pdf I don't think significant progress has been published since then. (The paper hep-th/0511173 which claims to have solved the problem only consists of a bunch of heuristic arguments. That the author calls it a proof doesn't turn it into a mathematical proof.) Yang-Mills theories are (perhaps erroneously) believed to be the simplest (hopefully) tractable case, being asymptotically complete while not having the extra difficulties associated with matter fields. (There are only gluons, no quarks or leptons.) Of course, one would like to show rigorously that QED is consistent. But QED has certain problems (the Landau pole, see below) that are absent in so-called asymptotically free theories, of which Yang-Mills is the simplest. Note that rigorous interacting relativistic theories in 2D and 3D exist; see, e.g., J. Glimm and A Jaffe, Quantum Physics: A Functional Integral Point of View, Springer, Berlin 1987. This book is quite difficult on first reading. Volume 3 of Thirring's Course in Mathematical Physics (which only deals with nonrelativistic QM but in a reasonably rigorous way) might be a good preparation to the functional analysis needed. A more leisurely introduction of the physical side of the matter is in Elcio Abdalla, M. Christina Abdalla, Klaus D. Rothe Non-Perturbative Methods in 2 Dimensional Quantum Field Theory World Scientific, 1991, revised 2nd. ed. 2001. http://www.wspc.com/books/physics/4678.html The book is about rigorous results, with a focus on solvable models. Note that 'solvable' means in this context 'being able to find a closed analytic expression for all S-matrix elements'. These solvable models are to QFT what the hydrogen atom is to quantum mechanics. The helium atom is no longer 'solvable' in the present sense, though of course very accurate approximate calculations are possible. Unfortunately, solvable models appear to be restricted to 2 dimensions. The deeper reason for the observation that dimension d=2 is special seems to be that in 2D the line cone is just a pair of lines. Thus space and time look completely alike, and by a change of variables 2g. (light front quantization), one can disentangle things nicely and find a good Hamiltonian description. This is no longer the case in higher dimensions. (But 4D light front quantization, using a tangent plane to the light cone, is well alive as an approximate technique, e.g., to get numerical results from QCD.) Thus, while 2D solvable models pave the way to get some rigorous understanding of the concepts, they are no substitute for the functional analytic techniques needed to handle the non-solvable models such as Phi^4 theory. ------------------------------ S6e. Constructive field theory ------------------------------ Rigorously defined Lorentz-covariant quantum field theories are known to exist in 2 and 3 dimensions; the standard reference (for d=2) is the book by J. Glimm and A. Jaffe, Quantum physics. A functional integral point of view New York, 1981 A recent review of the achievements of constructive quantum field theory in dimensions < 4 is V. Rivasseau Constructive Field Theory and Applications: Perspectives and Open Problems, J. Math. Phys. 41 (2000), 3764-3775. http://lanl.arxiv.org/pdf/math-ph/0006017 The case d=4 is a famous unsolved problem; the special case of 4D quantum Yang-Mills gauge theory with a compact simple, nonabelian gauge group is one of the Clay Millenium problems with a 1 million Dollar prize attached to its solution. Let me explain some aspects of the construction given in Glimm and Jaffe. First one needs to understand that the construction breaks the Lorentz symmetry. This is (although they don't draw this connection) because in irreducible Poincare representations, one can construct only three commuting coordinates, and their construction is observer-dependent, i..e, dependent on singling out a preferred time. Of course, the final theory is again Lorentz invariant. To motivate construction, one therefore needs to choose a time coordinate, then one makes analytical continuation to Euclidean time (i.e. it in place of t), and shows that one gets an SO(4) symmetric field theory in place of the Lorentz symmetry. The advantage gained is that the functional calculus over a space with definite metric is well-defined mathematically (via a limit approach through lattices, or via Wiener measures) - this is just classical stochastic calculus. Conversely, and this is the constructive part, given an SO(4) symmetric field theory, one can choose a direction as Euclidean time and obtain (via a fairly simple construction detailed in Chapter 7) within that theory a well-defined Hamiltonian on a suitably constructed Hilbert space of 3-dimensional fields. This Hamiltonian defines a time evolution as in ordinary quantum mechanics. The nontrivial part (which is the Osterwalder-Schrader reconstruction theorem stated in Chapter 7 but proved much later in the book - the forward references in Glimm and Jaffe are, unfortunately, quite confusing) is to show that the resulting theory is Lorentz invariant. Thus the construction reduces to constructing the Euclidean field theory. This is done via a Lattice regularization; indeed, all lattice field theory and computation is based on the Euclidean formulation rather than the Minkowski formulation. In 2D and 3D, the existing analytic error estimation techniques are sufficient to prove the existence of the limit with suitably renormalized operators. In 4D, there are additional technical problems that have not been overcome so far. But neither has it been proved that any of the 4D field theories cannot exist. There are some informal arguments suggesting this or that, but none of them is conclusive in the sense of having paved the way towards a construction or a no-go theorem. -------------------------------------------- S6f. The classical limit in relativistic QFT -------------------------------------------- The classical limit of a quantum field theory is the theory defined by taking the Lagrangian occuring in the functional formalism and making the corresponding action stationary. Note that a functional integral is an integral in which all fields have classical meaning. The quantum interpretation comes from taking the functional integral as a generating functional for S-matrix elements, while the classical interpretation comes from taking a saddle point approximation. Since the k loop contributions scale with hbar^k, they disappear in the classical limit hbar to 0, so only the tree diagrams are left in the expansion, which correspond to the saddle point approximation in the functional integral. This needs a slight qualification for Fermions, e.g., electrons. A fermion field Psi(x) itself, being an anticommuting field, has no direct classical meaning, but has the numerical advantage that it is a field in 3 instead of 6 variables. Products of two Psi terms commute with each other, hence have a direct classical interpretation. Indeed, classically there is an electron density field W(x,p) given by the Wigner transform of Psi(x)Psi(y)^*, where Psi(x) is the classical Grassmann field occuring in the Lagrangian, satisfying a Dirac equation with an electromagnetic interaction added. This field W(x,p) is measurable and plays a role in semiconductor modeling. (In the definition of the Wigner transform, a second hbar appears, a remnant of second quantization. If one moves this to zero, too, the description in terms of Psi is no longer possible, and one gets instead a Vlasov equation for W.) Thus the classical limit of the standard model is a mathematically well-defined theory, while the quantum version is only perturbatively defined, which means, it is mathematically undefined - even for QED. Nevertheless, the renormalization prescription make at least the coefficients of the asymptotoc series in hbar well-defined, which is what particle physicists use to extract approximate physical information. In this relaxed sense, the quantum standard model is also well-defined. ----------------------------------- S6g. What are interpolating fields? ----------------------------------- Traditional QFT has rules for computing reasonable approximations to the S-matrix of a field theory. The S-matrix describes the behavior of a state of the system under a transition from time t=-inf to time t=+inf. But in a complete dynamical theory, one would like to be able know what happens in-between at finite times. In nonrelativistic QM, this information is given by the Schroedinger equation. In QFT it is given by the interpolating field - called interpolation since it interpolates between the infinite limiting times. More precisely, the dynamical information about the interpolating field is represented mathematically in the Wightman functions, which give the (renormalized) vacuum expectations of field products at arbitrary combination of space-time points. Unfortunately, no one knows how to compute the latter in relativistic $D quantum field theories. However, Wightman functions have been constructed rigorously in lower dimension (more precisely in certain superrenormalizable theories in 2 and 3 dimensions). ----------------------------------------------------------------------- S6h. Hilbert space and Hamiltonian in relativistic quantum field theory ----------------------------------------------------------------------- Most of current quantum field theory (i.e., everything with exception of 2D and 3D constructive field theory - which doesn't even cover QED) does not have a well-defined Hilbert space at all, in which a time operator would be defined. Well-defined are only the asymptotic Hilbert spaces of in and out states for scattering experiments. These are Fock spaces of free particles, and hence defined on a mass shell. There is a basic result called Haag's theorem which states that these asymptotic Fock spaces cannot carry a nontrivial local dynamics, as would be required for a field theory. The full dynamics can be defined only indirectly, via CTP (closed time path) integration, and subject to all interpretation problems of the renormalization procedures. Constructing for a relativistic field theory a physical Hamiltonian which is bounded below is really difficult, and has been achieved only in less than 4D theories. The construction is usually based on a preferred time coordinate which is needed in all cases I am familiar with; - in the Foldy-Wouthuysen transformation (for the Dirac equation, where p_0 also fails to have the right properties), - in the Newton-Wigner construction (for single particles in an arbitrary massive irreducible representation of the Poincare group) and - in the Osterwalder-Schrader reconstruction theorem (for Lorentz-invariant field theories from Euclidean field theories). While the Hilbert space and the Hamiltonian depend on the choice of the time coordinate, the physics is independent of it since all these Hilbert spaces are isomorphic via isomorphisms that maps the Hamiltonians into each other. --------------------------------------- S6i. 2-dimensional quantum field theory --------------------------------------- Much of the state of the art in 2-dimensional relativistic quantum field theories is covered in two books, Elcio Abdalla, M. Christina Abdalla, Klaus D. Rothe Non-Perturbative Methods in 2 Dimensional Quantum Field Theory World Scientific, 1991, revised 2nd. ed. 2001. and J. Glimm and A Jaffe, Quantum Physics: A Functional Integral Point of View, Springer, Berlin 1987. The first book treats exactly solvable theories, the second book treats general polynomial interactions. The methods are completely different in the two cases, and the two books are essentially disjoint. Unfortunately, both books are somewhat difficult to read. Abdallah et al. treat those (very special) 2-dimensional quantum field theories having closed analytic expression for all S-matrix elements'. These solvable models are to 2-dimensional quantum field theory what the hydrogen atom is to quantum mechanics. It gives lots of details about many solvable models, but I found it too specialized to give me a feeling of general 2-dimensional quantum field theory. Glimm and Jaffe assume a lot of measure theory and functional analysis. This is summarized in Appendix A of their Part I, but working first through Volume 3 of Thirring's Course in Mathematical Physics (which only deals with nonrelativistic QM but in a reasonably rigorous way) would be a good preparation for tackling Gliimm and Jaffe. They construct - rigorously - for 2-dimensional relativistic Lagrangian scalar field theories with polynomial interaction a Hilbert space, a well-defined Hamiltonian, a well-defined unitary dynamics, with well-defined bound states that are eigenstates of the Hamiltonian, and everything is invariant under the 2D Poincare group ISO(1,1). Chapter 3 defines a rigorous version of the path integral for ordinary quantum mechanics, or rather for the Euclidean version of it, with the i in the Schroedinger equation dropped. This amounts to analytic continuation to imaginary time, where everything is easy and respectable. In place of a hyperbolic differential equation one gets a parabolic one (the heat equation), which makes things tractable since the heat kernel is positive and hence the measures needed to make the path integral rigorous are positive Wiener measures, with a good rigorous theory. Quantum field theory starts in Chapter 6. It is presented in a Euclidean and a Minkowski version, the former being an analytic continuation of the latter. Both versions are defined axiomatically, by the Osterwalder-Schrader axioms and the Wightman axioms, respectively. Again, the Euclidean version is the tractable one, in which one can generalize the path integral and perform the estimates needed for proving the existence of all the tools. The Osterwalder-Schrader theory then guarantees that, given the satisfaction of the Euclidean axioms, analytic continuation to the Minkowski case is indeed possible. This is outlined in Section 6.1; the remainder of the chapter discusses the (easy) special case of free fields. Chapters 7-12 and 19 then define the machinery needed to show how to satisfy the axioms in the case of 2-dimensional relativistic Lagrangian scalar field theories with polynomial interaction. Chapter 7 discusses the Gaussian measures that define the Euclidean path integral of free fields, Chapter 8 presents a rigorous theory of perturbation theory for Euclidean path integrals, and the remaining chapters mentioned provide the estimates needed to make sure that everything works. -------------------------- S7a. What is the mass gap? -------------------------- In a relativistic theory, whenever there is a state with definite 4-momentum p, there is also one with definite momentum p' = Lambda p obtained by applying a Lorentz transform Lambda. The orbit of 4-momenta obtained in this way forms a hyperboloid in the future cone (because of causality), characterized by a mass m=>0. p^2=m^2, p_0>0. This includes as a limiting case massless states with m=0, where the orbit consists of the future light cone with 0 excluded. Therefore the possible values of p are characterized by the possible values of m, which defines the mass spectrum of the theory. The mass spectrum is the relativistic analogue of the energy spectrum of the Hamiltonian in a nonrelativistic theory, shifted such that the ground state has E=0. The only state with zero momentum is the ground state, usually called the vacuum. If the values of p^2 for the realizable nonzero p is bounded below by a positive number, the theory is said to have a mass gap. The largest value of m>0 for which m^2 is such a lower bound defines the precise value of the mass gap. Usually there is a state for which p^2=m^2; this is then interpreted as the state of a single 'dressed' particle. In general, the mass spectrum consist of a discrete and a continuous part. The discrete part of the spectrum corresponds to bound states, the continuous part to scattering states. The continuous spectrum starts when there is the possiblity of scattering. which means that the energy is large enough that two asymptotically independent systems can exist. Given a state of mass m, one expects to have states with two almost independent systems of mass m and an arbitrary relative momentum, giving a continuous spectrum of scattering states with all possible squared momenta exceeding (2m)^2, as a simple calculation reveals: If p is the sum of two timelike vectors p1,p2 of mass m then p^2 = (sqrt(\p1^2+m^2)sqrt(\p2^2+m^2))^2 - (\p1+\p2)^2 = 2m^2 + 2 sqrt((\p1^2+m^2)(\p2^2+m^2)) -2\p1 dot \p2 By making \p2=-\p1 one gets arbitrarily large values of p^2, hence part of the continuous spectrum. The minimum of p^2 must occur by Cauchy/Schwarz for \p2=\p1, and is then (2m)^2, independent of the spatial momentum. Thus the continuous spectrum extends from mass 2m to infinity, where m is the mass gap. There may be bound states with mass m_b<2m, forming the discrete spectrum. These are not scattering states, hence not obtained by simply adding momenta. For bound states of k particles with masses m_1,...,m_k, one needs to subtract from (m_1+...+m_k)c^2 the binding energy of the bound particles. There might be bound states with mass m_b>2m embedded in the continuous spectrum, but these are possible only if there are selection rules that forbid the decay into particles with smaller mass. In particular, the state of minimal mass m, if it exists, is always a bound state (including the case of a single particle). If there is no mass gap, one expects massless dressed particles to be present. This corresponds to the limiting case m --> 0 of the above discussion. ------------------------------------------------------- S7b. Why can a bound state of massless quarks be heavy? ------------------------------------------------------- A system has a well-defined mass if it is in an eigenstate of p^2, where p is the total momentum operator (whatever this is; relativistically, bound states are very poorly understood). So to understand, view it from a nonrelativistic perspective. Because of E=mc^2, the mass shows up as energy, i.e., as eigenstate of the Hamiltonian. Now a bound state at rest defines the rest energy, and by giving it uniform motion one can increase the energy by an arbitrary amount of kinetic energy. The rest energy (and hence the rest mass), on the other hand, is determined by the discrete spectrum of the Hamiltonian in reduced coordinates, i.e., with center of mass motion separated out. For forces that decay with distance, a bound state necessarily has a mass that is less than the sum of the masses of the constituents. For particles involving quarks, this does not apply since the strong force increases with distance. Hence the rest mass of a bound state of quarks could be anything. ------------------------------------------------------ S7c. Bound states in relativistic quantum field theory ------------------------------------------------------ Bound states are supposed to be poles of the S-matrix, and Bethe-Salpeter equations for the bound state dynamics can be obtained approximately from resumming infinite families of Feynman diagrams. See Chapter 14 of Weinberg's QFT I. But... Perturbative QED (even in Scharf's rigorous treatment) has nothing at all to say about how to model bound states. Bound states don't exist perturbatively: The poles in the S-matrix can arise only by summing infinitely many Feynman diagrams. (Sum the geometric series 1+x+x^2+... to see how poles arise by summation.) I haven't seen a single rigorous treatment of such an issue in quantum field theory. Weinberg states in his QFT book (Vol. I) repeatedly that bound state problems (and this includes the Lamb shift) are still very poorly understood (though the Lamb shift is one of the most accurately predicted physical quantity). On p.564 he says, 'These problems are those inbolving bound states [...] such problems necessarily involve a breakdown of ordinary perturbation theory. [...] The pole therefore can only arise from a divergence of the sum of all diagrams [...]' On p.560, he writes, 'It must be said that the theory of relativistic effects and radiative corrections in bound states is not yet in an entirely satisfactory shape.' This remark suggests that he seems to think that, in contrast, for scattering problems, the theory is in an entirely satisfactory state, as given in the rest of his book. Thus 'satisfactory' does not mean 'mathematically rigorous', but only 'well understood from a physical, approximate point of view'. There are, of course, methods for approximating bound state problems, based on Bethe-Salpeter equations, Schwinger-Dyson equations, and some other approaches. See, e.g., the review H. Grotch and D.A. Owen, Foundations of Physics 32 (2002), 1419-1457. or hep-ph/0308280. But all of this is done in completely uncontolled approximations, and to get numerically consistent results is currently more an art than a science. This leaves plenty of scope for interesting (but hard) new work on bound states on both the physical and mathematical side. ------------------------- S8a. Why renormalization? ------------------------- Quantum field theory is what particle physicists define it is, and this includes many working interacting QFTs. But it is not a theory in the mathematical sense. This is due to the freedom they take when discussing the renormalization needed to remove formal infinities from their theories. Finite renormalization just refers to the fact that the coefficients in a Hamiltonian are not directly measurable but only computable as function of some key observables. It is simply a consequence of the historical accident that these coefficients were given names (masses, charges) that sound like real properties, while they are in fact indirectly related to them. Thus in solid state physics one gets bare masses of quasiparticles from the coefficients of a Hamiltonian, but they are just parameters and related to the measurable masses by some transformation, which is dubbed the finite renormalization. Infinite renormalization is needed in ordinary QM when the potential gets too singular, for example with delta-function potentials that model contact interactions. Hardly ever discussed in textbooks but important for understanding. See, e.g., hep-th/9710061, or Chapter I.3 in R. Jackiw, Diverse topics in theoretical and mathematical physics, World Scientific, Singapore 1995. A paper by Dimock (Comm. Math. Phys. 57 (1977), 51-66) shows rigorously that, at least in 2 dimensions, delta-function potentials define the correct nonrelativistic limit of local scalar field theories. In mathematical terms, infinite renormalization means that the interaction is a limit of regularized interactions related to fixed measurable quantities by finite transformations which, however, diverge when the regularization is removed. The limiting interaction remains, however, well-defined as a densely defined operator in Hilbert space. For exactly the same reason it is needed in relativistic QFT, since local fields imply singular interactions. But in 4 dimensions, the limiting process is not well understood mathematically. In 1+1 dimensions, everything is well-defined mathematically in terms of rigorous renormalization theory, for arbitrary polynomial interactions. (See the book by Glimm and Jaffe). The 1+2-dimensional case is significantly more difficult and needs a restriction on the polynomial degree. There is a nontrivial renormalization theory for Phi^4 theory, which is mathematically well-understood. Only the 1+3 dimensional case is at present completely open. What is loosely called 'infinite' in traditional discussions of renormalization means, strictly speaking, only that the limit where a cutoff goes to infinity does not exist. At any finite value of the cutoff, both the Hamiltonian and the counterterms are finite. If it were not so, one couldn't do renormalization and get something finite. The problem solved by Tomonaga, Schwinger and Feynman, for which they got the Nobel prize, was that they discovered how to produce a well-defined limiting theory for cutoff to infinity which allows to extract finite values for quantities that can be compared with experiment. All renormalization until today follows the same pattern. One does certain formal computations at finite cutoff and at some point where it no longer harms moves the cutoff to infinity, being left with approximate formulas at some (fixed or variable) loop order which no longer contain a cutoff and have finite values. ----------------------------------------- S8b. Renormalization without infinities I ----------------------------------------- Renormalization in QFT is often associated with the need to handle infinities. This makes everything look as nonsense from a mathematical point of view. But this is just the sloppiness of physicists; it is not difficult to get a satisfying view of renormalization without encountering any weird infinities. The basic principles can be explained without knowing anything about quantum mechanics, since renormalization is a much more general phenomenon associated with idealizations in a theory and the corresponding limits. As such it is also needed in various classical situations (classical point electrons, turbulence, etc.) hep-th/0212049 is a nice paper discussing most of renormalization without ever mentioning fields (which come in quite late). In all cases, we want to describe a situation which is a limit of more complex and often less symmetric situations. This limit is the only problematic thing, and sometimes generates infinities if done in an improper way. Just as when trying to compute s_N = sum_{k=0:N} (-1)^k/(k+1)^s = u_N - v_N by summing the even and odd contributions u_N and v_N separately. The limit N to inf is well-defined for s>0, but can be obtained only for s>1 by going to the limit in u_N and v_N separately. One needs to proceed similar as in techniques to evaluate limits which give naively inf-inf, by using some transformation that cancels the infinities analytically. Example: lim sqrt(n^2+n)-sqrt(n^2+1) = lim ((n^2+n)-(n^2+1))/(sqrt(n^2+n)+sqrt(n^2+1)) = lim (n-1)/(sqrt(n^2+n)+sqrt(n^2+1)) = 1/2. In quantum physics, the data (the Hamiltonian in QM, the action in QFT) depends on some parameter vector v of dimension d, say, without direct physical meaning. For example, v may consist of bare mass, bare charge, and bare coupling constant. Without the renormalization conditions we get a family solution parameterized by v from which we can compute measurable quantities combined into a vector q=q_N(v) of some dimension e>d. where N is the parameter in which we want to take the limit. (N might be an energy cutoff at energies beyond observability, and q the observed particle spectrum.) Anything we can reliably measure must clearly be essentially independent of N, once N is large enough. Therefore the equation q=q_N(v) defines a (generically) d-dimensional manifold in R^e whose limit as a set is also a well-defined d-dimensional manifold. This is the manifold of interest, since picking a particular finite value for N is usually subjective. In a theory with finite renormalization, this limit manifold can still be parameterized by v, since the limit q(v)= lim_{N to inf} q_N(v) (*) exists. Although v is unobservable it can be calculated from the measurements by solving the equation q=q(v) in the least squares sense. Rather than doing that (which would be numerically best in case the measurements are inexact or q(v) is not exactly known) one proceeds in theoretical work as if an s-dimensional vector mu of key physical data and a corresponding subset of d equations were known exactly, and can be solved exactly for v=v(mu). Then one gets a renormalized parameterization q=q_ren(mu), with q_ren(mu)= q(v(mu)), (**) expressing everything in terms of the physical parameters mu. When the limit (*) does not exist, the situation is more complicated. Since there is no limiting q, one has to work at finite N. Proceeding as before, one solves d of the equations in q=q_N(v) for v, getting v=v_N(mu), but since the limit (*) does not exist, there will also be no limit v(mu) = lim_{N to inf} v_N(mu) which would enable the use of (**). Instead, v_N(mu) diverges. Loosely speaking, we get infinite bare masses and bare coupling constants. But this limit will never be used, hence there are no problems. It is just the loose way of speaking that creates the impression of weirdness. The 'infinities' are caused by the nature of the interactions. If they are too singular for a standard treatment then the limits needed for a finite renormalization simple do not exist anymore. But this does not mean that the theory becomes meaningless but only that one has to be careful in performing the limit only where it is allowed to do so. This requires a small change in our procedure. At finite N, we can still define a renormalized parameterization q = q_{N,ren}(mu), with q_{N,ren}(mu)= q_N(v_N(mu)). For a renormalizable theory, the limit q_ren(mu) = lim_{N to inf} q_N,ren}(mu) exists although neither q_N nor v_N converge. Once this limit replaces the naive bare recipe (*)-(**) which is ill-defined, everything behaves properly as it should. The situation may be slightly more complex than indicated above. Instead of working with directly measurable quantities one often works with formally more tractable quantities q that are finitely related to the key measurable quantities mu (such as observed mass spectra). However, their definition depends on an additional scale parameter E that fixes the renormalization conditions. (This parameter should not be mixed up with the cutoff energy, which after renormalization is always infinite!) Thus we actually have q=q_N(v,E), solve some of these equations for v=v_N(mu,E), and get as a result q = q_{N,ren}(mu,E), with q_{N,ren}(mu,E)= q_N(v_N(mu),E), hence q_ren(mu,E) = lim_{N to inf} q_{N,ren}(mu,E). But since the scale E can be chosen arbitrarily, the final renormalized result of physical predictions P(q,E) must be independent of E. Thus, d/dE P(q_ren(mu,E),E) = 0, which is a form of the renormalization group equations. To get a renormalized Hamiltonian, one also needs wave function renormalization, which means using a cutoff-dependent inner product in the space of wave functions (in the functional Schr"odinger picture). The limiting Hamiltonian is perturbatively well-defined in the physical Hilbert space obtained as limit of renormalized Hilbert spaces at finite cutoff, as the cutoff goes to infinity. ------------------------------------------ S8c. Renormalization without infinities II ------------------------------------------ In bare (divergent) QFT, infinities arise because integrals taken over unbounded momenta don't exist; so doing it leads to nonsense. Instead, proper QFT takes regularized integrals, for example by adding an explicit cutoff Lambda. This simply means that everything is calculated with an action that depends on Lambda as an additional parameter. Once this is done, everything is finite, but Lambda-dependent. The only problem with that is that the cutoff destroys Lorentz covariance - apart from that it would be a completely respectable field theory in itself. Now Lorentz invariance is violated only at energies >O(Lambda); hence to have the theory conform to physics that can be checked it suffices to take Lambda large. But for aesthetic reasons or since we believe that symmetries are fundamental, we want to have fully invariant theories. This requires that we let Lambda go to infinity. But in order that the results have a finite limit we must at the same time make the coupling constants g dependent on Lambda. If this is done in a correct way (and the textbooks on QFT teach one or more of the known correct ways under the heading of 'renormalization'), one encounters no infinities at all in the whole process. Thus renormalized quantities are never infinite. The essentials of the renormalization process, namely the need for Lambda-dependent coupling constants for sufficiently singular Hamiltonians, can be understood nonperturbatively on the nonrelativistic level. What happens is that one has a family of Hamiltonians H(Lambda,g) that depend on a scale parameter Lambda and and a coupling constant g (or several). H(Lambda,g) has a good limit H(g) as Lambda to inf, with g fixed, but the corresponding limit of the resolvent G(Lambda,g) does not exist; hence if one tries to do calculations with H(g) directly (the 1930 way of doing things, which was a dead end), one gets infinities all over the place. On the other hand, if one chooses a good parameterization g(Lambda,mu) then, although H(Lambda,g(Lambda,mu)) has no longer a good limit as Lambda to inf, its resolvent G(Lambda,g(Lambda,mu)) has a well-defined limit G(mu). (At least in 1D and 2D field theory, where this can be proved in certain cases. In 3D and 4D, one probably needs also a Lambda-dependent inner product defining the Hilbert space to ensure that one ends up in the right representation, and Lambda-dependent wave functions to ensure that the limiting renormalized wave functions remain bounded in the limiting renormalized inner product.) Since all dynamical information including scattering information is in the resolvent, G(mu) defines a good physical model for a scattering process. In some simple cases, renormalization can be done nonperturbatively. For example, standard perturbation theory for a Hamiltonian p^2/2m +g delta(x) produces infinities. The renormalization of this particular example is treated nonperturbatively in hep-th/9305052. Thus, infinities only appear if one takes the limit in a way it cannot be taken consistently. Of course, the relativistic case is more involved and at present not understood nonperturbatively, but there is no difference in principle. The local interaction of the formal Lorentz invariant action is replaced by a nonlocal interaction depending on the UV cutoff Lambda. Thus one has V(g,Lambda) in place of V(g), where g are the coupling constants (including masses). To do so, one writes the (Euclidean = Wick rotated) field as Phi(x) = integral dp exp(-i p dot x) Phihat(p) and substitutes it into the action. This gives an action in the momentum representation. Then one regularizes the interaction term by throwing away the momenta above some cutoff Lambda. Introducing the cutoff makes the interaction nonlocal, as one can see by going from the momentum representation of the regularized interaction term back to the position representation by substituting Phihat(p) = const * integral dx exp(i p dot x) Phi(x). Instead of the delta functions which would appear without the cutoff there are now explicit nonlocal potential terms. (Note that Coulomb interaction in nonrelativistic QFT is nonlocal. See also H. Ekstein, Phys. Rev. 117, 1590-1595 (1960) for more on nonlocal interactions and relations to the S-matrix.) (But actually one does not need to care about locality or not, since the regularized interaction in the momentum representation is mathematically ok and one can do everything else in this representation.) More precisely, one starts with the smeared Lagrangian interaction defined by the cutoff, uses the representation of the S-matrix as a time-ordered exponential to work out the corresponding Hamiltonian interaction in the interaction picture, and takes this as definition of the regularized dynamics. (Note that Haag's theorem, which asserts that a nontrivial Lorentz-invariant theory satisfying microlocality cannot have an interaction picture, does not apply since the theory with cutoff is neither Lorentz invariant nor microlocal.) From here on, one can do standard perturbation theory without encountering any infinity at all; one gets meaningful formulas throughout the whole renormalization procedure. All contributions to the S-matrix elements of this regularized theory are finite, and give (after analytic continuation to real time) the S-matrix of the regularized interaction. The result is an asymptotic series S(g,Lambda) for the S-matrix of the regularized interaction, with finite, computable coefficients. This S-matrix is unitary and has all properties one would like to have, except that, because of the cutoff, it is only approximately Lorentz invariant. Of course, for a general nonlocal theory in position representation, one gets more complicated Feynman rules than those traditionally written down. In momentum space, the formulas become the standard formulas, but with explicit cutoff included. Thus to do the suggested exercise, one should always work in the momentum representation. To restore Lorentz invariance, one uses a running coupling constant g=g(Lambda,mu) which, for fixed renormalization point mu (a vector of the same dimension of g containing the free constants in the matching of the renormalization conditions), is uniquely determined (for any fixed renormalization scheme) as the solution of a renormalization group equation whose coefficients are also defined as a (presumably even convergent) asymptotic expansion. Having this, one can take the limit S(mu) = lim_{Lambda to inf} S(g(Lambda,mu),Lambda) which is an asymptotic series in hbar with finite, computable coefficients when the theory is renormalizable, and is Lorentz invariant and microlocal. Thus one gets the desired Lorentz invariant, microlocal theory as a perturbatively well-defined limit of perturbatively well-defined but not Lorentz invariant or microlocal theories. At the very end one can pass to the limit, but not earlier. The only infinity encountered is not worse than the infinity encountered in defining Riemann integrals over the real line, where one also gets a finite limit by letting a finite cutoff go to infinity. The real mathematical difficulties in QFT are not in the renormalization procedure but in giving a nonperturbative construction of the S-matrix S(mu). ---------------------------------------- S8d. Renormalization and coarse graining ---------------------------------------- In QFT, there are two different scales, one on the bare level and one on the renormalized level, and the meaning of the renormalization group is slightly different from that in statistical mechanics. On the statistical mechanics level, there is the cutoff beyond which one cannot (or does not want to) observe anything. This effective cutoff is a parameter Lambda in an effective theory defined by coarse graining. The effective theory depends on E: For different values of E you get a _different_ effective theory, though their low energy predictions are essentially the same. This is expressed by the Wilson flow, described by renormalization group equations that relate the parameters g(Lambda,mu) in the different effective theories such that some key low energy observables mu keep the same values. The number of such key observables (i.e, the dimension of mu) equals the number of parameters in the effective theory (i.e, the dimension of g); most other observables are different at different cutoffs (though only slightly if they are observable at low energy), because of the coarse graining done when lowering the cutoff scale Lambda. In QFT, the above is mimicked on the _bare_ level. The cutoff is a large energy Lambda beyond which the bare interaction is modified to be able to get a meaningful limit; this corresponds to coarse-graining. The resulting bare theory with cutoff Lambda is a well-defined effective theory and behaves precisely as described above. To define the renormalized theory, one needs, in addition to the cutoff, renormalization conditions defining the bare parameters in terms of renormalized parameters q. These conditions depend on a renormalization scale E figuring in the equations defining the renormalization conditions. Because of the dimensional nature of momentum, there always has to be such a parameter E, no matter which renormalization procedure is followed. In QFT, one usually refers to a mass scale M, which is the same as E=Mc^2 in units such that c=1. Then M is the constant needed in the renormalization conditions to relate certain computable expressions to the renormalized parameters. This is discussed at length in the QFT book by Peskin and Schroeder, Section 12.2, for a massless Phi^4 theory, and in Section 12.5 for the general case. (For an online source, see, e.g., equations (90-(11) of hep-th/9804079. M is introduced there without comment, the role of M is described later, after (20).) In the following, I continue to use E in place of M. Thus the bare parameters are functions g(Lambda,q,E) of the cutoff Lambda, the renormalized parameters q, and the renormalization scale E. The renormalization group equations in the statistical mechanics sense (the Wilson flow) would describe how g(Lambda,q,E) changes as the cutoff Lambda is altered. However, in QFT, this is of no physical interest. Indeed, Lambda is completely eliminated from considerations: The renormalized theory is obtained at fixed E by letting the cutoff Lambda go to infinity. This has the effect that the bare parameters become meaningless, since the limit lim_{Lambda to inf} g(Lambda,q,E) does not exist. At this stage it becomes obvious that all bare objects are unphysical. Although nonphysical, the renormalization group equations in Lambda are an important tool in the _construction_ of QFTs, where the limit of all correlation functions must be shown to exist in a suitable topology, and the absence of divergences shown. In the weakest topology, based on the ultrametric norm and corresponding to perturbation theory at all orders, this is shown rigorously in a nice book M. Salmhofer, Renormalization: An Introduction, Springer, Berlin 1999. Unfortunately, this topology is too weak to give the existence of the correlation functions as functions; they are only shown to exist as formal power series. All expressions of the theory that survive the limit, in particular all n-point correlation functions, n=1,2,3,..., describe observable physics. They can therefore be expressed as functions of q and E only, whose detailed form comes from the standard theory. However, there is a little twist since the scale E can be chosen arbitrarily, hence cannot be measurable. In terms of a fixed set of physical parameters mu (measurable under well-defined experimental conditions), we can predict mu by some function of q and E, mu=mu(q,E). Solving for q, we can express q in terms of mu and E, q=q_ren(mu,E). But the exact renormalized result of a physical prediction P(q,E) must be completely independent of E, uniquely determined by the physical parameters mu. Thus, d/dE P(q_ren(mu,E),E) = 0, which are the Callan-Symanzik equations, the renormalization group equations of interest in quantum field theory. In contrast to the Wilson flow, however, the sliding scale in the Callan-Symanzik flow is the renormalization scale E and _not_ the cutoff Lambda (which at this stage is already infinite). Moreover, since observable physics is completely independent of the renormalization scale E, the latter has no intuitive 'physical' interpretation. There is no relation between the two flows, except by analogy. The Wilson flow is needed to _get_ the renormalized theory at fixed renormalization conditions, the Callan-Symanzik flow describes what happens when you _change_ these conditions. -------------------------------------------------------- S8e. Renormalization scale and experimental energy scale -------------------------------------------------------- The picture drawn in the preceding is somewhat incomplete with regard to the practice of computing, due to the fact that we cannot compute this renormalized theory at any E, since it is exceedingly complicated. Thus we need to consider approximations. These approximations are no longer independent of E, since the approximation errors depend on it. It turns out that the approximation errors are small only when the energy scale of the experiment for which a prediction is made is close to the renormalization scale E, since (see, e.g., Weinberg's QFT book, Vol. 2, Chapter 18.1) the perturbative expansion contains arbitrary powers of log(E_experiment/E) which therfore must be kept small. Thus one needs to evaluate the theory near the scale of interest. However, perturbation theory is valid only near a fixed point E^* of the renormalization group equations. Therefore, one determines approximate formulas for the quantities q_ren(mu,E) with E close to the appropriate fixed point E^*, and then uses (also approximate) renormalization group equations to transform the result to the scale of interest. Thus there are two different scales involved, the energy scale E_exp where the experiments are done, and the renormalization scale E_ren (previously denoted by E). On the experimental side, coupling constants (such as the charge) are determined with reference to some effective, coarse grained theory (such as the nonrelativistic Schroedinger equation). This effective theory depends on E_exp (for QED, the charge is traditionally defined in the low energy limit E_exp to 0). This effective theory behaves like any other coarse-grained theory, giving rise to running coupling constants such as e=e_exp(E_exp). But these depend on the details of the coarse-graining scheme, and the computed results depend on the coarse-graining, too, and hence on E_exp. The experimental running coupling constants are only loosely related to the running coupling constants such as e=e_ren(E_ren) obtained by the Callan-Symanzik equation (= the renormalization group equation in terms of the renormalization scale E_ren). The latter are, in theory, uniquely defined by the renormalization prescription. There the coupling constants are defined not by an experimental prescription but as parameters in the renormalization prescription. For example, in Phi^4 theory, lambda=lambda(M) is defined by equation (12.30) in Peskin/Schroeder (and E_ren=Mc^2), and the charge e=e(M) in QED by (10.39) [but at spacelike momentum p^2=-M^2 as in Chapter 12]. As discussed, the physical predictions at any energy are completely independent of M if e(M) and the other renormalized parameters slide with M. At least this would be the case in a fully nonperturbative calculation (which we cannot do). However, the few-loop approximations depend heavily on M, and give a reasonable approximation to the exact theory _only_ at energies close to E_ren=Mc^2. Thus the few-loop approximation behaves just like an effective theory, provided we choose E_ren = E_exp (or close). But the analogy is not complete since in a true effective theory we could choose the coarse-graining scale anywhere at or above E_exp, while for good few-loop approximations we need to choose it always close to E_exp. Thus, if one could solve the equations exactly, the dependence on M and the Callan-Symanzik equation would be completely irrelevant, and nothing at all could be extracted from it. But in practice one can work only at few loops, and then different values of M may give vastly different results, and the equation is very useful since it enables one to work with the right M. The renormalization group equations are used to move from an M near the fixed point (where one can do perturbation theory and has reliable few-loop calculations but where the approximation errors = the higher order terms in the perturbation series are huge) to an M near the experimental scale (where the approximation error is small, and the few-loop calculation therefore reasonably accurate). This is often expressed by saying, loosely, that the renormalization group approach partially resum the perturbation series. One gets what is called 'renormalization group improved perturbation theory', which is predictive about a much larger range of coupling constants than simple renormalized perturbation theory (which only works for very weak coupling). ------------------------------- S8f. Dimensional regularization ------------------------------- The neatest way to perform regularization, and the only one which works well in complicated cases such as nonabelian gauge theories is dimensional regularization. Unfortunately, it is presented in most textbooks in a way that looks quite mysterious, involving unphysical fractional dimensions. This is however just sloppiness on the side of physical tradition, and a more rigorous approach removes everything strange. The rules for dimensional regularization are derived in Euclidean space rather than Minkowski space. To get the latter, one needs an additional analytic continuation. For p in Euclidean d-space (d>0 integral), we put p^2=p^Tp. If d is a positive integer and f(p^2) is integrable (i.e. decays fast enough), then standard Lebesgue integration gives the formula integral dp^d/(2 pi)^d f(p^2) = C_d integral_0^inf dr r^{d-1}f(r^2), (1) where C_d is given in terms of the Gamma function as C_d = 2 pi^{d/2}/Gamma(d/2). (2) We observe that the formula (2) makes sense for arbitrary complex d with nonnegative real part, and that therefore for f(s)=r^2j/(r^2+m^2)^n, n>j+d/2, the well-defined right hand side of (1) is an expression I(d,j,n) which depends analytically on d,j,n. In particular, the cases j=0 and j=1 lead to the expressions given in P/S (7.85/86). A similar reasoning produces (7.87) and more complicated rules analogous to those given in P/S on p.807 (where, however, analytic continuation to Minkowski space has already been performed). These rules, together with the Feynman trick stated as (A.39)on p.806 of P/S, can be used to evaluate integrals of arbitrary rational Lorentz-invariant expressions provided that they decay fast enough. Note that the resulting formula integral dp^d/(2 pi)^d f(p^2) = I(d,j,n) (3) is valid only if n>j+d/2, which ensures sufficiently fast decay at infinity to make the Lebesgue integral well-defined and integral d. For other values the above computations are meaningless, and any contradiction derived from it is therefore irrelevant. As irrelevant as the well-known fact that a divergent alternating infinite sum can be given any value whatsoever by formal rearrangements. Remarkably, however, I(d,j,n) (and the analogous formulas on p.807) can be analytically continued to the interesting case d=4-eps. This allows us to _define_ an _extended_ Lebesgue integral for d=4-eps by the formula integral (dp/2 pi)^d p^2j/(p^2+m^2)^n:= I(d,j,n) (4) and similar expressions for arbitrary rational Lorentz-invariant expressions. Moreover, if these expressions happen to have good limits for eps to 0 (which cannot happen for (4) but for suitable linear combinations) they define the value also for d=4. The derivation ensures that it gives the correct results in all cases where the integral makes sense in the traditional (Lebesgue) way. Thus we have defined a consistent extension of the Lebesgue integral of Lorentz-invariant expressions to the singular case. This is similar in spirit to Lebesgue's extension of the Riemann integral to the Lebesgue integral. A good, mathematically rigorous exposition of d-dimensional integration theory for general complex dimension d is given in P. Etingof, Note on dimensional regularization, Ppp. 597-607 in: Pierre Deligne et al., Quantum Fields and Strings, A Course for Mathematicians, Vol. 1, Amer. Math. Soc., Providence, Rhode Island, 1999 See also http://wwwthep.physik.uni-mainz.de/~scheck/Meyer.ps The theory of renormalization now shows that all integrals occuring in the expressions for S-matrix elements in renormalizable theories have a well-defined _extended_ Lebesgue integral for d=4. This is all that is required for consistency. For those who dislike unphysical complex dimensions, the uniqueness of analytic continuation implies that one can get completely equivalent results by keeping the physical dimension d=4. In this case, one must replace the propagator (p^2+m^2)^{-1} by (p^2+m^2)^{-n} with sufficiently large n, and continue the result analytically to the physical value n=1. Then all integrals are (in Euclidean space) ordinary Lebesgue integrals. The formulas used for the extended Lebesgue integral defined as above still apply; however, computations are now slightly more involved. Those who worry about the appropriateness of analytic continuation might wish to consider the functions f, g defined by f(d):=(sqrt(2-d))^{-2}, g(d):=1/(2-d) in the real domain. They are equal for d<2 but f does not make sense for d>=2. Nevertheless, it makes exceedingly much sense to extend the definition of f to arguments d>2 by making f(d):=g(d) a definition. Indeed, g(d) is the unique meromorphic extension of f to arbitrary complex arguments. This uniqueness is in the nature of analytic continuation, and makes the latter an extremely useful device in many applications. It is the reason why we consider such useful equations such as exp(ix)=sin(x)+i*cos(x), which one would have no right to use if one would not silently identify analytic functions defined on part of their domain with the full analytic function on the associated Riemann surface. ----------------------------------------- S8g. Nonrelativistic quantum field theory ----------------------------------------- The right way to understand relativistic QFT is to regard it as a limit of nonlocal nonrelativistic quantum field theory. The latter is much better behaved. Interacting QFT in 3+1 dimensions exists, however, as a rigorous mathematical theory in the nonrelativistic case, since there only finite renormalizations are needed and no infinities occur. In this context, Feynman-Dyson perturbation theory can be given a rigorous meaning. Note that nonrelativistic QFT is nonlocal because of the Coulomb potential interaction. Interacting QFT based on Feynman-Dyson perturbation theory in 3+1 dimensions exists as a rigorous mathematical theory in the relativistic case, as a limit of smeared, nonrelativistic theories. This is done for Phi^4 theory in all details in Salmhofer's book. For technical reasons, one gets the results however only in a very weak topology corresponding to power series in the coupling constant, rather than as true functions of the coupling constants. Thus perturbative relativistic QFT is rigorously established in 4D while nonperturbative relativistic QFT in 4D is still elusive. However, the infinities that plague 4D relativistic QFT are already present in 3D, and there rigorous construction have been given. Exactly the same kind of renormalization tricks are used in 3D. Thus our present lack of understanding cannot be blamed on renormalization, but has to do with the difficulty of getting the hard analytical estimates needed to justify the constructions. ----------------------------------------------------- S8h. Nonrenormalizable theories as effective theories ----------------------------------------------------- The difference between renormalizable and unrenormalizable theories is that the former are specified by a (small) finite number of parameters while the latter are specified by an infinite number of parameters. In a renormalizable quantum field theory, only few counterterms must be added to the action in order to get a consistent finite perturbative expansion at all orders. This means that a few parameters suffice to get a consistent theory which will be correct at the energies of interest (which should be essentially independent of what happens at the inaccessible large energies). In a nonrenormalizable quantum field theory, infinitely many counterterms must be added to the action in order to get a consistent finite perturbative expansion at all orders. This means that with a few parameters one can only get an effective low order theory, which may, however, still be good enough at the energies of interest. But for better approximation, one needs to determine more and more parameters... In both cases, it is possible to extract approximate results from computations, and the parameters can be tuned to fit the experimental results. This gives a consistent procedure for predictions. Indeed, many nonrenormalizable theories are in use as effective field theories. (See hep-ph/0308266 for a recent survey on effective field theories.) People who dislike nonrenormalizable theories do this on the basis of a claim that their predictive value is nil because of the infinitely many constants. But this is as unfounded as saying that thermodynamics is not predictive because it depends on a function (the expression for the free energy, say) that requires an infinite number of degrees of freedom for its complete specification. Clearly, in the latter case, the widespread use of finitely parameterized imperfect free energies does not hamper the usefulness of thermodynamics. The same can be said about nonrenormalizable field theories. It only implies that to extract arbitrarily precise predictions one needs correspondingly much information as input. We know that this is the case already for many simpler phenomena in physics. See also J.Gegelia, G.Japaridze, N.Kiknadze, K.Turashvili "Renormalization" Of Non Renormalizable Theories hep-th/9507067. J Gegelia, G Japaridze Perturbative Approach to Non-renormalizable Theories hep-th/9804189, ------------------------------------- S8i. What about infrared divergences? ------------------------------------- Renormalization theory deals with the regularization of ultraviolet divergences, occuring at very high but unobservable energies. In contrast, infrared divergences arise if there are problems at very low energies. They are not cured by renormalization and need completely different techniques. Theories without massless particles have no infrared problems at all, since at low energies only few particles can coexist. Indeed, the sum of the rest masses of physical particles is bounded by the total energy of the system. In QED one has infrared problems because the photon is massless, so a bound on the sum of the rest masses does not limit the number of possible photons. indeed, a closer calculations shows that there may be an arbitrary number of very low energy ('soft') photons. One can handle the situation in some approximation by giving the photon a tiny mass mu. But this is an _additional_ parameter, quite different from the renormalization scale M. And the renormalized theory at finite mu depends on mu (so that one needs to take in the end the limit mu to 0 to get physically correct results), while it is still independent of M. A better way to handle the infrared divergences is to avoid them completely by using coherent states. These sum the contributions of arbitrarily many soft photons in a coherent way. ----------------------------- S9a. Summing divergent series ----------------------------- There is a second kind of divergences, different from those cured by renormalization. Most perturbation series in QFT are believed to be asymptotic only, hence divergent. Strong arguments (which haven't lost in half a century their persuasive power) supporting the view that one should expect the divergence of the QED (and other relatvistic QFTs) power series for S-matrix elements, for all values of alpha>0 (and independent of energy) are given in F.J. Dyson, Divergence of perturbation theory in quantum electrodynamics, Phys. Rev. 85 (1952), 613--632. The remarkable fact is that QED is very accurate in spite of this. It produces verifiable predictions by restricting attention to the first few terms of a (most probably divergent) asymptotic series, but it has no way to make sense of the whole series. This is what Dirac found deficient in the foundations. An asymptotic series is a series such as f(x) = sum_{k=0:inf} k! x^k with radius of convergence zero. For small enough x, the first few terms give seemingly good approximations, but if one includes - for any fixed nonzero x - enough terms, the series diverges. Thus, as Dirac asserts, one neglects arbitrarily large terms to get the approximations which work so well in QED. There are infinitely many different ways to assign to an asymptotic series a function with this series as Taylor expansion. The problem is to have a way to choose the right one. Borel summation is often taken as default, but seems to be no cure for QFT in view of the so-called renormalon problem. At present, there is no sound mathematical foundation of relativistic quantum field theory. Who finds one will be awarded one of the 1 Million Dollar Clay Millenium prizes... If we have a well-defined Hamiltonian H(g) depending infinitely differentiably on a parameter g, it typically has a well-defined S-matrix S(g), also depending infinitely differentiably on g. Perturbation theory computes a power series expansion S(g)=S_0 + S_1 g + ... which often diverges for all g although each S_k is finite. This happens already for the anharmonic oscillator with H(g)= 1/2 (p^2+q^2) +g q^4. Thus a correct Hamiltonian with a convergent (in the harmonic oscillator case even finite, hence trivially convergent) expansion is quite consistent with a divergent expansion of the S-matrix. However, one can one still extract information by so-called resumming techniques. One can study these things quite well with functions which have known asymptotic expansions (e.g., improper integrals, using Watson's lemma). In many cases (and under well-defined conditions), the resulting infinite series is Borel summable in the following sense: To sum f(x) = sum a_k x^k (1) if it is divergent or very slowly convergent, one can sum instead its Borel transform Bf(z) = sum a_k/k! z^k (2) which obviously converges much faster (if not yet, one could probably repeat the procedure). In many cases, f can be reconstructed from Bf by means of Sf(x) = integral_0^inf dz/x exp(-z/x) Bf(z) = integral_0^inf dt exp(-t) Bf(tx). Sf is called the Borel summation of the asymptotic series (1), and is defined whenever Bf is convergent. If Bf has singularities, the integral over t may have to be done along a contour in the complex plane; see, e.g., physics/0010038. It is easy to show that BSf=Bf and that Sf has the same asymptotic expansion as f. Moreover, the identity Sf(x)=f(x) can be easily verified if (1) has a positive radius of convergence, but also under other natural assumptions (but stronger than simply asserting that (1) is an asymptotic expansion for f). The book J.S. Feldman, T.R. Hurd, L. Rosen and J.D. Wright, QED: A proof of renormalizability, Lecture Notes in Physics 312, Springer, Berlin 1988 claims to prove on p. 112ff that the coefficients in the loop expansion of the QED S-matrix are bounded by const*(N!)^{1/2)/R^n for some R>0, which would imply that it is locally Borel summable. But hep-ph/9701418 seems to make oppsite claims. See also hep-ph/9807443. Of course, since there are many functions with the same asymptotic expansion (e.g., one can add arbitrary multiples of terms like e^{-a/x}, e^{-a/x) log x, etc.), one has to show that the Borel summed Sf actually has the properties that the original f was supposed to have (and from which the asymptotic series was derived). If, in addition, f is uniquely determined by these properties, we know that f=Sf. Unfortunately, a proof for such a statement is missing in QED. In some 2D cases, where nonperturbative QM applies, one can show that the nonperturbative result satisfies the properties needed to show that Borel summation of the perturbative expansion reproduces the nonperturbative result. See also the thread Re: unsolved problems in QED starting with http://www.lns.cornell.edu/spr/2003-03/msg0049669.html With experimental results one just has numbers, and not infinite series, so questions of convergence do not occur. On the other hand, if one knows of an infinite series a finite number of terms only, the result can be, strictly speaking, anything. But usually one applies some extrapolation algorithm (e.g., the epsilon or eta algorithm) to get a meaningful guess for the limit, and estimates the error by doing the same several times, keeping a variable number of terms. The difference between consecutive results can count as a reasonable (though not foolproof) error estimate of these results. Similarly, given a finite number of coefficients of a power series, one can use Pade approximation to find an often excellent approximation of the 'intended' function, although of course, a finite series says, strictly speaking, nothing about the limit of the sequence. But to have reliable bounds one needs to know an exact definition of what one is approximating, and work from there. Such an exact defintion is, at present, missing for quantum electrodynamics. ----------------------- S9b. Is QED consistent? ----------------------- Quantum electrodynamics (QED) gives the most accurate predictions quantum physics currently has to offer. The anomalous magnetic dipole moment matches the experimental data to 12 significant digits: M. Passera, Precise mass-dependent QED contributions to leptonic g-2 at order alpha^2 and alpha^3, Phys. Rev. D 75, 013002 (2007). http://arxiv.org/abs/hep-ph/0606174 B. Odom, D. Hanneke, B. D'Urso, and G. Gabrielse, New Measurement of the Electron Magnetic Moment Using a One-Electron Quantum Cyclotron, Phys. Rev. Lett. 97, 030801 (2006) http://hussle.harvard.edu/~gabrielse/gabrielse/papers/2006/NewElectronMagneticMoment.pdf The Lamb shift, whose prediction made QED and renormalization respectable, is much more difficult to measure with high precision, hence offers no such phenomenal test of accuracy: S.G. Karshenboim, Precision physics of simple atoms: QED tests, nuclear structure and fundamental constants, Phys. Rep. 422 (2005), 1-63 http://arxiv.org/abs/hep-ph/0509010 Nevertheless, many physicists think that QED cannot be a consistent theory. There is a phenomenon called the Landau pole: http://en.wikipedia.org/wiki/Landau_pole It indicates that at extremely large energies (far beyond the range of physical validity of QED, even far beyond the Planck energy) something might go wrong with QED. (QED loses its validity already at energies of about 10^11 eV, where the weak interaction becomes essential. The Planck energy at about 10^28 eV is the limit where some current theories try to make predictions. But the Landau pole, if it exists, has an energy far larger than the latter.) This is probably why Yang-Mills and not quantum electrodynamics was chosen as the model theory for the millenium prize. Since the existence of the Landau pole is confirmed only in low order perturbation theory and in lattice calculations, hep-lat/9801004 and hep-th/9712244 the question whether the alleged landau pole implies limits to the consistency of QED has currently no rigorous mathematical substance. The observations about the Landau pole in perturbation theory can be recast in mathematically rigorous terms using so-called renormalons, obstructions to Borel summability; see V Rivasseau From Peturbative to Constructive Renormalization Princeton 1991 But the resulting analysis is inconclusive as regards the existence of the theory. The quality of the computed approximations to QED are a strong indication that there should be a consistent mathematical foundation (for not too high energies), although it hasn't been found yet. There is no indication at all that at the energies where QED suffices to describe our world (with electrons and nuclei considered elementary particles), it should be inconsistent. To show this rigorously, or to disprove therefore remains another unsolved (and for physics more important) problem. Perturbative QED is only a rudimentary version of the 'real QED'; which can be seen that Scharf's results on the external field case G. Scharf, Finite Quantum Electrodynamics: The Causal Approach, 2nd ed. New York: Springer-Verlag, 1995. are much stronger (he constructs in his book the S-matrix) than those for QED proper (where he only shows the existence of the power series in alpha, but not their convergence). J.S. Feldman, T.R. Hurd, L. Rosen and J.D. Wright, QED: A proof of renormalizability, Lecture Notes in Physics 312, Springer, Berlin 1988 gives a rigorous proof of perturbative existence of QED at all orders. This means that a formal power series for the S-matrix is shown to exist rigorously. This includes renormalization and is sufficient for actual computations since a few terms in the power series give very high accuracy. However, the power series is believed to diverge if enough (i.e., infinitely many) terms are added, and a consistent nonperturbative treatment of full QED is presently missing. The quest for 'existence' of QED is the quest for a framework where the formulas make sense nonperturbatively, and where the power series in alpha is a Taylor expansion of a (presumably nonanalytic) function of alpha that is mathematically well-defined for alpha around 1/137 and not too high energy. This is still open. More precisely: Probably QED (and thus the QED S-matrix exists nonperturbatively as a 2-parameter theory depending on the fine structure constant alpha and the electron mass m_e; these parameters are the zero energy limits of the corresponding renormalized running coupling constants, and is defined for alpha <= 1/137 and input energies <= some number E_limit(alpha,m_e) larger than the physical validity of pure QED. What is needed is a mathematical proof that the QED S-matrix exists for 0 S with B(s,t) = t(As) for s in T^q, t in T_p, and conversely; indeed, since the image As of s under A is in T^p, its image t(As) is a well-defined scalar. Using the B's in place of the A's gives an alternative way of defining tensors, although one less convenient for visualization. Given a basis on T and a dual cobasis on T^*, one can use coordinates. Then physicists write - elements of T as vectors = column vectors with an upper index, - elements of T^* as linear forms = 1-forms = covectors = row vectors with a lower index, - elements of T^q as multivectors with q upper indices, - elements of T_p as multicovectors with p lower indices, - elements of T_p as mixed multi/ko/vectors with p lower and q upper indices. (There is also a dual version of this, where vector are considered as rows and covectors as columns. The remainder then changes accordingly.) In particular. (0,0)-tensor = scalar, (1,0)-tensor = vector (vector in T=T1) = column vektor, (0,1)-tensor = covector (vector in the dual space T^*=T_1) = row vector, (1,1)-tensor = matrix (linear mapping from T to T). Clearly, the columns of the matrix A_i^k are column vectors = vectors, the rows are row vectors = covectors, and the indexing is consistent. The requirement that basis and cobasis are dual is equivalent to the statement that for every vector u and covector w (i.e., linear mapping from vectors to scalars), w(u) = w_i u^i; here the Einstein convention is used that formulas involving pairs of equally labelled indices, one of them a lower index and the other an upper index must be interpreted as a sum over these indices. Mathematicians using linear algebra (where no tensors of order >2 appear) write instead all indices as lower indices, no matter whether they belong to row vectors, column vectors, or matrices. They also write all sums explicitly, consider all vectors given by a single letter as column vectors, and write covectors (1-forms) explicitly using the transposition sign (^T, but statisticians often use a prime ' instead, which is also the form used in Matlab). This has many advantages and allows a simple notation which increases understandability of otherwise long formulas. Phys. notation: s = x^k y_k x vector, y covector Math. notation: s = sum_k y_k x_k or simply s=y^Tx. Phys. notation: y_i = A_i^k x_k x,y vectors, A matrix Math. notation: y_i = sum_k A_ik x_k or simply y=Ax. Phys. notation: s = A_i^i A matrix Math. notation: s = sum_i A_ii or simply s = tr A (trace). Phys. notation: y_i = A_i^j B_j^k x_k x,y vectors, A,B matrices, Math. notation: y_i = sum_jk A_ij B_jk x_k or simply y=ABx. Phys. notation: y_i = A_i^j B_j^k C_k^l D_l^m x_k x,y vectors, A,B,C,D matrices Math. notation: y_i = sum_jklm A_ij B_jk C_kl D_lm x_k or simply y=ABCDx. The linear algebra notation is compact and index-free, in spite of the fact that coordinates are being used. For higher order tensors, the advantages of the linear algebra notation are less pronounced since one has to specify which pairs of indices must be contracted. However, often, an index-free notation is still possible: Phys. notation: A_li = R_ijkl b^j c^k Math. notation: A(u,v) = R(v,b,c,u) Phys. notation: A_l^i = R^i_j^k_l b^j c_k Math. notation: A(u,v^T) = R(v^T,b,c^T,u) Phys. notation: A_i^j = R_i^kkj Math. notation: A = tr_23 R, where the subscripts indicate which indices must be contracted. All this is completely independent of any metric. If a metric = nondegenerate symmetric (0,2)-tensor g is given on T, which associates with u,v in T the scalar g(u,v), one can canonically identify vectors and covectors, at the expense of some confusion if one is not careful. This reads in physicists notation as follows: The metric is g_ik=g_ki (expressing the symmetry), and for every vector u^k, the associated covector is u_i = g_ik u^k. Conversely, one can reconstruct the vector from the covector using u^k = g^ik u_i, where g^ik=g^ki is the inverse metric, a symmetric (2,0)-tensor which for consistency must satisfy the equations g_ij g^kj = delta_i^k (*) with the Kronecker delta delta_i^k = 1 if i=k, = 0 otherwise, which is the identity matrix written as a (1,1)-tensor in index notation. Nondegeneracy is precisely the solvability of (*) for the dual metric. Mathematicians find it confusing to label different objects with the same symbol, and prefer to always distinguish between a vector and its canonically associated covector. Given a basis of T and the dual cobasis of T^*, coordinates (row and column vectors) can be used to define the elements of T and T^*; the metric g in T_2 is represented in these coordinates by an invertible symmetric matrix = (1,1)-tensor G such that g(u,v) = u^TGv for u,v in T. The canonical pairing induced by the metric therefore associates with the vector u the covector w^T = u^TG. (**) Conversely, one can reconstruct from the covector w^T the canonically associated vector u = G^{-1}w. The dual metric therefore maps u^T, v^T to u^TG^{-1}v, and is represented by the inverse matrix G^{-1}. The relation between the physicists form and the linear algebra form of writing things can be inferred from (**) - we simply have Phys. notation: g_ik Math. notation: G = (g_ik) Phys. notation: g^ik Math. notation: G^{-1} = (g^ik) Again, the linear algebra notation is compact and index free, in spite of the fact that coordinates are being used. -------------------------------------------------------------- S10b. Is quantum mechanics compatible with general relativity? -------------------------------------------------------------- The difficulty to reconcile quantum mechanics and general relativity counts as one of the big problems of fundamental physics. There appears to be a problem because canonical quantum gravity based on quantizing the Hilbert action is nonrenormalizable. (See the section on 'Renormalization in quantum gravity' in this FAQ about how nevertheless to renormalize a nonrenormalizable field theory.) The difference between renormalizable and unrenormalizable theories is that the former are specified by a (small) finite number of parameters while the latter are specified by an infinite number of parameters. In both cases, it is possible to extract approximate results from computations, and the parameters can be tuned to fit the experimental results. This gives a consistent procedure for predictions. Indeed, many nonrenormalizable theories are in use as effective field theories. (See hep-ph/0308266 for a recent survey on effective field theories.) People who dislike nonrenormalizable theories do this on the basis of a claim that their predictive value is nil because of the infinitely many constants. But this is as unfounded as saying that thermodynamics is not predictive because it depends on a function (the expression for the free energy, say) that requires an infinite number of degrees of freedom for its complete specification. Clearly, in the latter case, the widespread use of finitely parameterized imperfect free energies does not hamper the usefulness of thermodynamics. The same can be said about nonrenormalizable field theories. It only implies that to extract arbitrarily precise predictions one needs correspondingly much information as input. We know that this is the case already for many simpler phenomena in physics. (For indications that canonical quantum gravity is nonperturbatively renormalizable see, e.g., hep-th/0110021, hep-th/0312114, hep-th/0304222.) A different matter is the dream of a fundamental theory without any free parameters, which of course conflicts with a theory in which infinitely parameters are needed for its complete specification. But there is no theorem that says that nature is governed by unique principles. It is quite likely that the designer of the universe had some choices besides the constraints imposed by logical consistency. Thus I think this dream (which also fuels string theory) is misguided, and the correct quantum version of general relativity is standard, nonrenormalizable canonical quantum gravity. This means that, quite likely, general relativity is fully compatible with quantum mechanics. Of course this conflicts with the view of powerful groups within theoretical physics, who maintain that their approach to quantum gravity (either string theory, or loop quantum gravity) is the road to suuccess. But from what I have seen (at a somewhat superficial level of understanding) I trust neither string theory nor loop quantum gravity to be close to the truth. In any case, both are completely separated from experiemental verification. If experiments in the near future can probe some features of quantum gravity, it will be for small quantum systems interacting with external electromagnetic and gravitational fields. See gr-qc/0408010. ---------------------------------------- S10c. Difficulties in quantizing gravity ---------------------------------------- (i) (mathematical) No consistent interaction relativistic quantum field theory is known in 4 dimensions. (ii) (theoretical) The accepted ways to avoid divergences in expressions for scattering amplitudes that work in simpler theories all fail because of the lack of renormalizability. See, e.g., the references in Section 2.2 of http://relativity.livingreviews.org/Articles/lrr-2002-5/ (iii) (theoretical) The theories for which a (perturbatively) finite scattering theory is available have not been related quantitatively to the established theories. A convincing classical limit (to general relativity), nonrelativistic limit (to a multiparticle Schroedinger equation with Newtonian interaction), and low energy limit (at currently accessible energies no new particles apart from the graviton) would be needed. (iv) (conceptual) The three limits pose severe constraints on possible quantum gravity theories, and it requires much imagination to come up with a conceptual basis in which these limit make sense and are tractable. (But see the preceing entry.) (v) (experimental) Quantum effects in gravity are so weak that no experiments sensitive to quantum effects are in reach in the near future, and the data from astromomy that may cast light on quantum gravity are scarce. (Quantum gravity is not demanded by unexplained data but only by the quest for consistency with particle physics.) ---------------------------------------- S10d. Renormalization in quantum gravity ---------------------------------------- Renormalization of QFTs is needed to make the coefficients in the loop expansion (i.e., the expansion in powers of Planck's number hbar) of the S-matrix well-defined. Canonical quantum gravity is the theory obtained by writing down the Einstein-Hilbert action in a (3+1)-dimensional splitting (ADM formalism) and either fixing coordinates and solving the constraints (reduced phase space quantization) or quantizing using Dirac's approach to constrained systems (Dirac quantization). Covariant quantum gravity is the theory obtained as follows: Write down the classical Hilbert action for general relativity, look at the corresponding functional integral defined perturbatively as for QED or QCD, and try to compute S-matrix elements using the usual renormalization prescriptions for the integrals corresponding to the various Feynman diagrams. Quantum field theories are nowadays almost always defined in the covariant way; the covariant approach has the advantage of being manifestly invariant under the full symmetry group. (The canonical approach to scalar QED fails in certain versions to preserve Poincar'e symmetries, due to term ordering problems; see gr-qc/9403065.) On the other hand, the canonical approach is intrinsically nonperturbative, while the covariant approach needs extra tricks (renormalization group enhancements) to get partial nonperturbative results. Covariant quantum gravity only works in the traditional way up to 1 loop (and together with matter not even then); at higher loops (i.e., for corrections of higher order in the Planck constant hbar) one needs more and more counterterms to make the resulting combination of integrals finite. See S. Deser, Infinities in Quantum Gravities, http://arxiv.org/pdf/gr-qc/9911073v1 (and references [2,4] there). This is called 'nonrenormalizability', and is the main blemish of covariant quantum gravity. (For other potential problems, see, e.g., gr-qc/0108040.) Note that quantum gravity, though nonrenormalizable in the established sense, is renormalizable in a weak sense, where infinitely many counterterms are allowed; see J. Gomis and S. Weinberg, Are Nonrenormalizable Gauge Theories Renormalizable? http://arxiv.org/pdf/hep-th/9510087. Most researchers in quantum gravity want a renormalizable theory in the strong sense (so that finitely many counterterms suffice); then covariant quantum gravity is out, and people look for fancy alternatives (loop quantum gravity, superstring theory, etc.). However, these theories have their own difficulties. Some online references are: gr-qc/9803024: Strings, loops and others: a critical survey of the present approaches to quantum gravity gr-qc/9710008: Loop quantum gravity http://relativity.livingreviews.org/Articles/lrr-1998-1/index.html hep-th/9709062: Introduction to superstring theory astro-ph/0304507: Update on string theory hep-th/0311044: The nature and status of string theory physics/0605105: a short review of superstring theories gr-qc/0410049 shows how gravity derives from string theory; a more complete derivation is in section 3.7 of Polchinski's book. Phys. Rev. Lett. 60, 2105-2108 (1988) discusses the lack of Borel summability of the S-matrix expansion for the bosonic string. http://math.ucr.edu/home/baez/week195.html tells about the state in 2003 concerning the claims of (super)string theory to be a renormalizable quantum theory. Only the 2 loop case seems to be settled; see arXiv:hep-th/0501197 and hep-th/0211111 (especially Section 14 of the latter for the unsolved problems at 3 loops and higher). Others treat covariant quantum gravity just as they treat nonrenormalizable effective field theories, and fare well with it. See, for example, C.P. Burgess, Quantum Gravity in Everyday Life: General Relativity as an Effective Field Theory Living Reviews in Relativity 7 (2004), 5 http://www.livingreviews.org/lrr-2004-5 for 1-loop corrections, and Donoghue, J.F., and Torma, T., Power counting of loop diagrams in general relativity, Phys. Rev. D, 54, 4963-4972, http://arxiv.org/abs/hep-th/9602121 for higher-loop behavior. Section 4.1 discussed recent computational studies showing that covariant quantum gravity regarded as an effective field theory predicts quantitative leading quantum corrections to the Schwarzschild, Kerr-Newman, and Reisner-Nordstroem metrics. Only a few new parameters arise at each loop order, in particular only one (the coefficient of curvature^2) at one loop. In particular, at one loop, Newton's constant of gravitation becomes a running coupling constant with G(r) = G - 167/30pi G^2/r^2 + ... in terms of a renormalization length scale r. Here is a quote from Section 4.1: ''Numerically, the quantum corrections are so miniscule as to be unobservable within the solar system for the forseeable future. Clearly the quantum-gravitational correction is numerically extremely small when evaluated for garden-variety gravitational fields in the solar system, and would remain so right down to the event horizon even if the sun were a black hole. At face value it is only for separations comparable to the Planck length that quantum gravity effects become important. To the extent that these estimates carry over to quantum effects right down to the event horizon on curved black hole geometries (more about this below) this makes quantum corrections irrelevant for physics outside of the event horizon, unless the black hole mass is as small as the Planck mass'' ---------------------------------------------- S10e. Hadamard states and their Hilbert spaces ---------------------------------------------- In his book on qunatum field theory in curved spacetime Wald delineates a class of 2-point functions called Hadamard states that have locally the same kind of singular behavior as the flat free 2-point functions. This class of states is also natural from several other points of view, though I cannot give details off-hand since this is slightly outside my field of knowledge. Associated to each Hadamard state is a Gaussian state |0> of the quantum field which is constructed from the 2-point function via Wick's theorem. This state is often called a 'vacuum state', though this is not quite appropriate, unless one allows the vacuum to carry gravitational and electromagnetic fields. A more appropriate name would be a 'coherent state' since it is the generalization of coherent states in the Fock spaces considered in optics. Each Gaussian state produces a Hilbert space of wave functions consisting of linear combinations of the a*_k1 a*_k2 ...|0>, weighted by sufficiently smooth functions of the k's to render their norm finite. All states in this Hilbert space are also physically reasonable, but they do not have the same basic (vacuum-like) status as the Hadamard states since they are no longer Gaussian, and hence are harder to work with. But you can evaluate in such a state by expanding everything in terms of vacuum expectations of expressions in a's and a^*'s and applying Wick's theorem. Their leading singular behavior is probably the same as for the Gaussian state itself, though I haven't tried to check this. ----------------------------------- S10f. Why do gravitons have spin 2? ----------------------------------- The reason is that gravitation is described by a metric (symmetric 2-tensor field) modulo general covariance, which gives locally, in the tangent Minkowski space of any point, a spin 2 representation of the Poincare group. Gravitational waves have to be (classically) long range, which requires (after quantization) massless particles. Thus gravitons (although never observed) should be massless spin 2 particles. ----------------------------------- S10g. What is the tetrad formalism? ----------------------------------- A way of writing general relativity such that it can be applied to a spinor (e.g. electron) field. A tetrad is a set of four linearly independent vector fields e_0, e_1, e_2, e_3. Considering them orthonormal in the sense that g(e_j,e_k)=eta_jk (*) where eta is the Minkowski metric defines the metric g uniquely; conversely, for any metric one can choose (on any chart) such an orthonormal basis. If the manifold is parallelizable then one can choose the ONB even globally. In 4 dimensions, any manifold which allows to define spinors consistently is parallelizable (by a result of Geroch), hence reality is most likely described by such a manifold. Using (*), one can rewrite any formula involving the metric into one involving instead tetrads, and many things simplify - using tetrads is closer to the Cartan formalism of differential geometry than using the metric directly. E.g., sqrt(-det g) = det(e). One has to be slightly careful not to confuse curved and flat indices, but this is learnt very quickly. Then one needs much less index shifting. For gravitation coupled to a (classical) Dirac field, the tetrad formalism is indispensable, since spinors cannot be defined without a flat representation. ---------------------------------- S10h. Energy in general relativity ---------------------------------- Energy is no absolute concept, but depends on the observer (in the nonrelativistic case, by choice of a velocity, in the relativistic case, by choice of time-like unit vector that defines the direction of time and hence the time coordinate). In classical mechanics there is always a (up to rotations) distinguished center of mass frame where the whole system is at rest and the center of mass at zero. The observer is usually (silently) considered to be at rest with respect to that frame; then there is no ambiguity left in the energy. In special relativity things are already more problematic since there is no natural center of mass. But one can fix the time direction by taking it to be that of the total 4-momentum of the whole system. This again fixes a frame, now up to Euclidean motions. On the other hand, this is not what an observer (who has a slightly different eigentime depending on its 4-momentum) sees, and must be corrected accordingly. In general relativity the conserved total 4-momentum is identically zero, so there is no longer a way to fix a time direction. But assuming an asymptotically flat space-time one can take its flat coordinate system (determined up to a Poincare transformation) and use it to chart the localized part, and gets a Minkowski description, to which the preceding applies. In general relativity, the concept of energy depends on the choice of a spacelike hypersurface defining a region of space and a time-like vector field along that hypersurface defining the direction of time: Then the integral of [part of] the (0,0)-component of the energy-momentum tensor over this hypersurface defines the corresponding [part of the] energy in this region. This allows one to talk about the (observer-dependent) energy of a subsystem, or of all matter in the universe, etc. Observer-independent is the energy-momentum tensor density as a whole, but not energy. The weak-field limit defines a preferred coordinate system, thus reducing the arbitrariness to the choice of the time direction, and the nonrelativistic limit fixes this choice to be the direction of the total momentum of the reference object (e.g., the earth or sun or our galaxy). This makes everything completely determined and gives us a good energy for everyday life. Note that using the concept of energy does not require a global conservation law. Even in nonrelativistic classical mechanics, energy is conserved only for isolated systems, while the concept is used very profitably in all sorts of nonisolated settings. It just means that one needs to account in the balance equations for what happens at the boundary, and (if necessary) include friction terms (which describe, so to speak, the boundary to the neglected microscopic degrees of freedom). Thus, to connect general relativity to what most physicists actually study, namely systems localized in a small region of space and time (small may mean, e.g., a laboratory, the earth, the solar system, or our galaxy - within an hour, a year, a few millenia, etc.) one needs to make precise what energy means for such pieces of the whole universe. This requires that the observer specifies the region of space of interest, and the length of time of interest, including the way time is supposed to flow. The observer also has to specify which part of the energy is of interest, i.e., the terms in the energy-momentum tensor that define the system (contrasted to the environment - which make up all the other terms). After all that is done, energy has a well-defined meaning, as given above. On the other hand, the observer-independent notion generalizing energy is the full energy-momentum tensor; its tensor nature reflects the need for observer information to extract from it numerical values, i.e. real numbers that can be compared with experiment. But apart from energy it also contains the observer-independent part of the information about momentum and stress, which themselves are also observer-dependent. ---------------------------------- S10i. What happened to the aether? ---------------------------------- The aether as supporting substance for electromagnetic waves was a standard hypothesis in the 19th century but fell out of favor with the successes of relativity theory. When in vogue, the aether was the substance filling empty space - i.e., the physics of the aether is the physics of empty space. In a way, the classical background field (also termed the 'vacuum', or more neutral a 'coherent state' or - in quantum gravity - a 'Hadamard state') around which the quantum field is expanded into excitation modes (photons, gravitons, etc.) is the modern equivalent of the aether. However nobody uses the term since it it fraught with misleading connotations, and not really needed. In modern language, the aether is called the vacuum, and the properties of the aether are the properties of the vacuum. While the 19th century aether was thought to be at rest, the 20th century aether (= the vacuum in a quantum field theory) is a Poincare invariant state with zero quantum numbers. (In a putative quantum gravity, it would even be a diffemorphism invariant state, should something like that exist. The Unruh effect indicates, however, that there is probably no objective vacuum, since emptiness is observer dependent.) Indeed, Poincare invariance is the modern way of saying 'being at rest' - the momentum of a Poincare invariant state is zero in every frame of reference, and the mass of a Poincare invariant state must also be zero, which implies that the vacuum is empty in terms of mass. (It is however allowed to be filled by a constant nonzero Higgs field, as required in the standard model.) Identifying the aether and the vacuum is consistent with the way Einstein thought about the topic, as the following quotes from Einstein's lecture (in German) at the University of Leyden, 1920, show: ''Da solche Felder auch im Vakuum - d.h. im freien Aether - auftreten, so erscheint auch der Aether als Traeger von elektromagnetischen Feldern.'' ''Man kann hinzufuegen, dass die ganze Aenderung der Aetherauffassung, welche die spezielle Relativitaetstheorie brachte, darin bestand, dass sie dem Aether seine letzte mechanische Qualitaet, naemlich die Unbeweglichkeit, wegnahm.'' ''Man kann die Existenz eines Aethers annehmen; nur muss man darauf verzichten, ihm einen bestimmten Bewegungszustand zuzuschreiben, d.h. man muss ihm durch Abstraktion das letzte mechanische Merkmal nehmen, welches ihm Lorentz noch gelassen hatte.'' ''Der Aether der allgemeinen Relativitaetstheorie ist ein Medium, welches selbst aller mechanischen und kinematischen Eigenschaften bar ist, aber das mechanische (und elektromagnetische) Geschehen mitbestimmt.'' ''Man kann also wohl auch sagen, dass der Aether der allgemeinen Relativitaetstheorie durch Relativierung aus dem Lorentzschen Aether hervorgegangen ist.'' ''... Den Aether leugnen bedeutet letzten Endes annehmen, dass dem leeren Raume keinerlei physikalische Eigenschaften zukommen...'' For the complete speech in German and in English translation, see http://www.alberteinstein.info/db/ViewCpae.do?DocumentID=34003 (the part with the above quotes is not freely available online). Note that the QFT vacuum is considered by many as a very dynamical entity, being able 1. to have excitations, namely single particles and multiparticle states; in particular photons = quantized electromagnetic waves, 2. to exhibit spontaneous symmetry breaking, and 3. to generate random particle-antiparticle pairs. (In some people's imagination, being able to 4. allow whole universes to pop up or disappear!) Thus the modern vacuum looks much more like the 19th century aether (whose excitations were the classical electromagnetic waves) than the classical vacuum to which Einstein was referring. ------------------- S10j. What is time? ------------------- It is commonly asserted that in general relativity there is no absolute simultaneity. On the other hand, it is asserted that we see the Sun as it was 8 minutes ago and the Andromeda nebula as it was 2.5 million years ago. This seems to conflict with each other - apparently we have no diffeomorphism invariant way of assigning a relative time to a distant object. Let us take a closer look at the issues involved. The invariant way of defining present is to say that x and y are present if the two points are in a spacelike relation, and to say y was earlier (or later) than x if y lies in or on the past (or future) light cone. Thus the present is well-defined as the complement of the closed light cone. Now suppose that you look at the sun. If one is really pedantic, one would have to say that you see the sun in your eye, as a 2D object, and not out there in 3D. But we are accustomed to interpret our sensations in 3D and hence put the sun far away but into the here. In general relativity, one goes a step further. One thinks in terms of the 4D spacetime manifold and places the sun there. Calculating the length of the geodesic gives a value of 0, so the sun is not in your present. Consideration of the sign of the time component in an arbitrary proper Lorentz frame, one finds that the sun is in your past, as everything you observe. But the amount of invariant time passed, as measured by the metric, is zero. This looks like a paradox. What happened with the claimed 8 minutes? The answer is that the metric time is not the right way to measure time. It is the only time available in a Poincare-invariant flat universe, or in a diffeomorphism invariant curved universe. An empty universe where only noninteracting observers move has no notion of simultaneity. But a matter-filled, homogeneous and isotropic universe generally has one, defined by the rest frame of the galactic fluid with which general relativity models cosmology. Since the fluid breaks Lorentz symmetry (except in very special cases, which are ruled out by experiment) it creates a preferred foliation of spacetime. This foliation gives a well-defined cosmic time, when scaled to make the expansion of the universe uniform. (Actually there are several natural scalings = monotone transformations of the time parameter; see Section 27.9 in Misner/Thorne/Wheeler, so cosmic time without a reference to the scale used is ambiguous.) This cosmic time figures in all models of cosmology. The values commonly talked about when quoting times for cosmological events, such as the date of the big bang or the time a photon seen now left the Andromeda nebula, refer to this cosmological time. ------------------------------- S10k. Time in quantum mechanics ------------------------------- In the traditional formulation of quantum mechanics, time is not an observable. Nevertheless it can be observed... In the Schroedinger picture, the state is defined at fixed times, which distinguishes the time. In this picture, time measurement is difficult to discuss since the time at which a state is considered is always sharp. In the Heisenberg picture, time is simply a parameter in the observables, and therefore also distinguished, but in a different way. Parameters are in fact just continuous indices and not observables. As 3 is not an observable while p_3 is one, so t is not an observable but H(t) is one. Observables have at _each_ time an expected value; the moment of time (''now'') is not modelled as observable. But what can be modelled is a clock, i.e., a system with an observable which changes with time in a predictable way. If the observable u(t) of a system satisfies ubar(t) := = u_0 + v (t - t_0) (v nonzero) (*) with sufficient accuracy, one has a clock and can find out by means of how much time T = Delta t passed between two observed data sets. This is also the usual way we measure time in classical physics. Of course, to be a meaningful time measurement, T must be large enough compared with the intrinsic uncertainty Sigma_T := |v^{-1}| sigma(u(t)). Here sigma(u(t)) = sqrt(<(u(t)-ubar(t))^2>) is the standard deviation in the properly calibrated (quantum mechanical) state <.>. If (*) has significant errors then Sigma_T is of course correspondingly larger. In relativistic quantum field theory (which in its covariant Version can only be formulated in the Heisenberg picture), the 1-dimensional time t turns into the 4-dimensional space-time position x. Now x is a vector parameter in the observables (fields), and hence is not an observable. Space and time are now on the same level (allowing a covariant point of view), but both as non-observables. The observables are fields; positions and times of particles are modelled by unsharp 1-dimensional world lines characterized by a high density of the expectations of the corresponding fields. (Think of the trace of a particle in a bubble chamber.) For position and time measurement, one now needs a 4-vector field u(x) with = u_0 + V (x - x_0) and a nonsingular 4x4 matrix V, and the intrinsic uncertainty takes the form Sigma_T := sigma(V^{-1}u(x)) with sigma(a(x)) = sqrt(<(a(x)-abar(x))^*(a(x)-abar(x))>), abar(x)=. Conclusion: In nonrelativistic quantum mechanics, time is always measured indirectly via the expectations of distinguished observables of clocks in calibrated quantum mechanical states. In relativistic quantum field theory, the same holds for both position and time. However, this analysis works only when one assigns to single clocks a well-defined state, hence assumes a version of the Copenhagen interpretation. From the point of view of the minimal statistical interpretation, one needs in contrast a whole ensemble of identically prepared clocks to measure time... Note that in relativistic quantum mechanics, a single particle is described (in the absence of an external field) by an irreducible representation of the Poincare group. Here only the components of 4-momentum and the 4-angular momentum are observables. From these, one can reconstruct observer-dependent 3-dimensional (Newton-Wigner) position operators satisfying canonical commutation rules, but not a time operator. -------------------------------------------------- S10l. Diffeomorphism invariant classical mechanics -------------------------------------------------- In mechanics, time is a point in a 1-dimensional manifold, and diffeomorphisms are just smooth reparameterizations of the time. For any Lagrangian of the form L(q,qdot,t) := U(q(t)) qdot(t), where q is an n-dimensional column vector and U an n-dimensionaler row vector, the action S = integral L(q,qdot,t) dt is diffeomorphism invariant. As a consequence, the Noether energy (the formal Hamiltonian constructed in the transition from a Lagrangian to a Hamiltonian formulation) vanishes identically and has no physical content. For one can bring an arbitrary Hamiltonian system xdot=H_p(p,x) , pdot=-H_x(p,x), where H is the physically relevant energy, into the above form by putting q^T = (x^T,p^T,s), U(q) = (p^T,0^T,-H(p,x)). For a careful discussion see Section 4.3 of PJ Olver, Applications of Lie groups to differential equations, Springer, New York 1993. Those who can read German, can find more in the Section on ''Diffeomorphismeninvariante klassische Mechanik'' in my German Theoretische-Physik-FAQ at http://www.mat.univie.ac.at/~neum/physik-faq.txt For diffeomorphism invariant reformulations of arbitrary field theories, see C.G. Torre, Covariant phase space formulation of parameterized field theories, J. Math. Phys. 33 (1992) 3802-3812 hep-th/9204055 ---------------------------- S10m. The concept of ''Now'' ---------------------------- Time is passing - what is ''now'' in our subjective experience changes. But there is no concept of ''now'' in physics. Classical nonrelativistic mechanics does not know the concept of now. One declares some time to be ''now'' - but which time one declares to be ''now'' is completely subjective (i.e., in different situations it will be declared differently). Similarly, one declares some position to be ''here'', but which position you declare to be ''here'' is completely subjective, in the same sense. Classical relativistic mechanics does not know the concept of now, either, but things change a little: Here one declares some event (= spacetime point) to be ''here and now'' - but which event one declares to be ''here and now'' is completely subjective. Nonrelativistic quantum mechanics treats time completely differently from space (time is a parameter, space coordinates are operators), and introduces stochastic elements into the dynamics. but with respect to ''here'' and ''now'', the situation is identical with that in the classical nonrelativistic case. Relativistic quantum mechanics restores the treatment of space and time on equal footing (space annd time coordinates are parameters), and introduces stochastic elements into the dynamics. But with respect to ''here and now'', the situation is identical with that in the classical relativistic case. Once one has chosen ''here'' and ''now'', respectively ''here and now'', it serves as origin of the tangent hyperplane, in which localized, flat physics can be done, reflecting faithfully what happens in a neigborhood of the spacetime point. This is the domain of relativistic quantum field theory. ------------------------------------------------------------ S11a. A concise formulation of the measurement problem of QM ------------------------------------------------------------ Quantum mechanics asserts in the Born rule (also called Lueder's rule) that when a particle prepared in a pure state passes an ideal measuring instrument characterized by a finite family of mutually orthogonal projectors P_k (with P_k = P_k^*, P_k P_l = delta_kl P_l and sum_k P_k = 1), it transforms the pure state psi into the pure state psi_k = P_k psi/p_k with probability p_k= psi^* P_k psi. This is a consistent rule in a purely statistical interpretation in which psi is an objective property of a source (describing the statistical behavior of an ideal - stationary and pure - source of particles) rather than an objective property of each individual particle. The measurement problem arises when (as is commonly informally assumed) the wave function is regarded as an objective property of a particle. Then the stochastic transformation demanded by the Born rule, called the collapse of the wave function, conflicts with the deterministic, unitary dynamics of the wave function demanded by quantum mechanics of the joint system consisting of particle+instrument+environment. The unitary dynamics predicts that the joint system is in a macroscopic superposition, which is not observed. Note that a measurement does not need a conscious observer. A measurement is any permanent record of an event, whether or not anyone has seen it. Thus the terabytes of collision data collected by CERN are measurements, although most of them have never been looked at by anybody. We human beings only look at crude summaries of such high tech data, but the collapse (which gives rise to individual particle tracks) is clearly independent of whether or when we look at them. -------------------------------- S11b. The double slit experiment -------------------------------- The double slit experiment, where a broad beam of particles passes a screen with two slits, is one of the most fundamental quantum experiments. Standard wave function arguments for purely unitary quantum mechanics predict (at best) that the effect of the screen is to turn a particle in a pure state psi into a superposition of at least three terms, one each for being in one of the two beams (for sufficiently wide slits) or spherical waves (if the slits are narrow enough) passing the slit and a third (or more) for the particle being stuck somewhere on the screen. This conclusion is arrived at as a simple consequence of linearity of the Schroedinger equation, together with natural assumptions of what happens for particles prepared in coherent states. But it is generally believed - and assumed in _all_ discussions of interference - that a double slit screen projects a particle with incoming wave function psi with the correct Born probability to a particle in a superposition of the two beams that pass the slits. The challenge is to derive this from a quantum model of the situation, without invoking explicit collapse anywhere in the derivation. Before this cannot be done convincingly, I don't consider the measurement problem solved. For a precise version of a (slightly different) challenge, see http://www.mat.univie.ac.at/~neum/collapse.html ---------------------------------- S11c. The Stern-Gerlach experiment ---------------------------------- Another basic quantum experiment is the Stern-Gerlach experiment. An input beam of silver atoms is passed through an inhomogeneous magnetic field in a fixed direction, which produces a sideways classical force on each silver atom proportional to the atom's magnetic moment. The magnetic field is said to split the input beam into two separate beams corresponding to atoms of spin up and down, respectively, which shows in the experiment as silver spots where the beams hit a screen. If the beam of silver atoms is replaced by a beam of electrons with very low intensity and the screen is replaced by a more sensitive detector, one observes single detection events, each randomly at one of the two spots. Each such event is generally interpreted as a spin measurement (up or down), which makes sense only if the wave function actually collapses to |up> or |down>. (Though this is very questionable since the electron stops existing as an object separable from the screen.) If a blocker is put in the way of one of the beams, the corresponding spot on the screen disappears, but if the blocker is sensitive as well, single observations are found to occur at the blocker as well. According to strictly orthodox but purely unitary quantum mechanics, the situation is the following: If a single particle leaves the magnetic area, it is in an entangled state consisting of a bilocal superposition of wave packets somewhere along the two beams. When it encounters the blocker, this single electron turns into a still bilocal superposition of wave packets: One remains stuck where the blocked beam meets the block and the other continues its motion along the unblocked beam. A little later, this second wave packet meets the screen, and we end up with a still bilocal superposition of wave packets, now both sitting at the end points of the respective beam. Without the blocker, essentially the same happens, except that the electron ends up in a superposition of two spots on the screen. More precisely, what happens is that if one starts with a pure state |x,p> |left>, where |x,p> denotes an approximately coherent state with position x and momentum p, and |left>=1/sqrt(2)(|up>+|down>), one gets approximately a superposition 1/sqrt(2)(|x^+(t),p^+(t)>|up> +|x^-(t),p^-(t)>|down>), where the parameters in the approximately coherent states follow classical paths in phase space determined by approximately classical motion due to the magnetic field, the blocker and the screen - After hitting blocker and screen. respectively, positions are constant and momenta vanish, and the particle is in a superpostion of two spots. All this follows without difficulty from the superposition principle, i.e., from the linearity of the Schroedinger equation. To match observations in an objective interpretation of the wave function, one needs a mechanism for changing the unobserved superposition of spots into the observed definite spot. In an observer-independent interpretation this has to happen in the split moment between the particle feeling the presence of blocker or screen and hitting or passing it. This is the so-called collapse of the wave function. According to the old school (von Neumann, London and Bauer, Wigner), in a purely unitary setting it requires a conscious look at what really happened to change the superposition of spots into a definite spot, which gives quantum mechanics an uncomfortable subjective, human-centered touch. -------------------------------- S11d. The minimal interpretation -------------------------------- The minimal interpretation of quantum mechanics does not model what really happens - it only claims probabilities. When quantum mechanics is applied to small systems, one usually asks only for statistical information. Here a collapse simply means a change of the point of view resulting in taking conditional expectations, and all difficulties disappear. In that case, each particle simply moves in an undeclared and undeclarable fashion along the experimental setting, the classical instruments are always in a definite state, and instead of superpositions one has probabilities of observation of exactly one of the possible results in the superposition. Now all objectivity (sources and preparation, detectors and measurements) is in the classical setting only, which coexists with the somewhat spooky quantum world, connected by quantum statistics. The problem here is how to unify what happens classically and quantum mechanically. This minimal view becomes inconsistent once one wants to consider the classical system as a large quantum system - all objectivity disappears since macroscopic superpositions are possible. (Generally, nonlinear modifications of the Schroedinger dynamics are considered a possible way out, but this introduces other problems.) The main limitation of the minimal interpretation is that it does not apply to systems that are so large that they are unique. Today no one disputes that the sun is governed by quantum mechanics. But one cannot apply statistical reasoning to unique systems, such as the sun as a whole. If quantum mechanics is a universal theory of nature, it should also apply to the sun as a whole. At least we know that it applies to the extent that it governs the energy generating processes in the sun. The actual numerical analysis of models of the sun use just treats the nuclear reactions within a classical reaction-diffusion framework, which (in principle - I don't know whether anyone has actually done it) should be derivable from quantum mechanics using statistical mechanics arguments. A purely statistical interpretation has also a problem with the notion of probability. (See the discussion on probability elsewhere in this FAQ.) Probability (and hence the quantum state that predicts it) is often seen as a subjective view about the experimenter's assumed knowledge, or the knowledge an experimenter could gain when 100% attentive. There is the subjectivist difficulty to determine whose knowledge counts and why unobserved (and hence unknown) classical processes still make a difference; but one could imagine an ideal classical observer of the status of Laplace's demon, for whom these problems would be absent. --------------------------------- S11e. The preferred basis problem --------------------------------- Born's rule, stated in the form that ||^2 is the probability that a system prepared in state psi is, upon measurement, found in state phi, is valid only if a complete set of commuting observables is measured and phi belongs to the preferred basis determined by the experimental setting (i.e., the family of projectors). Given the present state of the universe (which fixes the experimental setting), there is no choice in the preferred basis. Thus, in a mathematical model of quantum mechanics in the large, it has to be deduced from the assumptions about the initial state and the dynamics. The preferred basis is fully determined by Nature, and that's why we can find it out. Given an unknown instrument, one finds out by experimenting with the new piece, letting it interact with systems of known properties, and matching the collected data to trial models until one fits. This is how things are indeed done in practice. The process is called model calibration (or parameter estimation if the model is fixed up to adjustable parameters). At first, one never knows a new instrument precisely, and has to check out its properties. After sufficient experience with enough instruments, one knows reasonably well what to expect of the next, similar one. Then only fine-tuning is needed, which saves time. And this knowledge can be used to create new instruments which are likely to behave a certain way; but one still has to check to which extent they actually do, since no theoretical design is realized exactly in practice. Not even in the classical, macroscopic domain! Nature's choice is systematic, hence after having seen that a number of screens have a preferred position basis, we conclude that this is the case generally. As for a spectrometer, if it is built with a prism to analyze light, it is reduced by theory to the observation of light or current at certain positions of the screen, which is done in the preferred position basis. Something similar can be said about the Stern-Gerlach experiment. So once one knows _some_ of Nature's preferences and the general laws, one can deduce other preferences. The challenge posed in the measurement problem is to deduce from first principles that a screen made of quantum matter, with two slits in it, actually has a preferred position basis and projects the incoming system to the part determined by the slits. ------------------------------------------- S11f. Master equation and pointer variables ------------------------------------------- On an approximate level, the preferred basis problem is approached via quantum master equations. A quantum master equation is a dynamical equation for the density matrix of a dissipative quantum systems, which approximates a quantum system weakly coupled to an environment at time scales long compared to the typical interaction time but short enough to avoid recurrence effects. More precisely, the dynamics is given by a completely positive Markovian semigroup in a representation named after Lindblad, wo discovered its general form. For a classical damped linear system xdot(t)=Ax(t) with a matrix A whose spectrum is in the left complex half plane, the contribution of x in the invariant subspace corresponding to eigenvalues which are not purely imaginary decays to zero, so that at large times t, x(t) essentially approaches the invariant subspace corresponding to purely imaginary eigenvalues. For a quantum master equation, a similar analysis holds and shows that (under suitable conditions) the density matrix at times much larger than the so-called decoherence time approaches a block diagonal form in a suitable basis. Thus it (almost) commutes with a special set of observables, which define the 'pointer variables' of the system. These pointer variables therefore behave essentially classically. If the pointer variables form a complete set of commuting variables, the density matrix approaches a diagonal matrix, and the basis in which this happens is called the 'preferred basis'. For details, see, e.g., cond-mat/0011204 or gr-qc/9406054 ----------------------------------------------------- S11g. Does decoherence solve the measurement problem? ----------------------------------------------------- Many physicist nowadays think that decoherence provides a fully satisfying answer to the measurement problem. But this is an illusion. Decoherence is the (experimentally verified) decay of off-diagonal contributions in a density matrix (written in a preferred basis), when information dissipates into unobservable degrees of freedom in the environment of a system. In particular, decoherence reduces a pure state to a _mixture_ of eigenstates. This is enough to induce classical features in many large quantum systems, characterized by a lack of interference terms. Thus decoherence is very valuable in understanding the classical features of a world that is fundamentally quantum. On the other hand, the 'collapse of the wave function' selects _one_ of the eigenstates as the observed one. This ''problem of definite outcomes'' is part of the measurement problem. It is still a riddle, and not explained by decoherence. See the excellent survey article M. Schlosshauer, Decoherence, the measurement problem, and interpretations of quantum mechanics, Rev. Mod. Phys. 76 (2005), 1267-1305. quant-ph/0312059 The champions of the decoherence approach are (not always but at least sometimes) quite careful to delineate what decoherence can do and what it leaves open. For example, Erich Joos, coauthor of the nice book 'Decoherence and the Appearance of a Classical World in Quantum Theory', http://www.iworld.de/~ej/book.html explicitly states in the last paragraph of p.3 in quant-ph/9908008 that (and why) decoherence does not resolve the measurement problem. If the big crowd has a cruder point of view, it means nothing but lack of familiarity with the details. If the quantum mechanical state is taken only as a description of a large ensemble, as in the Statistical Interpretation (see next question), there is no problem. But the riddle is present if one insists that the quantum mechanical state describes a single quantum system (as seems to be required for today's experiments on single atoms in a ion trap), which makes the collapse a necessity. In spite of all results about decoherence, Wigner's mathematically rigorous analysis of the incompatibility of unrestricted unitarity, the unrestricted superposition principle and collapse, Chapter II.2 in: J.A. Wheeler and W. H. Zurek (eds.), Quantum theory and measurement. Princeton Univ. Press, Princeton 1983, in particular pp. 285-288, is unassailable. In a nutshell, Wigner's argument goes as follows: If a measurement of 'up' turns the complete system (including the measured system, the detector, and the environment) into the state psi_1 = |up> tensor |up-detected> tensor |env_1> and a measurement of 'down' turns it into psi_2 = |down> tensor |down-detected> tensor |env_2> and the projections of these states are stable under repetition of the measurement (but possibly with different |env> parts>) then, by linearity, measuring the state |left> = (|up> + |down>)/sqrt(2) necessarily places the whole system into the superposition (psi_1 + psi_2)/sqrt(2) of such states and _not_ (as would be needed to account for the experimental observations) into a state of the form as psi_1 or psi_2, depending on the result of the measurement. Wigner's reasoning implies that a resolution of the measurement problem requires giving up one of the two holy tenets of traditional quantum mechanics: unrestricted unitarity or the unrestricted superposition principle. Von Neumann and with him most textbook authors opted for giving up unitarity by introducing collapse as a process independent of the Schroedinger equation. This is no longer adequate since we now know that there is no dividing line between classical and quantum, so that a measurement can no longer be idealized in the traditional fashion. But then there is no longer a clear place for when the collapse happens, and more specific solutions are called for. My paper A. Neumaier, Collapse challenge for interpretations of quantum mechanics quant-ph/0505172 (see also http://www.mat.univie.ac.at/~neum/collapse.html) contains a collapse challenge for interpretations of quantum mechanics that brings to a focus the requirements for a good solution of the measurement problem. In my opinion, the collapse is no fundamental principle but the result of _approximating_ the entangled dynamics of a system with its environment by a Markovian dynamics for the system itself, resulting in a dissipative master equation of Lindblad type. The latter have a built in collapse. The validity of the Markov approximation is an _additional_ assumption beyond decoherence, which is responsible for the collapse. Its nature is similar to that of the socalled Stosszahlansatz in the derivation of the Boltzmann equation. Quantum optics and hence all high quality experiments for the foundations of quantum mechanics are unthinkable without the Markov approximation. ------------------------------------------------------------------ S12b. Which textbook of quantum mechanics is best for foundations? ------------------------------------------------------------------ For large ensembles, there seems to be no disagreement about the interpretation. The book A. Peres, Quantum theory - concepts and methods, Kluwer, Dordrecht 1993 is probably the most useful (i.e., both clear and applicable) account of foundational aspects on this level. It is not the easiest book, though, and reading it demands more attention than, say Sakurai's book. The latter is much more readable but has sloppy foundations only; see the discussion in http://groups-beta.google.com/group/sci.physics.research/msg/77630f64b987274f?dmode=source There are also nice online treatises on certain aspects. For the basics as related to quantum information theory, see, e.g., M. Plenio, Quantum Mechanics http://www.lsr.ph.ic.ac.uk/~plenio/lecture.pdf M.B. Plenio and V. Vedral Entanglement in Quantum Information Theory quant-ph/9804075 M.B. Plenio and P.L. Knight The Quantum Jump Approach to Dissipative Dynamics in Quantum Optics quant-ph/9702007 Modern experiments appear to need, however, a quantum mechanics of individual systems, and that's where controversy and confusion prevails. I find none of the existing interpretations convincing, and wrote up in Int. J. Mod. Phys. B 17 (2003), 2937-2980 = quant-ph/0303047 my own constructive (but incomplete) view of the matter. This paper is completely self-contained and works directly with the statistical mechanics version of QM, with the benefit that it avoids many of the traditional obscurities. It discusses complementarity, ensembles, uncertainty relations, probability, quantum logic, nonlocality, Bell inequalities, sharpness of measurements, and rudiments of quantum dynamics. The German ''Theoretische Physik FAQ'' at http://www.mat.univie.ac.at/~neum/physik-faq.txt contains a German language exposition of my consistent experiment interpretation of quantum mechanics, which is a much extended version of the above and gives a consistent setting for a quantum universe which explains the nature of quantum chance. A paper on this (in English) is in preparation. For the history of the interpretation of QM, see the excellent book Max Jammer The philosophy of quantum mechanics. The interpretations of quantum mechanics in historical perspective Wiley, New York 1974 and the collection of original papers, J.A. Wheeler and W. H. Zurek (eds.), Quantum theory and measurement. Princeton Univ. Press, Princeton 1983, ---------------------------------------- S12c. What is the role of quantum logic? ---------------------------------------- Quantum logic is a variant of logic often thought to be appropriate for the foundations of quantum mechanics. A good exposition is given in K. Svozil, Quantum Logic, Springer, Singapore 1998. The book is nice and useful for its material on hidden-variable related arguments. However, all that is commonly argued in textbooks about QM is argued in terms of classical logic. An even cursory look at the large quantum mechanical literature reveals that quantum logic only has a marginal spectator role in QM, while all proofs of all properties of quantum systems have always been discussed using the familiar classical logic. Even in Svozil's book, one can see that quantum logic is argued in terms of classical logic, and that it has essentially no role in the analysis of actual physical situations (apart from those used for testing the foundations). Beyond a certain point, quantum logic is sterile, which is the reason it never figures in textbooks (except perhaps in passing). All one ever needs to know about quantum logic (unless one wants to specialize in it) is summarized in Sections 6 and 7 of my paper Int. J. Mod. Phys. B 17 (2003), 2937-2980 = quant-ph/0303047. ---------------------------------- S12d. Stochastic quantum mechanics ---------------------------------- For certain Hamiltonians, the Schroedinger equation can be interpreted as a classical diffusion process. This leads to the stochastic quantum mechanics of Nelson. For an overview, see, e.g., http://www-stud.uni-essen.de/~sb0264/stochastic.html While it gives an interesting aspect to quantum mechanics and its classical limit, Nelson's description has a severe deficiency in that it cannot handle the situation when the wave function vanishes at some point. At all such points, R has a singularity, and S is entirely undefined. This happens, e.g., for excited states of hydrogen, hence is an integral part of standard quantum mechanics. Even if one argues that such states are idealized and cannot occur, it seems not be possible to show that a state that is everywhere nonzero will preserve this property under time evolution. Thus Nelson's representations may develop spurious singularities which are not in the observable part of quantum mechanics. Also, it is awkward to do scattering calculations in Nelson's framework. Moreover, Nelson, as quoted on p. 16 of the above paper, says correctly, ''Quantum mechanics can treat much more general Hamiltonians for which there is no stochastic theory.'' Thus it is unlikely to be useful as a 'fundamental' description of nature. Instead, natural stochastic forms of quantum mechanics are those of quantum diffusion processes and quantum jump processes, in which the wave function itself is regarded as a classical random object. For their use in an experimental context, see, e.g., quant-ph/9805027. ------------------------------------------------- S12e. Is there a relativistic measurement theory? ------------------------------------------------- Real measurements take time, and are not instantaneous. To treat the collapse as instantaneous is an idealization, valid for many applications of quantum mechanics. If relativistic effects play a role, one needs to use quantum field theory. However, the measurement process in quantum field theory is very poorly researched. Thus statements about the conflict of instantaneous collapse and relativity theory are based on very shaky grounds. For measurement in the relativistic case (but without invoking field theory) see quant-ph/9906034 and other papers by Peres and/or Terno available in the arxiv. They indicate the absence of problems, as far as such a simplified analysis can be trusted. -------------------------------- S12f. Quantum mechanics and dice -------------------------------- It is frequently held that quantum mechanics makes only statements about probabilities and not about single events. This is very strange for a theory that claims to be the foundation for everything scientifically observable. According to the probabilistic view, quantum mechanics is incapable of making any statement about dice that have been thrown already. Although we can observe with perfect accuracy the value of the throw, all that traditional quantum mechanics can give is the probability distribution of the possible values of the throw, if this value were not yet known. Quantum mechanics has similar difficulties coping with other actual events, since it never ever predicts what must happen or what must have happened, but only gives probabilities. This is of little consequence for quantities like the value of a throw of three dice, but is a severe defect when discussing the trajectories of the planets of the Solar System (for which we cannot make meaningful statistics), of air planes, or of cars. Clearly there must be something objective about these, although traditional quantum mechanical interpretations - taken seriously - are unable to accont for definite individual events. --------------------------------------------- S13a. Random numbers and other random objects --------------------------------------------- In probability theory, a random number is just a random variable x, i.e., a measurable function on the set Omega of possible experiments, that assigns to each experiment omega in Omega the value x(omega) of x in this experiment. In the important, 'noninformative' case where the measure is invariant under a group transitive on Omega, so that all experiments are identical copies of one another, physicists refer to this set Omega as a (classical) 'ensemble', although they are usually too vague to express this in formal terms. The terminology easily extends to the inhomogeneous case if one allows in ensembles each realization with a different frequency. Mathematicians prefer to leave the set Omega (which they call the 'sample space') unspecified and talk about 'realizations' in place of 'experiments'. Thus, for each experiment omega in Omega, x(omega) is a realization of x, i.e., what physicists would call the value found in this particular experiment. By giving a specific definition of the sigma algebra of interest, and specific recipes defining x(omega), one has a model world in which realizations make perfect sense. A difficulty is, of course, that we do not have such a model for the real world, and hence must resort to empirical approximations when treating real-life problems. (This places physicists at a slight disatvantage; however, there is the compensating advantage that their results apply to real life instead of only satisfying one's sense of beauty and precision....) The only thing not specified in probability theory (unless one specifies a particular model as indicated above) is the mechanism that draws the number, and hence there is no way to know which experiment omega has been realized. Therefore, probability theory makes only statements about _all_ realizations simultaneously. Example. Given the axioms of probability theory, a random number uniformly distributed between zero and one is defined as a random variable x such that = integral_0^1 f(s) ds for all Lebesgue-integrable functions f on [0,1], and any x(omega) is a realization of it, i.e., an actual number in [0,1]. (In particular, random numbers are _not_ numbers!) Mechanisms to draw numbers that may be used as approximations to a sequence of independent realizations x(omega) are called randon number generators. They do not produce random numbers (since random numbers are not numbers but measurable functions). Instead, they produce sequences that look like typical realizations of sequences of independent, uniformly distributed random numbers (in the sense that they usually pass with high confidence level certain statistical tests valid for such random sequences). Therefore, the numbers they generate are used in practice as (often completely adequate) substitutes for random numbers. (On the other hand, there is no uniformly distributed random natural number since the uniform measure on natural numbers, mu(f) = sum_{k>=0} f(k) is not normalizable.) Random numbers are comparably simple objects. More complicated random objects need more sophisiticated ensembles but otherwise everything remains analogous. Let us consider the physically important example of Brownian motion. Brownian motion (the random walk in space) is modelled by an ensemble whose realizations (members) are the H"older differentiable functions on R^3 with exponent 1/2. The probability of any particular realization of a random walk is exactly zero, and statements with positive probability must hold in uncountably many realizations. Nevertheless, the ensemble is precisely the set Omega composed of all such realizations. And the appropriate sigma algebra carrying the Wiener measure needed to describe the random walk is indeed an algebra of subsets of Omega. Repeatedly tossing a fair coin is also a (kind of trivial) stochastic process. A fair coin that can be thrown an unlimited number of times with independent outcomes (sampling with replacement) cannot be modelled by the sigma algebra 2^{0,1} over Omega_1 ={0,1}, since this has not even two independent bits. Its sigma algebra is based on the infinite ensemble Omega_inf consisting of all possible sequences of outcomes, and is the tensor product of infinitely many copies of 2^{0,1}. This setting is necessary in order to provide meaning to the concept of 'independent trial' which underlies most of statisitcal reasoning. Because of the assumed independence of the trials, one can reduce all computations to computations within 2^{0,1}. This is generally done in elementary probability theory, to simplify the presentation. But once one looks at binary processes which are even slightly correlated (history-dependent), one needs the full sigma algebra over Omega_inf. ------------------------------------------- S13b. What is the meaning of probabilities? ------------------------------------------- To say that "The probability that someone in risk group A will die of cancer is 1/3" does _not_ mean that "10 out of 30 people in risk group A will die of cancer". It only means that, "on the average, 10 out of 30 randomly chosen people in risk group A will die of cancer". This can be checked (in the limit) by many repeated simulations, or (directly) by a theoretical computation; both require that the complete ensemble is available. Of course, in using probabilities for predictive purposes, an insurance company tacitly assumes (without any guarantee) that the group of 30 people of interest is actually well approximated by a random sample, so that one can expect 10 out of the 30 to die of cancer. But this tacit assumption may well turn out to be wrong. Statements about ensembles are in principle exactly checkable: Operationally, to say that "The probability that someone in risk group A will die of cancer is 1/3" means nothing more or less than that exactly 1/3 of _all_ people in risk group A will die of cancer. (This assumes that risk group A is finite. For infinite ensembles, to define the precise meaning of '1/3 of all',one needs to go into technicalities leading to measure theory. Indeed, measures are the mathematically rigorous versions of 'classical ensembles' in general. For quantum ensembles, see quant-ph/0303047.) Of course, we cannot check this before we have information about how _all_ people in risk group A died, but once we have this information, we can check and verify or falsify the statement. In terms of precise mathematics: A classical ensemble is the set of elementary events underlying the sigma algebra over which the measure is defined. For example, in any finite sigma algebra containing random variables representing a fair coin (realizations 0,1; 1=head) with probability 50%), one has a finite ensemble of elementary events, and exactly half of them come out heads. For an infinite sigma algebra, the ensemble is infinite; but with the natural weighting, again exactly half of them come out head. Usually, however, we only have incomplete knowledge about the ensemble. For example, 'Tossing 10 fair coins' is just a sloppy way of saying 'Selecting a sample of size 10 from the total ensemble'. The sigma algebra for modeling this must contain at least 10 indepemdent random variables representing fair coins. This is the case, e.g., in the direct product of N>=10 sigma algebras isomorphic to 2^{0,1}. For N>10, it is obvious that here the number of heads is 5 (=50%) only on average over many random samples; and it is impossible to infer the exact probability from a single sample. This is why statisticians say that they _estimate_ probabilities based on _incomplete_ knowledge, collected from a sample. The resulting estimated probabilities are known to be inherently inaccurate; but they can be checked approximately by independent data (cross-validation) providing confidence levels indicating how much the predictions can be trusted. On the other hand, they _compute_ probabilities from _assumed_complete knowledge about the ensemble, namely the theoretical probability distribution. Thus if complete information goes in, exact information comes out, while computations based on incomplete information naturally only gives approximate results inheriting some uncertainty from the input. Computed probabilities are powerful, but only if the assumed stochastic model is correct. Empirical estimates are usually inaccurate but useful. The two approaches are not contradictory; indeed, they are combined in practice without difficulties at all. The only subjective aspect in the whole thing is the choice of a stochastic model when making theoretical predictions; and even this is made almost objective by the standard rules of statistical inference and model building. Indeed, the choice of ensemble is _always_ a subjective act that determines what the probabilities mean. It encodes what the user is prepared to assume about the given situation. Once the ensemble is chosen - either a theoretical, exactly known ensemble, defined by specifying a distribution, or as a real life ensemble of which only a (perhaps growing) sample is available, all probabilities have an objective meaning. A chosen ensemble is knowledge precisely if it is close to the correct ensemble, and we have a good idea of how close it is. That's why we value highly scientists such as Gibbs who guessed the right ensembles for statistical mechanics, which turned out to be a highly accurate description of equilibrium situations. Only good choices are knowledge. And what is good is found out only through proper checking, and not through the principle of insufficient reason. In case of tossing a coin we know that the fairness assumption is usually reasonable, being consistent with experience. In case of taking an exam at a newly appointed professor about whom no one knows anything, reasoning from the two possible outcomes (pass or fail) and the principle of insufficient reason to assign a probability of 50% failure is ridiculous, and dangerous for those who are not prepared. ----------------------------------------------------------------- S13c. What about the subjective interpretation of probabilities? ----------------------------------------------------------------- People with a preference for subjective interpretations would say ''probabilities depend on someone's knowledge''. instead of ''probabilities are a property of the ensemble under consideration''. They talk of ''arrival of new information'' or ''learning'' instead of the objective and unassailable formulation ''restricting the ensemble to a subset defined by the conditions'' when discussing conditional probabilities (the classical analogue of the statistical collapse of the wave packet in quantum mechanics). But knowledge is an even more poorly defined concept than probability, which at least has an undisputed axiomatic basis. Thus explaining probability in terms of knowledge only makes the meaning of probability more foggy by putting it deep into the psychological realm. Moreover, the subjective interpretation based on the Bayesian paradigm of conditional probability has no formal way of coping with misinformation (the ensemble grows if one learns that some of the information one believed to know turns out to be false!) while, on the objective level, the latter is just another change of the ensemble. Thus the subjective interpretation of probability is an inadequate foundation for the use of probabilities in physics. ------------------------------------------------------- S13d. Are probabilities limits of relative frequencies? ------------------------------------------------------- Sometimes, probabilities are regarded as limits of relative frequencies as the number of trials becomes arbitrarily large. But the weak law of large numbers only guarantees that most trial histories will give a sequence of relative frequencies that converge to the probability. It might just fail for the one actually tried... Moreover, in practice we only have partial knowledge of such an infinite sequence of trials (which cannot be performed). This knowledge about the sample give no knowledge at all about the limiting ensemble. Just as the knowledge of the first n items of a sequence give, in theory, no knowledge at all about the limit of the sequence. That we often estimate the limit using a small part of the sequence is asnother matter, and is like estimating probabilities from samples. But the estimate may be completely wrong. Thus interpreting probability as relative frequency is a philosophically difficult interpretation step. For a thorough discussion, see the very informative books by T.L. Fine, Theory of probability; an examination of foundations. Acad. Press, New York 1973. and L. Sklar, Physics and Chance, Cambridge Univ. Press, Cambridge 1993. -------------------------------------------------------- S13e. How meaningful are probabilities of single events? -------------------------------------------------------- (Note: In this FAQ, 'event' is always understood in the ordinary sense of the word, as 'something specific happening'. In axiomatic probability theory based on Kolmogorov's axioms, there is a slightly different, formal meaning of an event as an element of the underlying sigma algebra. An axiomatic foundation of probability theory equivalent to that of Kolmogorov, but not based on sigma algebras, can be found in the book 'probability via expectation' by Paul Whittle, and a quantum extension in quant-ph/0303047.) Probabilities of single events are not at all meaningful - at least not in any scientific sense -, although we are used to scientific-sounding phrases such as ''There is a 60% probability for rain tomorrow''. Instead, probabilities are properties of ensembles of events. In the case just cited, the ensemble is the set of all tomorrow's, (or rather an infinite idealization of it), and the probability is not an exact probability, but an estimate computed on the basis of a sample of former 'tomorrow's, together with statistical weather models. Probability assignments to single events can be neither verified nor falsified. Indeed, suppose we intend to throw a coin exactly once. Person A claims 'the probability of the coin coming out head is 50%'. Person B claims 'the probability of the coin coming out head is 20%'. Person C claims 'the probability of the coin coming out head is 80%'. Now we throw the coin and find 'head'. Who was right? It is undecidable. Thus there cannot be objective content in the statement 'the probability of the coin coming out head is p', when applied to a single case. Subjectively, of course, every person may feel (and is entitled to feel) right about their probability assignment. But for use in science, such a subjective view (where everyone is right, no matter which statement was made) is completely useless. What is the probability that a particular person, Mrs. X, will die of cancer? This is a single event that either will happen, or will not happen. If one considers this single event only, the probability is 1 or 0, depending on what will actually happen. (But this sort of probability is not what we talk about in physics.) On the other hand one may assign a probability based on some facts about Mrs. X (smoker? age? gender? already ill?, etc). Each collection of such facts determine an ensemble of people, from which one can form a statistical estimate of the probability. It clearly depends on which sort of ensemble one regarde Mrs. X to belong to, what probability one will assign. Mrs. X belongs to many ensembles, and the answer is different for each of these. Thus probabilities are meaningful not as a property of the single event but only as a property of the ensemble under consideration. This can also be seen from the mathematical foundations. Classical probabilities are determined by measures over some sigma algebra. All statements in measure theory are _only_ about expectations and probabilities of all possible (often infinitely many) realizations simultaneously, and say nothing at all about any particular realization. For a random sequence consisting of 9 independent bits, with 0 and 1 equally likely, the sequence 111111111 has exactly the same status and exactly the same probability as the sequences 110100101 or 000000000, although only the second sequence looks random. (A random sequence is _not_ a sequence of numbers but a sequence of random numbers = measurable functions. Only the _realizations_ of a random sequence are sequences of ordinary numbers. Sequences of ordinary numbers are _never_ random, but they can 'look random', in a subjective sense.) ----------------------------- S13f. Objective probabilities ----------------------------- Consider a physical die (for simplicity assumed perfectly symmetric) with six elementary events 1,...,6. If the die is not thrown, all events are equivalent, and the probabilities are 1/6 for each event. These probabilities are associated to the die (_not_ to a throw), and can be determined uniquely from the knowledge of the geometry and composition of the die. All of probability theory happens at this level, since the 'happening' of an event is not formally defined. If the die is thrown, a given event (say 3) either happens or does not happen. If the event happens (does not happen), the statement 'This throw is a 3' is true (false), hence has a probability of 100% (0%), although before the throw, these probabilities are not yet known. These probabilities are associated to each particular throw (_not_ to the die). Thus a die functions as a potential stationary source of throws, and hence _defines_ an ensemble of (conceivable) throws. An actual throw, though a realization of this ensemble, is determined by the outcome, and cannot be assigned a probability different from 0 or 1. [See, e.g., the wikipedia entry http://en.wikipedia.org/wiki/Probability_theory ''Omega is a non-empty set, sometimes called the "sample space", each of whose members is thought of as a potential outcome of a random experiment.'' 'is thought of' signifies the interpretational level. Probabilities are only about 'potential outcomes' (what I call conceivable), not abut actual ones.] A stationary source has objective probability distributions for random vectors computable from observations made on it. These are given in terms of an objective expectation mapping and an associated density. In principle, this density can be measured arbitrarily well, and if the form and composition of the source is known, can be objectively predicted from physical theories. Thus objective probability distributions exist always when the generating ensemble is completely known, and more generally whenever it is objectively determined. Similarly, in quantum theory, a laser is a potential stationary source of photons, the oven in a Stern-Gerlach experiment is a stationary source of electrons, etc. The sources are in well-defined, objective quantum mechanical states, defining ensembles with objectively predictable properties. ------------------------------------------------------------ S13g. How probable are realizations of stochastic processes? ------------------------------------------------------------ In a stochastic setting, _every_ realization of a stochastic process typically has probability 0; nevertheless, exactly one of them actually happens. Taking for simplicity the stochastic process defined by independent flips of a fair coin, a realization is an infinite binary sequence, and each of these has probability zero. (Partial realizations of finite length N each have a probability of 2^-N which is extremely tiny for large N.) For discrete stochastic processes having a continuum of allowed values at each time step, even partial realizations have zero probability, except in degenerate situations. The same holds for continuous-time stochastic processes. The case of measuring electron spin, say, is more difficult to analyze because as stated, it is not yet a well-defined stochastic process. If it is taken as a continuous measurement, the flips occur at random times, and so even a single flip at a definite time has probability zero. If it is taken as a discrete process, we need to specify a measuring protocol that applies at definite, equidistant times. Then it is likely that there are some correlations, and probabilities even of finite pieces of a particular realization are hard to get by. Nevertheless, under reasonably random circumstances (for example, when measuring spins of independent electrons), the probability of the most likely sequence of N measurements decreases exponentially with N, and the probability of a complete realization (infinite sequence) is again zero. --------------------------------------------- S13h. How do probabilities apply in practice? --------------------------------------------- If one has a sound probabilistic model of a multitude of independent events e_i with same assigned probability p one would be surprised if the frequency of events is not close to p within a small multiple of sqrt(p(1-p)/N). Rather than just accepting a rare occurence (e.g., a brick going upwards due to fluctuations) as something within one's probabilistic model, one would probably rather try to explain it away by assuming a hidden, unobserved cause (someone throwing it). The way probabilities are used in practice is always as informative guides of what to expect, but not as statements with a 100% exact meaning. I wrote a paper on surprise: A. Neumaier, Fuzzy modeling in terms of surprise, Fuzzy Sets and Systems 135 (2003), 21-38. http://www.mat.univie.ac.at/~neum/papers.html#fuzzy that may help understand the fuzziness inherent in our concepts of reality. ----------------------------------------- S13i. Incomplete knowledge and statistics ---------------------------------------- It is offen erroneously assumed that incomplete knowledge can always be described by statistics. But this is by no means the case. If one knows about a number x only that it is in [0,1], one cannot apply statistics since one knows nothing at all about the distribution (except for its support). It is perfectly consistent with the knowledge that in fact always x=0.75, except that one does not know it, or that x oscillates regularly, or.... The ignorance is in this case simply deterministic lack of information. In particular, it would be a mistake to assume that the distribution is uniform (ignorance interpretation). Using the noninformative prior of the Bayesian school, which makes this assumption, may be seriously flawed. More realistically, in engineering, an uncertainty in the elasticity module of 5% in steel bars may be the only information available to an architect; but 3/4 of the bars used later in the building may have a deviation of 0.1% and the remaining quarter one of 3.7%. In general, all one can deduce from information that takes the form of deterministic bounds on a vector x of variables and/or on expressions in x are bounds on derived quantities y=f(x) one would like to compute from it. This leads to global optimization problems, where f(x) is minimized or maximized subject to the known constraints. See http://www.mat.univie.ac.at/~neum/glopt/intro.html The lack of knowledge that statistics can model is of a different kind. It assumes that the _maximal_attainable_ knowledge about the system - at the given level of description - is a probability distribution, and that this probability distribution is indeed known. The knowledge of the probability distribution can be replaced by a qualitative knowledge of it (e.g. 'some Gaussian distribution'), together with the knowledge of an incomplete sample from the ensemble of interest; in this case, however, the best statistics can offer are parameter estimation techniques that give credible probability distributions compatible at some confidence level with the sample data. There are also combinations of both kinds of incomplete information, where one knows the maximal knowledge about a system should be stochastic, but one lacks complete information on the distribution. This is handled by the field of 'imprecise probability', although there is not yet a generally accepted way for analyzing such situations, and different schools with quite different basic approaches compete. See, e.g, the links in http://class.ee.iastate.edu/berleant/home/ServeInfo/Interval/intprob.html Theoretical physics is always concerned about describing the maximal attainable knowledge about a system (at a given level of description), irrespective of what anyone actually knows about it. In this way, and only in this way, it is possible to get close to the objectivity that science always is striving for. ---------------------------------------------- S13j. Priors and entropy in probability theory ---------------------------------------------- For a probability distribution on a finite set of alternatives, given by probabilities p_n summing to 1, the Shannon entropy is defined by S = - sum p_n log_2 p_n. The main use of the entropy concept is the maximum entropy principle, used to define various interesting ensembles by maximizing the entropy subject to constraints defined by known expectation values = sum P_n f(n) for certain key observables f. If the number of alternatives is infinite, this formula must be appropriately generalized. In the literature, one finds various possibilities, the most common being, for random vectors with probability density p(x), the absolute entropy S = - k_B integral dx p(x) log p(x) with the Boltzmann constant k_B and Lebesgue measure dx. The value of the Boltzmann constant k_B is conventional and has no effect on the use of entropy in applications. There is also the relative entropy S = - k_B integral dx p(x) log (p(x)/p_0(x)), which involves an arbitrary positive function p_0(x). If p_0(x) is a probability density then the relative entropy is nonnegative. For a probability distribution over an _arbitrary_ sigma algebra of events, the absolute entropy makes no sense since there is no distinguished measure and hence no meaningful absolute probability density. One needs to assume a measure to be able to define a probability density (namely as the Radon-Nikodym derivative, assuming it exists). This measure is called the prior (it is often improper = not normalizable to a probability density). Once one has specified a prior dmu, = integral dmu(x) rho(x) f(x) defines the density rho(x), and then S(rho)= <-k_B log(rho(x))> defines the entropy with respect to this prior. Note that the condition for rho to define a probability density is integral dmu(x) rho(x) = <1> = 1. In many cases, symmetry considerations suggest a unique natural prior. For random variables on a locally compact homogeneous space (such as the real line, the circle, n-dimensional space or the n-dimensional sphere), the conventional measure is the invariant Haar measure. In particular, for probability theory of finitely many alternatives, it is conventional to consider the symmetric group on the set of alternatives and take as the (proper) prior the uniform measure, giving = sum_x rho(x) f(x). The density rho(x) agrees with the probability p_x, and the corresponding entropy is the Shannon entropy is one takes k_B=1/log2. For random variables whose support is R or R^n, the conventional symmetry group is the translation group, and the corresponding (improper) prior is the Lebesgue measure. In this case one obtains the absolute entropy given above. But one could also take as prior a noninvariant measure dmu(x) = dx p_0(x); then the density becomes rho(x)=p(x)/p_0(x), and one arrives at the relative entropy. If there is no natural transitive symmetry group, there is no natural prior, and one has to make other useful choices. In particular, this is the case for random natural numbers. Choice A. Treating the natural numbers as a limiting situation of finite interval [0:n] suggests to use the measure with integral dmu(x) phi(x) = sum_n phi(n) as (improper) prior, making = sum_n rho(n) f(n) the definition of the density; in this case, p_n=rho(n) is the probability of getting n. Choice B. Statistical mechanics suggests to use as (proper) prior instead a measure with integral dmu(x) phi(x) = sum_n h^n phi(n)/n!, where h is Planck's constant, making = sum_n rho(n) h^n f(n)/n! the definition of the density; in this case, p_n=h^n rho(n)/n! is the probability of getting n. The maximum entropy ensemble defined by given expectations depends on the prior chosen. In particular, if the mean of a random natural number is given, choice A leads to a geometric distribution, while choice B leads to a Poisson distribution. The latter is the one relevant for statistical mechanics. Indeed, choice B is the prior needed in statistical mechanics of systems with an indefinite number n of particles to get the 'correct Boltzmann counting' in the grand canonical ensemble. With choice A, the maximum entropy solution is unrelated to the distributions arising in statistical mechanics. Thus while the geometric distribution has greater Shannon entropy than the Poisson distribution, this is irrelevant for classical physics. In statistical physics with an indeterminate number of particles, only the relative entropy corresponding to choice B is meaningful. (In the quantum physics of systems with discrete spectrum, however, the microcanonical ensemble is the right prior, and then Shannon's entropy is the correct one.) The identification of 'information' and 'Shannon entropy' is dubious for situations with infinitely many alternatives. Shannon assumes in his analysis that without knowledge, all alternatives are equally likely, which makes no sense in the infinite case, and may even be debated in the finite case. (One of the problems of a subjective, Bayesian approach to probability is that one always needs a prior before information theoretic arguments make sense. If there is doubt about the former the results become doubtful, too. Since information theory in statistical mechanics works out correctly _only_ if one used the right prior (choice B) and the right knowledge (expectations of the additive conserved quantities in the equilibrium case), both the prior and the knowledge are objectively determined. But this is strange for a subjective approach as the information theoretic one, and casts doubt on the relevance of information theory in the foundations.) ------------------------------------------------------- S14a. Theoretical challenges close to experimental data ------------------------------------------------------- Many theoretical physicists seem to think that the only worthwhile challenges in theoretical physics can be found at >TeV energies. But, (un?)fortunately, there are challenges, as difficult and as exciting, in the realm of normal energies, deep in the limits of the unknown (as regards understanding), and far more relevant in my opinion. The manpower and money invested in the exotic realms of nature at very large energies would be much better spent on these challenges closer to experimental data.. For example, finding a consistent nonperturbative setting for QFT, or giving a meaning to the concept of the ground state of a Helium atom in quantum electrodynamics (extended by a field describing the nuclei). I have not seen a single field-theoretic treatment of Helium, surely a simple system. Helium is a bound state with well-defined asymptotic behavior, as well-defined as a dressed electron or photon, but there is no clear conceptual basis for this in QFT although there should be such a concept. That's why I think it is a very important unsolved problem. There are papers making heuristic approximations (see hep-ph/9612330) which give accurate predictions - cf. Phys Rev. A 65 (2002), 032516 and Phys. Rev Lett. 84 (2000), 3274 -, but they don't give a clue what a helium atom 'is' in QFT. Moreover, they treat two electrons in a classical external Coulomb field instead of a system of two electrons and a nucleus. The current treatment of bound states in QFT (see elsewhere in this FAQ) is a very loose patchwork of techniques borrowed from perturbative field theory and nonrelativistic quantum mechanics that should make every theoretician shudder. There are some beginnings in algebraic QFT of what bound states should be, but nothing convincing on the quantitative level. A theory of everything should also be able to answer questions that are well established experimentally but not understood from the foundations. For example, deriving the Navier-Stokes equations for water from quantum theory is another challenge that so far remained unmet; it has been done long ago for dilute gases, but no one extended it to dense fluids. There are severe difficulties to overcome, but we know both the final result (to much better accuracy than the parameters of the standard model) and the supposed underlying microscopic model (unlike in quantum gravity); and the availablility of a derivation might even have long-term engineering consequences for predicting properties of fluids under thermodynamic conditions where experiments are difficult or impossible. I am not an expert in this topic, but here are some pointers to what I have seen about the problem. I have never seen any microscopic derivation of Navier-Stokes for water, although this is by far the most important application. The statistical mechanics text of Reichl derives the equations in Chapter 14F from thermodynamics, and in Chapter 16C-F (for dilute monatomic gases) from classical statistical physics. Fujita, Nonequilibrium statistical mechanics, derives Boltzmann from QM in Chapter 4.2 and Chapter 6; Navier-Stokes would be roughly analogous (for dilute gases). Similarly for many other books on nonequilibrium stat. mech.. Mueller/Ruggeri, Extended Thermodynamics, treat relativistic versions, deriving them from the Boltzmann equation and from thermodynamics. Volume 9 of Landau/Lifschitz discusses techniques for the condensed state in general, but no derivation of Navier-Stokes. J. Math. Phys 11 (1970), 2481 is a paper summarizing in the introduction what was known by 1970. Phys. Rev. D 53, 5799-5809 (1996) derives hydrodynamic equations from quantum fields but only in a scalar phi^4 theory. More recent related work includes Phys. Rev. D 68, 085009 (2003) Phys. Rev. D 64, 025001 (2001) Phys. Rev. D 61, 125013 (2000) Thus there is a well-trodden pathway for the dilute gas case, and a set of tools for the condensed phase, but no synthesis of the two. If you find better references, please let me know. ------------------------------------------------ S14b. Does the standard model predict chemistry? ------------------------------------------------ The standard model is widely believed to be in agreement with all we know about matter and radiation on earth, within the range of accessible energies, as long as gravitational effects can be neglected. But this does not mean that it has a high predictivity, except on the level of high energy elementary particle scattering. The reason is that we can compute from it almost nothing at the scales of interest in nuclear, atomic, or molecular physics. Lattice gauge calculations show that the standard model implies the existence of baryons such as proton and neutron with masses that match the experimental masses with an accuracy of about 5%. This is far too low to be of use in chemistry or even in nuclear physics. The accuracy of the effective forces between them is even poorer. We have very little control over confinement, which is essential to get useful forces at the energies relevant for nuclear physics. Thus predictivity of the standard model for nuclear information is almost nil. And indeed, nuclear physicists do not use the standard model (except for paying religious lip service to it), but work with their own phenomenological models. They just borrow some of the symmetries. These were of course known long before the standard model was born, and built into the latter to match reality; so they cannot count as predictions from the standard model. If we had only the standard model and the numerical estimates for the constants of effective actions computed from it, this would give _very_ poor predictions of properties of protons, neutrons, and their bound states. One can show that the effective dynamics of protons and neutrons is governed by effective field theories whose form can be derived from the standard model (but also follows from assumed symmetry principles built into the standard model) but whose coefficients are derived by fitting calculations to _measured_ data about form factors of proton and neutron, which have _not_ been calculated from the standard model but must be put in by hand as additional information. From this, one can calculate the energy of the nuclei, using a combined droplet/shell model. We understand the structure of nuclei, in agreement with the standard model, but _not_ derived from it. If we had only the standard model and the numerical estimates computed from it, this would give _very_ poor predictions of nuclear properties. There would be neither nuclear energy nor nuclear weapons based on knowledge derived form the standard model only. Even knowing the properties of proton and neutron from measurement and the effective equations (but nothing else) does not allow to get highly accurate predictions for the properties of larger nuclei. At atomic distances from the nucleus (for QED-dominated phenomena), one can further approximate the theory by Dirac-Fock equations, or, for light nuclei, by Schroedinger's equation for electrons and nuclei together with relativistic corrections. The details of the nuclei become irrelevant for atomic physics and chemistry, except for their atomic weights. These cannot be derived accurately enough from lower levels, and must again be supplemented by additional experimental information. If we had only the standard model and the numerical estimates computed from it, this would give _very_ poor predictions of most chemical properties of everything including the hydrogen spectrum. Only starting on this level, _assuming_ the properties of the nuclei and the electron, we are able to predict much of macroscopic physics: We can solve the Dirac equation exactly for hydrogen, and compute the radiation corrections from QED and other corrections from the Standard Model. It agrees with the experimental measurement of hydrogen spectra to extraordinary accuracy. We can understand why the periodic table works, and predict the properties of even large atoms (such as the color of gold) reasonably well using the Dirac-Fock equations. From this level on upwards, one has enough experimental data to calculate chemical information for small molecules that is predictive in the sense that it may give quantitative information that is reasonably accurate and not put in by hand. But already for proteins, one again needs to complement the theoretical input by measurements to get predictions of reasonable accuracy. Thus the standard model is a very inaccurate tool for chemistry. It is useful only for elementary particle scattering experiments. At each higher level, one needs additional information from experiment to complement the predictions of the lower levels. --------------------------------------------------- S14c. Is the result of a measurement a real number? --------------------------------------------------- A single measurement (reading from a scale) always gives a rational number, at least if the scale is in terms of rastional units. (If the scale gives an angle in degrees which is then converted into arc length, the measuremnt gives rational multiples of pi instead). However, this is by convention only, since a pointer position is just a position in 3-space which must be translated into a number by a subjective reading or by a digital reading device of limited resolution. Thus the true position is not determined accurately enough to associate it with a single number. Infinitely many rationals (and uncountably many reals) are compatible with any observable state of the voltmeter. That's why the error bars are intrinsic to measurement results, even to single readings. Deleting them and claiming exact measurement results is just laziness, acceptable when the resolution of an instrument is known. Therefore, according to the standards of NIST (National Institute of Standards and Technology), a measurement gives an interval consisting of a rational number together with an error bar; see http://physics.nist.gov/cuu/Uncertainty/ Of course, the error bar is also somewhat uncertain, but one generally accounts for this uncertainty by rounding it upwards, to make the whole estimate conservative. The NIST definition has the advantage that it also applies to indirect measurements obtained from raw measurements by some computations. Indeed, most high quality measurements are of this kind. Nevertheless there is no contradiction if one assumes that reality is governed by equations in terms of exact real (or complex) numbers, and only the measurement abilities are limited. ----------------------------------------- S14d. Why use complex numbers in physics? ----------------------------------------- Complex numbers are _the_ natural number system for all but elementary physics; one needs them to make sense of many advanced concepts. Avoiding complex numbers would make much of what is done incomprehensible. Already Fourier analysis is most natural with complex numbers, though here it could be avoided by using trigonometric series instead. The time-independent Schroedinger equation defines the Fourier components of real, measurable expectations. So it is very natural that quantum mechanics is based on complex entities, too. Dispersion relations in optics are natural only in a complex setting. Spectra of nonhermitian operators, essential for dissipative systems even in the classical case, are always complex. Analytic continuation plays a significant role in some physical theories. For example, lattice gauge theory works in a continuation of quantum field theory to Euclidean space, and the results must be continued back to Minkowski space to get physical meaning. On the other hand, at first sight it seems that only real quantities are measurable. However this only holds for the most direct measurements where you read a number from a meter. Most measurements are of a more indirect kind, and then this restriction no longer applies. To measure a family of physical quantities x_l (l=1,...,n), one measures some related real quantities r_1,...,r_m connected to the x_l by a system of equations F(x,r)=0 (in the absence of measurement errors). In fact, there will always be measurement errors, hence one generally uses more equations than unknowns and solves the least squares problem ||F(x,r)||^2=min (or a more complicated related problem if a model of measurement errors is avaialble) to get an estimate of x. This recipe is universally used for all sorts of measurements and works whether the x_l are real or complex. ------------------------------------------- S15a. How precise can physical language be? ------------------------------------------- The relation between theory and reality necessarily uses ordinary language and is therefore somewhat fuzzy. If one insists on 100% unambiguous statements, one is on the level of pure mathematics or mathematical physics (platonic reality), and cannot have any contact with (physical) reality. The best one can do is to have completely precise concepts on the theoretical level and a description in ordinary, informal language that relates theory to reality. In the formal theory, all concepts can be precisely defined, and get names corresponding to their intended use in reality. This ensures that one knows precisely what one talks about - on the conceptual level. In this informal language there must be room for linguistic approximations without specifying their quality more than by fuzzy words interpreted by the circumstances, since this is the way we necessarily perceive reality. When formulating the interface between theory and reality, one must use the formulations people use who are using this interface, They know how 'large' something must be to be taken as 'infinite'. They estimate limits from finite sequences (most of numerical analysis would be void if we couldn't...), usually quite successfully - although this is meaningless mathematically. A mathematical limit in theory does _not_ translate into a mathematical limit in reality. This is necessary since all our observations are finite, and most of them are noisy. As there are approximate ways of determining the mass of the Moon, but no exact methods, so there are approximate methods for determining probabilities, but no exact ones. Exact real numbers belong to theory, not to reality. (Even counting is not sure to result in an integer. What about the number of people in a room when just someone enters?) Careful protocols for experimentation and measurement are useful to achieve a certain amount of objectivity and repeatability, but even the best protocols cannot reduce the level of fuzziness in the interface between theory and reality to zero. I recommend Experimentation and Measurement, by W.J. Youden, reprinted 1997 by the National Institute of Standards and Technology http://ts.nist.gov/ts/htdocs/230/233/calibrations/Publications/exp_meas.pdf Although a very old paper (from 1961), it is still considered by NIST to be up to date and exemplary in its lessons about measurements. Among other things, it discusses on pp. 26ff in greatest detail how to measure the thickness of a sheet of paper in an ensemble of sheets typically called a thick book. If one follows his argument closely, one finds that even classically, observables such as the 'thickness of a sheet of paper' are probabilistic only, notwithstanding that probably everything relevant about paper can be understood by classical mechanics and thermodynamics. Thus there are no exact concepts in observed Nature. But in a good theory of Nature, all concepts should be exact. ---------------------------------------- S15b. Why bother about rigor in physics? ---------------------------------------- Approximate methods are almost always more efficient than rigorous ones. You can see this, for example, from the way integrals are calculated in numerical analysis. No one uses the 'constructive proof' by Riemann sums or, harder, by measure theory. But for the logical coherence of a theory, the rigorous approach is important. To prove that a long, complicated expression in a single variable is monotone may be quite hard and exceed the capacity of a typical mathematician or phycisist, but to evaluate it at a few hundred points and look at the plot generated is easy. If you (the reader) are satisfied with the latter, never try to understand mathematical physics - it will be a waste of your time. But if you want to have physics in general look like classical Hamiltonian mechanics - a beautiful piece of mathematically rich and powerful theory, then you should not be satisfied with the way current quantum field theory (say) is done, and keep looking for a better, more solid, foundation. About the pitfalls of using mathematics ''formally'' (i.e., without bothering about convergence of the expressions, existence or interchangability of limits, etc.), I recommend reading F. Gieres, Mathematical surprises and Dirac's formalism in quantum mechanics, Rep. Prog. Phys. 63 (2000) 1893-1931. quant-ph/9907069 and G. Bonneau, J. Faraut, G. Valent, Self-adjoint extensions of operators and the teaching of quantum mechanics, Amer. J. Phys. 69 (2001) 322-331. quant-ph/0103153 See also: K Davey, Is Mathematical Rigor Necessary in Physics? British J. Phil. Science 54 (2003), 39-463. http://philsci-archive.pitt.edu/archive/00000787/ On the other hand, on the way towards finding out what is true, nonrigorous first steps are the rule, even for hard die mathematicians. The role of intuition and nonrigorous thinking in mathematics is well depicted in the classics J. Hadamard, An essay on the psychology of invention in the mathematical field, Princeton 1945. and G. Polya, Mathematics and plausible reasoning, 2 Vols., 1954. or G. Polya, Mathematical discovery, John Wiley and Sons, New York, 1962. More recently, the article A. Jaffe and F. Quinn, "Theoretical mathematics": Toward a cultural synthesis of mathematics and theoretical physics, Bull. Amer. Math. Soc. (N.S.) 29 (1993) 1-13. math.HO/9307227 reports on the potential and dangers of nonrigorous approaches to scientific truth. This paper was commented in contributions by a number of influential mathematicians and mathematical physcists in M. Atiyah et al., Responses to ``Theoretical Mathematics: Toward a cultural synthesis of mathematics and theoretical physics'', by A. Jaffe and F. Quinn, Bull. Amer. Math. Soc. 30 (1994) 178-207. math/9404229 and the response of Jaffe and Quinn is given in A. Jaffe and F. Quinn, Response to comments on ``Theoretical mathematics'', Bull. Amer. Math. Soc. 30 (1994) 208-211. math/9404231 See also D. Zeilberger, Theorems for a Price: Tomorrow's Semi-Rigorous Mathematical Culture, math.CO/9301202, J. Borwein, P. Borwein, R. Girgensohn and S. Parnes Experimental Mathematics: A Discussion (1996?) http://grace.wharton.upenn.edu/~sok/papers/age/expmath.pdf -------------------------------------------- S15c. Justifying the foundations of a theory -------------------------------------------- Quantum mechanics is a somewhat unintuitive theory, and generated a lot of foundational literature aimed at justification and explanation of the conceptual basis. Justification of the basic postulates of any theory is necessarily circular. If it were not, the postulates were not basic but derivable. One must take all the basic postulates as a single foundation on which everything else rests without circularity. But the basic postulates themselves can only be motivated, but not derived. Most people simply trust that tradition selected good foundations. If you want to probe that trust you can go into studying the sea of publications on the foundations of quantum mechanics. But unless you are very dedicated and spend a lot of effort on it, it is likely that you'll drown there before having found satisfaction... ---------------------------------------- S15d. Foundations, theory and experiment ---------------------------------------- Foundations of physics is the quest for getting the mathematical concepts right to be able to do correct physics and think correctly about it. Without correct concepts operational statements have no meaning. The theory defines what a measurement is. Outside the immediate realm of everyday experience, one needs already the conceptual basis to even discuss what has operational meaning. These statements apply both to good and bad theories. Even a bad theory defines what a measurement is; it just defines is more poorly. There is in fact a crossfertilization between measurement and foundations. If one gets better the other profits from it. On the other hand, fuzzy foundations lead to poor judgment and ambiguity in measurements, and poor measurements lead to low discrimination among theoretical alternatives. One can observe from history that progress in concepts lead to better inverstigations of nature, and better experiments lead to higher demands on the theory, forcing people to look for more stringent concepts and simpler or more encompassing frameworks. ------------------------------------------------------ S15e. Theoretical physics as a formal model of reality ------------------------------------------------------ Can the meaning of all terms in a physical model be determined precisely without an infinite regress? I want to show that the answer is a clear `yes'. Look at the question `What is a force?' To answer this, one needs to consider the concepts of force, mass, acceleration, pressure, stress, recoil, perhaps the gravitational field, etc., in total a small number of physical items. If we want to define them in reality, we don't get an infinite chain but a circular definition -- we can only define one in terms of another, illustrating the concepts by pointing to situations where we hope everything is obvious. In practice (i.e., in teaching physics), this works alright since each of us knows reality already and only needs enough context to identify the usage of the concepts -- there is essentially only one fit that works, and once the light goes on, we understand -- or at least the level of understanding deepens. (Later, when doing high precision measurements, we may notice that our understanding is not adequate, and become more careful and sophisticated, and at some advanced s tage one can probably write a whole book to get definitions that are really precise...) But there is another way that is fruitful and neither circular nor infinite. It is obtained by mimicking how modern logic investigates its foundations. It assumes that we know at the 'external reality' level what logic is; then it builds a formal model, a 'formal reality', in which one can talk about everything one talks in 'real' logic, but in completely formal terms. You don't need to know what truth, propositions, etc. are in reality, but you declare the rules for manipulating with them -- since this is the heart of the matter. This is done in exactly the same way as the Greeks declared rules for manipulating geometric terms. In addition, they had definitions like 'a point is what has no parts'; but in modern geometry, this is considered to be not a well-defined formal statement (instead it has the circular character of relating the concept to reality), and hence is simply dropped from the list of axioms. So modern geometers define a projective plane by a few simple statements: ''There are points, there are lines, there is a relation which tells which points are on which lines, through any two distinct points there is exactly one line, and any two distinct lines have exactly one common point.'' That's all, and it is enough to do planar projective geometry with full clarity and completeness. We do not need to know anything about the objects to analyze a situation (unless we want to check it's impact on external reality). Of course, it is good to have a few more restrictions and concepts to go really deep, but this is supposed to be just an example. In the same way, one can discuss _everything_ about the real logic in the formal model of logic, and reach clarity. It is my proposal to do this for physics as well. Actually it has been nearly achieved in classical physics, and fully achieved in Hamiltonian mechanics. You start with a phase space and a Hamiltonian which fall from heaven. (They are motivated by circular arguments, but these arguments are not part of the theory in the formal sense.) Having this, you can build a whole world, with atoms, dynamics, paths, forces, accelerations, stress, etc. In fact, you can discuss any question about the classical world in this mathematical frame, without ever needing any undefined term. Formal reality is define by what is expressible in terms of the concepts already available, and 'true' reality with its circularity never enters except as a guide to formulating new concepts and to discuss their consequences. This is what I think theoretical physics is about. It builds a formal model of the world, with a 'formal reality', in which every important concept from experimental physics has a well-defined formal meaning, and in which every reasonable question about the physical world can be posed and investigated. What can be posed and analyzed in such a framework counts as understood, and understanding of nature increases by bringing more and more into such a formal model, until everything about physical nature is representable. My vision is that the same is possible and desirable for quantum physics. For me, realizing this vision is equivalent to having understood quantum physics. So I want to have a mathematical quantum model of nature, in which one can talk about all the things physicists talk about when they talk about nature in the physical sense. In particular, there will be concepts like particles, fields, detectors, measurement, probability, memory, etc. but -- unlike in real nature -- they will have a precise and unambiguous formal definition, of the same formal quality as force, acceleration, etc. are defined in Hamiltonian mechanics. Then we can ask about the "meaning" of each term, and get a well-defined answer within the formalism, without infinite regress. ---------------------------- S16a. On progress in science ---------------------------- The frontier in science is the frontier because there is no clear understanding of what is beyond. All that is there is a set of questions bothering those close to the frontier, and a set of experiences of more or less failed attempts to push the frontier forward. Real improvements in difficult matters never come by starting from scratch - they come from patiently building upon the best of what already exists, being open-minded but critical about new possibilities, and trying to integrate what looks most promising. Those who had the questions and found real answers published it and andvanced the state of the art. The others can only share their experience and their chart of the uncharted territory. As one can see from the conflicting opinions, these charts are not reliable. If you (the reader) want to proceed further, you need to learn to see with your own eyes, take your own risks, and find out for yourself what can be trusted. There are no guides beyond a certain point. And don't count on recognition before you actually succeed! As long as ideas are tentative and not validated by experiment, they are always hard to defend. Success comes late - either with a triumphal experimental verification, or if people realize that a new way is significantly simpler than the tradition. If neither happens, people will stick to the tradition, except for a minority who lives from exploring the consequences of the idea. Innovative research is always a risky business - one must be prepared to continue one's work no matter how much it is criticied, but one must also learn as much as possible from one's critics. Then - if it is indeed the right track - success will come sooner or later. But who knows beforehand what will turn out to be the right track? So people have a right to be critical... Critics usually just present a statement, or point to an incoherence in an opponent's statement. To learn from it is a nontrivial task, since it means that one has to find out a) how to make the criticism strongest, in a constructive sense, and b) how best to defend the original statement. Finding this out is learning from it. Everyone starts their journey from where they are, in the direction they find most promising. The others observe what they do and have to make up their own mind. If people knew what is the right start and the right direction, all important unsolved problems were solved by now. The journey is a journey to collect understanding of the ill-understood. To find bugs in a computer code one doesn't go around speculating, but one carefully compares evidence available and stays as close as possible to the code. Physics needs to find the bug in its foundations, and as with computer programs, it will be very subtle and will be found only by a careful investigator, not by a dreamer. Of course, a certain amount of creativity is needed. But it must be guided closely by general knowledge of similar problems already solved and on the structure of the system, together with the information turned up by a detailed analysis of the code. Thus imaginative speculation works only if checked and confirmed by detailed code analysis. And most of the wild ideas are useless. Not a procedure I'd recommend for research, though, unfortunately, it has become fashionable in some quarters of theoretical physics... Rather, learn as much as you can about how and why the good theories work, and if you have the calibre to be an innovator, you might be able to spot what went wrong. But not by searching in the mist; your search should always be well-directed, or you'll go in circles... Judging from my own experience, understanding is not something that springs into one's head without preparation, but is the result of walking attentively and openminded along many blind alleys, until one sees one which smells like being the real thing. Then one starts grinding away in this direction, and in this process discovers what should have been the guiding principle that would have avoided all the dead ends, bringing one directly to the goal. Then, and only then, the right understanding governs the remainder of the search. This is not only my personal experience but seems to be the general pattern: See G. Polya, Mathematical Discovery, John Wiley and Sons, New York, 1962. ------------------------------------------------------------- S16b. How different are physical sciences and social sciences ------------------------------------------------------------- From the subject matter treated, a lot. From the modeling side far less. There is no difference in principle. All science is based on observation and experiment. All experimental data must be observed according to well-defined protocols, to be objective (and hence science). The main difference between physical sciences and social sciences is that in the former one generally studies systems which are strongly constrained by the experimental setting, so that they give much more predictable results. In both cases, however, the correct mathematical model is that of a stochastic process, and physiccal sciences and social sciences only differ in the size of the noise relative to the signal. Sometimes to the extent that one can ignore the noise and treat a physical system as deterministic, while a social system can never be controlled well enough to make the remaining fluctuations negligible. ------------------------------------- S16c. Can good theories be falsified? ------------------------------------- The philosopher Karl Popper claimed that falsifiability is the hallmark of scientific theories. But scientific practice speaks against him. A correct theory cannot be falsified, and in this sense is not falsifiable, in spite of Popper. (Falsifiability can be asserted only in a contrafactual sense, that there are _conceivable_ situations that, according to the theory, are excluded. But for a correct theory, these situation will never happen, hence are completely ficticious.) What happens with good theories is, at worst, that their region of validity or accuracy gets restricted as new data about more remote instances come in. In today's understanding, people are careful to indicate the limits where a theory is claimed to be valid, and the accuracy to which its answers are to be trusted. For example, the Standard Model is claimed to be valid whenever gravitation is negligible, accuracies conform to present possibilities, and energies are well below a putative unification scale. Failures outside this domain are not counted as falsifications. While limits and accuracy claims are not necessarily part of the theory proper, they are part of the theory as actually taught and applied. Indeed, although people try to extrapolate, one can never be sure whether a theory is correct outside the domain where the data were collected. But one can be reasonably sure within the domain where enough data are available. Good scientific practice requires that a good theory agrees with the data within the tolerances claimed. Once this is the case, these theories can never be falsified. Rather, if people find disagreement in experiments, the theory falsifies the experimental arrangement or analysis. All science students who ever did experiments in the lab know very well that this is common practice. The degree of caution and care at the highest level of quality has been increasing through the centuries. It is now too late to ask Newton whether he believed his theory was valid without restrictions. (Or are there any hints in the Principia Mathematica?) Certainly Newton's theory as taught today is taught (i.e., with the restriction that it is valid at speeds small compared to c and at distances large compared to the radius of the largest atom). But we nevertheless believe that it is the 'same' theory, and if Newton would live today, I think he would agree with that. And Newton's theory will never be falsified, unless God suddenly decides to change the physics of the Universe. (That the observed advance of Mercury's perihelion did not match Newton's theory was known as a limitating condition already before relativity was born.) ---------------------------------------------- S16d. What, then, distinguishes a good theory? ---------------------------------------------- We can _know_ whether a theory has been correct in the past, and we can _trust_ that it will remain so in the future. There is no other kind of knowledge than that of the past. Relying on that ''anything in the future is like in the past'' is an act of faith. The question is not about faith or not, but about faith in what is best supported by past experience. Theories that conform with the past are easy to trust. But they come in different degrees of stringency. Theories which are not restrictive at all but accommodates everything (such as astrology or psychoanalysis) are in vogue (as society shows) but useless (and probably harmful). These are the ones that Popper calls unfalsifiable. Highly restrictive theories (what Popper calls scientific) are preferred by those who want to control their destiny as far as possible. Theories like Newton's, general relativity, or QED are extremely restrictive and in agreement with past experience, hence both trustworthy and very useful. What makes a theory good is not its potential falsifiability, but that it drastically reduces the number of possibilities which are present without the theory, without eliminating something that can actually happen. If you have no theory and put two marbles into your empty pocket, and then another two, you don't know how many marbles you can take out. If you know arithmetic and the law of conservation of marbles you can predict that exactly four can be taken out. This is testable, and will always come out correct. So you have a correct theory. Of course, its validity is not unlimited, since it assumes that your pocket does not have a hole; so if some experiment does not conform to your theory since you can only take out three, you suspect that the domain of validity was violated; you check for the hole - and surely you'll find it. This is exactly analogous to the way Newton's theory works, within its domain of validity. If it fails, we suspect speed close to c, or highly accurate measurements, or tiny distances. And surely we'll find it so. ------------------------------------------------ S16e. When is a theory preferred to another one? ------------------------------------------------ Frequently, Ockham's razor ''frustra fit per plura quod potest fieri per pauciora'', that we should not use more degrees of freedom than are necessary to model a phenomenon, is invokes to argue that the theory with the fewest parameters is the best. But this is true only when taken with many grains of salt. Chemists prefer as a starting point of their deepest investigations the theory based on Dirac-Fock theory or even cruder approximations, treating the nuclei (for large problems even atoms) as elementary. This gives them all the information they need, while they can deduce nothing at all from the standard model which is supposed to be a much more exact and general theory. Thus what is preferred depends a lot on which use can be made of it Ockham's razor is appropriate only if two theories allow the same deductions with a similar amount of work, or if the more parsimonious theory is even superior in allowing one to derive the desired properties. Nothing in science is against a complicated model if it gives more ready access to the quantities of interest than a formally simpler but computationally more difficult or even untractable formulation. Given only the standard model +classical relativity (allegedly correctly describing all phenomena of the world at accessible energies, distances, and accuracy), we'd know very little about our world, and only very inaccurately. Not even the masses of the nuclei can be predicted at present with any confidence, let alone the properties of water or gold. And given only string theory (a theory without any free parameter), we'd know essentially nothing about our world. (See http://rz70.rz.uni-karlsruhe.de/~ed01/Hyle/Hyle3/hoffman.htm for further discussion of Ockham's razor.) -------------------- 16f. What is a fact? -------------------- In discussion on sci.physics.research, one often finds very good information, but also often poor and misleading information. How to distinguish the good from the poor? Everything called knowledge is in fact a set of beliefs of the person claiming it. And this set of beliefs is more or less close to the objective truth, depending on the standards of that persons. Calling so-called knowledge a set of beliefs does not contradict the objectivity of mathematical definitions. When I say that a Banach space is a normed, complete vector space, I both state my belief and happen to coincide with the social consensus of the guild of mathematicians. And when I say that state reduction is a physical process, I both state my belief and happen to coincide with famous physicists like von Neumann and many others, and this is good enough to make this statement honestly, since the community has not reached an agreement on the matter. Telling others what one thinks is true in no way manipulates others any more than feeding others what one thinks is nourishing. But as we shouldn't accept being fed by those with poor judgment about food, we shouldn't accept an opinion for the truth if offered by someone with poor judgment about the relevant areas. It is obvious that whatever a person claims is first and foremost his or her personal opinion, and not a fact. Who takes it for a fact is simply misleading himself or herself. Thus there is no need to qualify each of one's statements by clumsy phrases like 'in my opinion', or 'according to what I have read/understood', or 'as far as I am informed' or 'since this makes most sense to me'. These phrases accompany silently any statement by anyone. It is also obvious that an opinion doesn't become a fact because it is believed by half the number of people from a particular ensemble; truth would otherwise become dependent on the choice of this ensemble. Thus one needs to check the claims, to listen to different sides of a controversy, to ask for sources or justification of an opinion. In this way, anyone who wants to get a clear picture soon notices which claims are trustworthy, which ones are tenable but somewhat shaky, and which ones are poorly founded. On the other hand, in participating in a discussion, honesty only requires that one asserts what one thinks is true, and gives one's reasons upon request. This is the scientific approach, since it lets others check upon the trustworthiness of a claim. ---------------------------- S16g. Physics and experience ---------------------------- On superficial reasoning, time is only a concept that helps us to order our experiences. Thus, ''experience exists; time does not''. By exactly the same, argument, ''experience exists; space does not'' ''experience exists; mass does not'' ''experience exists; charge does not'' ''experience exists; gravitation does not'' etc. Physics is exactly about the concepts that are substituted for experience to make experience quantitatively predictable. Therefore, in this deeper sense, time, space, mass, charge, gravitation, etc. exist, and are more fundamental than experience. ---------------------- S16h. Modeling reality ---------------------- In describing reality from a physics point of view, the person modeling a system of interest makes certain choices. These consist in choosing a mathematical model of the system, and setting up a correspondence between informal objects related to the system and formal objects in the mathematical model. More specifically, an assertion about reality is modelled as a mathematical assertion about mathematical objects in the mathematical model that carry the same names as those in the reality they are supposed to model. -------------------------------------------- S16i. What is a system (e.g., an ideal gas)? -------------------------------------------- Theories of physics do not say what a system (such as an electron, a star, an ideal gas, a crystal) is in reality. Nevertheless, it is possible to check the reality contents of a physical theory. How does this come about? Let us consider thermodynamics. Thermodynamics does not say which system is an ideal gas, which is only a van-der-Waals gas, which is a liquid, or a solid. Indeed, such questions need not be answered by the theory. Instead, they are answered by checking how a system behaves: If a real system behaves as the theory for an ideal gas (a solid, a crystal, an electron) requires, a physisict will say it 'is' an ideal gas (a solid, a crystal, an electron); if not, it is not. While this definition may seem circular, it isn't once it is recognized that one can check some characterizing properties of systems that have a particular label (such as 'ideal gas') by a small amount of measurements, and then deduce many more properties from the theory that can be checked subsequently. Engineers call this process 'system identification'. Thus the task of theory is to provide models with just enough flexibility that they cover the range of relevant possibilities, while being still restricted enough so that one can identify the system with a limited amount of data. Exactly in this case a theory has predictive value. --------------------------------- S16j. When is a theory confirmed? --------------------------------- Any deviation from a law can only be 'confirmed' by narrowing error bars for the parameters modeling the deviation. As long as the error bars contain zero, the law counts as confirmed. With time, confirmation of the law may be at a higher level of accuracy, or (as in the case of neutron masses) confirmation of the deviation (if the more accurate error bars no longer contain zero). If one disputes any of the established theories because of not enough confirmation, one can as well dispute Lorentz symmetry, translation invariance, zero photon mass, general relativity, etc., which are basic to contemporary physics but all confirmed only to a certain precision. There are experiments testing the limits of all these assumptions, but even when one of these experiments succeeds (as in the case of neutron masses), the previous theory remains valid to the accuracy it was known to be valid before. In this sense, older theories don't die even when they are superseded. A well-known case is Newton's gravitational theory which is still taught and heavily used although not completely correct. ------------------- S16k. What is real? ------------------- All physics is just a handy way of thinking about certain phenomena. This - a handy way of thinking - is what it means that something - the concept we find useful - exists. We say that people exist, because they are a handy way to describe certain blobs of matter like ourselves. We say that electrons exist, because they are a handy way to describe ionization phenomena. We say that photons exist because they are a handy way to describe quantum optics phenomena. Photons are objectively real because they are needed in the only comprehensive coherent theory of microscopic interactions that we know of. On the other hand, 'photon' is merely a word that physicists use on paper and in conversation. But in precisely the same sense that entropy, energy, or the electromagnetic field are merely words that physicists use on paper and in conversation. Even our best concepts are 'merely' words. If we give up concepts, only an undifferentiated happening in space-time remains, and even talking about this becomes impossible. --------------------------------------------------- S16l. How many angels fit onto the tip of a needle? --------------------------------------------------- Anton Zeilinger writes in http://www.ap.univie.ac.at/users/Anton.Zeilinger/philosop.html ''the question whether such a description exists or not was therefore similarly irrelevant as, according to Pauli, the old question how many angels fit onto the tip of a needle.'' This question has become a well-known metaphor for doing irrelevant physics. But how old is this question really? Who was the person who discussed it seriously? http://web.maths.unsw.edu.au/~jim/headsofpins.html mentions explicitly Chillingworth's ''Religion of Protestants a Safe Way to Salvation'' (1638, reprinted 1972, 12th unnumbered page of the preface) accusing unnamed scholars of debating ''Whether a Million of Angels may not fit upon a needles point?'' It seems that, as here, the question has always been used in a derisive manner only. In the historical essay E.D. Sylla, Swester Katrei and Gregory of Rimini: Angels, God and mathematics in the fourteenth century, pp. 251-270 in: Mathematics and the Divine: A Historical Study (T. Koetsier and L. Bergmans, eds.) Elsevier 2005, http://www.elsevier.com/wps/find/bookdescription.cws_home/704302/description#description Sylla conjectures that the question might have been coined by Thomas Hobbes, who had learnt the scholastic tradition in Oxford between 1603 and 1608. See also http://en.wikipedia.org/wiki/How_many_angels_can_dance_on_the_head_of_a_pin%3F But similar questions were discussed much earlier. Sylla mentions an anonymous 14th century mystical treatise ''Swester Katrei'' (= Sister Kate or Sister Catherine) referring to ''a thousand souls in heaven sitting on the point of a needle''. Cf. also the paper G.M. Ross, Angels, Philosophy 60 (1985), 495-511. http://www.jstor.org/pss/3750436 and the web site Sister Catherine (Schwester Katrei) http://people.bu.edu/dklepper/RN413/katrei.html Even earlier and most prominent is the discussion of angels in Thomas Aquinas' ''Summa Theologica'', published in 1266. It is surprisingly interesting. It looks as if Aquinas was the first writer anticipating quantum theory and the Pauli exclusion principle. Replace 'angel' by 'electron' and he sounds surprisingly modern; in modern terms, angels are Fermions, according to Thomas Aquinas. An English translation of the ''Summa Theologica'' is available online. Part I (http://www.newadvent.org/summa/1.htm) contains the chapter on angels. The sections 50-53 on their substance relates to their physical properties and hence is of scientific interest. There he discusses the properties of a point particle from a logical point of view. His 'angels' are not the winged creatures we might imagine them to be, but incorruptible, indivisible, extended objects, ''form without matter'', with quite precise properties. Two angels cannot be in the same place, but they have virtual (sic!) positions, and can be in an extended place: ''So the entire body to which he is applied by his power, corresponds as one place to him.'' They may go from one place to another with or without being observable in between: ''But an angel's substance is not subject to place as contained thereby, but is above it as containing it: hence it is under his control to apply himself to a place just as he wills, either through or without the intervening place.'' Their number roughly matches those of the number of electrons: ''Hence it must be said that the angels, even inasmuch as they are immaterial substances, exist in exceeding great number, far beyond all material multitude.'' (With ''angel'' interpreted as ''electron'', ''immaterial'' could thus be interpreted as zero baryon number.) Like early chemists hiding their scientific insights in an alchemist guise, he might have phrased his speculations in terms of notions acceptable to his clerical collegues... If we attribute to the Greeks the concept of the atom (though they thought of it in - for modern ears bizarre - terms that have little to do with our modern view), we should perhaps be as generous towards Aquinas and attribute to him the exclusion principle. On a more tongue-in-cheek basis, the Annals of Improbable Research published an article A. Sandberg, Quantum Gravity Treatment of the Angel Density Problem, Annals of Improbable Research 7 (Issue 3), (2001), 5-8. http://www.improbable.com/airchives/paperair/volume7/v7i3/angels-7-3.htm http://headofapin.net/ ------------------------------------------------------ S17a. How to get information from sci.physics.research ------------------------------------------------------ If you read sci.physics.research out of curiosity, you may find that the discussions get too specific for you but make you curious to learn more about the background. But it may be difficult to find out where to get started. The right way to find out is to ask on sci.physics.research for what you need, in response to someone's contribution. The writers usually know how they got the knowledge, and are happy to give you hints or recommendations, and others will join in if they think they have better advice. The more specific your question, the more likely you'll get an answer, and the more useful it will be for others, too. By asking good questions you are doing a service to all. My Lord Jesus Christ, for whom I live, asserted: "Ask, and it will be given you; search, and you will find; knock, and the door will be opened for you. For everyone who asks receives, and everyone who searches finds, and for everyone who knocks, the door will be opened." (Matth. 7:7-8) It took me a while to realize that this was excellent advice. ------------------------------------ S17b. How to get your work published ------------------------------------ You did some work that you think is great (or at least reasonable), but it was rejected by the journal you sent it to? This is disappointing, but not the end of all hope... Rejection letters usually give some reasons for rejection; if they don't you may request (in a polite way!) getting reasons so that you can learn from them. And then _do_ learn from them! Usually the reasons for rejection are sound and mean at least that you didn't pose your case well. It also takes some time to learn the standards that publications should respect, and it is likely that you violated some of the unspoken rules without realizing it. If your idea is far from mainstream, you need also convince people that your approach is sound and merits spending the time to read through the new proposal. This is difficult since you need to build up trust; it requires that you have a high level of frustration tolerance. The less mainstream an idea the stronger must be its contents and the more careful it must be argued to be publishable; use the feedback you get to find out the standards expected and then go and meet them. The difference between a crank and a serious researcher is that the letter learns from criticisms and grows through each feedback, while the former 'knows' (and acts on this assumption) that he is right and that established physics is just rejecting him or her for no good reasons. If you enter a correspondence with anyone who takes the time to read your work, stay polite even when the answers you get are not what you hoped for. Once the tone of your mail gets defensive or aggressive, you probably lost your case - your partner sees that you try to replace facts by emotions and your credibility is gone. Time is precious for active scientists. So keep your article as short as possible without losing substance. 120 pages of detailed analysis, say, is too much for most people to read, unless they already have high confidence that the contents is sound. If you really need 120 pages to make your case you need to make short versions of your long paper that allow others to do checks for reasonableness with less efforts. You'd then have a 1/2 page abstract, a 3 page introductory essay, a 7 page outline version, a 20 page version with the key steps, and a full paper with all the details, and each of these versions should be self-contained and allow the reader to get a feeling of what you do, and why you succeed - in terms of background that shows that you are familiar with the state of the art, and in a language that is both understandable and concise. Then anyone reading it gets a sense of high quality work that is informative and inviting. Note that the most important task is not to present your claim and praise or defend your work, but to convince others that your claim deserves trust enough to spend time on checking it. It is all too easy to make claims that are unsubstantiated but embedded in a complicated manuscript where one gets easily lost, loses track of what is important, and therefore misses the mistakes or gaps in the arguments. It is the responsibility of the innovator to present the news in a way that makes checking and trusting easy. Of course one can find many published papers that do not meet these standards. This is probably because their contents is not important enough to require high standards of checking, or because their conclusions are not inviting suspicion. But innovative work invites suspicion since it is far from the common, and if relevant requires therefore higher standards to be accepted. -------------------------------------------------- S17c. How to respond to critical referee's reports -------------------------------------------------- {This is taken verbatim from http://authors.aps.org/faq_review.html] What Should I Do When a Referee Criticizes My Paper? Read the referee report carefully and dispassionately. Approach the report with an open mind. What may at first seem like a devastating blow is perhaps a request for more information or for a more detailed explanation. At other times the referee may indeed have found a fatal flaw in the research or logic. Put yourself in the position of a reader, which is exactly the position of the referee. Is the paper well written? Is the presentation clear, unambiguous, and logical? Respond to all referee comments, suggestions, and criticisms. Explain which changes have been made and state your position on points of disagreement. In our experience, appropriate response to some referee comments may require more research or even reconsideration of the research project. ----------------------------------------- S17d. How to sell your revolutionary idea ----------------------------------------- Unless you don't care about making a fool of yourself, don't tell it to others before you worked out enough details to be convincing. Your audience is very likely to be skeptic (since there are too many revolutionary ideas around which don't stand the test); so you need to make best use of this fact. The secret is that most people like to answer questions that fall into their field of expertise, if it does not take too much effort to reply. But few like to listen to half-baked (or even fully baked but only outlined) ideas; too many such offers come from cranks. The devil is always in the details; and if you can't provide them it is likely they'll think it is because it does not work or does not offer any advantage. So the right approach is to ask them for (and afterwards study!) information about what is known in the direction you want to go, rather than proposing the revolutionary way of doing it correctly. Take heed of the advice of an old saint: "Let every man be swift to hear, slow to speak, slow to wrath." (The Bible, James 1:19) If you really can do it better than others, and you don't find prior relevant work in the literature, work it out yourself and show with a nontrivial application that you can do _something_ more effciently than tradition. Then submit it to a respectable journal, and people are likely to listen. If you get negative feedback from referees, take it seriously, learn from it as much as you can. Raise your standards according to what you learn, and accomodate the criticism in your future work. The referees are usually competent and have a point in what they say. If not, it is likely that your work was presented in a fashion prone to misunderstanding - in this case formulate your results more carefully, taking into account accepted tradition. It is an author's obligation to minimize the chances of misunderstanding by potential readers. I gained a lot from considering the referee's advice in the many papers I have written. And it takes a while to learn how to write good papers... Even if your work is good but not mainstream, it may take persistence to publicize it properly; publishing is not enough. But publicizing does not mean boasting with great claims - this makes people suspicious and is therefore counterproductive. Be modest in your claims - claim what you can actually prove, but not what you only dream of proving one day. See also: The Crackpot Index (by John Baez) at math.ucr.edu/home/baez/crackpot.html --------------------------------------------------- S17e. Useful background, online lecture notes, etc. --------------------------------------------------- (incomplete, just some useful references) The Nobel Prize Winners in Physics http://www.mat.univie.ac.at/~neum/nobel.txt Nobel lectures of the laureates, and their biographies http://nobelprize.org/physics/laureates/ worth reading - can be regarded as a sort of lively answer to the question: What has been important enough in physics to deserve a big prize? Gerard 't Hooft, How to Become a _good_ theoretical physicist http://www.phys.uu.nl/~thooft/theorist.html Hyperphysics http://hyperphysics.phy-astr.gsu.edu/hbase/hph.html A tree (well, almost) of physics fields, subfields, and concepts. The leaves explain things in some detail. There is an index but it does not contain references to each node. Organization seems to be experimental physics oriented; for example, I have not found nodes with 'statistical mechanics' or 'quantum field theory'. Everything You Always Wanted to Know About the Hydrogen Atom (But Were Afraid to Ask) http://www.pha.jhu.edu/~rt19/hydro/hydro.html Theory of Renormalization and Regularization http://wwwthep.physik.uni-mainz.de/~scheck/Hessbg02.html contains a very useful set of online notes that may serve as an introduction to QFT from a mathematical physics point of view. Lecture Scripts and Online Courses on Quantum Mechanics http://cips02.physik.uni-bonn.de/~baehren/scripts/quantum.html and on other physics topics http://www.astron.nl/~bahren/wiki/doku.php?id=studium:lecture_scripts Introduction to General Relativity (video) http://timms.uni-tuebingen.de/ Review articles on Local Quantum Physics http://www.lqp.uni-goettingen.de/bibliography/reviews.html Norbert Dragon, Remarks on Quantum Mechanics http://www.itp.uni-hannover.de/~dragon/qm_eng.ps.gz Lost & Regained Causes in theoretical physics http://www.mth.kcl.ac.uk/~streater/lostcauses.html http://www.mth.kcl.ac.uk/~streater/regainedcauses.html Selected Classic Papers from the History of Chemistry http://web.lemoyne.edu/~giunta/papers.html Digest of moderated newsgroup sci.physics.research http://www.lns.cornell.edu/spr Historical Physics Lecture Notes http://hrst.mit.edu/hrs/renormalization/public/documents.htm * Freeman J. Dyson, Advanced Quantum Mechanics 1951 * Fritz Rohrlich, Applied Quantum Electrodynamics, 1953 * Green and Sengers Proc. 1965 conference on critical phenomena * Cyril Domb's brief historical survey on critical phenomena, 1985 * 1993 roundtable, Physics in Transition Resources for the History of Physics & Allied Fields http://www.aip.org/history/web-link.htm Sidney Coleman Lecture notes on quantum field theory http://www.damtp.cam.ac.uk/user/dt281/qft/col1.pdf http://www.damtp.cam.ac.uk/user/dt281/qft/col2.pdf ------------------------------ S17f. Stories about physicists ------------------------------ Memories about Theoretical Physicists (by R.F. Streater) http://www.mth.kcl.ac.uk/~streater/links.html Short Stories http://www2.physics.umd.edu/~yskim/home/storie.html Parables for Modern Academia (by D. and L. Haarsma) http://www.calvin.edu/~lhaarsma/parables.html http://www.asa3.org/archive/asa/200006/0147.html ------------------------ S17g. Other physics FAQs ------------------------ http://math.ucr.edu/home/baez/physics/ Usenet Physics FAQ (extensive, has also links to further physics-related FAQs) http://www.faqs.org/faqs/physics-faq/ Physics FAQ (a list of links) http://www.kar.net/~plasma/faq/ Plasma FAQ http://www.iworld.de/~ej/faq.html Quantum Physics FAQ (current views of Erich Joos) http://theory.gsi.de/~vanhees/faq/index.html Physik und das Drumherum (Physics FAQ in German) ----------------------- S17h. Naming in science ----------------------- How do scientific concepts, effects, or inventions named after their discoverers? It is good practice to name important concepts, effects, or inventions created by esteemed collegues after them - good names are always hard to find, and besides names clearly related to the content, names naturally related to the history stick best. If a naming is successful (in that others find it appropriate and useful) it will spread, and soon everywhere is using it. Then the name is established. It is bad practice if authors calls something by their own name before it has been established by others. It suggests both vanity and a lack of confidence that others do a good naming job. And if the self chosen vanity name does not stick, it serves them right for having made a fool of themselves. On the other hand, naming is at times unfair. Not rarely in the past, a concept (or theorem, etc.) got the name of one of its main proponents rather than that of its creator. http://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy There are several reasons for this. It takes time (and a certain amount of interest) to find the true origin of a concept; but a good name is needed once it is used by more than a few people. But once a name is established, it is nearly impossible to change it. A concept may also be rediscovered independent of its first inception. If the time wasn't ripe for it the first time, it is likely that the name of the rediscoverer sticks, and the voices of those who had known the first source come too late. See also: List of misnamed theorems http://en.wikipedia.org/wiki/List_of_misnamed_theorems ----------------------------------------------- S18a. What is the meaning of 'self-consistent'? ----------------------------------------------- A self-consistent solution (or method, or theory) refers to the fact that one has two sets of equations relating two sets of unknown quantities, and wants to solve the equations jointly for the unknowns. If aspect A of a theory says y=x^2 and aspect B of the theory (or of another theory) says x=y-2 then self-consistency means that both equations are assumed to be valid, giving x^2 = y = x+2, which leads to the two solutions x=2, y=4 and x=-1, y=1. That's all. Of course, the self-consistent Hartree-Fock method, say, has more variables and is harder to solve, but the principle is the same. ----------------------- S18b. What is a vector? ----------------------- A vector is (for the beginner) a list of numbers written below each other. For example the x,y, and z coordinate of a point in a 3-dimensional coordinate system. Physicists write the three coordinates as x_1, x_2, x_3 and combine it to a vector simply called x. / \ | x_1 | x = | x_2 | (The parentheses look a bit awkward in ascii.) | x_3 | \ / The same for a list of n numbers. This gives a vector x with n coordinates x_1,...x_n, and is thought of as a point in a space with n dimensions. Two vectors are added or subtracted just by adding or subtracting their entries. A vector is multiplied by a number just by multiplying each entry with the number. Then there is the inner product of two vectors x dot y = sum_i x_i*y_i which is a number and not a vector. Once you mastered vectors you need to understand matrices. These are rectangular arrays of numbers. Later you need to enrich the meaning of a vector by learning the concept of a vector space. Now all sorts of objects might also deserve the name vector, most prominently functions, matrices, tensors, operators. They behave in many respects just like ordinary vectors. ------------------------------------------ S18c. Learning quantum mechanics at age 14 ------------------------------------------ If you want to learn about quantum physics and really understand you need to learn first how to do calculations with vectors and matrices. Look in your local library for math books, about 'linear algebra' or 'analytic geometry'. You may have to try several before you find one suitable at your level. Linear algebra (i.e., vectors and matrices) is more fundamental to quantum mechanics than calculus, although the latter is needed to understand how things change steadily with time. But one can understand the time-independent part of quantum mechanics already without calculus, namely everything involving entanglement, Schroedinger's cat, quantum cryptography, and the like. This only needs linear algebra, which may be easier. (On the other hand, calculus is not really difficult either, once one gets used to it.) Maybe at first it is better to get math schoolbooks from your older peers. Good school books are written in a way that they can be used for self study. If you are motivated it can be very exciting! If you like math it is much less work than you might think, and it is fun! Just start with next years textbook and read it in your spare time! I started reading math beyond my age when I was 12, and never regretted it. With the right motivation, you can learn 10 times as fast as when you just wait till the subject comes up in school! And it will be 10 times as interesting! You don't need to do all the exercises but just enough that you think you know how it works. Go back to practicing more if you need it. This speeds up things a lot. Also, you don't need to read everything in the order it is in the book - just go where your curiosity leads you, and if you encounter something you don't know yet, go back to where it was introduced. In this way you get the idea of what is happening long before you understand it thoroughly, and it will be a motivation to learn the missing things. Learning math and physics is a life-long challenge (so much interesting stuff accumulated over the centuries...), and you can't start early enough. And at any time in life there will be parts you understand well, parts you understand partly or superficially only, and parts where you know little more than a few buzz words. So you need not aim at understanding everything fully on first acquaintance, but learn whatever you can in whatever order you pick it up. The stuff to be practiced and learnt well is only the part that comes up over and over again. When you realize that then you know what to learn, and you quickly see how to do it! ------------------------ S18d. Research at age 16 ------------------------ With 16, you should spend your time with learning rather than with doing research. Lacking ideas means knowing too little... Once you know enough about what others did and where they got stuck, you'll have more than enough ideas to work on. I'd like to suggest that you read the Nobel lectures of the physics Nobel laureates, http://nobelprize.org/physics/laureates/ The material spans a whole century, and will occupy you for long! It will put your mind to themes that have been important enough to merit the prize; most of them will continue to be important in the future. In parallel, use the web to sort out all concepts used in the Nobel lectures that you don't yet understand; at first it will be a lot, and you have to search a bit to find out where the basics you need are well explained. Some items might be explained in this theoretical physics FAQ, or in the book mentioned at the top of this FAQ. Doing both will put you on a learning track which will end in a research career and bear plenty of fruit. ------------------------------------------ S18e. Are there indefinite Hilbert spaces? ------------------------------------------ There are no indefinite Hilbert spaces. There are, however, vector spaces with a distinguished indefinite inner product; these are called Krein spaces. Their structure is much weaker than that of Hilbert spaces; there is no natural topology, no completeness, nothing resembling a Hilbert space except the inner product. Since there are physical situations where indefinite inner products arise naturally, some people show their lack of knowledge of the literature by referring to Krein spaces as indefinite Hilbert spaces. But if a few people do so, it doesn't mean that the terminology is justified. For example, quant-ph/0211048 uses this poor terminology. The ghosts referred to in this paper are nonphysical vectors in a Krein space which contains a definite subspace of physical vectors whose completion gives the physical Hilbert space. This is a natural construction in gauge theories (Gupta-Bleuler formalism) where the direct construction of a physical Hilbert space would manifestly break Lorentz and/or gauge invariance, while the nonphysical, bigger Krein space enjoys all desired invariance properties. The indefinite metric in relativity, also mentioned in that paper, has nothing to do with indefinite Hilbert spaces, since the underlying vector spaces (Minkowski space in special relativity, the tangent spaces at space-time points in general relativity) are 4-dimensional spaces with the ordinary Euclidean topology (although the metric is non-Euclidean). --------------------- S19a. God and physics --------------------- This is most likely to be controversial; but you might be interested in how the author of this FAQ sees the issues. The following links are to some relevant pages from my web site. How Do We Know Whether God Acts In The World? http://www.mat.univie.ac.at/~neum/sciandf/eng/godacts.html ''I found the assumption that `God acts in the world' a superior way of organizing the events that I see or hear happen.'' Knowledge, Chance, and Creation http://www.mat.univie.ac.at/~neum/sciandf/eng/chance.html (On the difficulty to know, and the role of the second law of thermodynamics as an instrument of creation) How to study http://www.mat.univie.ac.at/~neum/sciandf/eng/study.html ''When I questioned the bible about the attitude appropriate to the study of science I found the following instructions.'' How to Create a Universe - Instructions for an Apprentice God. http://www.mat.univie.ac.at/~neum/other/turing.txt (A fantasy to be read at leisure time) Science and Faith (an extensive collection of links) http://www.mat.univie.ac.at/~neum/sciandf.html ''Science is the truth only in matters that can be objectified; in the spiritual world, where values, goals, authority and purpose are located, science has nothing to say. It is a poor life that is restricted to the scientific standard of truth, where you and I are nothing but a collection of atoms without meaning and purpose. Realizing the narrow-minded nature of science opens the gate to an understanding of God that complements the scientific truth and gives life, love and peace.'' and in German: Gott - die grosse Unbekannte http://www.mat.univie.ac.at/~neum/sciandf/ger/unbek.html Mathematik, Physik und Ewigkeit (mit einem Augenzwinkern betrachtet) http://www.mat.univie.ac.at/~neum/sciandf/ger/neumann.pdf --------------------- S20a. Acknowledgments --------------------- Thanks to the contributors to the newsgroup sci.physics.research for their more or less challenging questions and comments, without which this FAQ wouldn't exist. Thanks also to Steve Carlip, Norbert Dragon, Hendrik van Hees, Don Koks, Nick Maclaren, Alejandro Rivero, Joe Rongen, and Gerard Westendorp for useful comments that lead to improvements in the FAQ. Finally, thanks to God for his wonderful and interesting universe, and for the gift of being able to understand his wonders.