What,s new by Ghani 
What’s new
Updates on my research and expository papers, discussion of open problems, and other maths-related topics. By Terence Tao
The determinant
of a square matrix
obeys a large number of important identities, the most basic of which is the multiplicativity property
whenever
are square matrices of the same dimension. This identity then generates many other important identities. For instance, if
is an
matrix and
is an
matrix, then by applying the previous identity to the
square matrices
and
(where we will adopt the convention that
denotes an identity matrix of whatever dimension is needed to make sense of the expressions being computed, and similarly for
) we obtain the Sylvester determinant identity
This identity, which relates an
determinant with an
determinant, is very useful in random matrix theory (a point emphasised in particular by Deift), particularly in regimes in which
is much smaller than
.
Another identity generated from (1) arises when trying to compute the determinant of a
block matrix
where
is an
matrix,
is an
matrix,
is an
matrix, and
is an
matrix. If
is invertible, then we can manipulate this matrix via block Gaussian elimination as
and on taking determinants using (1) we obtain the Schur determinant identity
relating the determinant of a block-diagonal matrix with the determinant of theSchur complement
of the upper left block
. This identity can be viewed as the correct way to generalise the
determinant formula
It is also possible to use determinant identities to deduce other matrix identities that do not involve the determinant, by the technique of matrix differentiation (or equivalently, matrix linearisation). The key observation is that near the identity, the determinant behaves like the trace, or more precisely one has
for any bounded square matrix
and infinitesimal
. (If one is uncomfortable with infinitesimals, one can interpret this sort of identity as an asymptotic as
.) Combining this with (1) we see that for square matrices
of the same dimension with
invertible and
invertible, one has
for infinitesimal
. To put it another way, if
is a square matrix that depends in a differentiable fashion on a real parameter
, then
whenever
is invertible. (Note that if one combines this identity withcofactor expansion, one recovers Cramer’s rule.)
Let us see some examples of this differentiation method. If we take the Sylvester identity (2) and multiply one of the rectangular matrices
by an infinitesimal
, we obtain
applying (4) and extracting the linear term in
(or equivalently, differentiating at
and then setting
) we conclude the cyclic property of trace:
To manipulate derivatives and inverses, we begin with the Neumann seriesapproximation
for bounded square
and infinitesimal
, which then leads to the more general approximation
whenever
depends in a differentiable manner on
and
is invertible.
We can then differentiate (or linearise) the Schur identity (3) in a number of ways. For instance, if we replace the lower block
by
for some test
matrix
, then by (4), the left-hand side of (3) becomes (assuming the invertibility of the block matrix)
while the right-hand side becomes
extracting the linear term in
, we conclude that
As
was an arbitrary
matrix, we conclude from duality that the lower right
block of
is given by the inverse
of the Schur complement:
One can also compute the other components of this inverse in terms of the Schur complement
by a similar method (although the formulae become more complicated). As a variant of this method, we can perturb the block matrix in (3) by an infinitesimal multiple of the identity matrix giving
By (4), the left-hand side is
From (5), we have
which relates the trace of the inverse of a block matrix, with the trace of the inverse of one of its blocks. This particular identity turns out to be useful in random matrix theory; I hope to elaborate on this in a later post.
As a final example of this method, we can analyse low rank perturbations
of a large (
) matrix
, where
is an
matrix and
is an
matrix for some
. (This type of situation is also common in random matrix theory, for instance it arose in this previous paper of mine on outliers to the circular law.) If
is invertible, then from (1) and (2) one has thematrix determinant lemma
if one then perturbs
by an infinitesimal matrix
, we have
Extracting the linear component in
as before, one soon arrives at
assuming that
and
are both invertible; as
is arbitrary, we conclude (after using the cyclic property of trace) the Sherman-Morrison formula
for the inverse of a low rank perturbation
of a matrix
. While this identity can be easily verified by direct algebraic computation, it is somewhat difficult to discover this identity by such algebraic manipulation; thus we see that the “determinant first” approach to matrix identities can make it easier to find appropriate matrix identities (particularly those involving traces and/or inverses), even if the identities one is ultimately interested in do not involve determinants. (As differentiation typically makes an identity lengthier, but also more “linear” or “additive”, the determinant identity tends to be shorter (albeit more nonlinear and more multiplicative) than the differentiated identity, and can thus be slightly easier to derive.)
Exercise 1 Use the “determinant first” approach to derive theWoodbury matrix identity (also known as the binomial inverse theorem)whereis an
matrix,
is an
matrix,
is an
matrix, and
is an
matrix, assuming that
,
and
are all invertible.
Mathematicians study a variety of different mathematical structures, but perhaps the structures that are most commonly associated with mathematics are the number systems, such as the integers
or the real numbers
. Indeed, the use of number systems is so closely identified with the practice of mathematics that one sometimes forgets that it is possible to do mathematics without explicit reference to any concept of number. For instance, the ancient Greeks were able to prove many theorems in Euclidean geometry, well before the development of Cartesian coordinates and analytic geometry in the seventeenth century, or the formal constructions or axiomatisations of the real number system that emerged in the nineteenth century (not to mention precursor concepts such as zero or negative numbers, whose very existence was highly controversial, if entertained at all, to the ancient Greeks). To do this, the Greeks used geometric operations as substitutes for the arithmetic operations that would be more familiar to modern mathematicians. For instance, concatenation of line segments or planar regions serves as a substitute for addition; the operation of forming a rectangle out of two line segments would serve as a substitute for multiplication; the concept of similarity can be used as a substitute for ratios or division; and so forth.
A similar situation exists in modern physics. Physical quantities such as length, mass, momentum, charge, and so forth are routinely measured and manipulated using the real number system
(or related systems, such as
if one wishes to measure a vector-valued physical quantity such as velocity). Much as analytic geometry allows one to use the laws of algebra and trigonometry to calculate and prove theorems in geometry, the identification of physical quantities with numbers allows one to express physical laws and relationships (such as Einstein’s famous mass-energy equivalence
) as algebraic (or differential) equations, which can then be solved and otherwise manipulated through the extensive mathematical toolbox that has been developed over the centuries to deal with such equations.
However, as any student of physics is aware, most physical quantities are not represented purely by one or more numbers, but instead by a combination of a number and some sort of unit. For instance, it would be a category error to assert that the length of some object was a number such as
; instead, one has to say something like “the length of this object is
yards”, combining both a number
and a unit (in this case, the yard). Changing the unit leads to a change in the numerical value assigned to this physical quantity, even though no physical change to the object being measured has occurred. For instance, if one decides to use feet as the unit of length instead of yards, then the length of the object is now
feet; if one instead uses metres, the length is now
metres; and so forth. But nothing physical has changed when performing this change of units, and these lengths are considered all equal to each other:
It is then common to declare that while physical quantities and units are not, strictly speaking, numbers, they should be manipulated using the laws of algebra as if they were numerical quantities. For instance, if an object travels
metres in
seconds, then its speed should be
where we use the usual abbreviations of
and
for metres and seconds respectively. Similarly, if the speed of light
is
and an object has mass
, then Einstein’s mass-energy equivalence
then tells us that the energy-content of this object is
Note that the symbols
are being manipulated algebraically as if they were mathematical variables such as
and
. By collecting all these units together, we see that every physical quantity gets assigned a unit of a certaindimension: for instance, we see here that the energy
of an object can be given the unit of
(more commonly known as a Joule), which has the dimension of
where
are the dimensions of mass, length, and time respectively.
There is however one important limitation to the ability to manipulate “dimensionful” quantities as if they were numbers: one is not supposed to add, subtract, or compare two physical quantities if they have different dimensions, although it is acceptable to multiply or divide two such quantities. For instance, if
is a mass (having the units
) and
is a speed (having the units
), then it is physically “legitimate” to form an expression such as
, but not an expression such as
or
; in a similar spirit, statements such as
or
are physically meaningless. This combines well with the mathematical distinction between vector, scalar, and matrix quantities, which among other things prohibits one from adding together two such quantities if their vector or matrix type are different (e.g. one cannot add a scalar to a vector, or a vector to a matrix), and also places limitations on when two such quantities can be multiplied together. A related limitation, which is not always made explicit in physics texts, is that transcendental mathematical functions such as
or
should only be applied to arguments that are dimensionless; thus, for instance, if
is a speed, then
is not physically meaningful, but
is (this particular quantity is known as the rapidity associated to this speed).
These limitations may seem like a weakness in the mathematical modeling of physical quantities; one may think that one could get a more “powerful” mathematical framework if one were allowed to perform dimensionally inconsistent operations, such as add together a mass and a velocity, add together a vector and a scalar, exponentiate a length, etc. Certainly there is some precedent for this in mathematics; for instance, the formalism of Clifford algebras does in fact allow one to (among other things) add vectors with scalars, and in differential geometry it is quite common to formally apply transcendental functions (such as the exponential function) to a differential form (for instance, the Liouville measure
of a symplectic manifold can be usefully thought of as a component of the exponential
of the symplectic form
).
However, there are several reasons why it is advantageous to retain the limitation to only perform dimensionally consistent operations. One is that of error correction: one can often catch (and correct for) errors in one’s calculations by discovering a dimensional inconsistency, and tracing it back to the first step where it occurs. Also, by performing dimensional analysis, one can often identify the form of a physical law before one has fully derived it. For instance, if one postulates the existence of a mass-energy relationship involving only the mass of an object
, the energy content
, and the speed of light
, dimensional analysis is already sufficient to deduce that the relationship must be of the form
for some dimensionless absolute constant
; the only remaining task is then to work out the constant of proportionality
, which requires physical arguments beyond that provided by dimensional analysis. (This is a simple instance of a more general application of dimensional analysis known as the Buckingham
theorem.)
The use of units and dimensional analysis has certainly been proven to be very effective tools in physics. But one can pose the question of whether it has a properly grounded mathematical foundation, in order to settle any lingering unease about using such tools in physics, and also in order to rigorously develop such tools for purely mathematical purposes (such as analysing identities and inequalities in such fields of mathematics as harmonic analysis or partial differential equations).
The example of Euclidean geometry mentioned previously offers one possible approach to formalising the use of dimensions. For instance, one could model the length of a line segment not by a number, but rather by the equivalence class of all line segments congruent to the original line segment (cf. the Frege-Russell definition of a number). Similarly, the area of a planar region can be modeled not by a number, but by the equivalence class of all regions that areequidecomposable with the original region (one can, if one wishes, restrict attention here to measurable sets in order to avoid Banach-Tarski-type paradoxes, though that particular paradox actually only arises in three and higher dimensions). As mentioned before, it is then geometrically natural to multiply two lengths to form an area, by taking a rectangle whose line segments have the stated lengths, and using the area of that rectangle as a product. This geometric picture works well for units such as length and volume that have a spatial geometric interpretation, but it is less clear how to apply it for more general units. For instance, it does not seem geometrically natural (or, for that matter, conceptually helpful) to envision the equation
as the assertion that the energy
is the volume of a rectangular box whose height is the mass
and whose length and width is given by the speed of light
.
But there are at least two other ways to formalise dimensionful quantities in mathematics, which I will discuss below the fold. The first is a “parametric” model in which dimensionful objects are modeled as numbers (or vectors, matrices, etc.) depending on some base dimensional parameters (such as units of length, mass, and time, or perhaps a coordinate system for space or spacetime), and transforming according to some representation of a structure group that encodes the range of these parameters; this type of “coordinate-heavy” model is often used (either implicitly or explicitly) by physicists in order to efficiently perform calculations. The second is an “abstract” model in which dimensionful objects now live in an abstract mathematical space (e.g. an abstract vector space), in which only a subset of the operations available to general-purpose number systems such as
or
are available, namely those operations which are “dimensionally consistent” or invariant (or more precisely, equivariant) with respect to the action of the underlying structure group. This sort of “coordinate-free” approach tends to be the one which is preferred by pure mathematicians, particularly in the various branches of modern geometry, in part because it can lead to greater conceptual clarity, as well as results of great generality.
Things are pretty quiet here during the holiday season, but one small thing I have been working on recently is a set of notes on special relativity that I will be working through in a few weeks with some bright high school students here at our local math circle. I have only two hours to spend with this group, and it is unlikely that we will reach the end of the notes (in which I derive the famous mass-energy equivalence relation E=mc^2, largely following Einstein’s original derivation as discussed in this previous blog post); instead we will probably spend a fair chunk of time on related topics which do not actually require special relativity per se, such as spacetime diagrams, the Doppler shift effect, and an analysis of my airport puzzle. This will be my first time doing something of this sort (in which I will be spending as much time interacting directly with the students as I would lecturing); I’m not sure exactly how it will play out, being a little outside of my usual comfort zone of undergraduate and graduate teaching, but am looking forward to finding out how it goes. (In particular, it may end up that the discussion deviates somewhat from my prepared notes.)
The material covered in my notes is certainly not new, but I ultimately decided that it was worth putting up here in case some readers here had any corrections or other feedback to contribute (which, as always, would be greatly appreciated).
[Dec 24: notes updated, in response to comments.]
Perhaps the most important structural result about general large dense graphs is the Szemerédi regularity lemma. Here is a standard formulation of that lemma:
Lemma 1 (Szemerédi regularity lemma) Letbe a graph on
vertices, and let
. Then there exists a partition
for some
with the property that for all but at most
of the pairs
, the pair
is
-regular in the sense that
wheneverare such that
and
, and
is the edge density between
and
. Furthermore, the partition is equitablein the sense that
for all
.
There are many proofs of this lemma, which is actually not that difficult to establish; see for instance these previous blog posts for some examples. In this post I would like to record one further proof, based on the spectral decomposition of the adjacency matrix of
, which is essentially due to Friezeand Kannan. (Strictly speaking, Frieze and Kannan used a variant of this argument to establish a weaker form of the regularity lemma, but it is not difficult to modify the Frieze-Kannan argument to obtain the usual form of the regularity lemma instead. Some closely related spectral regularity lemmas were also developed by Szegedy.) I found recently (while speaking at the Abel conference in honour of this year’s laureate, Endre Szemerédi) that this particular argument is not as widely known among graph theory experts as I had thought, so I thought I would record it here.
For reasons of exposition, it is convenient to first establish a slightly weaker form of the lemma, in which one drops the hypothesis of equitability (but then has to weight the cells
by their magnitude when counting bad pairs):
Lemma 2 (Szemerédi regularity lemma, weakened variant) . Letbe a graph on
vertices, and let
. Then there exists a partition
for some
with the property that for all pairs
outside of an exceptional set
, one has
Let us now prove Lemma 2. We enumerate
(after relabeling) as
. The adjacency matrix
of the graph
is then a self-adjoint
matrix, and thus admits an eigenvalue decomposition
for some orthonormal basis
of
and some eigenvalues
, which we arrange in decreasing order of magnitude:
We can compute the trace of
as
Let
be a function (depending on
) to be chosen later, with
for all
. Applying (3) and the pigeonhole principle (or the finite convergence principle, see this blog post), we can find
such that
We now design a vertex partition to make
approximately constant on most cells. For each
, we partition
into
cells on which
(viewed as a function from
to
) only fluctuates by
, plus an exceptional cell of size
coming from the values where
is excessively large (larger than
). Combining all these partitions together, we can write
for some
, where
has cardinality at most
, and for all
, the eigenfunctions
all fluctuate by at most
. In particular, if
, then (by (4) and (6)) the entries of
fluctuate by at most
on each block
. If we let
be the mean value of these entries on
, we thus have
Finally, to control
we see from (4) and (8) that
has an operator norm of at most
. In particular, we have from the Cauchy-Schwarz inequality that
Let
be the set of all pairs
where either
,
,
, or
One easily verifies that (2) holds. If
is not in
, then by summing (9), (11), (12) and using (5), we see that
and so (since
)
If we let
be a sufficiently rapidly growing function of
that depends on
, the second error term in (13) can be absorbed in the first, and (1) follows. This concludes the proof of Lemma 2.
To prove Lemma 1, one argues similarly (after modifying
as necessary), except that the initial partition
of
constructed above needs to be subdivided further into equitable components (of size
), plus some remainder sets which can be aggregated into an exceptional component of size
(and which can then be redistributed amongst the other components to arrive at a truly equitable partition). We omit the details.
Remark 1 It is easy to verify thatneeds to be growing exponentially in
in order for the above argument to work, which leads to tower-exponential bounds in the number of cells
in the partition. It was shown by Gowers that a tower-exponential bound is actually necessary here. By varying
, one basically obtains the strong regularity lemmafirst established by Alon, Fischer, Krivelevich, and Szegedy; in the opposite direction, setting
essentially gives the weak regularity lemma of Frieze and Kannan.
Remark 2 If we specialise to a Cayley graph, in whichis a finite abelian group and
for some (symmetric) subset
of
, then the eigenvectors are characters, and one essentially recovers the arithmetic regularity lemma of Green, in which the vertex partition classes
are given by Bohr sets (and one can then place additional regularity properties on these Bohr sets with some additional arguments). The components
of
, representing high, medium, and low eigenvalues of
, then become a decomposition associated to high, medium, and low Fourier coefficients of
.
Remark 3 The use of spectral theory here is parallel to the use of Fourier analysis to establish results such as Roth’s theorem on arithmetic progressions of length three. In analogy with this, one could view hypergraph regularity as being a sort of “higher order spectral theory”, although this spectral perspective is not as convenient as it is in the graph case.
Lars Hörmander, who made fundamental contributions to all areas of partial differential equations, but particularly in developing the analysis of variable-coefficient linear PDE, died last Sunday, aged 81.
I unfortunately never met Hörmander personally, but of course I encountered his work all the time while working in PDE. One of his major contributions to the subject was to systematically develop the calculus of Fourier integral operators (FIOs), which are a substantial generalisation of pseudodifferential operators and which can be used to (approximately) solve linear partial differential equations, or to transform such equations into a more convenient form. Roughly speaking, Fourier integral operators are to linear PDE ascanonical transformations are to Hamiltonian mechanics (and one can in fact view FIOs as a quantisation of a canonical transformation). They are a large class of transformations, for instance the Fourier transform, pseudodifferential operators, and smooth changes of the spatial variable are all examples of FIOs, and (as long as certain singular situations are avoided) the composition of two FIOs is again an FIO.
The full theory of FIOs is quite extensive, occupying the entire final volume ofHormander’s famous four-volume series “The Analysis of Linear Partial Differential Operators”. I am certainly not going to try to attempt to summarise it here, but I thought I would try to motivate how these operators arise when trying to transform functions. For simplicity we will work with functions
on a Euclidean domain
(although FIOs can certainly be defined on more general smooth manifolds, and there is an extension of the theory that also works on manifolds with boundary). As this will be a heuristic discussion, we will ignore all the (technical, but important) issues of smoothness or convergence with regards to the functions, integrals and limits that appear below, and be rather vague with terms such as “decaying” or “concentrated”.
A function
can be viewed from many different perspectives (reflecting the variety of bases, or approximate bases, that the Hilbert space
offers). Most directly, we have the physical space perspective, viewing
as a function
of the physical variable
. In many cases, this function will be concentrated in some subregion
of physical space. For instance, a gaussian wave packet
where
,
and
are parameters, would be physically concentrated in the ball
. Then we have the frequency space (or momentum space) perspective, viewing
now as a function
of the frequency variable
. For this discussion, it will be convenient to normalise the Fourier transform using a small constant
(which has the physical interpretation of Planck’s constant if one is doing quantum mechanics), thus
For instance, for the gaussian wave packet (1), one has
and so we see that
is concentrated in frequency space in the ball
.
However, there is a third (but less rigorous) way to view a function
in
, which is the phase space perspective in which one tries to view
as distributed simultaneously in physical space and in frequency space, thus being something like a measure on the phase space
. Thus, for instance, the function (1) should heuristically be concentrated on the region
in phase space. Unfortunately, due to the uncertainty principle, there is no completely satisfactory way to canonically and rigorously define what the “phase space portrait” of a function
should be. (For instance, the Wigner transform of
can be viewed as an attempt to describe the distribution of the
energy of
in phase space, except that this transform can take negative or even complex values; see Folland’s book for further discussion.) Still, it is a very useful heuristic to think of functions has having a phase space portrait, which is something like a non-negative measure on phase space that captures the distribution of functions in both space and frequency, albeit with some “quantum fuzziness” that shows up whenever one tries to inspect this measure at scales of physical space and frequency space that together violate the uncertainty principle. (The score of a piece of music is a good everyday example of a phase space portrait of a function, in this case a sound wave; here, the physical space is the time axis (the horizontal dimension of the score) and the frequency space is the vertical dimension. Here, the time and frequency scales involved are well above the uncertainty principle limit (a typical note lasts many hundreds of cycles, whereas the uncertainty principle kicks in at
cycles) and so there is no obstruction here to musical notation being unambiguous.) Furthermore, if one takes certain asymptotic limits, one can recover a precise notion of a phase space portrait; for instance if one takes the semiclassical limit
then, under certain circumstances, the phase space portrait converges to a well-defined classical probability measure on phase space; closely related to this is the high frequency limit of a fixed function, which among other things defines the wave front set of that function, which can be viewed as another asymptotic realisation of the phase space portrait concept.
If functions in
can be viewed as a sort of distribution in phase space, then linear operators
should be viewed as various transformations on such distributions on phase space. For instance, a pseudodifferential operator
should correspond (as a zeroth approximation) to multiplying a phase space distribution by the symbol
of that operator, as discussed in this previous blog post. Note that such operators only change the amplitude of the phase space distribution, but not the support of that distribution.
Now we turn to operators that alter the support of a phase space distribution, rather than the amplitude; we will focus on unitary operators to emphasise the amplitude preservation aspect. These will eventually be key examples of Fourier integral operators. A physical translation
should correspond to pushing forward the distribution by the transformation
, as can be seen by comparing the physical and frequency space supports of
with that of
. Similarly, a frequency modulation
should correspond to the transformation
; a linear change of variables
, where
is an invertible linear transformation, should correspond to
; and finally, the Fourier transform
should correspond to the transformation
.
Based on these examples, one may hope that given any diffeomorphism
of phase space, one could associate some sort of unitary (or approximately unitary) operator
, which (heuristically, at least) pushes the phase space portrait of a function forward by
. However, there is an obstruction to doing so, which can be explained as follows. If
pushes phase space portraits by
, and pseudodifferential operators
multiply phase space portraits by
, then this suggests the intertwining relationship
The formalisation of this fact in the theory of Fourier integral operators is known as Egorov’s theorem, due to Yu Egorov (and not to be confused with the more widely known theorem of Dmitri Egorov in measure theory).
Applying commutators, we conclude the approximate conjugacy relationship
Now, the pseudodifferential calculus (as discussed in this previous post) tells us (heuristically, at least) that
and
where
is the Poisson bracket. Comparing this with (2), we are then led to the compatibility condition
thus
needs to preserve (approximately, at least) the Poisson bracket, or equivalently
needs to be a symplectomorphism (again, approximately at least).
Now suppose that
is a symplectomorphism. This is morally equivalent to the graph
being aLagrangian submanifold of
(where we give the second copy of phase space the negative
of the usual symplectic form
, thus yielding
as the full symplectic form on
; this is another instantiation of the closed graph theorem, as mentioned in this previous post. This graph is known as the canonical relation for the (putative) FIO that is associated to
. To understand what it means for this graph to be Lagrangian, we coordinatise
as
suppose temporarily that this graph was (locally, at least) a smooth graph in the
and
variables, thus
for some smooth functions
. A brief computation shows that the Lagrangian property of
is then equivalent to the compatibility conditions
for
, where
denote the components of
. Some Fourier analysis (or Hodge theory) lets us solve these equations as
A reasonable candidate for an operator associated to
and
in this fashion is the oscillatory integral operator
for some smooth amplitude function
(note that the Fourier transform is the special case when
and
, which helps explain the genesis of the term “Fourier integral operator”). Indeed, if one computes an inner product
for gaussian wave packets
of the form (1) and localised in phase space near
respectively, then a Taylor expansion of
around
, followed by a stationary phase computation, shows (again heuristically, and assuming
is suitably non-degenerate) that
has (3) as its canonical relation. (Furthermore, a refinement of this stationary phase calculation suggests that if
is normalised to be the half-density
, then
should be approximately unitary.) As such, we view(4) as an example of a Fourier integral operator (assuming various smoothness and non-degeneracy hypotheses on the phase
and amplitude
which we do not detail here).
Of course, it may be the case that
is not a graph in the
coordinates (for instance, the key examples of translation, modulation, and dilation are not of this form), but then it is often a graph in some other pair of coordinates, such as
. In that case one can compose the oscillatory integral construction given above with a Fourier transform, giving another class of FIOs of the form
This class of FIOs covers many important cases; for instance, the translation, modulation, and dilation operators considered earlier can be written in this form after some Fourier analysis. Another typical example is the half-wave propagator
for some time
, which can be written in the form
This corresponds to the phase space transformation
, which can be viewed as the classical propagator associated to the “quantum” propagator
. More generally, propagators for linear Hamiltonian partial differential equations can often be expressed (at least approximately) by Fourier integral operators corresponding to the propagator of the associatedclassical Hamiltonian flow associated to the symbol of the Hamiltonian operator
; this leads to an important mathematical formalisation of thecorrespondence principle between quantum mechanics and classical mechanics, that is one of the foundations of microlocal analysis and which was extensively developed in Hörmander’s work. (More recently, numerically stable versions of this theory have been developed to allow for rapid and accurate numerical solutions to various linear PDE, for instance through Emmanuel Candés’ theory of curvelets, so the theory that Hörmander built now has some quite significant practical applications in areas such as geology.)
In some cases, the canonical relation
may have some singularities (such as fold singularities) which prevent it from being written as graphs in the previous senses, but the theory for defining FIOs even in these cases, and in developing their calculus, is now well established, in large part due to the foundational work of Hörmander.
I’ve just uploaded to the arXiv my joint paper with Vitaly Bergelson, “Multiple recurrence in quasirandom groups“, which is submitted to Geom. Func. Anal.. This paper builds upon a paper of Gowers in which he introduced the concept of a quasirandom group, and established some mixing (or recurrence) properties of such groups. A
-quasirandom group is a finite group with no non-trivial unitary representations of dimension at most
. We will informally refer to a “quasirandom group” as a
-quasirandom group with the quasirandomness parameter
large (more formally, one can work with asequence of
-quasirandom groups with
going to infinity). A typical example of a quasirandom group is
where
is a large prime. Quasirandom groups are discussed in depth in this blog post. One of the key properties of quasirandom groups established in Gowers’ paper is the following “weak mixing” property: if
are subsets of
, then for “almost all”
, one has
where
denotes the density of
in
. Here, we use
to informally represent an estimate of the form
(where
is a quantity that goes to zero when the quasirandomness parameter
goes to infinity), and “almost all
” denotes “for all
in a subset of
of density
“. As a corollary, if
have positive density in
(by which we mean that
is bounded away from zero, uniformly in the quasirandomness parameter
, and similarly for
), then (if the quasirandomness parameter
is sufficiently large) we can find elements
such that
,
,
. In fact we can find approximately
such pairs
. To put it another way: if we choose
uniformly and independently at random from
, then the events
,
,
are approximately independent (thus the random variable
resembles a uniformly distributed random variable on
in some weak sense). One can also express this mixing property in integral form as
for any bounded functions
. (Of course, with
being finite, one could replace the integrals here by finite averages if desired.) Or in probabilistic language, we have
where
are drawn uniformly and independently at random from
.
As observed in Gowers’ paper, one can iterate this observation to find “parallelopipeds” of any given dimension in dense subsets of
. For instance, applying (1) with
replaced by
,
, and
one can assert (after some relabeling) that for
chosen uniformly and independently at random from
, the events
,
,
,
,
,
,
are approximately independent whenever
are dense subsets of
; thus the tuple
resebles a uniformly distributed random variable in
in some weak sense.
However, there are other tuples for which the above iteration argument does not seem to apply. One of the simplest tuples in this vein is the tuple
in
, when
are drawn uniformly at random from a quasirandom group
. Here, one does not expect the tuple to behave as if it were uniformly distributed in
, because there is an obvious constraint connecting the last two components
of this tuple: they must lie in the same conjugacy class! In particular, if
is a subset of
that is the union of conjugacy classes, then the events
,
are perfectly correlated, so that
is equal to
rather than
. Our main result, though, is that in a quasirandom group, this is (approximately) the onlyconstraint on the tuple. More precisely, we have
Theorem 1 Letbe a
-quasirandom group, and let
be drawn uniformly at random from
. Then for any
, we have
wheregoes to zero as
,
are drawn uniformly and independently at random from
, and
is drawn uniformly at random from the conjugates of
for each fixed choice of
.
This is the probabilistic formulation of the above theorem; one can also phrase the theorem in other formulations (such as an integral formulation), and this is detailed in the paper. This theorem leads to a number of recurrence results; for instance, as a corollary of this result, we have
for almost all
, and any dense subsets
of
; the lower and upper bounds are sharp, with the lower bound being attained when
is randomly distributed, and the upper bound when
is conjugation-invariant.
To me, the more interesting thing here is not the result itself, but how it is proven. Vitaly and I were not able to find a purely finitary way to establish this mixing theorem. Instead, we had to first use the machinery of ultraproducts (as discussed in this previous post) to convert the finitary statement about a quasirandom group to an infinitary statement about a type of infinite group which we call an ultra quasirandom group (basically, an ultraproduct of increasingly quasirandom finite groups). This is analogous to how the Furstenberg correspondence principle is used to convert a finitary combinatorial problem into an infinitary ergodic theory problem.
Ultra quasirandom groups come equipped with a finite, countably additive measure known as Loeb measure
, which is very analogous to the Haar measure of a compact group, except that in the case of ultra quasirandom groups one does not quite have a topological structure that would give compactness. Instead, one has a slightly weaker structure known as a
-topology, which is like a topology except that open sets are only closed under countable unions rather than arbitrary ones. There are some interesting measure-theoretic and topological issues regarding the distinction between topologies and
-topologies (and between Haar measure and Loeb measure), but for this post it is perhaps best to gloss over these issues and pretend that ultra quasirandom groups
come with a Haar measure. One can then recast Theorem 1 as a mixing theorem for the left and right actions of the ultra approximate group
on itself, which roughly speaking is the assertion that
for “almost all”
, if
are bounded measurable functions on
, with
having zero mean on all conjugacy classes of
, where
are the left and right translation operators
To establish this mixing theorem, we use the machinery of idempotent ultrafilters, which is a particularly useful tool for understanding the ergodic theory of actions of countable groups
that need not be amenable; in the non-amenable setting the classical ergodic averages do not make much sense, but ultrafilter-based averages are still available. To oversimplify substantially, the idempotent ultrafilter arguments let one establish mixing estimates of the form (2) for “many” elements
of an infinite-dimensional parallelopiped known as an IP system (provided that the actions
of this IP system obey some technical mixing hypotheses, but let’s ignore that for sake of this discussion). The claim then follows by using the quasirandomness hypothesis to show that if the estimate (2) failed for a large set of
, then this large set would then contain an IP system, contradicting the previous claim.
Idempotent ultrafilters are an extremely infinitary type of mathematical object (one has to use Zorn’s lemma no fewer than three times just to construct one of these objects!). So it is quite remarkable that they can be used to establish a finitary theorem such as Theorem 1, though as is often the case with such infinitary arguments, one gets absolutely no quantitative control whatsoever on the error terms
appearing in that theorem. (It is also mildly amusing to note that our arguments involve the use of ultrafilters in two completely different ways: firstly in order to set up the ultraproduct that converts the finitary mixing problem to an infinitary one, and secondly to solve the infinitary mixing problem. Despite some superficial similarities, there appear to be no substantial commonalities between these two usages of ultrafilters.) There is already a fair amount of literature on using idempotent ultrafilter methods in infinitary ergodic theory, and perhaps by further development of ultraproduct correspondence principles, one can use such methods to obtain further finitary consequences (although the state of the art for idempotent ultrafilter ergodic theory has not advanced much beyond the analysis of two commuting shifts
currently, which is the main reason why our arguments only handle the pattern
and not more sophisticated patterns).
We also have some miscellaneous other results in the paper. It turns out that by using the triangle removal lemma from graph theory, one can obtain a recurrence result that asserts that whenever
is a dense subset of a finite group
(not necessarily quasirandom), then there are
pairs
such that
all lie in
. Using a hypergraph generalisation of the triangle removal lemma known as the hypergraph removal lemma, one can obtain more complicated versions of this statement; for instance, if
is a dense subset of
, then one can find
triples
such that
all lie in
. But the method is tailored to the specific types of patterns given here, and we do not have a general method for obtaining recurrence or mixing properties for arbitrary patterns of words in some finite alphabet such as
.
We also give some properties of a model example of an ultra quasirandom group, namely the ultraproduct
of
where
is a sequence of primes going off to infinity. Thanks to the substantial recent progress (by Helfgott, Bourgain, Gamburd, Breuillard, and others) on understanding the expansion properties of the finite groups
, we have a fair amount of knowledge on the ultraproduct
as well; for instance any two elements of
will almost surely generate a spectral gap. We don’t have any direct application of this particular ultra quasirandom group, but it might be interesting to study it further.
which is a subset of the Cartesian product
.
There are a number of “closed graph theorems” in mathematics which relate the regularity properties of the function
with the closure properties of the graph
, assuming some “completeness” properties of the domain
and range
. The most famous of these is the closed graph theorem from functional analysis, which I phrase as follows:
Theorem 1 (Closed graph theorem (functional analysis)) Letbe complete normed vector spaces over the reals (i.e. Banach spaces). Then a function
is a continuous linear transformation if and only if the graph
is both linearly closed (i.e. it is a linear subspace of
) and topologically closed (i.e. closed in the product topology of
).
I like to think of this theorem as linking together qualitative and quantitative notions of regularity preservation properties of an operator
; see this blog post for further discussion.
The theorem is equivalent to the assertion that any continuous linear bijection
from one Banach space to another is necessarily an isomorphism in the sense that the inverse map is also continuous and linear. Indeed, to see that this claim implies the closed graph theorem, one applies it to the projection from
to
, which is a continuous linear bijection; conversely, to deduce this claim from the closed graph theorem, observe that the graph of the inverse
is the reflection of the graph of
. As such, the closed graph theorem is a corollary of the open mapping theorem, which asserts that any continuous linear surjection from one Banach space to another is open. (Conversely, one can deduce the open mapping theorem from the closed graph theorem by quotienting out the kernel of the continuous surjection to get a bijection.)
It turns out that there is a closed graph theorem (or equivalent reformulations of that theorem, such as an assertion that bijective morphisms between sufficiently “complete” objects are necessarily isomorphisms, or as an open mapping theorem) in many other categories in mathematics as well. Here are some easy ones:
Theorem 2 (Closed graph theorem (linear algebra)) Letbe vector spaces over a field
. Then a function
is a linear transformation if and only if the graph
is linearly closed.
Theorem 3 (Closed graph theorem (group theory)) Letbe groups. Then a function
is a group homomorphism if and only if the graph
is closed under the group operations (i.e. it is a subgroup of
).
Theorem 4 (Closed graph theorem (order theory)) Letbe totally ordered sets. Then a function
is monotone increasing if and only if the graph
is totally ordered (using the product order on
).
Remark 1 Similar results to the above three theorems (with similarly easy proofs) hold for other algebraic structures, such as rings (using the usual product of rings), modules, algebras, or Lie algebras, groupoids, or even categories (a map between categories is a functor iff its graph is again a category). (ADDED IN VIEW OF COMMENTS: further examples include affine spaces and-sets (sets with an action of a given group
).) There are also various approximate versions of this theorem that are useful in arithmetic combinatorics, that relate the property of a map
being an “approximate homomorphism” in some sense with its graph being an “approximate group” in some sense. This is particularly useful for this subfield of mathematics because there are currently more theorems about approximate groups than about approximate homomorphisms, so that one can profitably use closed graph theorems to transfer results about the former to results about the latter.
A slightly more sophisticated result in the same vein:
Theorem 5 (Closed graph theorem (point set topology)) Letbe compact Hausdorff spaces. Then a function
is continuous if and only if the graph
is topologically closed.
Indeed, the “only if” direction is easy, while for the “if” direction, note that if
is a closed subset of
, then it is compact Hausdorff, and the projection map from
to
is then a bijective continuous map between compact Hausdorff spaces, which is then closed, thus open, and hence a homeomorphism, giving the claim.
Note that the compactness hypothesis is necessary: for instance, the function
defined by
for
and
for
is a function which has a closed graph, but is discontinuous.
A similar result (but relying on a much deeper theorem) is available in algebraic geometry, as I learned after asking this MathOverflow question:
Theorem 6 (Closed graph theorem (algebraic geometry)) Letbe normal projective varieties over an algebraically closed field
of characteristic zero. Then a function
is a regular map if and only if the graph
is Zariski-closed.
Proof: (Sketch) For the only if direction, note that the map
is a regular map from the projective variety
to the projective variety
and is thus a projective morphism, hence is proper. In particular, the image
of
under this map is Zariski-closed.
Conversely, if
is Zariski-closed, then it is also a projective variety, and the projection
is a projective morphism from
to
, which is clearlyquasi-finite; by the characteristic zero hypothesis, it is also separated. Applying (Grothendieck’s form of) Zariski’s main theorem, this projection is the composition of an open immersion and a finite map. As projective varieties arecomplete, the open immersion is an isomorphism, and so the projection from
to
is finite. Being injective and separable, the degree of this finite map must be one, and hence
and
are isomorphic, hence (by normality of
)
is contained in (the image of)
, which makes the map from
to
regular, which makes
regular. 
The counterexample of the map
given by
for
and
demonstrates why the projective hypothesis is necessary. The necessity of the normality condition (or more precisely, a weak normality condition) is demonstrated by (the projective version of) the map
from the cusipdal curve
to
. (If one restricts attention to smooth varieties, though, normality becomes automatic.) The necessity of characteristic zero is demonstrated by (the projective version of) the inverse of the Frobenius map
on a field
of characteristic
.
There are also a number of closed graph theorems for topological groups, of which the following is typical (see Exercise 3 of these previous blog notes):
Theorem 7 (Closed graph theorem (topological group theory))Letbe
-compact, locally compact Hausdorff groups. Then a function
is a continuous homomorphism if and only if the graph
is both group-theoretically closed and topologically closed.
The hypotheses of being
-compact, locally compact, and Hausdorff can be relaxed somewhat, but I doubt that they can be eliminated entirely (though I do not have a ready counterexample for this).
In several complex variables, it is a classical theorem (see e.g. Lemma 4 of this blog post) that a holomorphic function from a domain in
to
is locally injective if and only if it is a local diffeomorphism (i.e. its derivative is everywhere non-singular). This leads to a closed graph theorem for complex manifolds:
Theorem 8 (Closed graph theorem (complex manifolds)) Letbe complex manifolds. Then a function
is holomorphic if and only if the graph
is a complex manifold (using the complex structure inherited from
) of the same dimension as
.
Indeed, one applies the previous observation to the projection from
to
. The dimension requirement is needed, as can be seen from the example of the map
defined by
for
and
.
(ADDED LATER:) There is a real analogue to the above theorem:
Theorem 9 (Closed graph theorem (real manifolds)) Letbe real manifolds. Then a function
is continuous if and only if the graph
is a real manifold of the same dimension as
.
This theorem can be proven by applying invariance of domain (discussed in this previous post) to the projection of
to
, to show that it is open if
has the same dimension as
.
Note though that the analogous claim for smooth real manifolds fails: the function
defined by
has a smooth graph, but is not itself smooth.
(ADDED YET LATER:) Here is an easy closed graph theorem in the symplectic category:
Theorem 10 (Closed graph theorem (symplectic geometry)) Letand
be smooth symplectic manifolds of the same dimension. Then a smooth map
is a symplectic morphism (i.e.
) if and only if the graph
is a Lagrangian submanifold of
with the symplectic form
.
In view of the symplectic rigidity phenomenon, it is likely that the smoothness hypotheses on
can be relaxed substantially, but I will not try to formulate such a result here.
There are presumably many further examples of closed graph theorems (or closely related theorems, such as criteria for inverting a morphism, or open mapping type theorems) throughout mathematics; I would be interested to know of further examples.
I recently finished the first draft of the last of my books based on my 2011 blog posts (and also my Google buzzes and Google+ posts from that year), entitled “Spending symmetry“. The PDF of this draft is available here. This is again a rather assorted (and lightly edited) collection of posts (and buzzes, and Google+ posts), though concentrating in the areas of analysis (both standard and nonstandard), logic, and geometry. As always, comments and corrections are welcome.
[Once again, some advertising on behalf of my department, following on a similar announcement in the previous three years.]
Two years ago, the UCLA mathematics department launched a scholarship opportunity for entering freshman students with exceptional background and promise in mathematics. We have offered one scholarship every year, but this year due to an additional source of funding, we will also be able to offer an additional scholarship for California residents.The UCLA Math Undergraduate Merit Scholarship provides for full tuition, and a room and board allowance for 4 years. In addition, scholarship recipients follow an individualized accelerated program of study, as determined after consultation with UCLA faculty. The program of study leads to a Masters degree in Mathematics in four years. for commet go to ghani int.com
No comments:
Post a Comment