Tree decompositions and large-scale computation

A crash course in 18 lectures

by Arnold Neumaier, University of Vienna

This page, http://www.mat.univie.ac.at/~neum/Tree/TDcourse.html, contains information about my course with the same title held from February 6-20, 2012 at the

Department of Computer Science, University of Szeged, Hungary.

The lectures were each workday from 14:00 to 16:30, with 15 minutes
break in the middle, except Fridays where they were from 14:00 to
15:30 without a break.

To have maximal benefit from the course, you should have a good background in graph theory and linear algebra.

Examinations will be in writing a few weeks after the course. (The examination will be organized by Prof. Csendes.) You will be asked to do some multifrontal computations on toy examples similar to those covered in the course or given as exercises, and to answer some questions about the subject matter.

**Lecture 1** (Monday, February 6):

I discussed in general terms the graph
structure of large-scale computations, and which graphs are favorable
to a domain decomposition approach. Most favorable are paths and trees
(a rare case in applications),
but graphs that are pathlike or treelike in the sense that they have
a small pathwidth or treewidth share the multifrontal structure that
makes a domain decomposition approach most efficient. This leads to a
join tree structure of a collection of fronts covering the edges of
the graph. If the fronts are not large, we may solve decomposable
problems in linear time by solving small auxiliary subproblems in a
forward sweep through the fronts from the leaves to the root,
followed by a backward sweep that assembles the solution, following
all branches of the tree from the root to the leaves.

Parallelization is supported if the join tree has lots of leaves. I informally discussed the nested dissection approach for finding a tree decomposition with many leaves, for the eample when the graph is a path (where the natural sequential approach is not parallelizable).

**Lecture 2** (Monday, February 6):

I presented applications from five areas where
multifrontal computations play an important role:

**Lecture 3** (Tuesday, February 7):

I completed the overview over applications, presenting
five further areas where multifrontal computations are important:

**Lecture 4** (Tuesday, February 7):

I introduced Boolean and more general discrete constraint satisfaction
problems (CSPs) and table-based constraint propagation as a basic
pruning technique.

I discussed the NP-completeness of the satisfiability problem, and the consequences for solving not sufficiently structured CSPs. I then defined the special case of sequential CSPs, and announced their solvability in linear time by an appropriate form of constraint propagation.

**Lecture 5** (Wednesday, February 8):

I discussed how to solve sequential CSPs by calculating and updating
tables of possibilities on fronts determined by a moving window,
eliminating in a forward sweep through the fronts the variables that
no longer appear in subsequent fronts. The frontal table of the last
front then contains a list of partial solution, which can be completed
in a backward sweep.

The time is linear in the sequence length but potentially exponential in the window length. It is also linear in the number of solutions computed, but as there may be exponentially many solutions, one may wish to compute only a few ones. An efficient implementation uses the state space form of a sequential CSP, which reduces the window length to two by encoding the past in terms of states.

**Lecture 6** (Wednesday, February 8):

Slightly generalizing sequential CSPs, I introduced the concept of a
path decomposition of the constraint graph, and showed how to create
an efficient state space form to solve problems having a path
decomposition of small width.

For problems in state space form, we discussed cost functions for which it is possible to efficiently find the best among all solutions of a CSP with a good path decomposition. This leads to a dynamic programming problem, where I derived the main recursive relation.

**Lecture 7** (Thursday, February 9):

I presented an example for how to execute a dynamic programming
algorithm for finding the best solution of CSPs in state space form.
In fact we found all (two) optimal solutions.

I then discussed a Boolean CSP in 9 variables, computed its constraint graph and a particular tree decomposition, and illustrated various features of the nonsequential setting.

**Lecture 8** (Thursday, February 9):

The Boolean CSP in 9 variables was solved in complete analogy to the
sequential case. We found all 7 solutions and discussed ways to
answer various questions about the solution without having to generate
them all. This illustrates applications to database queries (that will
be treated later).

As introduction to Part 2 of the course, which is supposed to cover the mathematical side of the subject, we looked at properties of trees that are not present in general graphs, but play an essential role for the use of trees in multifrontal computations. We then looked at the defining properties of join trees, noting that the join property and the intersection property are in fact equivalent.

**Lecture 9** (Friday, February 10):

I discussed how to go from an arbitrary join tree to a minimal one,
and proved that every edge of a join tree gives rise to a separator
of the associated filled graph. Then I introduced tree decompositions of
graphs in a formal way, and deduced that graphs with small treewidth
must have many small separators.

This naturally leads to the nested dissection strategy for the construction of good tree decompositions. I ended with a discussion of the principles involved in creating good nested dissection software.

**Lecture 10** (Monday, February 13):

I discussed clique properties of filled graphs and proved the chordal
graph theorem, the deepest result in the subject.

**Lecture 11** (Monday, February 13):

I introduced maximum cardinality search for finding fill-in free tree
decompositions, and discussed the role of the clique intersection
graph for finding all possible join trees with a given filled graph.
I mentioned without deeper discussion minimum fill and minimum degree
heuristics for the construction of tree decompositions.

**Lecture 12** (Tuesday, February 14):

I introduced the sparsity graph of a matrix, and showed that Gaussian
elimination produces a filled matrix whose sparsity graph is the filled
graph in the corresponding elimination ordering. For matrices whose
sparsity graph has a good tree decomposition, a multifrontal algorithm
sois therefore able to solve the associated linear equations using only
dense matrix computations on matrices of the size of the fronts.
This is very efficient in today's world of parallel vector processing.

**Lecture 13** (Tuesday, February 15):

In extension of the previous lecture, I showed how to use an additional
backward sweep to efficiently calculate for symmetric matrices all
entries of the inverse matrix whose indices are in a common front.
In particular, this gives the diagonal of the inverse, relevant for
assessing the accuracy of least square estimators.

It also allows one to efficiently compute the gradient of the logarithm of the determinant of a symmetric matrix dependent on parameters, which is of crucial importance for applications in animal breeding.

**Lecture 14** (Wednesday, February 16):

To prepare for the discussion of Bayesian network, I reviewed some
concepts from probability theory, in particular conditional
proabilities.

Then I defined hidden Markov models, described their use in speech analysis (from sounds to phomnemes) and in speech synthesis in cell phones, and showed that the negative loglikelihood function has the structure that makes dynamic programming feasible.

**Lecture 15** (Wednesday, February 16):

I introduced Viterbi decoding for finding the most likely state
sequence given a HMM and an observation sequence, and showed that it
reduces to solving a dynamic programming problem.
Then I discussed the training problem - finding the probability tables
for a HMM from a large sample of observation sequences. Prior
probability tables can be found from expert assignments of state
sequences to a subset of the training sample (or just by guessing a
prior). The prior tables may then refined iteratively by
Viterbi training, which labels observed sequences on the basis of
Viterbi decoding and created new probability tables based on counting
in the resulting data.

An important alternative is Baum-Welch training, which deduces the probability tables directly from the HMM probability measure, using belief propagation. This is a much slower process than Viterbi training, but has the advantage of corectly accounting for cases where competing state assignments with almost the same probabilities exist. This leads to ambiguities in the model predictions, where Viterbi decoding often chooses an inferior alternative when the probability tables are inaccurate.

**Lecture 16** (Thursday, February 17):

After introducing the concept of conditional independence,
I defined causal networks and proved some basic results for it.
From a tree decomposition of the causal graph (which define a Bayesian
network) one gets a product formula for the probability measure in
terms of probability tables on fronts. This leads to multifrontal
formulas for the potential (negative loglikelihood), which allows an
efficient solution of the task of finding the most likely completion
of a partial scenario in a causal network, and hence of training
causal networks via Viterbi training.

**Lecture 17** (Thursday, February 17):

I derived the recursion formulas for doing belief propagation in
causal networks, adnd for finding probabilities on subtrees.
based on this, it is possible to do the more accurate Baum-Welch
training.

**Lecture 18** (Friday, February 18):

I discussed the problem of finding all solutions of constrained
nonlinear systems of equations in high dimensions. Local methods
(e.g., multistart strategies) often become unreliable, while global
methods (based on branch and bound) often get stuck in a combinatorial
explosion of untreated subproblems.

If the constraint graph has a small treewidth, one may contemplate to devise a multifrontal solution procedure based on discretization, which reduces the problem to a discrete CSP. However, for a fine discretization, the tables are likely to become huge.

I described some of the problems arising in this approach applied to nonlinear systems for distillation columns, whose constraint graph indeed has small pathwidth. This is work in progress done by my research group in Vienna. I reported on a particular such problem that we could already solve successfully, while conventional methods produce erratic results.

(from before starting the course)

I intend to begin with a general overview of large-scale computation and applications where tree decompositions have a decisive advantage. I'll also review basic themes such as Boolean constraint satisfaction and polynomial-time algorithms, that form a background for the remaining course.

The next topic will be dynamic programming techniques for discrete constraint satisfaction problems with a temporal structure: how to decide whether a solution exists, how to find some or all solutions, how to select the best or the k best solutions according to some criterion.

Then I want to discuss some graph theoretic background and its relations to the direct solution of large linear systems of equations. I'll introduce join trees, tree decompositions, and the concept of treewidth, and show how their properties benefit a number of computations in linear algebra.

This will probably be followed by nonserial dynamic programming - the extension of the dynamic programming paradigm to problems whose structure is given by a sparse graph with comparatively small treewidth.

As an application, I intend to consider algorithms for efficient database queries.

Then I'll probably look at algorithms for efficiently finding good tree decompositions for large graphs, based on principles such as minimum degree, nested dissection, and maximum cardinality search.

The next topic will probably be hidden Markov models (HMMs) for analyzing and predicting discrete time series. I'll discuss training, testing, and updating a HMM, topics closely related to dynamic programming.

Bayesian networks generalize HMMs in the same way as nonserial dynamic programming generalizes dynamic programming. They are the most appropriate tool for reasoning under aleatoric uncertainty (belief propagation) and for forming consistent and parsimonious statistical models for large, structured datasets (machine learning).

Towards the end of the course, I'll also say something about recent work in our group on applying the above ideas to the problem of finding all solutions of difficult constrained nonlinear systems of equations arising in chemical engineering.

Other applications that I might treat if there is time and interest are structured semidefinite programming, restricted maximum likelihood methods for plant and animal breeding, sparse inverse covariance selection for large-scale data sets, and applications to statistical mechanics and quantum spin systems.

A large collection of links on the topics treated can be found at my page on the

This list is neither systematic nor comprehensive.

Restricted Maximum Likelihood (REML) Methods

Tree decomposition at Wikipedia

A Brief Introduction to Graphical Models and Bayesian Networks

Bayesian Networks Bibliography

Probabilistic networks and explanatory coherence (Paul Thagard)