Axioms and CI implication
After installing CInet::Tools
, you can start the program CImake
:
$ CImake
CImake|001>
This presents you with a REPL (read-eval-print loop) for you to enter
Perl code, similar to polymake
. The REPL is just an interface to a
standard Perl interpreter with all the CInet modules loaded.
Counting, enumerating and symmetry reduction
Using CInet::Tools
, axioms for CI structures can easily be written
down. A semigraphoid is a set of conditional independence statements
which is closed under the following boolean formula
for any distinct and a set disjoint from . The module
CInet::Propositional
provides syntactic sugar
which allows us to define semigraphoids in a similarly succinct fashion:
CImake|001> propositional Semigraphoids = cube (ijk|L) -> (ij|L) & (ik|jL) => (ik|L) & (ij|kL);
Subroutine Semigraphoids redefined at (eval 521) line 6, <DATA> line 840.
undef # time=00:00.14
Some remarks about this line are in order since it deviates from usual
Perl code. It uses the powerful Keyword::Declare
module to hijack the Perl parser and add our own syntax extensions. In this
case, the propositional
keyword switches from the usual Perl parser to
our own, which understands a math-like syntax for CI axioms.
The other remark concerns terminology around cube
s. By pure coincide,
the faces of the -dimensional hypercube index many important
concepts in the theory of conditional independence structures. Its points
(vertices) correspond to subsets of and therefore to subvectors of a
random vector. Its edges (1-faces) are described by non-empty subsets with
a distinguished element. We use to denote the edge that goes from
vertex to vertex , whenever . Edges index elementary
functional dependence statements. Lastly, squares (2-faces) are similarly
indexed by where and . This
symbol corresponds to the 2-face with vertices and it
indexes an elementary conditional independence statement.
The part cube (ijk|L)
in the above statement specifies that the following
axioms require distinct atoms and an arbitrary disjoint subset
of the ground set. The axioms given after the arrow ->
will be replicated
for every such -face of the -cube.
The result of this statement in our CImake
environment is that it defines
a function Semigraphoids
. This definition of semigraphoids is actually
built into CInet::Tools
(see CInet::Propositional::Families
),
which is why its redefinition in the shell causes a warning.
The Semigraphoids
function can be called with a ground set argument.
This results in a sequence (CInet::Seq::Propositional
)
to be created for dealing with the set of all semigraphoids on that ground set.
This sequence has a SAT solver (CInet::ManySAT
)
attached which allows to efficiently test membership, count or enumerate
all elements.
CImake|002> Semigraphoids(3)->count
22 # time=00:00.01
CImake|003> Semigraphoids(4)->count
26424 # time=00:00.03
The symmetric group acts on the ground and it induces an action
on the cube and its face lattice which preserves the dimension of each
face. In particular, there is a resulting action on CI structures as sets of
2-faces. This symmetry is implemented in CInet::Symmetry
. The sequence
object generated by Semigraphoids
can be reduced modulo this symmetry
using the following built-in function:
CImake|004> Semigraphoids(4)->modulo(SymmetricGroup)->count
1512 # time=00:00.53
This still takes less than a second to compute! Semigraphoids are also closed under the entire symmetry group of the -cube, also known as the hyperoctahedral group . This is implemented as well:
CImake|005> Semigraphoids(4)->modulo(HyperoctahedralGroup)->count
629 # time=00:01.12
Our final note about axioms in this tutorial is that axiom systems like
Semigraphoid
can be reused. Graphoids are semigraphoids which also
fulfill the intersection axiom.
CImake|006> propositional Graphoids = cube(ijk|L) -> Semigraphoids, (ij|kL) & (ik|jL) => (ij|L) & (ik|L);
CImake|007> Graphoids(4)->count
6482 # time=00:00.03
If a graphoid also satisfies upwards stability and weak transitivity, then it is an undirected graphical model. Due to faithfulness of these models, the following is a very roundabout way of counting undirected graphs:
CImake|008> propositional UndirectedGraphs = cube(ijk|L) -> Graphoids, (ij|L) => (ij|kL), (ij|L) => (ik|L) | (jk|L);
CImake|009> UndirectedGraphs(4)->count
64 # time=00:00.02
CImake|010> UndirectedGraphs(5)->count
1024 # time=00:00.16
CImake|011> UndirectedGraphs(6)->count
32768 # time=00:24.42
The last line took 24 seconds to execute! Recall that the ->count
method
uses an exact model counter (or #SAT solver), in this case the program
dsharp
, on the boolean formula assembled from the given axioms. One piece
of advice from years of using SAT solvers: It is sometimes more efficient
to enumerate all solutions to the formula using an AllSAT solver —
especially when the number of solutions is comparatively small — and to
count the elements in the result set. This can work because counters and
enumerators use very different algorithms and sometimes the enumeration
strategy is more powerful against a given formula.
CImake|012> 0+ UndirectedGraphs(6)->list
32768 # time=00:02.38
Conditional independence implication
Many basic families of CI structures can be defined axiomatically. In this case, SAT solvers can be used to compute CI statements which are implied by the axioms and additional CI assumptions. This is known as the conditional independence implication problem for the family under consideration.
The families considered above are already provided by the
CInet::Propositional::Families
module
which is automatically loaded by CImake
. Families defined using the
propositional
mechanism have implication testing methods built into them.
To check if holds
modulo the gaussoid axioms, simply execute:
CImake|013> my $p = CIR(4, [[1,2],[3]], [[1,3],[4]], [[1,4],[2]])
Relation <*0****0**0**************> over Cube on [1,2,3,4]
CImake|014> Gaussoids->imply($p => [[1,2],[]])
"" # time=00:00.02
The CIR
function creates a CInet::Relation
which is used to represent a CI structure. This structure holds the premises
of the query. The imply
method is then used to check if the statements in
$p
and the gaussoid axioms together imply (12|)
. The result is a “no” in
this case and is reached in just 10ms. Note that this is rule (20) in
[LM07]. It is an implication which is valid for regular Gaussian
distributions but which cannot be derived from the gaussoid axioms.
On the other hand, let’s consider an instance of the long semigraphoid
derivation sequences in [Mat02]. The function L
defined in
the file longmatus.pl
quoted below constructs the sequence
of CI structures defined in the paper. It is shown there that in order to
derive the consequence from one needs to perform
inference steps with the semigraphoid axioms. The SAT solver
is keeping up well solving all tasks from in about 4
seconds. (This includes the time required to construct
from which is extracted.)
CImake|015> do './longmatus.pl'
CImake|016> map Semigraphoids->imply(L($_) => [[1,2],[3,$_]]), 4 .. 10
[1, 1, 1, 1, 1, 1, 1] # time=00:04.27
# longmatus.pl
use Modern::Perl 2018;
use CInet::Tools;
# Return the vertex set of 𝓚_n ordered according to the connected components.
# The first two elements are the first isolated edge. Then every triple is a
# connected component and the last two elements are an isolated edge again.
#
# The ordering of these vertices is crucial. It is from left to right just as
# in Matúš's pictures. This allows us to easily recurse (the construction can
# be seen as a kind of grey code) and to extract the substructure 𝓛_n below.
sub K {
my $n = shift;
if ($n == 4) {
return (
[[1,2],[3]], [[1,3],[]], [[1,3],[2]], [[1,2],[]], [[2,4],[1]],
[[2,4],[]], [[1,2],[4]], [[1,3],[2,4]], [[1,3],[4]], [[1,2],[3,4]]
);
}
else {
my @prev = K($n-1);
return (
@prev, [[2,$n],[1,3,$n-1]], [[2,$n],[3,$n-1]],
reverse map { [$_->[0], [$_->[1]->@*, $n]] } @prev
);
}
}
sub L {
my $n = shift;
my @comps = K $n;
CIR($n,
$comps[0], $comps[1],
@comps[map { 4+3*($_-1) } 1 .. (@comps-4)/3]
)
}