Justus Polzin

Justus Polzin

Mixed Boolean-Arithmetic (Part 2): Systems of Linear Equations mod n

In this post we are going to try and solve systems of linear congruences, i.e. , where , , and are the integers mod n. In addition to finding a solution, we will also find a basis1 for the kernel which gives us all possible solutions. If you want to see the motivation for this, see part one, but it is not required.

The post will be structured as follows:

  1. I'm going to show you a puzzle where this problem is at its core, which you can skip.
  2. We will try to apply some standard linear algebra to see where we get stuck.
  3. I will present two different solutions.
  4. We will discuss the differences between these solutions and wrap up with some future work.

Prerequisites for this post are:

  • Linear Algebra. You should know what a field is and how to solve normal systems of linear equations, i.e. Gaussian Elimination.
  • Very basic Abstract Algebra. is a commutative ring. The units (elements with a multiplicative inverse) are the elements coprime to .
  • Very basic Number Theory, i.e. what a diophantine equation is and having heard of the (Extended) Euclidean Algorithm wouldn't hurt.

I'm going to define some conventions. I will identify the elements of with the representatives . Historically equivalence mod n was written , but I will use an to be more uniform. Think of the as meaning: "This expression is to be evaluated in ". I don't want to write . If it is clear that we are in , I won't state the .

If you just want to skip the puzzle, click here!

A puzzle

This is a puzzle in a video game called Genshin Impact.

What's going on here? You can see four stones, each of which can be in 3 states (1, 2, 3). The goal is for every stone to be in state 3 at the same time. Changing the state of a stone is done by attacking it, which increments the state, wrapping around if it's 3. But this does not only change the state of the attacked stone, but also of the neighbor(s) (in this instance).

How does this relate to the "Linear Equations mod " problem?

Let the (state of the) stones be , , , from left to right. Initially, the state is , , , . The "wrapping around" part corresponds to modular arithmetic. All calculations are done in (mod 3). Because 3 = 0, the initial state is equivalently , , , .

When the first stone is hit, and are incremented (mod 3).

Let's call the number of times the first stone is hit (similarly for the others). When the first stone is hit times the state of the system is

where is the state of stone after the hits. You can see that the state of stones 1 and 2 was incremented times, while the state of stones 3 and 4 didn't change. But here we only considered hitting stone 1. When the other stones are hit times, we get the following:

Take your time to confirm this is indeed what we want. Hitting stone 1 times increases the state of stones 1 and 2. Hitting stone 2 times increases the state of stones 1, 2 and 3. Hitting stone 3 times increases the state of stones 2, 3 and 4. And hitting stone 4 times increases the state of stones 3 and 4.

To solve this puzzle, the final state of every stone needs to be 3 or equivalently 0, which means all . To get the problem into the form we want, we subtract the initial state in each equation to bring it to the left side.

We can use matrix/vector notation to "simplify" this.

This leaves us with the equation . I won't write the "mod 3" anymore, just consider all calculations to be in .

Let's actually try to solve this specific instance of the problem by finding the row echelon form of the matrix.

From the fourth row we have . From the third row . From the second row . From the first row .

So overall

You can see in the video that I hit the fourth stone 4 times, but this is equivalent to 1 mod 3.

Since 3 is prime, is a field, which means we can just apply usual linear algebra techniques. is not a field in general because not all elements have a multiplicative inverse. But we needed these to eliminate entries. For example, in the last step we want to eliminate the first 1 in the last row using the third row. We have to multiply the third row by the inverse of the first entry and subtract it from the last row, which is equivalent to adding the third row (). Admittedly, you can avoid this situation in this instance, if you choose other operations in previous steps, but it is not avoidable in general. We also used this fact when solving for ().

First attempt

The general idea is to use Gaussian Elimination but there are some problems with it:

Elementary Row Operations

Suppose this was a – admittedly very simple – row:

It doesn't have a solution, but multiplying it by 2 we get

which is solved by . So what went wrong here? Why do we have a solution now when we didn't before? This can't happen over the real numbers or other fields.

Let's first think about what operations on the rows are allowed without changing the space of solutions. Row operations correspond to left-multiplication of elementary matrices. What makes these operations not change the set of solutions is that they are invertible. Suppose we have an that solved the system after an elementary row operation, i.e. such that where is an elementary matrix (or any invertible matrix actually). Then we also have , meaning is also a solution to the original system.

At this point we have to verify which of the elementary matrices are invertible over . Let's be more general and consider any commutative ring . We can find the inverse of a matrix algebraically using Cramer's rule as

Where is the adjugate of which is defined in terms of minors which are themselves defined in terms of determinants, which – in particular – involve no inverses, so the adjugate can be computed for any matrix in any commutative ring. What can, however, fail, is for the determinant to be a unit in . So a matrix is invertible iff is a unit. In a field, the only non-unit is 0, so a matrix is invertible iff , which is the known result from Linear Algebra.

Computing the determinants for the elementary matrices, we find that:

  • Swapping two rows is obviously still okay. The elementary matrix has determinant -1 which is always invertible since it is its own inverse.
  • Multiplication of a row by a constant is only okay if the constant is a unit. The determinant is that constant.
  • Addition of a multiple of one row to another is still okay, no matter what we multiply the row by. The determinant is 1.

Row Reduction

Suppose and consider this system of linear congruences, which will serve us as an example throughout the post:

Let's think about what Gaussian Elimination would do over the real (or rational) numbers. To make the first entry of the second row zero, we would subtract times the first row from it. Analogously in the modular case, we want to multiply the first row by something, so that the first entry becomes a 4 and then subtract it from the second row. We want to solve . If 3 had an inverse mod 6, this would be no problem. Simply multiply both sides by the inverse of 3 to get . But 3 doesn't have an inverse mod 6. Furthermore the equation doesn't have a solution mod 6. The same goes for switching the role of the rows, which means trying to eliminate the 4. In general a linear congruence has a solutions if and only if , so we can eliminate neither entry in the first row and we are stuck. Perhaps surprisingly, this system still has a solution: , , so we will have to find a way around this.

There might be multiple ways to do this, but a very straightforward one is to choose the smallest non-zero element in the column as a pivot and to reduce all other elements using it. For simplicity, we always swap the row of the pivot to the top. In this simple case the pivot is the three and we reduce the second row with the first as much as possible, which in this case is just once.

Of course, we are still not done with this column, but all entries below the pivot are smaller than the pivot. Now, we select the pivot, which is 1, and swap it to the top.

We can reduce the second row 3 times with the first one.

And we're done! If the matrix was bigger, we would continue with the next column.

I think this is one of those algorithms where it's easy to see why it terminates. In each iteration, all elements below the pivot are reduced so that they are strictly smaller than the pivot. So in the following iteration the pivot will be strictly smaller than in the current iteration. If, at some point, all other elements below the pivot are 0, we can continue with the next column. Otherwise, since the pivot is getting smaller in each iteration, it will be 1 at some point, in which case all other entries can be reduced to 0 and we can continue with the next column.

Anyways, we can read the solution off the final matrix now. The second row tells us that which means . And from the first we get . For we get , and for we have . So overall, the solutions are

While this representation is fine, when is small, what we really want is a basis for the kernel and any particular solution. In this case, you can deduce the basis as the difference of the two solutions, so we can write the general solution as

Back-Substitution is annoying

In general, finding a particular solution and a basis for the kernel is annoying. Especially when trying to automate it, so I won't go into more detail than I have to here.

Consider this example in .

The last row tells us that The solutions will again be the sum of a particular solution and the solutions to . To find a particular solution, we can translate the problem into a diophantine equation: This can be solved using the Extended Euclidean Algorithm and we simply discard the solution for i. Since the numbers are so small here, we can see that is a solution. The solutions to are just So overall You can find the proof of everything we did here.

Now, to do the next step of back-substitution, we substitute our solution into the equation of the next row. We have an additional variable! This can still be solved by generalizing the method we used above, but I won't do that here. I just wanted to show you that things get more annoying. If we step away from trying to solve this algorithmically, we can see that must be even, because otherwise the sum will be odd, so . Then we have the same problem as before and know . Furthermore, because is even, we can look at our solution for which now reduces to . I will stop here. I think you can see why this is annoying (but certainly not impossible) to automate. We have to solve multi-variable linear congruences and the solutions for old variables can still change. So even if you just wanted any solution to the system, you could not just pick any particular solution for a variable and then continue back-substitution with that choice, since other rows might require a different choice.

First solution: Reduction to Diophantine Equations

We're going to construct a system of linear diophantine equations. It might seem counterintuitive at first, we've just seen that the back-substitution results in diophantine equations which are pretty annoying to solve. And it's true that this method will also allow you to solve the diophantine equations for back-substitution, it will actually end up giving us a solution and a basis for the kernel very easily.

Formally, a system of linear diophantine equations is just where all entries are in . Translating our original problem into this form is rather straightforward. Recall that iff . Using the example from the beginning, the matrix

is equivalent to the following system of congruences

which is itself equivalent to

Notice, that we need a new variable for each equation since the factors of the modulus do not depend on each other. The matrix form for this is

So all we have to do, is append to the right of to get a problem over the integers which, as a ring, have much nicer properties than the integers mod n. Now, we have to solve the system of linear diophantine equations which will be the topic of the next section.

Before we delve into that, I want to talk about how to go from a solution to the diophantine equations to a solution for the original system, which is rather simple. Suppose is a solution to , then contains entries for the modulus which we just have to truncate. The same goes for the basis vectors of the kernel, but it is possible to introduce linear dependence2 here. For mixed boolean-arithmetic, this doesn't really matter because we just want a random solution.

System of Linear Diophantine Equations

To simplify notation, the matrix is again, so we are solving where the matrices have integer entries.

The Hermite Normal Form

We are going to compute the Hermite Normal Form (HNF) of a certain matrix, that we are going to set up in the next section. The Hermite normal form is a row echelon form of the matrix where the pivots are positive and the elements above each pivot are positive and smaller than the pivot.

A simple algorithm for computing it is very similar to the row reduction algorithm for -matrices that we've seen above but there are some differences. First, let's figure out what elementary row operations are allowed. Of course, swapping rows is okay and adding a multiple of one row to another is as well. We already found out that multiplication of a row by a constant is only okay, if that constant is a unit. The only units in are 1 and -1. In the case, we didn't need multiplication by units at all, but could've used it to accelerate eliminating entries. In this case, we need it because the conditions on the HNF state that the pivots are positive. When choosing a pivot in a column, we choose the element with the smallest absolute value and multiply the row by -1, if it is negative. Once the pivot is selected, we reduce all elements with it in the same way we've seen before. When all elements except the pivot are zero, we can use the pivot to reduce all elements above it, so that they are positive and less than the pivot.

I won't work through an example here but you can find the code here, which should hopefully clear up any questions.

As we've already discussed, the application of row operations corresponds to the left-multiplication by elementary matrices. Let be the matrix at step i and be elementary matrix of the row operation applied in step i. Then we have

In the end the HNF is

Let be the overall transformation matrix, i.e.

So overall we have

We can compute along the way by initializing it to an identity matrix and applying the same row operations to that we do to . Since the are invertible, their product is too, which over means that . Such a matrix is called a unimodular matrix. Important to note is that – as the "normal form" implies – is unique, i.e. there is only one HNF of .

Solving Linear Diophantine Equations

Computing the HNF of doesn't give us the solutions directly. But I promised a way to find the solutions using only the HNF. The approach is outlined in this paper. Basically we construct the matrix

and compute the HNF which – as discussed above – also gives us an invertible matrix such that . Let's think about what will look like. If has a solution, then exist such that

where is the i-th column of . Since the columns of appear as rows in (which is important because we are using row operations), we can eliminate using the coefficients . So all in all, will contain a row of zeros, except for the last entry which is a 1 that is inherited from . Moreover, will contain the HNF of which might contain rows of zero which will be moved to the bottom. Let be the non-zero rows of the HNF of , then over all takes the form:

The zero rows at the end may or may not exist. Actually, the argument is not completely sound yet, clearly the I just gave you fulfills the HNF conditions and by the above argument arises from the application of elementary row operations on . By the uniqueness of the HNF, it is the HNF. This doesn't really help us much yet though, because we want and a basis for the kernel. These will be encoded in , so let's understand what that matrix looks like. I think it's easiest if I just tell you the solution and we then verify that it makes sense:

Again, the rows need not exist if the kernel of is trivial. We know that and will derive a bunch of relations between the sub-matrices contained in , and from this. All this really is, is following the matrix multiplication. is just the transformation matrix from the HNF of which we don't care about.

From the row, we get Taking the transpose and moving to the other side which is what we want and confirms that really is the solution we are looking for.

From the rows, we get Again taking the transpose, we get Looking at the columns of we have that so the columns are in the kernel of . Additionally are linearly independent since is invertible and by the definition of the HNF they have to span the kernel, since otherwise we would have another zero row in , so is a basis.

Hermite Normal Form over ?

Why did we bother to create the diophantine equations? Couldn't we have just set up the matrix in the same way, but with entries in and then computed the HNF of it? No. Well maybe, but I couldn't make it work, but there might be some way to fix the problems, but I couldn't see it. The problem is that the Hermite Normal Form is not really well defined for -matrices.

One of the problems is that is not even an integral domain meaning non-trivial zero-divisors exist, which is a problem when doing euclidean division. Additionally there is really not a sensible way to define to choose the pivot since the ring is cyclic. In the example above we just chose the usual on the representatives . Consider the following matrix (still with ):

In the algorithm I just showed you, we would subtract the first row 2 times from the second row and get

This matrix fulfills the conditions for a HNF, but if we instead add the first row to the second, we get

We then simplify further by subtracting the second row from the first to reduce the element above the pivot.

This matrix also fulfills the conditions for a HNF, so the HNF is no longer unique, which we needed for the above construction. Additionally the second operation is (arguably) better, because we get a diagonal matrix, but how could we have known to make it in advance. If you find some way to save this, let me know 😉.

Second solution: Diagonalization

There is another approach that avoids the translation to diophantine equations entirely. We can transform the matrix into a diagonal matrix using both row and column operations. Zeros are allowed on the diagonal. Whereas the row reduction algorithm over was similar to the Hermite normal form, this algorithm will be similar to the Smith Normal Form (SNF). But again, the SNF is defined for -matrices and not unique for . Moreover the SNF places restriction on the elements in the diagonal, which causes the most trouble for coming up with an algorithm, but we are not going to need it here.

Note that the "diagonalization" here has nothing to do with the usual one in linear algebra, where you find an invertible matrix , such that is diagonal, which can't even be done for every matrix. Instead we are going to find invertible matrices , such that is diagonal, which can always be done.

The algorithm for computing the diagonal matrix uses exactly the same ideas we've already seen. In the first step we eliminate all entries in the first row and first column except the first entry which is on the diagonal. We then just continue recursively with the rest of the matrix.

Let's look at an example in . We will first use row operations to eliminate the entries in the column. This is exactly the same thing we've already seen.

We are lucky here, because we can eliminate the 3 in the second column directly. In general this need not happen. We would then have to do the same procedure but with column operations, which might make entries in the row reappear. This is not a problem, however. The only thing that is important, is that the diagonal entry, which is also the pivot, is getting smaller in each iteration. So eventually we will be able to use it to eliminate both the row and column. Nevertheless, in this instance we can now use a single column operations to eliminate the 3.

If the matrix was bigger, we would continue with the next row and column. It is easy to see that the zeros in the first column and row would never be destroyed.

The matrices and can be computed along the way in the same way as in the HNF. They are

Once we have , we can solve and get the final answer as . The proof that is a solution to is straightforward:

Left-multiplying both sides by

Substituting

Solving is straightforward because each row corresponds to a single variable linear congruence whose solution we have already discussed. We have to multiply the basis of the kernel by as well.

Continuing with the example above, we have to solve

So we get the linear congruences

and

So overall

To get , we left-multiply by .

This is the same solution we had already gotten using the other method.

Conclusion and Future Work

We've see two ways to solve systems of linear congruences, so which one should you choose? In my mixed boolean-arithmetic implementation I implemented the first solution, but mostly for non technical reasons. I actually came up with that solution first. I knew I could translate the linear congruences into linear diophantine equations and Wikipedia told me that those were solved. Furthermore, you only need to implement an HNF algorithm and build the appropriate matrices. Systems of linear diophantine equations are very general and allow you to solve many common problems in number theory (although specific algorithms for the problem are usually faster). Examples are a single multi-variable linear diophantine equation that we would have had to solve in the back-substitution, as well as the Chinese Remainder Theorem. However, a big problem is that we need arbitrary precision integers, because the intermediate results can become big. There might be a way to get around this for systems of linear congruences though, again, let me know if you find a way. If that is not possible though, the second algorithm is probably better, because the size of the integers is fixed. Additionally, if you have to solve multiple systems of linear congruences with the same , you can precompute the matrices , , and use them to quickly find the solution.

There are some algorithmic improvements in the common core of both algorithms. When eliminating entries in a row/column we are basically performing the euclidean algorithm but perform the additions on the whole row. It is possible to do the euclidean algorithm on just the elements and apply the results to the whole column/row. If you want to find out more, look for any HNF algorithm online, almost all do this, but I think it takes away from the simplicity.

The next section deals with the ways in which I've been using "basis" and "linear independence" wrong, if you are interested, but probably doesn't matter too much to most people.

Bases in

I've been using the word basis very loosely, but in a way that hopefully makes intuitively sense. The kind of objects we have been dealing with are modules. A module is a vector space where the scalars are allowed to be a ring instead of a field. In this post we were dealing with the very natural modules as a -module and as a -module. The same definition of a basis makes sense in a module as well.

Let be an -module and . is called a generating set if the elements of span the module, i.e. if for every , there are such that

is a basis if is a generating set for and is linearly independent, i.e.

(These definitions are easily extended to infinite ). Not every module has a basis (if it has one, it is called a free module) and not all bases have to have the same cardinality, which might give you an idea of how much stranger (and more interesting?) modules are. Both and have the obvious basis and all other bases have the same cardinality.

Let's look at the example from earlier over

Looking at the solution from earlier (or by computing the HNF), we find that the kernel is which is generated by .

But is not a basis because it is linearly dependent: , so we have a non-trivial linear combination of 0. So while the kernel will always be a submodule, it need not have a basis. Furthermore, linear dependence does not necessarily mean that a vector can be removed from the generating set, as seen in this example. Instead, what we really want is a minimal generating set (which does not always exist, though in our finite cases it of course does), but we would need a way to compute it. I don't really need this for MBA so I don't bother, but you can find a minimal generating set by computing the HNF (or our ill-defined "HNF" mod n) of the matrix that contains the vectors of the generating set as columns. All the non-zero columns of the result will generate the same submodule with as few vectors as possible.

Footnotes

  1. It's not actually a basis, it's a generating set, but it is okay to think of it as a basis for now. It will be discussed in more detail at the end, if you are interested.

  2. Linear dependence is not the issue per se, but minimality of the generating set. It will be discussed in the last section, if you are interested.