Justus Polzin

Justus Polzin

Mixed Boolean-Arithmetic (Part 3): Binary Permutation Polynomials

In this post we are going to find polynomials , such that , where is a power of two. In particular, this means that is invertible (bijective) and is its inverse, which makes them permutations of .

Polynomials modulo a power of two are called Binary Polynomials, and the ones that permute the integers modulo are called Binary Permutation Polynomials.

Most of the results in this post can actually be generalized to polynomials modulo a prime power and most of the papers referenced do so. Before we get into a classification, we will begin with some preparations.

The ring

Some of the nice properties of polynomials that we know from or are no longer true. Obviously is infinite (e.g. is an infinite subset) but there are only (i.e. finitely many) functions and only permutations of .

So there exist polynomials with but . To rephrase, there are polynomials that are different as expressions, i.e. have different coefficients, but compute the same function. Equivalently, there exist such that but . An example of this where is . We will characterize all such polynomials later. This can never happen in , or generally in infinite UFDs as a consequence of Gauss's Lemma. In finite fields, by the same argument as above, this also happens, but only if divides the polynomial, whereas it is more subtle in which is ultimately a result of the existence of zero divisors.

For now though, we see that it makes sense to differentiate between the polynomial expressions and the functions they induce. From now on, I will use uppercase letters for the polynomial expressions and lowercase letters for the corresponding polynomial functions where is the set of all functions that can be represented by polynomials. There is also the obvious ring-homomorphism . By definition of , is surjective. Over or , is a bijection, so differentiating between the expressions and function doesn't really matter.

Some immediate questions are:

  • Are all functions in , i.e. is every function representable by a polynomial?
  • Is every permutation representable by a polynomial?

The answer two both questions is no in general. We will see that during proof of the characterization of binary permutation polynomials. Both are true if and only if is a prime number (or 1). The next question then is, how many functions are representable by polynomials. We will find the answer for Binary Polynomials. You can find a discussion of the general case in [1] or [2].

It is often advantageous to consider polynomials instead, and ask whether is a permutation polynomial mod , because we can (for example) talk about different moduli for the same polynomial, and I will often switch between these points of view without mentioning it, because it is very natural and shouldn't cause any confusion. There are however some instances where this makes some minor differences, e.g. when talking about the polynomials which evaluate to zero. In the only such polynomial of degree less than two is the zero polynomial, but in , for every the constant polynomials and linear polynomials are different polynomials which all evaluate to zero mod .

Structure of the Post

In the next section, we will prove the characterization of binary permutation polynomials, which also provides an intuition on how restricted polynomials in are. Afterwards, we will try to find all polynomials that evaluate to zero, but before we can do that, we need to discuss polynomial division. Finally, we will try to find the inverse of a permutation polynomial, and see three different algorithms to achieve that. I came up with one myself and it has (as far as I know) not been published. It is the slowest (but in my opinion still practical 😉, though you should use one of the other two).

Characterization of Binary Permutation Polynomials

As I mentioned in the first post in this series, the first explicit characterization of binary permutation polynomials was given by Rivest in [3]. It's an elementary proof and easy to read, but he mentions that after the fact Mullen told him of a more general lemma that is already present in [2], which can also be used to derive the same characterization. We will follow Rivest's proof here (at times word for word), as we will need some of the results we prove along the way later, when trying to find a polynomial for the inverse permutation. I originally was not gonna give a complete proof here, (were it not for the intermediary results we will need later), and only used Hensel's Lemma (from which Mullen's lemma easily follows) to derive the same characterization, which I didn't want to simply discard, so you can find it here.

We will consider a polynomial . Rivest's proof works by reducing the question of whether is a permutation polynomial mod to the case . However, there are certain conditions on this, as we will see. We will start with the base case ().

Lemma 1 is a permutation polynomial mod 2 if and only if is odd.
Proof: Of course . So we need . Adding to both sides (or subtracting it, as they are equivalent mod ) we get the Lemma.

We will now try to find conditions under which a permutation polynomial mod is also a permutation polynomial mod . This will work inductively by going from to and the other way around. We will need some auxiliary Lemmas for it.

Lemma 2If is a permutation polynomial mod , then is odd.
Proof: If was even, then , which follows easily by computation.

The next Lemma will heavily restrict the class of permutations that can be represented by polynomials.

Lemma 3If is a permutation polynomial mod , then it is also a permutation polynomial mod .
Notice that for general permutations, this absolutely doesn't have to be the case: Let and consider the permutation that swaps 0 and 3 and fixes 1 and 2. So is not a permutation mod 2. Nevertheless, this has to be true for polynomials:

Of course , because and are just two representatives for mod . If was not a permutation mod , then there are two distinct values and such that . This means that there are four values mod that maps to a value congruent to mod . But there are only two values congruent to mod .

Lemma 4If is a permutation polynomial mod , then , for all .
Proof: As discussed before, . So if one maps to mod the other has to map to mod . So the difference between them will always be mod , which is another way to read the Lemma.

The next Lemma shows us, under which condition we can lift a permutation polynomial mod to one mod .

Lemma 5Let with . If is a permutation polynomial mod , then it is a permutation polynomial mod if and only if is even.
Proof: Since and since is a permutation polynomial mod , the only way could fail to be a permutation polynomial mod would be if for some . So our next step is evaluating . In order to do that, we will have to compute . Using the binomial theorem, it is easy to see that all terms with where are zero, so we are left with for . Including the coefficient we have Usually the latter term will vanish and it will just be equal to , but when is odd and either
  • or
  • and both and are odd

it will be equal to .

In the case that , we know from Lemma 2 that is odd, so . For even , by the above, this is the only that we get in the evaluation of , so . For odd , we get an from every odd as well, as long as is odd. If is even , so we can simply write . Since is odd, we have if and only if is even.

Now, we can prove the main result.

Theorem: Characterization of Binary Permutation Polynomials
A polynomials is a permutation mod () if and only if
  • is odd
  • is even
  • is even
Proof:

If is a permutation polynomial mod , then is odd by Lemma 2. By Lemma 3, is also a permutation polynomial mod and by Lemma 5 is even. By repeated application of Lemma 3, is a permutation polynomial mod 2, and so is odd by Lemma 1. Subtracting what we know from previous results, is even.


If is odd, is even, and is even, then is a permutation polynomial mod , by induction on , using Lemma 1 for the base case and Lemma 5 for the inductive step.

We will now concern ourselves with trying to find an inverse. But before we can do that, we need to find the polynomials that evaluate to zero, and before we can do that, we need to talk about polynomial division.

Polynomial Division and Zeros

Usually, polynomial division is only defined when the coefficients are in a field, but it is possible to define a limited version of polynomial division when the coefficients are in any commutative ring.

Theorem: (Limited) Polynomial Division
Let be a commutative ring, and let the leading coefficient of be a unit. Then there exist unique polynomials , such that and .
Of note is that the leading coefficient has to be a unit (which is trivially satisfied in a field). For example, you can't divide by in .

The proof is straightforward using the usual polynomial division algorithm. Let be the leading coefficients of respectively. To eliminate the leading coefficient of we multiply by , so that the product also has a leading coefficient , and subtract the result from . The degree of is now smaller and we can continue recursively.

Using polynomial division we can now prove the following.

TheoremLet be a commutative ring. is a root of , i.e. , if and only if , i.e. there exists , such that .

, so obviously .


Using polynomial division by , we can write , where and , so is a constant. We know , but on the other hand , so and we have or in other words .

This is, of course, what you would expect, but it is good to be aware that polynomial division can't always be done in commutative rings.

Functional Equivalence in

We will now deal with the problem of deciding when two polynomials represent the same function, as we will need it for the computation of the inverse. Equivalently, our problem is to find all polynomials that evaluate to 0, i.e. for all . In the language of homomorphisms, we are searching for which is an ideal. Remember that is the homomorphism mapping polynomial expressions to functions. We will call this ideal the zero ideal Note that this deviates from the usual meaning of "zero ideal", which is the boring ideal . By the (first) isomorphism theorem, , which means we can uniquely represent a polynomial function by representatives that we will later call "reduced polynomials". Of course, is infinite (every non-zero ideal in is), but by Hilbert's basis theorem, it is finitely generated, so what we really want, is a set of generators.

Theorem: Vanishing Binary Polynomials
Let . if and only if , where is the smallest integer such that , and Additionally, is a minimal generating set for .

This might seems complicated at first, but why these evaluate to zero is actually rather intuitive, once you understand what is going on. I've seen two ways of proving this. Singmaster [1] sets up a clever induction. This is in my opinion the more elementary proof so I suggest you check it out if you want to see a proof. In the Appendix he mentions a different way of proving this using the Newton "series" (it is a finite sum for polynomials). [2] also follows this approach.

To get some understanding of these polynomials and why they evaluate to zero we will consider polynomials of degree less than three. Let evaluate to zero, that is for all . It is relatively easy to see that the only such polynomial of degree less than 2 is . Let's think through the case . An idea you might have, is that since for all , the product But this can't be true for degree reasons alone, e.g. evaluates to zero mod 16 but the product has degree 16. We know that , so As before, a priori this doesn't mean , but in this case it actually is true. Since , we can write . We know , but on the other hand , so , which implies . So overall we can write We now have to find all such that evaluates to zero. We know that the product will be even, since one of the factors is even. It is either zero, in which case any is okay, or in the "worst case" two (for ). So it suffices to choose . This is the definition of in the theorem above! Generalizing this logic to higher degrees to see that all polynomials that evaluate to zero are indeed generated by the is not straightforward but also not too difficult. Again, see the induction in [1].

What we can do, is understand the definition of better, and see why those polynomials indeed evaluate to zero. Let's think about the product in the definition of in instead of . Note that I will sweep various details under the rug here, as they just distract from the core idea, but rest assured, they can be fixed relatively easily. Plugging in any value for , it will be the product of consecutive numbers, which we can write as so it is divisible by . (The product of consecutive integers is always divisible by ). In the "worst case" the product is itself.

What we want to do now, is multiply that product by something, that makes the result zero mod , i.e. such that the result is a multiple of . If we want the smallest factor that is sufficient, we want the result to be the least common multiple of and . By the identity for natural numbers , , we can find the factor that you can see in the definition of .

What I haven't seen mentioned anywhere, is that the resulting generating set is not minimal. This is only really useful in that it allows you to save some memory. and both of which are zero mod , so they can be removed. These polynomials are included when considering polynomials over which evaluate to zero mod .

But these are not the only ones. Let be an even integer. We want to show that redundant. The leading coefficients of and are the same, since is even and is odd, . Calling the leading coefficient , we can factor : Thus , i.e. is generated by .

All in all, for odd , is superfluous and we have

This generating set is minimal, because the coefficient of the leading term is strictly decreasing as the degree increases, since .

Reduced Polynomials

Similar to how there is a (in some sense) good way to choose representatives in as , the same is true for . The reason is that we want to do computations on these polynomials and want the representation to be small. This means that the degree and the coefficients should be as small as possible.

Let and let be the ones from the theorem on vanishing binary polynomials above. We can always find an equivalent polynomial with . Notice that the leading coefficient of is 1, so we can eliminate the coefficient of where by multiplying by and subtracting the result from .

We can reduce the other coefficients of in a similar way. Let be the leading coefficient of , i.e. which is a power of two. We can reduce the coefficient into the range by multiplying by and subtracting the result from .

Theorem: Reduced Binary Polynomials
Let and let be the smallest integer such that . There exists a unique polynomial that computes the same function mod , i.e. , with

This also allows us to count the number of functions that can be represented by polynomials.

Theorem: Number of distinct Polynomials Functions/Permutations
The number of distinct polynomial functions mod is The number of permutations that can be represented by polynomials is

The first part follows from the theorem on reduced binary polynomials. By the Characterization of Binary Permutation Polynomials, has to be odd, which excludes half of the functions. The sums and have to be even which exclude half again respectively, so overall an eighth of the polynomial functions are permutations.

These are only tiny fractions of the number of possible functions and permutations.

When we discuss the inversion of permutation polynomials next, consider all computations on polynomials (e.g. their composition) to include reducing the result with at least so that the degree doesn't explode. We can also quickly bound in terms of , which will be useful to determine the complexity of algorithms. was defined to be the smallest integer such that . It is easy to see that , since is the product of even numbers (and odd numbers). Thus . In particular .

In the cases we care about the most, itself is also a power of two, e.g. 8, 16, 32, 64. In those cases . Consider how many times 2 divides . For every even number greater than zero and not greater than , there is a factor of 2. Similarly for every multiple of 4, we get another factor of 2, and so on. There are such even numbers, such multiplies of 4, and so on. So overall, has exactly factors of 2. But we wanted factors of two, i.e. an additional one, which obviously has.

Inverting Binary Permutation Polynomials

Let be a permutation polynomial. It is not immediately clear that the inverse permutation is even representable by a polynomial. To see that it is, consider the group of all permutations on elements . We have where is the order of , i.e. the order of the subgroup generated by , . Note that in this context isn't the product , it is the composition . But is representable by a polynomial, because the composition of polynomials is still a polynomial. This gives us an extremely inefficient algorithm to invert any permutation polynomial: We just compute in . When the result is the identity, we know that the polynomial in the last step was the inverse. The problem is of course that the order of the subgroup could be very big, e.g. and could easily be in practice.

However, we can turn this into a practical algorithm.

Inversion by Composition

Theorem: Order of Binary Permutation Polynomials
Let be a permutation polynomial mod . Then the order of mod is a power of two, less than or equal to , i.e. where .
The converse of this theorem is not true, as the example in Lemma 3 has order 2, but does not have a polynomial representation as explained there.

The weaker version of this theorem (weaker bound on the order), follows from Lagrange's theorem and the fact that the number of distinct permutation polynomials is a power of two, as can be seen from the theorem on the number of distinct permutation polynomials mod .

The proof will be by induction on . For , there are only two permutations, the identity of order one, and swapping 0 and 1, of order two. Let be a permutation polynomial mod with . We know that is also a permutation polynomial mod by Lemma 3. By the induction hypothesis, has order mod where . We have for all . We will now try to figure out the order mod . Fix an . Let be the smallest integer such that , i.e. the length of the cycle containing mod . We know that is a power of two and less than or equal to , since the order of the permutation is both a power of two and the least common multiple of its cycles lengths.

We now have to distinguish between two cases mod :

Suppose . By Lemma 4, we know . So maps and to themselves. Obviously, there is no such that or , as that would contradict that is the cycle length of mod .

Otherwise . By Lemma 4, we know . Thus And As before, there is no such that or . There can also be no such that or , since otherwise , which, again, contradicts that is the cycle length of mod .

So the cycle length of mod has either stayed the same or doubled. And the cycle length of mod is the same as that of . It now easily follows that the order of mod has either stayed the same or doubled. In either case, it is a power of two less than .

Now, the idea is that can be computed in compositions instead of , by using the associativity of function composition So the algorithm can compute until the result is the identity, i.e. for all . The inverse is then , which can be computed like this Overall, this will take at most compositions.

To implement the algorithm as presented, we would have to store to do the composition at the end. The following algorithm avoids this, but otherwise follows the same idea.

# `f`: Permutation Polynomial
# `w`: Integer width (mod 2^w)
# Returns the inverse of `f`
def invert(f, w):
    # `inverse` will contain f, f^3, f^7, f^15, ..., f^(2^i-1)
    inverse = f
    for _ in range(0, w):
        # Compose `inverse` and `f`
        # This will be f^(2^i).
        g = reduce(compose(inverse, f), w)

        # If this is the identity then
        # f^(2^i-1) is the inverse.
        if g.is_identity():
            return inverse

        # Compose f^(2^i-1) and f^(2^i)
        # to get f^(2^(i+1)-1) for the
        # next iteration.
        inverse = reduce(compose(inverse, g), w)

Let's quickly look at the complexity when everything (composition, reduction, multiplication) is using the naive algorithm. We will consider additions and multiplications in to have unit cost. Composition is just evaluating at which can be done using Horner's method with multiplications (and additions). A multiplication and subsequent reduction by is in . So the composition is in . Since , this is equivalent to . We have to do compositions, so overall the asymptotic complexity of this algorithm is .

Inversion by Newton's Method

A different algorithm and - as far as I know - the first general algorithm [4] was published in 2016. It is based on Newton's method. Take the following as intuition, not as proof, we will do a lot of things that shouldn't work and barely make sense. You could say, we are doing engineering maths 😉.

The usual step in Newton's method is where we are trying to approximate a root of .

In our case, given an we are trying to find such that . We can reformulate this as a root finding problem of where means evaluating at , i.e. the composition. We are treating as a variable, i.e. the here. Substituting this for To evaluate the right-hand side, we need to invert , which was our original problem. However, taking the derivative of with respect to , we get . Dividing by , we get . So overall the step can be written as:

[4] cites some french paper for the proof that this will work, which I can't read. Maybe someone who knows some -adic analysis can prove this. In particular, I think some generalization of Hensel's lemma can probably prove this.

Experimentally, this always finds a solution with . And the number of iterations is about . If we trust this experimental observation, then the complexity would be .

Inversion by (Lagrange) Interpolation

The last technique we will look at is [5], but the idea of using interpolation was already presented in [4], where the authors couldn't quite make it work.

Let us try to naively apply Lagrange interpolation to find a polynomial based on sample points. We are not trying to invert anything yet, just doing interpolation. Formally, given distinct and any , we are trying to find such that for all .

What you could try to do (the authors of [4] did this) is to construct the Lagrange basis polynomials The solution would then be The problem is that we can't compute the because we have to invert the denominator which (aside from a few trivial cases) will always be even and thus not invertible.

We could have seen this a different way: If we could do standard Lagrange interpolation, we could find a polynomial representation for any function , which is not possible, as not every function is representable by a polynomial.

There is, however, another formulation of the problem as a system of linear equations, which can be generalized from fields to the ring and lets us apply the results from Part 2.

I assume the idea is well-known, so I will go over it quickly. We have to choose the degree of the polynomial beforehand. Usually, (i.e. over a field), , as this results in a system with a unique solution, but in our case, we know that for any polynomial there is a reduced polynomial with , so we can always choose .

Note that even when , i.e. the output for every input in is specified, we will not get a unique solution (if one exists), since we could add any polynomial in . On the other extreme, even if only two points on the polynomial are specified, there need not exist a polynomial of any degree that passes through these points: Let . As we have seen before, , so we can't have and . This underlines the fact that linear algebra over rings is a lot weirder than over fields.

Moving on, what you do is construct the Vandermonde matrix with and the vector with and then solve . The vector will contain the coefficients of the polynomial, if the system has a solution. Note, that we need to solve this system in which was discussed in Part 2. The algorithms shown there give us an as well as a basis for . The elements of the (basis of the) kernel can of course also be treated as polynomials. They are a generating set for all polynomials that evaluate to 0 on the , i.e. . Some of them will belong to the "zero ideal" that evaluate to zero everywhere, but there could also exist others which are non-zero for inputs not equal to any . Differentiating between the two is easy, just reduce the polynomials. The coefficients of polynomials in the zero-ideal will be reduced to 0. By the way, the generating set will usually not be minimal, because we are effectively treating as an -module here, meaning the coefficients of linear combinations are in , whereas for the generating set, we treat as an -module.

Now, we are finally ready to look at the algorithm for inverting a permutation polynomial .

  1. Compute for .
  2. Solve the interpolation problem by constructing and solving the linear system.
  3. is the/an inverse.

Once you look at it for a bit, you might think it's rather straightforward, but there is some subtlety to this. The intuitive explanation that is not quite correct is this: We are looking for a polynomial of degree meaning coefficients, so we should have sample points to find a unique polynomial. We want for all , so plugging in the first values, we get . So we can use as the sample points.

The problem with this intuition is that half of it makes sense for polynomials over a field, while the other half only for polynomials over . What we know for sure is that for . We still have to show that this also holds for , i.e. there is no solution to the linear system where this doesn't hold for all . We know, there is one solution where this is true, when is an inverse of , so we need to show that the kernel of the linear system only contains polynomials in the "zero ideal". Polynomials in the kernel satisfy for . This means, we need to show that for implies for all . I don't think there is an easier way to see this than to follow the proof that the generate the "zero ideal". In that proof you only need to use that for to get the same generators, which are 0 everywhere.

TheoremA polynomial function is uniquely determined by where is the smallest positive integer such that .

Note that given values , there is not necessarily a polynomial function such that for all . For example for , we have .

The authors of [5] took advantage of the special form of to compute the LU-Decomposition which can be used to solve the linear system without needing to resort to a general algorithm for solving linear systems mod . The complexity will also be better than that of the general algorithm. The general LU-Decomposition can also be used to show the uniqueness of the solution. The authors consider inversions to have unit cost, which I don't think is quite realistic, and come to the conclusion that the algorithm has quadratic complexity in .

Conclusion

We've seen the most important theory on binary polynomials:

  • We have a characterization of binary permutation polynomials, which also allows us to generate random permutation polynomials.
  • We can decide when two binary polynomials compute the same function.
  • Given a binary permutation polynomial, we can efficiently find an inverse.

Appendix: Characterization of Binary Permutation Polynomials using Hensel's Lemma

Usually Hensel's Lemma is presented as a statement about lifting factorizations of polynomials mod to factorizations mod . We will only need the special case of lifting roots:

Theorem: Hensel Lifting of Simple Roots
Let , be a prime number and an integer. If is a simple root of mod , i.e. then there exists a unique , such that

Now, let be a polynomial. We need to rephrase the property of being a permutation polynomial in terms of roots: is a permutation polynomial mod if and only if has a single root for every . Or in other words, if there is a unique such that for every . Using the Hensel's Lemma we can now find a condition on the polynomial mod that implies the above.

Theorem: Classification of Permutation Polynomials modulo Prime Powers
Let , be a prime, . is a permutation polynomial mod if and only if is a permutation mod and for all .

The direction is relatively easy. Since is permutation polynomial mod , we know that has a single root mod for . Because and mod , we can Hensel lift this root to exactly one mod . So also has a single root mod , which makes it a permutation of .

We will prove the direction by contradiction. Suppose is not a permutation polynomial mod , meaning there exists a such that no is mapped to via . But since , no element is ever mapped into the equivalence class of () mod , so the function is not a bijection.

Now suppose that is a permutation mod , but that mod , for some . We need to discuss what happens to repeated roots.

Theorem: Hensel Lifting of Repeated Roots
Let , be a prime number and an integer. If is a repeated root of mod , i.e. then there are either no, or at least , , such that

See Wikipedia's explanation for a proof. If there are no roots, then no value maps to , otherwise there at least values that map to . In either case, it is not a permutation.

Notice that if a polynomial is a permutation polynomial mod with , then it is a permutation polynomial mod for all . In the special case of , we already saw this before.

Let . For to be a permutation, it needs to be a permutation on . The following calculations will all be done in . Of course , so for to be a permutation.

So , i.e. needs to be odd. Furthermore we need and : So needs to be odd. So needs to be odd as well. The characterization of Rivest now follows easily:

Theorem: Characterization of Binary Permutation Polynomials
A polynomials is a permutation mod () if and only if
  • is odd
  • is even
  • is even

References

[1] Singmaster: On polynomial functions (mod m)

[2] Mullen & Stevens: Polynomial functions (mod m)

[3] Rivest: Permutation Polynomials Module 2^w

[4] Barthelemy et al.: Binary Permutation Polynomial Inversion and Application to Obfuscation Techniques

[5] Barthelemy et al.: Quadratic Time Algorithm for Inversion of Binary Permutation Polynomials