REMARKS ON THE INPUT ARGUMENTS.
Input checks are intentionally avoided, as lexunrank is supposed to be
called many times, for sampling subsets. Thus, please ensure that:
- k < n;
- N is an integer between 0 and bc(n,p)-1.
It is possible to enable checks, by changing an internal "if" statement to 1.
REMARKS ON THE OUTPUT ARGUMENTS.
As n increases, 'calls' becomes much smaller than 'ncomb'. This means
that lexunrank(n,k,N) is extremely convenient if you are interested in
one or several, but not all, k-combinations at given generation
order(s) N.
To generate all combinations in lexicographic order, it is more
convenient using the FSDA function combsFS. The MATLAB function
with the same purpose, nchoosek(1:4,3), is much less efficient.
ON THE LEXICOGRAPHIC ORDERING.
lexunrank(n,k,N) gives the k-combination of n elements of position N
in the reverse co-lexicographic order of such combinations or,
equivalently, of position bc(n,k)-N in the lexicographic order of the
same combinations.
Note that, in this implementation of the lexicographic unrank, N ranges
over the integers between 0 and bc(n,k)-1. For details see the
"combinatorial number system" discussed by Knuth (2005), pp. 5-6.
To clarify with an example the meaning of the different orders, while
the lexicographic order of the 2-combinations of 3 elements are:
\left(
\begin{array}{ccc}
1 & 2 & 3 \\
1 & 2 & 4 \\
1 & 3 & 4 \\
2 & 3 & 4
\end{array}
\right)
the co-lexicographic order of the same combinations are
\left(
\begin{array}{ccc}
3 & 2 & 1 \\
4 & 2 & 1 \\
4 & 3 & 1 \\
4 & 3 & 2
\end{array}
\right)
and the reverse co-lexicographic order of the original combinations are:
\left(
\begin{array}{ccc}
4 & 3 & 2 \\
4 & 3 & 1 \\
4 & 2 & 1 \\
3 & 2 & 1
\end{array}
\right)
The reasons for choosing a co-lexicographic unrank is that right-to-left
array filling is much faster and elegant. The reverse is due to a similar
motivation.
ALGORITMIC DETAILS.
Given the totally ordered set S=\{1,2,\ldots,n\}, a k-combination is
a subset \{x_1, \ldots, x_k\} of S. Consider the n-lists of
elements of the set \{0,1\}, i.e. the vertices of the hypercube V_n.
Each k-combination \{x_1,\ldots,x_k\} can be associated to the
n-list having a 1 at position x_1, \ldots, x_k, and a 0 elsewhere.
Example:
2-combinations of \{1,2,3,4\}: \{1,2\}, \{1,3\}, \{1,4\},
\{2,3\}, \{2,4\}, \{3,4\}. Corresponding 4-lists of \{0,1\}:
1100, 1010, 1001, 0110, 0101, 0011.
The n-lists of \{0,1\} containing k times 1, and therefore
equivalently the k-combinations of n-elements of S, can be
generated in lexicographic order with an algorithm that builds the
k-list of position t+1 using only the k-list of position t, and
which stops without counting the combinations generated. For example, the
MATLAB function NCHOOSEK(S,k), where S is the row vector of length n
of the elements of S, creates in lexicographic order a k columns
matrix whose rows consist of all possible combinations of the n
elements of S taken k at a time. The number of such combinations,
given by the binomial coefficient n!/((n-k)! k!), can be also computed
with the function NCHOOSEK by replacing the first argument, the row
vector S, with the scalar n.
Unfortunately the binomial coefficient increases rapidly with n, which
makes the generation of all k-combinations computationally hard: with
NCHOOSEK the task is impractical even for values just above 15. However,
a lexicographic algorithm implements a one-to-one correspondence between
the k-combinations and the generation order, i.e. the set of numbers s
= 1,\ldots,(n!/((n-k)!k!)). This fact is used in our function to
determine the n-list corresponding to the k-combination \{x_1,
\ldots, x_k\} which would be generated by the lexicographic algorithm at
a given desired position N. This is useful in a number of applications
which require one or several, but not all, k-combinations at given
generation order(s).