An error-correcting code is essentially just a subset of the set of all possible messages of a given length over some finite "alphabet."
In algebraic coding theory, the "alphabet" is usually some finite field (very often GF(2)) and frequently the error-correcting code is chosen to be a vector subspace of the space of all row vectors of some fixed length n. Such codes are known as Linear Codes, but, however a code is defined the point is to have a collection of "codewords" that are said to be "in the code" and any other word (row vectors that are not "in the code") will be assumed to be a codeword that has been mangled by the addition of noise.
When a message is received that is not a codeword, we ask ourselves the question "Which codeword is closest to this message I've received?" In other words we make the presumption that the received message is actually a codeword that has been changed in a relatively small number of positions -- and we put them back the way they were supposed to be!
That process is called "decoding." Developing codes that have efficient decoding algorithms is one of the central problems of algebraic coding theory.
So let's play around a bit.
Start GAP in a terminal window, then issue the command
gap> LoadPackage("guava");
GUAVA can construct codewords in a variety of ways. One of the most typical cases is for a codeword to consist of binary digits. In that case we say that "the code is over GF(2)" and codewords can be constructed as follows:
gap> c1:=Codeword("101010101"); [ 1 0 1 0 1 0 1 0 1 ] gap> v:=Z(2)*[1,1,1,1,1,1,1,1,1]; [ Z(2)^0, Z(2)^0, Z(2)^0, Z(2)^0, Z(2)^0, Z(2)^0, Z(2)^0, Z(2)^0, Z(2)^0 ] gap> c2:=Codeword(v); [ 1 1 1 1 1 1 1 1 1 ] gap> c3:=c1+c2; [ 0 1 0 1 0 1 0 1 0 ] gap> Weight(c1); 5 gap> Weight(c2); 9 gap> Weight(c3); 4
The previous excerpt from a GAP session shows that codewords can be constructed from quoted strings or from vectors whose entries lie in a finite field. We also see that codewords can be added together and that there is a function called Weight which (if it isn't obvious) tells us how many entries in a codeword are non-zero.
The Hamming distance is used extensively in coding theory. It tells us in how many positions two codewords differ. In GUAVA the Hamming distance is implemented by a function called DistanceCodeword.
gap> DistanceCodeword(c1, c2); 4
Note that the Hamming distance between c1 and c2 happens to give the same value as the weight of their sum. This is no coincidence and has to do with the curious fact that in GF(2) adding and subtracting are the same thing.
A codeword can also be constructed using a polynomial. Indeed, the internal representation of a codeword requires either a polynomial or a vector. There are GUAVA functions that allow one to switch back and forth between the two representations.
gap> x:=Indeterminate(GF(2)); x_1 gap> c4:=Codeword(x^7+x^2+x+1); x^7 + x^2 + x + 1 gap> VectorCodeword(c4); <an immutable GF2 vector of length 8> gap> Display(last); [ Z(2)^0, Z(2)^0, Z(2)^0, 0*Z(2), 0*Z(2), 0*Z(2), 0*Z(2), Z(2)^0 ] gap> c5:=Codeword([1,0,0,0,0,0,1]); [ 1 0 0 0 0 0 1 ] gap> PolyCodeword(c5); x_1^6+Z(2)^0
A code is fundamentally just a collection of codewords. Sometimes a code is merely a set of codewords. Other times a code will be the vector space generated by some small set of codewords.
First let's build a code that is merely a set:
gap> l:=["111000", "011100", "001110", "000111", "100011", "110001", "000000",$ [ "111000", "011100", "001110", "000111", "100011", "110001", "000000", "111111" ] gap> m:=Codeword(l,6,GF(2)); [ [ 1 1 1 0 0 0 ], [ 0 1 1 1 0 0 ], [ 0 0 1 1 1 0 ], [ 0 0 0 1 1 1 ], [ 1 0 0 0 1 1 ], [ 1 1 0 0 0 1 ], [ 0 0 0 0 0 0 ], [ 1 1 1 1 1 1 ] ] gap> C1:=ElementsCode(m, GF(2)); a (6,8,1..6)2..3 user defined unrestricted code over GF(2) gap> IsLinearCode(C1); false gap> WeightDistribution(C1); [ 1, 0, 0, 6, 0, 0, 1 ]
In this example we first wrote out a list of strings, then converted them into codewords over GF(2). The call to ElementsCode constructs a code from a list of elements. It is possible that the set of codewords we used actually is a vector space, but the call to IsLinearCode says no. Finally the last function tells us that there are 6 codewords of weight 3, and one each of weights 0 and 6 in this code.
A very useful feature of GUAVA is the ability to construct random codes:
gap> C:= RandomLinearCode(12,5,GF(2)); a [12,5,?] randomly generated code over GF(2)
An error-correcting code's properties are fairly well captured by three numbers which traditionally are referred to using the letters n, k and d. We ask for a random code by specifying n (the wordlength), and k (the code's dimension) as well as the field which serves as the alphabet for the code.
One of the most salient features of a code (a feature that determines how good it will be at correcting errors) is its minimum weight, d. This is the smallest weight of any nonzero word in the code. If we wish to correct m errors we will need to have a minimum weight of at least 2m+1.
gap> MinimumWeight(C); 3
This particular code would be capable of correcting single bit errors.
Finally, one might be interested in the entire distribution of the weights of the words in a code. The weight distribution is a vector that tells us how many words there are in a code with each possible weight between 0 and n.
gap> WeightDistribution(C); [ 1, 0, 0, 2, 3, 6, 7, 6, 4, 2, 1, 0, 0 ]
generated by GAPDoc2HTML