I Experimental purpose
Master the basic principle of dictionary coding, program LZW decoder with C/C++/Python and other languages, and analyze the encoding and decoding algorithm
II Experimental principle
1.LZW coding principle
Core idea:
The dictionary is initialized to contain all single characters, and the current prefix string P is empty.
Enter the next character C in the data stream,
Judge whether P+C is in the dictionary
(1) Exist
Assign P+C as a new prefix string to P
(2) Does not exist
Output the codeword CW corresponding to P, enter P+C into the dictionary, and then assign C to P as a new prefix string
2.LZW decoding principle
The decoding algorithm mainly involves two variables: CW and PW
Core idea:
(1) The first codeword read into the codeword stream is assigned to CW
(2) And assign Dictionary[CW] to the character stream to be output (decoding)
(3) Assign CW to PW as the prefix string, and CW accepts the next new codeword in the codeword stream
(4) Assign Dictionary[PW] to P and Dictionary[CW] to C
(5) Judge whether P+C exists in the dictionary. If it does not exist (and the dictionary has enough space), write P+C as a new term
(6) Assign CW to PW as a new prefix string, and repeat the above steps
exceptional case:
What should I do if the current codeword does not exist in the dictionary? Why?
First, analyze the reasons why the current codeword does not exist in the dictionary:
In the encoding process, when a new string is encountered, write the new string into the dictionary, and then read the next string. If the next string happens to be the newly written string, the decoder cannot update the index number of the new string synchronously due to the delay problem, so the current codeword does not exist in the dictionary.
However, through observation, we will find that in the above special case, the new entry has an obvious feature: the first character is the same as the last character.
Therefore, at the decoding end, we can copy and paste the first symbol of the previous codeword to the position of the last symbol, and the new codeword is the current codeword we want to decode.
LZW decoding code is as follows:
void LZWDecode(BITFILE* bf, FILE* fp) { int character; int new_code, last_code; int phrase_length; unsigned long file_length; file_length = BitsInput(bf, 4 * 8); if (-1 == file_length) file_length = 0; InitDictionary();//Initialize dictionary last_code = -1;//pw=-1, as the old character in the string while (0 < file_length) { new_code = input(bf);//Read in a new character, cw if (new_code >= next_code)//The index number of the read character is greater than the maximum index number of the initialization dictionary, indicating that the character does not exist in the dictionary { // this is the case CSCSC( not in dict) d_stack[0] = character;//character is the string represented by pw phrase_length = DecodeString(1, last_code);//decode } else//The index number of the newly read character exists in the initialization dictionary { phrase_length = DecodeString(0, new_code);//Direct decoding } character = d_stack[phrase_length - 1];//Save the first character of cw into character while (0 < phrase_length) { phrase_length--; fputc(d_stack[phrase_length], fp);//Output the string corresponding to the current codeword file_length--; } if (MAX_CODE > next_code)//If the dictionary still has space {// add the new phrase to dictionary AddToDictionary(character, last_code);//Save a string that does not exist in the original dictionary } last_code = new_code;//Number of updated entries //Need to fill } }
int DecodeString( int start, int code)//Direct decoding of codeword index number in dictionary { int count; count = start; while (0 <= code) { d_stack[count] = dictionary[code].suffix;//Incoming new character code = dictionary[code].parent;//Assign old characters count++; } return count; //Need to fill }
III Experimental steps
1. First debug the LZW encoding program, take a text file as the input, and get the output LZW encoding file
2. Taking the encoding file obtained in experimental step 1 as the input, the decoding program of LZW is written. When writing the decoding program, you need to annotate the key statements and explain what to do. In the experimental report, it focuses on how to deal with and explain the reasons when the current codeword does not exist in the dictionary.
The decoding program, comments and how to deal with the current codeword when it does not exist in the dictionary are described in the decoding principle.
It can be seen that the content of the original txt file is the same, and the decoding is successful.
3. Select at least ten different format types of files and compress them with LZW encoder to obtain the output compressed bitstream file. Analyze the compression efficiency of files in different formats.
It can be seen that RGB files and YUV files are reduced after compression, while other file formats become larger after compression.
IV Complete code
bitio.h
/* * Declaration for bitwise IO * * vim: ts=4 sw=4 cindent */ #ifndef __BITIO__ #define __BITIO__ #include <stdio.h> typedef struct{ FILE *fp; unsigned char mask; int rack; }BITFILE; BITFILE *OpenBitFileInput( char *filename); BITFILE *OpenBitFileOutput( char *filename); void CloseBitFileInput( BITFILE *bf); void CloseBitFileOutput( BITFILE *bf); int BitInput( BITFILE *bf); unsigned long BitsInput( BITFILE *bf, int count); void BitOutput( BITFILE *bf, int bit); void BitsOutput( BITFILE *bf, unsigned long code, int count); #endif // __BITIO__
bitio.c
/* * Definitions for bitwise IO * * vim: ts=4 sw=4 cindent */ #include <stdlib.h> #include <stdio.h> #include "bitio.h" BITFILE *OpenBitFileInput( char *filename){ BITFILE *bf; bf = (BITFILE *)malloc( sizeof(BITFILE)); if( NULL == bf) return NULL; if( NULL == filename) bf->fp = stdin; else bf->fp = fopen( filename, "rb"); if( NULL == bf->fp) return NULL; bf->mask = 0x80; bf->rack = 0; return bf; } BITFILE *OpenBitFileOutput( char *filename){ BITFILE *bf; bf = (BITFILE *)malloc( sizeof(BITFILE)); if( NULL == bf) return NULL; if( NULL == filename) bf->fp = stdout; else bf->fp = fopen( filename, "wb"); if( NULL == bf->fp) return NULL; bf->mask = 0x80; bf->rack = 0; return bf; } void CloseBitFileInput( BITFILE *bf){ fclose( bf->fp); free( bf); } void CloseBitFileOutput( BITFILE *bf){ // Output the remaining bits if( 0x80 != bf->mask) fputc( bf->rack, bf->fp); fclose( bf->fp); free( bf); } int BitInput( BITFILE *bf){ int value; if( 0x80 == bf->mask){ bf->rack = fgetc( bf->fp); if( EOF == bf->rack){ fprintf(stderr, "Read after the end of file reached\n"); exit( -1); } } value = bf->mask & bf->rack; bf->mask >>= 1; if( 0==bf->mask) bf->mask = 0x80; return( (0==value)?0:1); } unsigned long BitsInput( BITFILE *bf, int count){ unsigned long mask; unsigned long value; mask = 1L << (count-1); value = 0L; while( 0!=mask){ if( 1 == BitInput( bf)) value |= mask; mask >>= 1; } return value; } void BitOutput( BITFILE *bf, int bit){ if( 0 != bit) bf->rack |= bf->mask; bf->mask >>= 1; if( 0 == bf->mask){ // eight bits in rack fputc( bf->rack, bf->fp); bf->rack = 0; bf->mask = 0x80; } } void BitsOutput( BITFILE *bf, unsigned long code, int count){ unsigned long mask; mask = 1L << (count-1); while( 0 != mask){ BitOutput( bf, (int)(0==(code&mask)?0:1)); mask >>= 1; } } #if 0 int main( int argc, char **argv){ BITFILE *bfi, *bfo; int bit; int count = 0; if( 1<argc){ if( NULL==OpenBitFileInput( bfi, argv[1])){ fprintf( stderr, "fail open the file\n"); return -1; } }else{ if( NULL==OpenBitFileInput( bfi, NULL)){ fprintf( stderr, "fail open stdin\n"); return -2; } } if( 2<argc){ if( NULL==OpenBitFileOutput( bfo, argv[2])){ fprintf( stderr, "fail open file for output\n"); return -3; } }else{ if( NULL==OpenBitFileOutput( bfo, NULL)){ fprintf( stderr, "fail open stdout\n"); return -4; } } while( 1){ bit = BitInput( bfi); fprintf( stderr, "%d", bit); count ++; if( 0==(count&7))fprintf( stderr, " "); BitOutput( bfo, bit); } return 0; } #endif
lzw.c
/* * Definition for LZW coding * * vim: ts=4 sw=4 cindent nowrap */ #include <stdlib.h> #include <stdio.h> #include "bitio.h" #define MAX_CODE 65535 struct { int suffix; int parent, firstchild, nextsibling; } dictionary[MAX_CODE+1]; int next_code; int d_stack[MAX_CODE]; // stack for decoding a phrase #define input(f) ((int)BitsInput( f, 16)) #define output(f, x) BitsOutput( f, (unsigned long)(x), 16) int DecodeString( int start, int code); void InitDictionary( void); void PrintDictionary( void)//Print Dictionary { int n; int count; for( n=256; n<next_code; n++)//The initialization dictionary contains only 256 entries (8bit by default) { count = DecodeString( 0, n); printf( "%4d->", n); while( 0<count--) printf("%c", (char)(d_stack[count])); printf( "\n"); } } int DecodeString( int start, int code)//Direct decoding of codeword index number in dictionary { int count; count = start; while (0 <= code) { d_stack[count] = dictionary[code].suffix;//Incoming new character code = dictionary[code].parent;//Assign old characters count++; } return count; //Need to fill } void InitDictionary( void){ int i; for( i=0; i<256; i++){ dictionary[i].suffix = i; dictionary[i].parent = -1; dictionary[i].firstchild = -1; dictionary[i].nextsibling = i+1; } dictionary[255].nextsibling = -1; next_code = 256; } /* * Input: string represented by string_code in dictionary, * Output: the index of character+string in the dictionary * index = -1 if not found */ int InDictionary( int character, int string_code){ int sibling; if( 0>string_code) return character; sibling = dictionary[string_code].firstchild; while( -1<sibling){ if( character == dictionary[sibling].suffix) return sibling; sibling = dictionary[sibling].nextsibling; } return -1; } void AddToDictionary( int character, int string_code){ int firstsibling, nextsibling; if( 0>string_code) return; dictionary[next_code].suffix = character; dictionary[next_code].parent = string_code; dictionary[next_code].nextsibling = -1; dictionary[next_code].firstchild = -1; firstsibling = dictionary[string_code].firstchild; if( -1<firstsibling){ // the parent has child nextsibling = firstsibling; while( -1<dictionary[nextsibling].nextsibling ) nextsibling = dictionary[nextsibling].nextsibling; dictionary[nextsibling].nextsibling = next_code; }else{// no child before, modify it to be the first dictionary[string_code].firstchild = next_code; } next_code ++; } void LZWEncode( FILE *fp, BITFILE *bf) { int character; int string_code; int index; unsigned long file_length; fseek( fp, 0, SEEK_END); file_length = ftell( fp); fseek( fp, 0, SEEK_SET); BitsOutput( bf, file_length, 4*8); InitDictionary(); string_code = -1; while( EOF!=(character=fgetc( fp))){ index = InDictionary( character, string_code); if( 0<=index){ // string+character in dictionary string_code = index; }else{ // string+character not in dictionary output( bf, string_code); if( MAX_CODE > next_code){ // free space in dictionary // add string+character to dictionary AddToDictionary( character, string_code); } string_code = character; } } output( bf, string_code); } void LZWDecode(BITFILE* bf, FILE* fp) { int character; int new_code, last_code; int phrase_length; unsigned long file_length; file_length = BitsInput(bf, 4 * 8); if (-1 == file_length) file_length = 0; InitDictionary();//Initialize dictionary last_code = -1;//pw=-1, as the old character in the string while (0 < file_length) { new_code = input(bf);//Read in a new character, cw if (new_code >= next_code)//The index number of the read character is greater than the maximum index number of the initialization dictionary, indicating that the character does not exist in the dictionary { // this is the case CSCSC( not in dict) d_stack[0] = character;//character is the string represented by pw phrase_length = DecodeString(1, last_code);//decode } else//The index number of the newly read character exists in the initialization dictionary { phrase_length = DecodeString(0, new_code);//Direct decoding } character = d_stack[phrase_length - 1];//Save the first character of cw into character while (0 < phrase_length) { phrase_length--; fputc(d_stack[phrase_length], fp);//The output string corresponding to the current codeword file_length--; } if (MAX_CODE > next_code)//If the dictionary still has space {// add the new phrase to dictionary AddToDictionary(character, last_code);//Save a string that does not exist in the original dictionary } last_code = new_code;//Number of updated entries //Need to fill } } int main( int argc, char **argv){ FILE *fp; BITFILE *bf; if( 4>argc){ fprintf( stdout, "usage: \n%s <o> <ifile> <ofile>\n", argv[0]); fprintf( stdout, "\t<o>: E or D reffers encode or decode\n"); fprintf( stdout, "\t<ifile>: input file name\n"); fprintf( stdout, "\t<ofile>: output file name\n"); return -1; } if( 'E' == argv[1][0]){ // do encoding fp = fopen( argv[2], "rb"); bf = OpenBitFileOutput( argv[3]); if( NULL!=fp && NULL!=bf){ LZWEncode( fp, bf); fclose( fp); CloseBitFileOutput( bf); fprintf( stdout, "encoding done\n"); } }else if( 'D' == argv[1][0]){ // do decoding bf = OpenBitFileInput( argv[2]); fp = fopen( argv[3], "wb"); if( NULL!=fp && NULL!=bf){ LZWDecode( bf, fp); fclose( fp); CloseBitFileInput( bf); fprintf( stdout, "decoding done\n"); } }else{ // otherwise fprintf( stderr, "not supported operation\n"); } return 0; }