Huffman Tree Explained with Examples and Implementation
What is a Huffman Tree?
A Huffman Tree is a binary tree used to create an optimal prefix code for a set of symbols with known frequencies. It minimizes the total weighted path length (average code length) so frequently occurring symbols get shorter codes and rare symbols get longer codes.
Key properties
- Prefix-free: No code is a prefix of another, so codes are uniquely decodable.
- Optimality: For a given set of symbol frequencies, Huffman coding yields the minimum average code length among all prefix codes.
- Binary tree: Leaves represent symbols; path from root to leaf gives the symbol’s binary code (left=0, right=1).
Algorithm (high-level)
- Create a leaf node for each symbol and add it to a priority queue ordered by frequency.
- While there is more than one node in the queue:
- Remove the two nodes with the smallest frequencies.
- Create a new internal node with these two nodes as children and frequency equal to their sum.
- Insert the new node back into the queue.
- The remaining node is the root of the Huffman Tree.
- Traverse the tree to assign codes: left edge = 0, right edge = 1.
Worked example
Symbols with frequencies:
- A: 45
- B: 13
- C: 12
- D: 16
- E: 9
- F: 5
Step-by-step:
- Initial queue (freq): F(5), E(9), C(12), B(13), D(16), A(45)
- Merge F(5)+E(9)=14 → new node FE(14). Queue: C(12), B(13), FE(14), D(16), A(45)
- Merge C(12)+B(13)=25 → CB(25). Queue: FE(14), D(16), CB(25), A(45)
- Merge FE(14)+D(16)=30 → FED(30). Queue: CB(25), FED(30), A(45)
- Merge CB(25)+FED(30)=55 → CBFED(55). Queue: A(45), CBFED(55)
- Merge A(45)+CBFED(55)=100 → root(100)
Assigning codes by traversing (left=0, right=1) yields a possible set:
- A: 0
- B: 101
- C: 100
- D: 111
- E: 1101
- F: 1100
Average code length = (451 + 133 + 123 + 163 + 94 + 54) / 100 = (45 + 39 + 36 + 48 + 36 + 20)/100 = ⁄100 = 2.24 bits/symbol.
Implementation (Python)
python
import heapq from collections import defaultdict, namedtuple Node = namedtuple(“Node”, [“freq”, “symbol”, “left”, “right”]) # For heapq, nodes compare by freq only class HuffmanNode: def init(self, freq, symbol=None, left=None, right=None): self.freq = freq self.symbol = symbol self.left = left self.right = right def lt(self, other): return self.freq < other.freq def build_huffman_tree(freq_map): heap = [HuffmanNode(freq, sym) for sym, freq in freq_map.items()] heapq.heapify(heap) if len(heap) == 1: # Edge case: only one symbol node = heapq.heappop(heap) return HuffmanNode(node.freq, None, left=node) while len(heap) > 1: a = heapq.heappop(heap) b = heapq.heappop(heap) merged = HuffmanNode(a.freq + b.freq, None, left=a, right=b) heapq.heappush(heap, merged) return heap[0] def generate_codes(node, prefix=””, code_map=None): if code_map is None: code_map = {} if node is None: return code_map if node.symbol is not None: code_map[node.symbol] = prefix or “0” # handle single-symbol case else: generate_codes(node.left, prefix + “0”, code_map) generate_codes(node.right, prefix + “1”, code_map) return code_map # Example usage: freq_map = {‘A’:45, ‘B’:13, ‘C’:12, ’D’:16, ‘E’:9, ‘F’:5} root = build_huffman_tree(freq_map) codes = generate_codes(root) print(“Huffman Codes:”, codes)
Encoding and decoding
- Encoding: Replace each symbol with its code and concatenate bits.
- Decoding: Traverse the tree from root for each bit; when reaching a leaf, output the symbol and return to root.
Complexity
- Building tree: O(n log n) using a priority queue (n = number of symbols).
- Generating codes: O(n).
- Encoding/decoding per symbol: O(code length) which averages to the entropy-bound optimal length.
Practical notes
- Huffman coding is optimal among prefix codes but not always globally optimal for all compression schemes (context modeling or arithmetic coding can do better).
- For small alphabets or skewed frequencies, consider canonical Huffman codes for simpler storage of the codebook.
References and further reading
- Huffman, D. A. (1952). A Method for the Construction of Minimum-Redundancy Codes.
- Textbook chapters on data compression (e.g., Salomon, Sayood).
Leave a Reply