Python TDD: HashSet and MinHeap
Overview
An off-curriculum Python/TDD session. James is experienced in TypeScript/JavaScript but new to Python. We set up a uv project, wrote pytest tests for two data structures, and James implemented both. Code reviews after each implementation surfaced Python idioms and real bugs.
Questions
Project Setup with uv and pytest
James set up a minimal Python project using uv init --no-package and uv add --dev pytest, then ran tests with uv run pytest -v.
Delivery: N/A (hands-on setup)
Content: Clean — correct commands, correct mental model of what --no-package does.
HashSet Implementation
James implemented a HashSet backed by an array of buckets with chaining. All tests passed.
Delivery: N/A Content grade: A−
Correct use of __contains__, __len__, and exception translation (ValueError → KeyError in remove). Two small feedback points:
sum([len(x) for x in ...])— wrapping a list comprehension insum()builds an unnecessary intermediate list; generator expressionsum(len(x) for x in ...)is idiomaticfor i in range(...)with unusedi— Python convention is_for unused loop variables
Python Type-Agnostic Hashing
James asked how to detect whether an item is a number or string, and how to convert a string to a number (referencing the TypeScript isNaN pattern). Key insight:
Python’s hash() is already polymorphic — hash(42), hash("hello"), hash((1,2)) all return integers. The likely issue was using item % self.capacity directly instead of hash(item) % self.capacity. Swapping that one call fixed the failing test.
Content grade: A — once the pattern was clear, James immediately identified the fix.
MinHeap Implementation
James implemented a MinHeap with push/pop/peek, _sift_up, _sift_down, _heapify, and a _assert_heap debugging method. Showed good instinct for private helpers and defensive assertions.
Delivery: N/A Content grade: B− — structure solid, but three real bugs found in review (see below).
MinHeap Code Review
Three bugs found:
-
_assert_heapoff-by-one —if bi < length - 1skipped the last valid parent-child pair. For a 2-element heap[5, 1], the assertion never fired. Fix:bi < length. -
_sift_downequal-children case — when both children are equal and less than the parent (e.g.[5, 3, 3]),b < a and b < cfails becauseb == c. Neither branch fires; the heap property is violated silently. Fix: changeb < ctob <= cin the first branch. -
pop()is O(n) —list.pop(0)andlist.insert(0, ...)both shift every element. Standard fix: swap root with last element,list.pop()(O(1)), then sift down.
Style issues also addressed:
minshadows the built-in; useresultorsmallestinitial if initial else []should belist(initial) if initial is not None else []to handle empty-list input and avoid mutating the caller’s list_sift_upwas more complex than necessary; standard implementation only compares with parent, not siblings_heapifyiteration used a confusinglength - ai - 1mapping; cleaner asrange((length-1)//2, -1, -1)
Session transcript
Setup: User requested an off-curriculum Python/TDD lesson as a beginner. Set up uv project, wrote pytest tests for HashSet, user implemented.
HashSet: All tests passed. Asked about detecting number vs string type and converting string to number (TypeScript isNaN pattern). Explained Python’s hash() is already type-agnostic — hash(item) % capacity works for all hashable types. User identified that they were using item % capacity directly. Fixed; test passed.
HashSet review: Two minor style issues — sum([...]) vs generator expression, and i vs _ for unused loop variable. Grade: A−.
MinHeap: Proposed MinHeap as next data structure. Wrote 4 concrete tests and 4 stubs. User implemented, including _heapify, _sift_up, _sift_down, and _assert_heap.
MinHeap review: Found 3 bugs (assert off-by-one, equal-children sift_down, O(n) pop) and 4 style issues. User applied all fixes. Grade before fixes: B−.