James A Rosen

Staff Engineer, AI Systems

James A Rosen, pixelated, wearing a bow tie

Python TDD: HashSet and MinHeap

Python, TDD, Data Structures

Overview

An off-curriculum Python/TDD session. James is experienced in TypeScript/JavaScript but new to Python. We set up a uv project, wrote pytest tests for two data structures, and James implemented both. Code reviews after each implementation surfaced Python idioms and real bugs.

Questions

Project Setup with uv and pytest

James set up a minimal Python project using uv init --no-package and uv add --dev pytest, then ran tests with uv run pytest -v.

Delivery: N/A (hands-on setup) Content: Clean — correct commands, correct mental model of what --no-package does.

HashSet Implementation

James implemented a HashSet backed by an array of buckets with chaining. All tests passed.

Delivery: N/A Content grade: A−

Correct use of __contains__, __len__, and exception translation (ValueErrorKeyError in remove). Two small feedback points:

Python Type-Agnostic Hashing

James asked how to detect whether an item is a number or string, and how to convert a string to a number (referencing the TypeScript isNaN pattern). Key insight:

Python’s hash() is already polymorphic — hash(42), hash("hello"), hash((1,2)) all return integers. The likely issue was using item % self.capacity directly instead of hash(item) % self.capacity. Swapping that one call fixed the failing test.

Content grade: A — once the pattern was clear, James immediately identified the fix.

MinHeap Implementation

James implemented a MinHeap with push/pop/peek, _sift_up, _sift_down, _heapify, and a _assert_heap debugging method. Showed good instinct for private helpers and defensive assertions.

Delivery: N/A Content grade: B− — structure solid, but three real bugs found in review (see below).

MinHeap Code Review

Three bugs found:

  1. _assert_heap off-by-oneif bi < length - 1 skipped the last valid parent-child pair. For a 2-element heap [5, 1], the assertion never fired. Fix: bi < length.

  2. _sift_down equal-children case — when both children are equal and less than the parent (e.g. [5, 3, 3]), b < a and b < c fails because b == c. Neither branch fires; the heap property is violated silently. Fix: change b < c to b <= c in the first branch.

  3. pop() is O(n)list.pop(0) and list.insert(0, ...) both shift every element. Standard fix: swap root with last element, list.pop() (O(1)), then sift down.

Style issues also addressed:

Session transcript

Setup: User requested an off-curriculum Python/TDD lesson as a beginner. Set up uv project, wrote pytest tests for HashSet, user implemented.

HashSet: All tests passed. Asked about detecting number vs string type and converting string to number (TypeScript isNaN pattern). Explained Python’s hash() is already type-agnostic — hash(item) % capacity works for all hashable types. User identified that they were using item % capacity directly. Fixed; test passed.

HashSet review: Two minor style issues — sum([...]) vs generator expression, and i vs _ for unused loop variable. Grade: A−.

MinHeap: Proposed MinHeap as next data structure. Wrote 4 concrete tests and 4 stubs. User implemented, including _heapify, _sift_up, _sift_down, and _assert_heap.

MinHeap review: Found 3 bugs (assert off-by-one, equal-children sift_down, O(n) pop) and 4 style issues. User applied all fixes. Grade before fixes: B−.