3.1. Rule 6: Implement Test-Driven Development with AI#

Frame your test requirements as behavioral specifications before requesting implementation code, and tell the AI what success looks like through concrete test cases. This test-first approach forces you to articulate edge cases, expected inputs/outputs, and failure modes that might otherwise be overlooke [1]. AI will respond better to specific test scenarios than vague functionality descriptions. By providing comprehensive test specifications, you guide the AI toward more robust, production-ready implementations. AI tools (such as chatbots or Github’s Spec Kit) can help develop these specifications in a way that will optimally guide the model. Keep a close eye on the tests that are generated, since the models will often modify the tests to pass without actually solving the problem rather than generating suitable code. Be especially aware that coding agents may generate placeholder data or mock implementations that merely satisfy the test structure without validating actual logic. In many cases, the AI may insert fabricated input values or dummy functions that appear to meet acceptance criteria but do not reflect true functionality. These “paper tests” can be dangerously misleading, seemingly passing as tests while masking broken or incomplete logic. In addition, whenever a bug is identified during your development cycle, ask the model to generate a test that catches the bug, to ensure that it’s not re-introduced in the future.

3.1.1. What separates positive from flawed examples#

Flawed examples ask for implementation first and maybe add tests later as an afterthought, testing only the happy path and perhaps one or two edge cases you thought of. You get code that technically works for anticipated uses but fails on boundary conditions, numerical edge cases, and subtle failure modes you didn’t consider. When bugs appear, you patch the code without adding tests, so the same bugs reappear later. The AI often modifies tests to make them pass rather than fixing the actual problem, or generates “paper tests” with placeholder data and mock implementations that merely satisfy test structure without validating actual logic.

Positive examples start with tests that specify expected behavior before requesting any implementation. You articulate success criteria through concrete test cases, leveraging AI to systematically identify potential failure modes, edge cases, boundary conditions, and scenarios where numerical instability could occur. You evaluate these AI-generated test suggestions critically; some won’t be relevant, but many reveal real gaps in your testing strategy. When the AI generates code, you can immediately verify it meets your specifications. When bugs appear, you first write a test that catches the bug, then fix the implementation, watching carefully to ensure the AI doesn’t modify tests to make them pass. Your test suite becomes comprehensive and your code robust against inputs you didn’t anticipate.


3.1.1.1. Example 1: Implementation First, Tests as Afterthought#

The user asks for code without specifying what success looks like. The AI generates something that works for basic cases but has no clear specification. When tests are added later, they just verify what the code currently does rather than what it should do. Edge cases are discovered in production. When bugs appear, the code gets patched without adding tests to prevent regression. The cycle repeats.


3.1.1.2. Example 2: Tests Define Behavior First#

The user specifies expected behavior through comprehensive test cases before asking for implementation. The tests cover happy path, edge cases, error conditions, and domain-specific requirements (like preserving the diagonal). The AI now has a clear specification of what success looks like. The implementation naturally handles all specified cases. When bugs appear later, tests are added first to catch the bug, then the implementation is fixed.


3.1.1.3. Example 3: Test-First Bug Prevention#

A bug is discovered in production. Instead of immediately patching the code, the user first writes a test that catches the bug. This ensures the bug won’t be reintroduced later. Then the implementation is fixed to pass the new test. The test suite grows to cover real-world failure modes. Each bug becomes a permanent regression test.


3.1.1.4. Example 4: Catching AI Test Manipulation#

The user provides test specifications, but the AI modifies the tests to make them pass rather than fixing the implementation. The user catches this by carefully reviewing what changed. They explicitly instruct the AI to not modify tests and to fix the implementation instead. This prevents the AI from taking the easy path of making tests less strict.


3.1.2. References#

[1]

Kent Beck. Test-Driven Development: By Example. Addison-Wesley, Boston, MA, 2003.