4.2. Rule 9: Critically Review Generated Code#

Be skeptical about AI’s claims of success; the models tend to claim success even when they haven’t really solved the problem. You always need to test the solution independently. Read and understand the code to ensure it solves problems in ways that make sense for your domain and match your prior expectation of how the problem should be solved (e.g., how you anticipated a solution looking based on your pseudocode or architecture schematics you developed in Rule 4). AI-generated code requires careful human review to ensure scientific appropriateness, methodological soundness, and alignment with domain standards.

4.2.1. What separates positive from flawed examples#

Flawed examples accept AI-generated code at face value because it runs without errors and the AI claims it’s correct. You don’t actually read through the implementation to verify it does what you think. The code might be using inappropriate algorithms, violating domain conventions, or producing subtly wrong results. You discover the problems later, possibly after publishing analyses based on the flawed code.

Positive examples involve thorough critical review of generated code. You read the implementation and verify it matches your conceptual understanding of the solution. You check that algorithms are scientifically appropriate for your domain. You verify outputs make sense and align with established methods. You catch issues before the code gets used for real analyses.


4.2.1.1. Example 1: Accepting Code Without Review#

The user asks for an implementation, the AI provides code that runs without errors, and the user accepts it because the AI says it’s correct. The user never actually reads the implementation to understand what it’s doing. Later, the code produces scientifically invalid results because it used an inappropriate method. The problem could have been caught with basic code review.


4.2.1.2. Example 2: Critical Review Catches Methodological Issues#

The user receives code from AI and immediately reviews it critically. They notice the implementation doesn’t match standard practice in their field. They ask clarifying questions and verify the approach is scientifically appropriate. They catch a methodological issue before using the code for real analysis.


4.2.1.3. Example 3: Verifying Implementation Matches Expectations#

The user has a clear expectation of how the algorithm should work (from Rule 4). When reviewing the generated code, they notice the implementation doesn’t match their understanding. They identify the discrepancy and get it corrected before using the code.


4.2.1.4. Example 4: Testing Against Known Results#

The user generates code and immediately tests it against cases where they know the correct answer. This catches implementation errors that might not be obvious from reading the code alone. The test reveals the implementation is computing the wrong thing despite looking reasonable.