Chapter 5: Making it Better — Refine Code & Add Features

Building on the version control workflow introduced in Chapter 4, remember that each time you refine the code, you should also commit or push these changes to your repository. This way, you capture an incremental history of improvements (or partial reverts) made in collaboration with the LLM. Whether you are introducing new features or simply tweaking the logic, you can:

Create a new commit after each LLM-generated update or fix.
Write a concise commit message describing which improvements or features were added.
Compare your latest code against previous versions using git diff or a GitHub pull request.

With that in mind, after the initial review of the LLM-generated code, you will likely have a list of potential improvements or missing elements. This marks the beginning of the Refine Code & Add Features step, where we iteratively enhance the AI’s initial output to better align with our research needs and coding standards [ICE preprint].

flowchart TB
    accTitle: Focused refinement workflow
    accDescr: A vertical refinement loop showing how reviewed code becomes a prioritized wishlist, one focused prompt, a tested change, and a committed improvement before the next feature is selected.
    A[Reviewed code and issue list] --> B[Prioritize one change]
    B --> C[Write focused refinement prompt]
    C --> D[Apply generated change]
    D --> E[Test targeted behavior]
    E --> F{Improvement works?}
    F -->|No| G[Narrow request or revert]
    G --> B
    F -->|Yes| H[Commit improvement]
    H --> I[Select next feature]
    I --> B

The first step in this refinement process is to create a wishlist of improvements [computational biology tips]. Based on your review in the previous chapter, jot down any aspects of the code that you would like to enhance or any features that are currently missing but would be beneficial. For our BMI harmonization example, this wishlist might include items such as implementing more robust handling of missing data, adding more detailed comments to explain the code’s logic, or incorporating the calculation of an additional metric like height percentile into the output dataset [bioinformatics ChatGPT video]. Identifying these desired changes provides a clear direction for the subsequent refinement efforts.

Once you have your wishlist, it is generally best to address issues iteratively [Self-Refine]. Instead of trying to implement all the changes at once, tackle them one set at a time. This approach helps to isolate any problems that might arise from the modifications and makes it easier to guide the LLM in the refinement process. Focus on one major improvement or missing feature in each iteration. For instance, you might first ask the LLM to add the BMI category definitions if they were absent in the initial code, and then in the next iteration, ask it to include the summary table of BMI category counts.

Refine one meaningful change at a time. Small prompts are easier to review, test, and revert than broad “make it better” requests.

When you are ready to implement a change, it is crucial to re-prompt with specific instructions [data science prompts repo]. Clearly state the exact part of the code that you want the LLM to modify or the new feature that you want it to add. Providing precise instructions will help to avoid ambiguity and ensure that the LLM understands exactly what you are asking for. For example, instead of a vague prompt like “improve the data cleaning,” you could say: “Please modify the code in the ‘Data Cleaning’ section to also remove BMI entries where either the height or the weight is recorded as zero.”

Broad refinement prompts can silently change working behavior. Ask for a targeted change and inspect the diff before accepting it.

To further illustrate this process, consider some example refinement prompts that operate on code rather than data. You might prompt the LLM to “Modify the BMI-cleaning function so it flags rows where height or weight is zero, writes an auditable count to the local summary report, and adds a synthetic test fixture for this case. Do not inspect or request real data.” Or, if you noticed that the code didn’t include the calculation for height percentile, you could prompt: “Please add a height_percentile calculation based on the height_cm column, preserve the existing input/output contract, and update synthetic tests for the new column.” These specific prompts guide the LLM to make targeted changes and add the desired functionalities without exposing individual-level records.

In essence, code refinement is an integral part of the LLM-assisted workflow [Alonso]. By iteratively addressing issues and adding features through clear and specific prompts, researchers can mold the AI-generated code into a solution that not only works but also meets their specific research needs and adheres to high standards of quality and functionality. This iterative process allows for a collaborative evolution of the code, where the LLM acts as a helpful assistant guided by your expertise.

Remember to commit or push each refined version to your repository. This incremental approach keeps a clear record of the evolution of your code and ensures that all your improvements—small or large—are captured and can be revisited or merged as needed.