References

Reference list numbers are for display only. In-text citations link to stable source-key anchors such as #nice-ng246-bmi, so sources can be reordered without breaking citation links.

Advancing bioinformatics with large language models: components, applications and perspectives — PMC, accessed May 27, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC10802675/
Identifying and assessing overweight, obesity and central adiposity — NICE guideline NG246, accessed May 27, 2026, https://www.nice.org.uk/guidance/ng246/chapter/Identifying-and-assessing-overweight-obesity-and-central-adiposity
Package growthcleanr reference manual — R-universe, accessed May 27, 2026, https://cran.r-universe.dev/growthcleanr/doc/manual.html
AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation for Tabular Data — arXiv, accessed May 27, 2026, https://arxiv.org/abs/2412.06724
Why coding with LLMs can be harder than you think — Niraj Chauhan, accessed March 28, 2025, https://www.niraj.life/blog/why-coding-with-llms-can-be-harder-than-you-think/
AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science — arXiv, accessed May 27, 2026, https://arxiv.org/abs/2502.16395
“Prompt it”, not “Google it” : Prompt Engineering for Statistical Programmers and Biostatisticians — PharmaSUG, accessed March 28, 2025, https://pharmasug.org/proceedings/2024/SD/PharmaSUG-2024-SD-141.pdf
Prompt templates — LLM — Datasette, accessed March 28, 2025, https://llm.datasette.io/en/stable/templates.html
Synthetic data in medical research — PMC, accessed May 27, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC9951365/
LLM Prompting Techniques for Developers — Pedro Alonso, accessed March 28, 2025, https://www.pedroalonso.net/blog/llm-prompting-techniques-developers/
Using LLMs in Life Sciences: Building a Bioinformatics Assistant — Genestack, accessed March 28, 2025, https://genestack.com/news/blog/using-llms-in-life-sciences/
What to look for in a code review — Google Engineering Practices, accessed May 27, 2026, https://google.github.io/eng-practices/review/reviewer/looking-for.html
How to do a code review — Google Engineering Practices, accessed May 27, 2026, https://google.github.io/eng-practices/review/
ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research — PMC, accessed March 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC11845875/

LLM Evaluation: Comparing Four Methods to Automatically Detect Errors

Label Studio, accessed March 28, 2025, https://labelstud.io/blog/llm-evaluation-comparing-four-methods-to-automatically-detect-errors/

[Literature Review] A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why? — Moonlight, accessed March 28, 2025, https://www.themoonlight.io/review/a-deep-dive-into-large-language-model-code-generation-mistakes-what-and-why
How to analyze MEDICAL DATA in minutes with ChatGPT Code Interpreter. — YouTube, accessed March 28, 2025, https://www.youtube.com/watch?v=e8bX48tx66Q
Refining LLMs Outputs with Iterative Consensus Ensemble (ICE) — medRxiv, accessed March 28, 2025, https://www.medrxiv.org/content/10.1101/2024.12.25.24319629v1.full-text
Ten quick tips for harnessing the power of ChatGPT in computational biology — PMC, accessed March 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10414555/
How I use ChatGPT as a Bioinformatics Scientist — YouTube, accessed March 28, 2025, https://m.youtube.com/watch?v=Kxy_G7CtRPY&pp=ygUHI2Jpb2dwdA%3D%3D
travistangvh/ChatGPT-Data-Science-Prompts — GitHub, accessed March 28, 2025, https://github.com/travistangvh/ChatGPT-Data-Science-Prompts
Creating Effective Prompts: Best Practices, Prompt Engineering, and How to Get the Most Out of Your LLM — Visible Thread, accessed March 28, 2025, https://www.visiblethread.com/blog/creating-effective-prompts-best-practices-prompt-engineering-and-how-to-get-the-most-out-of-your-llm/
Best practices for using GitHub Copilot — GitHub Docs, accessed May 27, 2026, https://docs.github.com/en/copilot/get-started/best-practices
Self-Refine: Iterative Refinement with Self-Feedback — arXiv, accessed May 27, 2026, https://arxiv.org/abs/2303.17651
Analysis of Code and Test-Code generated by Large Language Models — arXiv, accessed March 28, 2025, https://arxiv.org/html/2408.16601v1
Large language models can help with biostatistics and coding needed in radiology research, accessed March 28, 2025, https://pubmed.ncbi.nlm.nih.gov/39406582/
An Empirical Study on the Potential of LLMs in Automated Software Refactoring — arXiv, accessed March 28, 2025, https://arxiv.org/html/2411.04444v1
Syntax — Tidyverse style guide, accessed May 27, 2026, https://style.tidyverse.org/syntax.html
Data Dictionary — Network of the National Library of Medicine, accessed May 27, 2026, https://www.nnlm.gov/guides/data-glossary/data-dictionary
Best practices for using GitHub Copilot to work on tasks — GitHub Docs, accessed May 27, 2026, https://docs.github.com/en/enterprise-cloud@latest/copilot/tutorials/cloud-agent/get-the-best-results
Best practices for Claude Code — Anthropic, accessed May 27, 2026, https://code.claude.com/docs/en/best-practices
Code generation — OpenAI API, accessed May 27, 2026, https://developers.openai.com/api/docs/guides/code-generation
Linux Foundation Announces the Formation of the Agentic AI Foundation, Anchored by MCP, goose, and AGENTS.md — The Linux Foundation, accessed May 27, 2026, https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation
Agent internet access — Codex web — OpenAI Developers, accessed May 27, 2026, https://developers.openai.com/codex/cloud/internet-access
Security Best Practices — Model Context Protocol, accessed May 27, 2026, https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices
OWASP Top 10 for LLM Applications 2025 — OWASP GenAI Security Project, accessed May 27, 2026, https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/
NOT-OD-25-081: Protecting Human Genomic Data when Developing Generative Artificial Intelligence Tools and Applications — NIH, accessed May 27, 2026, https://grants.nih.gov/grants/guide/notice-files/NOT-OD-25-081.html
45 CFR 164.514 — Other requirements relating to uses and disclosures of protected health information — eCFR, accessed May 27, 2026, https://www.ecfr.gov/current/title-45/part-164/section-164.514
Use of AI by Authors — International Committee of Medical Journal Editors, accessed May 27, 2026, https://www.icmje.org/recommendations/browse/artificial-intelligence/ai-use-by-authors.html
Structured model outputs — OpenAI API, accessed May 27, 2026, https://developers.openai.com/api/docs/guides/structured-outputs
Structured output — Gemini API — Google AI for Developers, accessed May 27, 2026, https://ai.google.dev/gemini-api/docs/structured-output
ellmer: Chat with Large Language Models — Posit, accessed May 27, 2026, https://ellmer.tidyverse.org/reference/ellmer-package.html
GitHub Copilot — RStudio User Guide — Posit, accessed May 27, 2026, https://docs.posit.co/ide/user/ide/guide/tools/copilot.html
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity — METR, accessed May 27, 2026, https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
We are Changing our Developer Productivity Experiment Design — METR, accessed May 27, 2026, https://metr.org/blog/2026-02-24-uplift-update/