References

Reference list numbers are for display only. In-text citations link to stable source-key anchors such as #nice-ng246-bmi, so sources can be reordered without breaking citation links.

  1. Advancing bioinformatics with large language models: components, applications and perspectives — PMC, accessed May 27, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC10802675/

  2. Identifying and assessing overweight, obesity and central adiposity — NICE guideline NG246, accessed May 27, 2026, https://www.nice.org.uk/guidance/ng246/chapter/Identifying-and-assessing-overweight-obesity-and-central-adiposity

  3. Package growthcleanr reference manual — R-universe, accessed May 27, 2026, https://cran.r-universe.dev/growthcleanr/doc/manual.html

  4. AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation for Tabular Data — arXiv, accessed May 27, 2026, https://arxiv.org/abs/2412.06724

  5. Why coding with LLMs can be harder than you think — Niraj Chauhan, accessed March 28, 2025, https://www.niraj.life/blog/why-coding-with-llms-can-be-harder-than-you-think/

  6. AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science — arXiv, accessed May 27, 2026, https://arxiv.org/abs/2502.16395

  7. “Prompt it”, not “Google it” : Prompt Engineering for Statistical Programmers and Biostatisticians — PharmaSUG, accessed March 28, 2025, https://pharmasug.org/proceedings/2024/SD/PharmaSUG-2024-SD-141.pdf

  8. Prompt templates — LLM — Datasette, accessed March 28, 2025, https://llm.datasette.io/en/stable/templates.html

  9. Synthetic data in medical research — PMC, accessed May 27, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC9951365/

  10. LLM Prompting Techniques for Developers — Pedro Alonso, accessed March 28, 2025, https://www.pedroalonso.net/blog/llm-prompting-techniques-developers/

  11. Using LLMs in Life Sciences: Building a Bioinformatics Assistant — Genestack, accessed March 28, 2025, https://genestack.com/news/blog/using-llms-in-life-sciences/

  12. What to look for in a code review — Google Engineering Practices, accessed May 27, 2026, https://google.github.io/eng-practices/review/reviewer/looking-for.html

  13. How to do a code review — Google Engineering Practices, accessed May 27, 2026, https://google.github.io/eng-practices/review/

  14. ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research — PMC, accessed March 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC11845875/

  15. LLM Evaluation: Comparing Four Methods to Automatically Detect Errors Label Studio, accessed March 28, 2025, https://labelstud.io/blog/llm-evaluation-comparing-four-methods-to-automatically-detect-errors/
  16. [Literature Review] A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why? — Moonlight, accessed March 28, 2025, https://www.themoonlight.io/review/a-deep-dive-into-large-language-model-code-generation-mistakes-what-and-why

  17. How to analyze MEDICAL DATA in minutes with ChatGPT Code Interpreter. — YouTube, accessed March 28, 2025, https://www.youtube.com/watch?v=e8bX48tx66Q

  18. Refining LLMs Outputs with Iterative Consensus Ensemble (ICE) — medRxiv, accessed March 28, 2025, https://www.medrxiv.org/content/10.1101/2024.12.25.24319629v1.full-text

  19. Ten quick tips for harnessing the power of ChatGPT in computational biology — PMC, accessed March 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10414555/

  20. How I use ChatGPT as a Bioinformatics Scientist — YouTube, accessed March 28, 2025, https://m.youtube.com/watch?v=Kxy_G7CtRPY&pp=ygUHI2Jpb2dwdA%3D%3D

  21. travistangvh/ChatGPT-Data-Science-Prompts — GitHub, accessed March 28, 2025, https://github.com/travistangvh/ChatGPT-Data-Science-Prompts

  22. Creating Effective Prompts: Best Practices, Prompt Engineering, and How to Get the Most Out of Your LLM — Visible Thread, accessed March 28, 2025, https://www.visiblethread.com/blog/creating-effective-prompts-best-practices-prompt-engineering-and-how-to-get-the-most-out-of-your-llm/

  23. Best practices for using GitHub Copilot — GitHub Docs, accessed May 27, 2026, https://docs.github.com/en/copilot/get-started/best-practices

  24. Self-Refine: Iterative Refinement with Self-Feedback — arXiv, accessed May 27, 2026, https://arxiv.org/abs/2303.17651

  25. Analysis of Code and Test-Code generated by Large Language Models — arXiv, accessed March 28, 2025, https://arxiv.org/html/2408.16601v1

  26. Large language models can help with biostatistics and coding needed in radiology research, accessed March 28, 2025, https://pubmed.ncbi.nlm.nih.gov/39406582/

  27. An Empirical Study on the Potential of LLMs in Automated Software Refactoring — arXiv, accessed March 28, 2025, https://arxiv.org/html/2411.04444v1

  28. Syntax — Tidyverse style guide, accessed May 27, 2026, https://style.tidyverse.org/syntax.html

  29. Data Dictionary — Network of the National Library of Medicine, accessed May 27, 2026, https://www.nnlm.gov/guides/data-glossary/data-dictionary

  30. Best practices for using GitHub Copilot to work on tasks — GitHub Docs, accessed May 27, 2026, https://docs.github.com/en/enterprise-cloud@latest/copilot/tutorials/cloud-agent/get-the-best-results

  31. Best practices for Claude Code — Anthropic, accessed May 27, 2026, https://code.claude.com/docs/en/best-practices

  32. Code generation — OpenAI API, accessed May 27, 2026, https://developers.openai.com/api/docs/guides/code-generation

  33. Linux Foundation Announces the Formation of the Agentic AI Foundation, Anchored by MCP, goose, and AGENTS.md — The Linux Foundation, accessed May 27, 2026, https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation

  34. Agent internet access — Codex web — OpenAI Developers, accessed May 27, 2026, https://developers.openai.com/codex/cloud/internet-access

  35. Security Best Practices — Model Context Protocol, accessed May 27, 2026, https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices

  36. OWASP Top 10 for LLM Applications 2025 — OWASP GenAI Security Project, accessed May 27, 2026, https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/

  37. NOT-OD-25-081: Protecting Human Genomic Data when Developing Generative Artificial Intelligence Tools and Applications — NIH, accessed May 27, 2026, https://grants.nih.gov/grants/guide/notice-files/NOT-OD-25-081.html

  38. 45 CFR 164.514 — Other requirements relating to uses and disclosures of protected health information — eCFR, accessed May 27, 2026, https://www.ecfr.gov/current/title-45/part-164/section-164.514

  39. Use of AI by Authors — International Committee of Medical Journal Editors, accessed May 27, 2026, https://www.icmje.org/recommendations/browse/artificial-intelligence/ai-use-by-authors.html

  40. Structured model outputs — OpenAI API, accessed May 27, 2026, https://developers.openai.com/api/docs/guides/structured-outputs

  41. Structured output — Gemini API — Google AI for Developers, accessed May 27, 2026, https://ai.google.dev/gemini-api/docs/structured-output

  42. ellmer: Chat with Large Language Models — Posit, accessed May 27, 2026, https://ellmer.tidyverse.org/reference/ellmer-package.html

  43. GitHub Copilot — RStudio User Guide — Posit, accessed May 27, 2026, https://docs.posit.co/ide/user/ide/guide/tools/copilot.html

  44. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity — METR, accessed May 27, 2026, https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

  45. We are Changing our Developer Productivity Experiment Design — METR, accessed May 27, 2026, https://metr.org/blog/2026-02-24-uplift-update/