Project Plan - Initial Draft

This example is for synthetic or otherwise approved training data. Do not paste real EHR rows, individual-level records, PHI, PII, private paths, credentials, controlled-access data, or sensitive small-cell outputs into an LLM. Keep protected data outside the code repository and run real-data workflows only in the approved environment.

Data Description The source file contains rows of BMI-related information, which is tab-delimited. Each row includes:

  • A person’s unique identifier (person_id)
  • An encounter identifier (encounter_id)
  • A numerical BMI value (bmi)
  • The person’s height in centimeters (height_cm)
  • The person’s weight in kilograms (weight_kg)
  • The date of measurement (measurement_date)

This raw data from an EHR system was not collected for research purposes. There are multiple rows per person.

Task to Be Accomplished

  1. Read the Data
  2. Clean and Filter
  3. Representative Record Selection
  4. Categorize BMI, Height, and Weight

Expected Output

  1. Cleaned Dataset
  2. Summary Table
  3. Data Dictionary