Project Plan - Initial Draft
This example is for synthetic or otherwise approved training data. Do not paste real EHR rows, individual-level records, PHI, PII, private paths, credentials, controlled-access data, or sensitive small-cell outputs into an LLM. Keep protected data outside the code repository and run real-data workflows only in the approved environment.
Data Description The source file contains rows of BMI-related information, which is tab-delimited. Each row includes:
- A person’s unique identifier (
person_id) - An encounter identifier (
encounter_id) - A numerical BMI value (
bmi) - The person’s height in centimeters (
height_cm) - The person’s weight in kilograms (
weight_kg) - The date of measurement (
measurement_date)
This raw data from an EHR system was not collected for research purposes. There are multiple rows per person.
Task to Be Accomplished
- Read the Data
- Clean and Filter
- Representative Record Selection
- Categorize BMI, Height, and Weight
Expected Output
- Cleaned Dataset
- Summary Table
- Data Dictionary