✅ Перевірена відповідь на це питання доступна нижче. Наші рішення, перевірені спільнотою, допомагають краще зрозуміти матеріал.
To complete this assignment please refer to the below given dataset link.
Customer_Survey_Dataset
🔧 Part A: Hands-On Data Cleaning Tasks
Dataset: Use any simple CSV file (e.g., customer information, survey data) with numeric and categorical columns.
Remove Duplicates
Identify and drop duplicate rows from the dataset.
Show the number of rows before and after.
Handle Missing Values
Find all missing values.
Replace missing numerical values with mean and categorical values with mode.
Detect and Handle Data Errors
Manually introduce one incorrect age (e.g., -5 or 135).
Detect and handle the error using logical reasoning.
Correct Formatting Issues
Format a column with inconsistent date formats (e.g., "2024-01-01", "01/01/2024").
Standardize all dates to YYYY-MM-DD.
Compare Two Data Sources
Create two mini datasets with a record mismatch (e.g., salary of the same person differs).
Detect the discrepancy and choose which value to retain, providing justification.
🧮 Part B: Hands-On Transformation Tasks
Standardization (Z-score Normalization)
Apply standardization to a numeric column.
Show the original mean and standard deviation, and verify the new column’s mean is ~0 and std dev is ~1.
Min-Max Scaling
Apply Min-Max scaling to a numeric column.
Confirm values are scaled between 0 and 1.
Log Transformation
Apply log transformation to a skewed column (add +1 if there are zeros).
Plot histogram before and after to visualize distribution improvement.
Categorical Encoding
Use one-hot encoding for a gender column.
Use label encoding for a color column.
Use ordinal encoding for a satisfaction rating column (Low, Medium, High).
Отримайте необмежений доступ до відповідей на екзаменаційні питання - встановіть розширення Crowdly зараз!