Statistics Terms Overview

Comprehensive Guide to Statistics for AI & Data Science

📊 Comprehensive Guide to Statistics for AI & Data Science

This post covers key statistics concepts essential for understanding data in AI. Includes visualizations and real-world applications.

1. Mean (Average)

The mean is the sum of all values divided by the number of values.

Example: 10, 20, 30, 40, 50 → Mean = (10+20+30+40+50)/5 = 30

2. Median

The middle value of sorted data. If even, take the average of two middle values.

Example: 2, 5, 7, 8, 9 → Median = 7

3. Mode

The value that appears most often in the dataset.

Example: 2, 3, 4, 3, 6, 3, 5 → Mode = 3

4. Range

Range = Highest - Lowest

Example: For 5, 10, 15 → Range = 15 - 5 = 10

5. Quartiles (Q1, Q2, Q3)

Quartiles divide data into four equal parts. Q1 = 25%, Q2 = 50% (median), Q3 = 75%

Useful in understanding the spread and identifying outliers.

6. Interquartile Range (IQR)

IQR = Q3 - Q1, measures the middle 50% spread of data

Resistant to outliers and used in box plot visuals.

7. Standard Deviation

Measures how spread out numbers are from the mean. Higher SD = more variation

Used in normalization and understanding data reliability.

8. Variance

The average of squared differences from the mean. SD is the square root of variance.

9. Mean Absolute Deviation (MAD)

The average of the absolute differences between each value and the mean.

10. Z-score

Tells how many standard deviations a value is from the mean.
z = (x - mean) / standardDeviation

Used in outlier detection and normalization.

11. Percentiles

The value below which a given percentage of data falls. Example: 90th percentile = 90% below

Used in grading, ranking, income levels, etc.

12. Skewness

Measures symmetry. Positive skew = long right tail; Negative skew = long left tail.

13. Kurtosis

Measures the tailedness. High = heavy tails (outliers), Low = light tails

14. Box Plot

A visual representation showing Min, Q1, Median, Q3, and Max with potential outliers.

Helps summarize data distribution and spot outliers.

15. Outliers

Data points that are significantly distant from other values.

Use: May affect mean/SD. Important to detect in ML models.

Post a Comment

0 Comments

Me