1.2: Histograms, Box Plots, Outliers, and Standard Deviation

Introduction to Histograms

Histogram: a bar graph for quantitative data
The horizontal axis is divided into classes
Each class needs to cover the same range of values
Generally, 5-7 classes is a good minimum
The more classes, the more detail/nuance shown
The vertical axis measures how much data is in each class
The bars must be touching
If a data point is on the break of a class group (on a tick mark on the x-axis), it is included in the right bar
Frequency histogram: a histogram showing the number of data points
Relative frequency histogram: a histogram showing the percent of data
- Can be made by taking the frequency in each class and dividing it by the total number of data points
The center is generally found by estimation, especially if only a graph is given
A histogram displays how many pieces of data are in each class

Outliers

Mean is the numerical standardized average of a set of data
Standard deviation is the spread of data about the mean
Standard deviation uses the same units as the original data
Skew and outliers influence both mean and standard deviation
- Skew: the extent to which a graph is pulled to one side or centered around the middle
- If skew/outliers are present in a data set, this means that mean and standard deviation should not be used
These measurements work well when data is approximately symmetrical with no outliers

Range = maximum - minimum
IQR = Q3 - Q1
Standard Deviation
- x̄ = mean
- Standard deviation measures the rough average distance between each point and the mean
  - Larger standard deviations indicate that there is more data further from the mean
  - Moderate standard deviations indicate that data is moderately spread around the mean
  - Smaller standard deviations indicate that there is more data clumped closer to the mean
Variance
Variance is also equal to the square root of standard deviation
Remember to always plot data; measures of spread and center only display specific facts about a data set, but graphs give the best overall pictures of distributions

Histogram: a bar graph for quantitative data
The horizontal axis is divided into classes
Each class needs to cover the same range of values
Generally, 5-7 classes is a good minimum
The more classes, the more detail/nuance shown
The vertical axis measures how much data is in each class
The bars must be touching
If a data point is on the break of a class group (on a tick mark on the x-axis), it is included in the right bar
Frequency histogram: a histogram showing the number of data points
Relative frequency histogram: a histogram showing the percent of data
- Can be made by taking the frequency in each class and dividing it by the total number of data points
The center is generally found by estimation, especially if only a graph is given
A histogram displays how many pieces of data are in each class

Mean is the numerical standardized average of a set of data
Standard deviation is the spread of data about the mean
Standard deviation uses the same units as the original data
Skew and outliers influence both mean and standard deviation
- Skew: the extent to which a graph is pulled to one side or centered around the middle
- If skew/outliers are present in a data set, this means that mean and standard deviation should not be used
These measurements work well when data is approximately symmetrical with no outliers

Range = maximum - minimum
IQR = Q3 - Q1
Standard Deviation
- x̄ = mean
- Standard deviation measures the rough average distance between each point and the mean
  - Larger standard deviations indicate that there is more data further from the mean
  - Moderate standard deviations indicate that data is moderately spread around the mean
  - Smaller standard deviations indicate that there is more data clumped closer to the mean
Variance
Variance is also equal to the square root of standard deviation
Remember to always plot data; measures of spread and center only display specific facts about a data set, but graphs give the best overall pictures of distributions