**4.1- Introduction**

•Collect, classify and tabulate statistical data

• Calculate the mean, median, mode and range for individual and discrete data and distinguish between the purposes for which they are used

• calculate an estimate of the mean for grouped and continuous data

**Sampling** basically means selecting people/objects from a population in order to test the population for something. For example, we might want to find out how people are going to vote at the next election. Obviously we can’t ask everyone in the country, so we ask a sample.

When considering a particular population it is usually advisable to choose a sample in such a way that everyone is represented. This is not easy and requires careful thought about sample size and composition. Often questionnaires are devised to identify the required information. These need to be idiot proof, so questions need to cover all alternatives and give little scope for variation.

**4.2- Representing data**

**Scatter Graphs**

These are used to compare two sets of data. One set of data is put on the x-axis (the horizontal axis) and the other on the y-axis (the vertical axis). If one set of data depends upon the other, this is put on the y-axis (and is known as the ‘dependent variable’). For example, if you were plotting a child’s height at various times, the height would depend upon the time and so the height is the dependent variable and goes on the y-axis, whereas time doesn’t depend on anything and so is the independent variable and goes on the x-axis.

Usually, we are looking to see if there is a relationship between the two sets of data. We draw a line of best fit. This should have roughly the same number of points above and below it.

The less scatter there is about the best-fit line, the stronger the relationship is between the two quantities. If the points are close to the best-fit line, we say that there is a strong **correlation**. If the points are loosely scattered, there is a weak correlation. We say there is **zero correlation** if there is no linear relationship between the variables- in other words if we can’t draw a meaningful best fit line.

Also, if the best fit line slopes upwards, like it does below, then the things we are comparing go up together. We say that there is a **positive correlation**. If the line slopes down, the ‘dependent variable’ decreases as the ‘independent variable’ increases. We say there is a negative correlation.

**Stem and Leaf Diagrams**

A stem and leaf diagram is a way of grouping your data into classes. The good thing about it is that from the diagram you can obtain the original data- so no information is lost.

**Bar Chart**

A bar chart is a chart where the height of bars represents the frequency. The data is ‘discrete’ (discontinuous- unlike **histograms** where the data is continuous). The bars should be separated by small gaps.

**Pie Chart**

A pie chart is a circle which is divided into a number of parts.

**4.3- Averages and Spread**

• Read, interpret and draw simple inferences from tables and

statistical diagrams

**4.4- Frequency Diagrams**

• Identify the modal class from a grouped frequency distribution

**Cumulative Frequency**

Cumulative frequency is the running total of the frequencies. On a graph, it can be represented by a cumulative frequency polygon, where straight lines join up the points, or a cumulative frequency curve.

Histograms

Histograms are similar to bar charts apart from the consideration of areas. In a bar chart, all of the bars are the same width and the only thing that matters is the height of the bar. In a histogram, the area is the important thing.

**4.5- Summary and Review**

**4.6- Assessment 4**

**Question 1 :** Find the mode, mean and median of the following data : 2, 5, 3, 4,9 ,11,5 ,7 ,8,5,6,3? Calculate the range?

**Question 2 :** In a class of 28 students the mean height of the 12 boys is 1.58 metres, and the mean height of all 28 students is 1.52 metres. Calculate the mean height of the girls?