When making estimates about a whole population based on a sample of that population, there is a margin of error around those estimates. Confidence Intervals (C.I.) are used to give an indication of the likely size of this margin. The smaller the sample size and the greater the degree of variation, the wider the C.I. will be.
Confidence intervals are used as a clear and simple method of determining whether the difference between two groups is “statistically significant”, i.e. whether there is sufficient evidence to suggest that it reflects a real difference in the wider population. If the confidence intervals of the two comparative sets of data overlap, then the difference is not statistically significant (e.g. Chart A, below). If the confidence intervals do not overlap, then the difference is statistically significant (e.g. Chart B, below).
Depending on the situation, Confidence Intervals may be explicit as in the charts above, or simply be reflected in the commentary, i.e. differences may be highlighted if they are statistically significant and apparent differences not mentioned if they are not statistically significant.
Medians and quartiles
The median for a dataset is the value such that 50% of the data points are lower and 50% of the data points are higher when all data points are arranged in order of size. It is an overall summary measure that is less affected by the presence of extreme values (outliers) than the mean (which is another type of average summary measure calculated by summing all the data points and dividing by the number of them).
Further explanation of the use of the median and the mean can be found in the resource box.
The lower quartile for a dataset is the value such that 25% of the data is lower and 75% of the data is higher. The upper quartile for a dataset is the value such that 75% of the data is lower and 25% of the data is higher. The term "quartile" is also used to refer to a range bounded by the quartile values. For example, saying that a score lies "in the upper quartile" really means it lies in a range bounded by the upper quartile value and the highest score achieved. Saying it lies "in the second quartile" really means it lies in a range bounded by the median and the upper quartile value. The "inter-quartile range" is the range of values bounded by the upper and lower quartiles.
Quartiles and medians have been used for two specific purposes:
- To compare earnings in Herefordshire with those elsewhere at three different points on the earnings distribution e.g. low earners (25% lower quartile), average earners (median) and high earners (25% upper decile).
- To compare performance in Herefordshire with all English Authorities. Here the terms top quartile and bottom quartile have been used.
- If high values of a performance indicator are desired:
the top quartile of authorities equates to the upper quartile of the dataset
the bottom quartile of authorities equates to the lower quartile of the dataset.
- If low values of a performance indicator are desired:
the top quartile of authorities equates to the lower quartile of the dataset
the bottom of authorities quartile equates to the upper quartile of the dataset.
- In some cases, the terms "best" and "worst" are used in preference to "upper / lower" or "top / bottom", to avoid confusion.
Understanding sub-county geographies
There are two key things to remember about statistics for areas smaller than counties:
1. the difference between statistical geographies and administrative geographies, and
2. how different areas ‘nest’ into each other (the geographical hierarchy)
Geographies that are used for administrative purposes are not always ideal for publishing statistics: they can vary substantially in population size meaning that comparisons aren’t really appropriate (e.g. wards), or can even be so small that publishing statistics could risk identifying an individual (e.g. parishes). They can also change significantly over time.
Despite these disadvantages, administrative geographies are often well-known and meaningful to people, and statistics are needed for people and organisations to understand the characteristics of their areas. Commonly used administrative geographies in Herefordshire are parishes, wards and localities. Statistics can also be produced for Hereford city and the five market towns.
Another complexity of administrative geographies is their hierarchy. Civil parishes are generally the smallest building block in rural areas and these ‘nest’ into group parishes (in some cases) and then wards. However, in larger urban areas (Hereford, Leominster and Ross) wards can be smaller than parishes, and they don’t always nest perfectly. Localities are built from group parishes, and contain more than one ward – but some wards cross locality boundaries.
In 2004, the Office for National Statistics (ONS) designed a standard set of statistical geographies to avoid the problems associated with administrative geographies. Using the results of the 2001 Census, they were designed to have similar population sizes and, to a certain extent, similar characteristics. They also ‘nest’ into a clear hierarchy: output areas (OAs) of about 300 people group together to form lower super output areas (LSOAs) of about 1,500 people, which in turn group to form middle super output areas (MSOAs) of about 7,500 people. MSOAs don't cross county boundaries.
More detail about the statistical geographies can be found in a QER article from May 2004, including an explanation of how local names for LSOAs were devised to give more of an indication of the area covered than the generic codes allocated by ONS.
One of the main principles for the statistical geography hierarchy is stability over time, meaning that the results of the 2011 Census were based on the same statistical geographies. Some changes (less than 5% of areas across England & Wales) can occur if areas have experienced large population change over the decade, but these are limited to splits or mergers of existing statistical geographies.
The relationship between administrative and statistical geographies in Herefordshire
OAs were designed to respect parish and ward boundaries and LSOAs to respect ward boundaries so it is possible to create statistics for current (2003) wards and parishes by aggregating the statistical geographies. Statistics for different areas can be found using the interactive map on the homepage.
If ward or parish boundaries change (ward boundaries are being reviewed for the 2015 local elections) so that they are no longer aligned to the OA or LSOA boundaries, statistics will have to be created by ‘best-fitting’ the statistical geographies to the administrative ones.
Rural urban classifications
Rural and urban classifications are the product of a project commissioned jointly by the Office for National Statistics (ONS), the Department for Environment, Food and Rural Affairs (Defra), the Office of the Deputy Prime Minister (ODPM), the Countryside Agency (CA) and the Welsh Assembly Government to create a new settlement-based definition of urban and rural areas.
In 2004, classifications were published for all output areas, lower super output areas (LSOAs) and wards in England & Wales by settlement form and sparsity. The categories used are:
- Rural town (town and fringe)
- Rural village
- Rural dispersed (hamlets and isolated dwellings)
Each of these categories is then divided into "sparse" and "less sparse". Settlements of 10,000 people or more are considered urban. Hereford, Leominster and Ross-on-Wye fall into this category. The remaining market towns (Kington, Bromyard and Ledbury) are classified as "rural towns".
Following the initial classification of output areas, a classification of local authority areas was initiated by Defra. This classified local authorities as either major urban, large urban, other urban, significant rural, rural-50 or rural-80. Herefordshire is classified as rural-50; between 50% and 80% of the local authority's population live in a rural settlement.
The rural nature of Herefordshire can be seen in the map below. You may notice a rogue urban area just north-east of Ledbury - it seems likely that when classifications were made, the Malvern Hills were not taken into account, so villages in the Colwall area may have been seen as suburbs of Malvern.
Map 1. Rurality of Herefordshire at Output Area level