AlphaPlus - Supporting awarding organisations

Assessment Quality Improvement service

AQI Services 2014-15

You can find more details of each of our AQI services below. We're always happy to tailor our AQI services to your requirements so please do contact us if you'd like to discuss your requirements further.

Improving the quality of your knowledge-based multiple-choice questions

Description: This strand of analysis looks at fundamental questions for assessment quality. It is delivered as a workshop to discuss the results of the analysis we have undertaken.

Topics covered:

  • The (average) difficulty of questions. Does this correspond to what you (or your item writers) intended?
  • Discrimination (relationship between scoring on a particular question, and the test overall). Is this question separating overall strong from overall weak candidates? If not, does this suggest it is not really assessing what you think it is?
  • For multiple-choice questions, distractor (wrong answer) analysis. Are the distractors plausible? Is one of the distractors too plausible – i.e. is it actually correct?
  • Practical approaches to planning and taking remedial action based on analysis of question performance. Approaches to focusing on achieving the maximum benefit for the minimum amount of time and effort.
  • Item writer guidelines and item writer training. What is good practice in writing questions? How can you train and manage teams of question writers and edit their submissions?

Monitoring and standardising marking processes

Description: This strand looks at carrying out data analysis to answer marker performance questions and also provides best practice advice on marker training and standardisation processes (with the intention of maximising the quality of standardisation while the marking is taking place, and then undertaking analysis to measure the quality and provide intelligence on which to base future training and standardisation processes). It is delivered as a workshop to discuss the results of the analysis we have undertaken.

Topics covered:

  • Ways of assessing how well your markers are performing. Are they consistent with the lead markers, and/or with each other, and/or over time?
  • What is the agreement rate between markers?
  • Are any markers too lenient, too severe or inconsistent?
  • Do markers drift away from the standard over time?
  • Are there any markers who are not so good at marking particular questions?

Providing best practice advice and guidance in areas such as:

  • Standardisation of markers
  • Writing mark schemes that markers can interpret consistently
  • Effective methods for using on-screen marking engines

Practical reliability analysis and interpretation

Description:  Reliability is a key indicator of assessment quality. It is about the consistency of results; would test takers get a different result if they attempted a different version of the same test, or took the test on a different day, or in a different test window? It also quantifies the extent of error in results; how much of the scoring on the test concerned can be explained by the subject of the assessment, and how much is random error?

This is provided as a training course. We customise the content to match the particular assessment scenarios that the AO is using, coupled with reviewing the analysis that we have undertaken.

Analysis and outputs:

  • Reliability analysis conducted as part of the service provides an indication of whether the assessments are reasonably defensible or not.
  • The outputs are carefully written to be jargon-free and accompanied by clear ‘rules of thumb’ explaining the significance of the result, and what action could be taken.

Principles of qualification design

Description:  This consultancy service helps assessment organisations to design qualifications to maximise validity and reliability. This is typically provided as a workshop to discuss the results of the sample development work we have undertaken.

It focuses on the principles of meaningful learning outcomes and assessment criteria, design appropriate grading schemes and create meaningful combinations of units.

We also provide advice on how to gather and organise underpinning evidence for successful regulatory applications.

Grading and standards setting

Description:   We assist your team in planning, running and documenting the results of grading or standards setting meetings. This is typically provided as a workshop to discuss the options for approaches to grading design and standard setting activity.

This includes:

  • ensuring that the meanings of standards and grades are as clear, complete and unambiguous as possible, and ensuring that these definitions underpin the standard setting processes to follow
  • identifying a context-appropriate (effective, timely, affordable) approach to standards setting (e.g. various Angoff approaches, bookmarking, contrasting groups)
  • identifying the types of participants that would be necessary to run a standards setting meeting
  • Identifying documents and data necessary to run a standards setting meeting
  • assisting in running the meeting and in recording outcomes to provide a documented and defensible case for the operation of the qualification.

Comparability of different versions of tests and questions

Description:  Using statistical techniques such as Classical Test Theory and Item Response Theory to understand the extent to which tests and questions are comparable in terms of difficulty. This can either be provided as a training programme, or as a workshop following analysis by our statistics experts.

Topics covered  may include the following areas of comparability:

  • Various (fixed) forms that may be available around the country, world, across different test sessions, etc.
  • Forms generated dynamically from e-assessment item bank databases. (For example, if candidates receive different questions from each other, how might this have a measureable impact on results?)
  • Results at different centres, or regions or types of centre (e.g. private training provider vs. FE college)

Using effective qualitative analysis techniques to establish the demands (the intellectual content) of assessments. For example:

  • If two assessment methods are used in parallel within a unit, are they assessing the same thing (comparability of demands)?
  • Can a new assessment method be introduced without accusations of ‘dumbing down’?

Reasonable use of tests and questions over time

Description: This service allows assessment organisations to balance assessment requirements such as security (prevention of cheating and other aspects of maladministration), reliability and comparability against practical drivers such as producing a feasible number of questions, requiring test versions to be live for economic time periods (i.e. not requiring excessive quantities of tests and items to be produced). This can either be provided as a training programme, or as a workshop following analysis.

The consultancy would look at issues such as:

  • How many questions are necessary to address each Learning Outcome?
  • What practical and justifiable approaches can be taken to sampling (identifying the proportion of LOs to be directly assessed)?
  • What is a reasonable expectation to place on authors for question writing; how many questions can writers be expected to produce in a given time period? What are the approaches to setting reasonable rates of pay for writers? What percentage of questions getting amended or rejected at editing is reasonable?
  • How can under-performing question writers be managed and improved?
  • How frequently should a body rotate items and test versions?
  • How to should organisations use statistics to understand test exposure over time (for instance, how to deal with concerns that questions appear to be ‘getting easier’)?


For more information about our AO AQI services, please contact Giilian Whitehouse:

Email Gillian

Phone: 07881 361718