Item Analysis for Language Tests

Learn how to use either SPSS or Microsoft Excel to carry out item analysis and improve your language tests.

by Tim Martyn

someone analysing a multiple choice test

What is item analysis?

Item analysis is a set of statistical procedures that can be used to work out how well items in a language test are functioning.

It can be used to ensure that:

  • the items are at an appropriate level of difficulty – not too easy and not too hard – for a particular group of students;
  • the items are discriminating between stronger and weaker students;
  • there aren’t any faulty items or errors in the answer key;
  • distractors in multiple choice questions are indeed distracting.

If you want to ensure that items in your test are functioning as intended, item analysis is critical. Without it, any faulty items in your test will remain hidden. For example, there could be an item in your test that weaker students perform better on than stronger students – which isn’t logical – and you’d be none the wiser.

This tutorial focuses on what’s known as classical item analysis, a powerful but also very accessible set of procedures. To carry out item analysis, you can use either IBM SPSS Statistics – a statistical package that’s very widely used in the social sciences – or Microsoft Excel. This tutorial will provide instructions for both of these programs.

Note: In this tutorial, I mention a few well-established rules of thumb that can help you work out how well your items are functioning. However, it’s important to take the purpose of your test into account when interpreting the results. If it’s an achievement test, for example, you can expect the results to be different to those of a multi-level proficiency test. I comment further on this issue throughout the tutorial.

Prerequisites

There are a few things you’ll need in order to carry out item analysis successfully.

1. A single construct

Construct is the technical term for the thing you’re trying to measure. In a language test, you’re usually trying to measure students’ language proficiency, but for the purpose of item analysis, your construct needs to be more specific than this.

For example, if your test has both reading and listening sections, you’ll need to analyse the reading and listening items separately. This is because reading skills and listening skills aren’t the same construct. A particular student might be very good at reading but much weaker at listening.

2. Dichotomous items

Dichotomous items are ones that can be either correct or incorrect. In language tests, this is usually the case, but some tests use partial credit. In a listening test, for example, students might be awarded half a mark if they get the answer correct but make a spelling mistake. It’s possible to carry out item analysis with this kind of item, but it’s more complex, so we won’t cover it in this tutorial.

3. Enough students

With item analysis – as with all statistical analyses – the larger the sample size, the more confidence you can have in the results. There’s no magic number, but I’d suggest that you aim to have around 100 students in your sample. You can still get useful information with as few as 50 students, but if you have a very small sample (e.g. a class of 18 students), you’re likely to get misleading results.

Step 1 – Preparing your file

The first step is to prepare your file for data entry.

Make sure you’re on the Variable View tab. This is where you’ll enter information about your variables. For item analysis, you need one ID variable and one variable for each item.

Add the ID variable

Start by entering ID in the first cell in the Name column. The other values in the first row will populate automatically. In the Decimals column, change 2 to 0. Also change Unknown to Nominal in the Measure column.

Add the item variables

Now, in the cell directly under ID, enter Item01. Again, the rest of the values in the row will populate automatically. Just change Unknown to Scale in the Measure column. Continue adding item variables until there’s one row for each item.

Here’s a video demonstration of this step.

You can download a copy of my Excel spreadsheet with all of the required formulas by clicking the button below.

The spreadsheet is set up for 30 students doing a 30-item test. If the number of students/items in your sample is different, you’ll need to add/delete rows and/or columns.

To add students, you need to:

  1. select row 32;
  2. right-click and select Insert;
  3. keep pressing Control + Y until you have enough rows;
  4. add any missing student numbers.

To add items, you need to:

  1. select column AF;
  2. right-click and select Insert;
  3. keep pressing Control + Y until you have enough columns;
  4. add any missing item numbers.

To delete students, you need to:

  1. select row 32;
  2. right-click and select Delete;
  3. continue until you have the correct number of students.

To delete items, you need to:

  1. select column AF;
  2. right-click and select Delete;
  3. continue until you have the correct number of items.

The reason you should start in row 32 when adding/deleting students and column AF when adding/deleting items is so that you don’t disturb the formulas in the spreadsheet.

Also note that, when deleting students or items, you can select multiple rows/columns and delete them in one go if you prefer.

Note: If you add or delete items, you’ll also need to add or delete them on the right-hand side of the spreadsheet in the section labelled “Corrected totals”. At this stage, don’t worry what this side of the spreadsheet is for – I’ll explain in a later step.

Here’s a video demonstration of this step.

Step 2 – Entering data

The next step is to enter your data. This can be very tedious, but it’s a critical step. Take your time and double-check your work. If you enter data incorrectly, the results will be affected.

Switch to the Data View tab. This is where you’ll add all of the test data.

In the first column, start by adding an ID number for each student. This can just be 1, 2, 3, etc. As an example, if you’ve got 50 students in your sample, you’ll need 50 rows.

In each cell, enter 0 if the student got the item incorrect and 1 if they got it correct. For example, let’s imagine that student 1 got item 1 correct. In the first row, you’d enter 1 in the Item01 column. But if they got item 2 wrong, you’d enter 0 in the Item02 column.

Here’s a video demonstration of this step.

In each cell, enter 0 if the student got the item incorrect and 1 if they got it correct. For example, let’s imagine that student 1 got item 1 correct. You’d enter 1 in cell C3. But if they got item 2 wrong, you’d enter 0 in cell D3.

Here’s a video demonstration of this step.

Step 3 – Calculating facility values

You’re now ready for your first calculation. A facility value tells you how easy or difficult each item is.

The calculation itself is very simple. Let’s imagine that 30 students did the test and 18 of those students got item 1 correct. To calculate the facility value for item 1, you’d divide 18 by 30, which would give you a facility value of 0.60. This means that 60% of the students got the item correct. A facility index of 0.05, on the other hand, would mean that only 5% of the students got the item correct.

In the top menu, click on Analyze → Descriptive Statistics → Frequencies.

In the pop-up window, select all of your item variables and move them across to the Variables box. Then click OK.

You’ll then see an output table for each item. You’ll find the facility value for each item in the Valid Percent column, in the row for 1.00.

Here’s a video demonstration of this step.

There’s a formula in cell C33, which calculates the facility value for item 1. You can copy this formula to adjacent cells so that you can see the facility value for other items. You need to:

  1. click on cell C33, which is the one that contains the formula;
  2. put the black cross over the little square in the bottom right-hand corner of the cell;
  3. hold the left button of your mouse down as you click on the little square;
  4. drag your mouse to the right and release the button once you’ve got to the last item.

Note: If you added students to the spreadsheet (or deleted some), the cell containing the formula won’t be C33, but you can easily find it. Look for the yellow cell labelled “FAC.” – the cell with the formula is the one to the immediate right of it.

Here’s a video demonstration of this step.

Interpretation

For proficiency tests, facility values between 0.30 and 0.70 are generally considered acceptable (Green, 2019). However, as long as items discriminate between stronger and weaker students, your facility values can be outside this range. (You’ll learn about discrimination in the next step.)

For achievement tests, facility values tend to be quite high – often in the 0.80 – 0.90 range – which suggests that students have understood the course content (Green, 2013). We’d also expect facility values to be relatively high for items on a proficiency test that targets a specific level (e.g. B1 on the CEFR).

In some cases, test developers opt to leave a small number of easy items at the start of their tests – even if they don’t discriminate well – to make students feel confident and lower their stress levels (Hughes, 2003).

Step 4 – Calculating discrimination indices

A discrimination index tells you how well an item is able to discriminate between stronger and weaker students. In other words, it tells you whether stronger students are more likely to get the item correct, which is what you’d expect.

The index is a special correlation – called a point-biserial correlation – between students’ scores on a particular item and their total scores less their score on the item in question. If we don’t make this correction, the discrimination indices will be inflated.

If an item discriminates well, we’d expect to see a positive correlation coefficient. If it doesn’t discriminate well, we’d expect the correlation coefficient to be close to 0.

In the top menu, click on Analyze → Scale → Reliability Analysis.

In the pop-up window, select all of your item variables and move them across to the Items box.

Then click on the Statistics button. At the top left, check the boxes next to Item, Scale and Scale if item deleted. Then click Continue.

Make sure Alpha is selected to the right of Model, and then click OK.

You’ll find the discrimination indices in the Item-Total Statistics table in the column called Corrected Item-Total Correlation.

Here’s a video demonstration of this step.

In Excel, there are a few steps involved in calculating discrimination indices.

1. Calculate total scores

Once you’ve entered all of the 0s and 1s, you need to calculate each student’s total score. In the spreadsheet, there’s a formula in cell AG3. You can copy this formula to the cells below it so that you can see the total score for each student. You need to:

  1. click on cell AG3, which is the one that contains the formula;
  2. put the black cross over the little square in the bottom right-hand corner of the cell;
  3. hold the left button of your mouse down as you click on the little square;
  4. drag your mouse down and release the button once you’ve got to the last student.

Note: If you added items to the spreadsheet (or deleted some), the cell containing the formula won’t be AG3, but you can easily find it. Look for the blue column. The cell with the formula is the very top blue one.

2. Calculate corrected totals

Before you can calculate the discrimination index for each item, you need to fill in the Corrected totals section of the spreadsheet. You need to:

  1. click on cell AH3, which is the one that contains the formula;
  2. put the black cross over the little square in the bottom right-hand corner of the cell;
  3. hold the left button of your mouse down as you click on the little square;
  4. drag your mouse down and release the button once you’ve got to the last student;
  5. with all of the cells for item 1 selected, hold the left button of your mouse down as you click on the little square again;
  6. drag your mouse to the right and release the button once you’ve got to the last item.

Note: If you added items to the spreadsheet (or deleted them), the cell containing the formula won’t be AH3, but you can easily find it. Look for the blue column. The cell with the formula is to the immediate right of the very top blue one.

3. Calculate the discrimination indices

You’re now ready to calculate the discrimination indices themselves. You need to:

  1. click on cell AH33, which is the one that contains the formula;
  2. put the black cross over the little square in the bottom right-hand corner of the cell;
  3. hold the left button of your mouse down as you click on the little square;
  4. drag your mouse to the right and release the button once you’ve got to the last item.

Note: If you added students to the spreadsheet (or deleted some), the cell containing the formula won’t be AH33, but you can easily find it. Look for the yellow cell labelled “DIS.” – the cell with the formula is the one to the immediate right of it.

Here’s a video demonstration of this step.

Interpretation

When it comes to discrimination indices, higher is better. As a general rule, items with discrimination indices of 0.30 or higher are considered to be discriminating between stronger and weaker students (Green, 2013).

Negative discrimination indices

If an item has a negative discrimination index, it’s a major red flag because it tells you that weaker students are performing better on that particular item than stronger students, which isn’t logical. There are two common causes:

  • There might be an error in the answer key.
  • There might be something about the item/instructions that’s confusing stronger students.

Unless the cause of the issue is clear (e.g. a problem with the answer key), items with negative discrimination indices will need to be dropped from the test.

Small discrimination indices

If the discrimination index is small, it tells you that there’s little to no relationship between students’ scores on that particular item and their total scores. There are several possible causes:

  • Students might be able to guess what the correct answer is. This is often the case with multiple choice questions when some of the distractors are implausible.
  • There might be more than one correct answer.
  • There might not be a correct answer.
  • The item might be measuring or partly measuring another construct. This commonly occurs when students need some kind of general or subject knowledge to be able to answer items correctly. It also occurs when students need to do some kind of mathematical calculation to answer a question correctly.

Items with small discrimination indices need to be reviewed. If they can’t be improved, they may need to be dropped from the test.

Note: For achievement tests, as well as proficiency tests that target a specific level (e.g. B1 on the CEFR), the discrimination indices will generally be lower than they would be on a multi-level proficiency test. This is because students who take such tests generally do relatively well, so there’s likely to be less variability in test scores.

Step 5 – Calculating Cronbach’s alpha

Cronbach’s alpha is a measure of a test’s internal consistency (reliability). It shows the extent to which the items in a test are related to each other. (We want them to be related.)

Cronbach’s alpha was actually calculated as part of Step 4. You’ll find it in the Reliability Statistics table.

SPSS also outputs an interesting statistic called Cronbach’s Alpha if Item Deleted, which you’ll find in the Item-Total Statistics table. This tells you what Cronbach’s alpha would be if a given item was removed from the test. Removing items with small or negative discrimination indices will usually increase Cronbach’s alpha.

Here’s a video demonstration of this step.

To calculate Cronbach’s alpha, you first need to calculate the variance for each item. You need to:

  1. click on cell C36, which is the one that contains the formula;
  2. put the black cross over the little square in the bottom right-hand corner of the cell;
  3. hold the left button of your mouse down as you click on the little square;
  4. drag your mouse to the right and release the button once you’ve got to the last item.

Note: If you added students to the spreadsheet (or deleted some), the cell containing the formula won’t be C36, but you can easily find it. Look for the section called “Variances (items)” – the cell with the formula is the very first blue one on the left.

The other calculations you need to work out Cronbach’s alpha – i.e. the number of items in the test, the variance of the total scores and the sum of the variances of the items – are done automatically. If you haven’t added/deleted students, they’ll appear in cells C39, C42 and C45. Cronbach’s alpha will appear in cell C48, to the immediate right of the yellow cell labelled “REL.“.

Here’s a video demonstration of this step.

Interpretation

Cronbach’s alphas of 0.70 or higher are generally considered to be acceptable, with alphas over 0.80 preferred (Green, 2019).

A couple of points about Cronbach’s alpha:

  • Because internal consistency and discrimination are related, if the items in the test discriminate well, Cronbach’s alpha will likely be higher.
  • All else being equal, the more items there are in a test, the higher Cronbach’s alpha will be. If Cronbach’s alpha is very high, you can probably remove items from the test and still have a sufficiently high Cronbach’s alpha.

Note: For achievement tests, as well as proficiency tests that target a specific level (e.g. B1 on the CEFR), Cronbach’s alpha will generally be lower than it would be on a multi-level proficiency test. This is because there’s likely to be less variability in test scores and therefore less discrimination. For example, the Cronbach’s alpha for the IELTS test, which is designed to test candidates from a range of levels, is typically around 0.90. For the Cambridge B2 First, on the other hand, it’s lower – usually around 0.80.

References

Green, R. (2013). Statistical analyses for language testers. New York: Palgrave Macmillan.

Green, R. (2019). Item analysis in language assessment. In V. Aryadoust & M. Raquel (Eds.), Quantitative data analysis for language assessment, Volume I: Fundamental techniques (pp. 15–29). London and New York: Routledge.

Hughes, A. (2003). Testing for language teachers (2nd ed.). Cambridge: Cambridge University Press.