One of the nicest, if unintentional, compliments I received from a student was one who complained that “you can’t even Google answers for tests in Dr. Barrett-Fox’s classes.” The student was frustrated that, even though most questions on most of my exams for Intro to Soc are trule/false and multiple choice, the answers weren’t ones you could find on a page of your textbook or a quick internet search; they required student thought to apply familiar knowledge to novel situations. The comment also suggested to me that there are a lot of online classes where exam answers are easily found online. The student’s comments were affirmation to me that we can design exams that require deep thought and careful application—and still make them easy to grade.
This is a primer on how to do it. It is for use in classes where exams are taken online (which can be F2F classes—just reserve a computer lab and have students take the exam there) and can be used for classes at any level and include any type of question (including essays written in class).
It is also one area where I think investment of time is worthwhile even if you pivoting online rather than building an online class.
The heart of is it backwards design. This means you begin by selecting the concepts or skills you want students to assess first, then figure out how to teach them. (You probably have already done this if you use exams in a F2F class.)
In a large online class, question banks are central to the success of tests that are validity. Instead of trying to use tech to thwart cheating, I use test design—specifically, test banks. Test banks ensure that students are not all facing the same questions, even as the questions they do face cover the same content and present the same level of challenge. Here’s how:
1.Make a list of all the content knowledge, skills, or other things you want to assess.
2. For each, decide at what level in Bloom’s taxonomy you want to measure it. This will depend on a number of factors, including the course level, the other ways you are assessing students, the number of students whose work you will have to evaluate, whether you have support in the form of a grader or graders, and more. In general, higher level courses have fewer students assume prerequisite knowledge (and so don’t need to assess whether students remember or understand basic concepts), and invite higher order thinking, which means more writing and other creative work that is more time-intensive for grading.
Above, Bloom’s taxonomy. By Fractus Learning – Fracus Learning, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=69357265
3. Create a chart. Row 1 are the levels in Bloom’s taxonomy. The columns are the areas of student knowledge you want to assess. For example, in my Intro to Soc class, students take an exam over the topic of deviance. Because this is an introductory course, I don’t expect students to be able to perform the top of the taxonomy, so I don’t include “Create” here (though this might be appropriate for a final project or exam).
Here is an example, but it’s short. For a real test, I have about 40-50 concepts, with one question per concept, but, again, this depends on many factors, like level of the course and other assessments being used to measure student knowledge. So just imagine that this table goes on for about 40 more rows.
|crime v. deviance|
|categories of crime|
|how to find the definition of a crime in the FBI Uniform Crime Report|
|how to read the Clery Act information for their campus|
|deprivation theory of deviance|
|strain theory of deviance|
|differential-association theory of deviance|
|trends in crime rates|
5. For each concept or skill, I select ONE level to assess it at. Again, this will vary according to your course level. 4000-level courses should be assessing students at higher levels than 1000-level courses. In Intro, I will typically ask a few remember questions, a lot of understand apply, analyze, and a few evaluate. In the chart below, I’ve selected just 10 of the concepts or skills I want to assess students’ mastery of to illustrate my point, but my actual chart would have many more rows—one for each question students will face (so, typically 40-50 rows for an Intro exam). Note that I could ask a question at any level about any of these. For example, I could ask students to define labeling theory all the way up to create a legal defense based on it.
|crime v. deviance||X|
|trends in crime rates||X|
|categories of crime||X|
|how to find the definition of a crime in the FBI Uniform Crime Report||X|
|how to read the Clery Act information for their campus||X|
|deprivation theory of deviance||X|
|strain theory of deviance||X|
|differential-association theory of deviance||X|
You see in the chart above that 2/10 questions are of the lowest order and 1/10 is of the highest order (of the types of questions I ask since I don’t ask create ones on exams in Intro). I expect almost all students to get the lower order ones correct, fewer students to get the higher order ones correct, and the fewest number students to get the highest order ones correct. (After students take the test, you can analyze patterns of error to see if you questions are distinguishing between students effectively. If students who got every other answer right missed the easiest two questions, maybe they aren’t as easy as you thought.
These cells become the categories for the test bank. This means that every student will get one remember question about crime v. deviance, one remember question about trends in crime, one understand question about categories of crime, etc.
6. Next, I write the questions for each category. How many I write per category depends on how many students I have in a class. I always aim for at least three questions, but for large classes, like Intro, I often have five questions per category.
You are doing the math on this, right? If an exam has 50 questions and each category of question has 5 questions within it, that’s 250 questions per exam. Students will only face 50 of them, drawn at random from each category, so it’s unlikely that any two exams will have many of the same questions—even as all students are tested on the same material and presented with questions of the same difficulty.
Typically, writing an exam this way takes me 12-15 hours. This is a major reason why designing an excellent online course requires at least 100 hours of work.
The questions for each category must be comparable (which is why I use Bloom’s taxonomy to guide me). For example, if I want to be sure that students are able to read the campus Clery Act information, which reports on crimes on campus, I may create one question about property crime on campus, one about property crimes in the dorms, one about minor in possession of alcohol in dorms, one about battery on campus, and one about theft on campus, which are some of the ways that my campus sorts information in the annual report. In any one attempt at the test, students would then have a 1/5 chance of getting any one of those questions—which makes “collaborating” with a classmate much harder. If they take the exam a second time–which I recommend allowing–then they may face it again, but that’s unlikely. And if they do, it’s okay.
7. Your LMS will give you the option of creating exams that draw a certain number of questions from the categories you choose. If you aren’t sure how to actually enter the questions in your LMS, do a quick online video search for help.
Should you go to this much effort in an course that is quickly moving online? Not necessarily.
You can save yourself time now by writing essay questions that require more synthesis—so, for example, you might write just 5-10 essay questions, but each demands that students demonstrate knowledge from multiple categories. (For example, Select a crime that occurred on our campus 2000-2020 and was reported in the student newspaper and recorded in our university’s Clery Act information. After briefly detailing it (being sure to define it using the appropriate—local, state, or federal—definition) using the journalistic description, which you must cite, analyze it using one structural-functionalist theory of deviance and one conflict theory of deviance.) These are easier to write, but they will take more time to grade. Note that questions that are unique to your students (like a question about their campus) become are less “cheatable” than ones like “Choose a theory of deviance and define it.” Again, if you use a test bank, even for essay questions, students would not easily be able to collaborate.
Or you can use quizzes instead of tests, which keeps the focus tighter and allows you to get assessments online faster. So, for example, perhaps instead of having an entire exam over deviance, I could have a quiz over crime (definitions, trends) and one over theories of deviance. In the end, this may be more work for me, but it could mean that students aren’t without work while they wait.
Or you could write just 2 questions per category. That’s still enough, especially when combined with random ordering, to make cheating a challenge. Or maybe some categories have just 2 questions and some have 3, 4, 5, or more.
A final idea is to include an oral question, one that students must answer in a video recording. This doesn’t guarantee they won’t cheat–since they could have someone else write their answer, or and you probably can’t guarantee that you can match the face on the screen with the face in your visual roster. And it presents tech barriers. But it could be one option among other choices for students (though a pain to grade if it produces 200 1-2 minute long videos for you to watch).
A final reminder: We don’t know the many barriers our students are facing now. Consider giving students an option of taking a paper-based test and returning it to you via mail. If possible, give them an envelope for it before they leave campus, already stamped and addressed. Anything to lower the burden on them while ensuring that they still have the opportunity to learn.