Balancing Difficulty and Accessibility in Test Writing

Post by **Adi1008** » September 26th, 2018, 10:46 am

Hi all,

I'm writing tests (mainly for Astronomy) at a few tournaments. I've written tests for tournaments in the past, and I've always struggled with balancing difficulty with accessibility for less experienced teams, often leading to very lopsided score distributions.

As someone who has loved Astronomy for as long as I can remember, the last thing I want to do is write a test that discourages students from pursuing the subject further or makes them think Astronomy is too esoteric or difficult to understand. On the other hand, I remember the frustration I'd feel as a competitor with tests that were too easy and failed to create any separation between competitors, especially within the top teams (and moreso when a trip to Nationals is at stake!) And more importantly, I personally loved the thrill of taking a challenging test (e.g. Nationals Astronomy 2016, Princeton Astronomy 2017, etc) that pushed me to my limits. Taking hard tests is what made me love Astronomy even more and I want to recreate that feeling in everyone, including the best competitors who might find easier tests boring or lackluster

Over the past few years, I've found some interesting takes on test writing, difficulty, format, etc, such as prioritizing "gettable" questions over "gimme" ones and writing longer tests with easier problems, hoping that the length/speed of the test will separate competitors as opposed to difficulty. I'm curious to see what others personally think are the best ways or how their personal test writing philosophies shape the tests they write.

In short, to others who write tests: how do you try and balance these two elements of test writing?

JoeyC · Post by **JoeyC** » September 26th, 2018, 11:31 am

I have written tests on occasion, and feel that while a few "Gimme" questions are necessary to establish that the competitor(s) have at least a basic knowledge of the topic, (and if they don't it'll prompt them to learn), the most important parts of the test should be in depth questions that require strong understanding of the principles of the subject; application questions. As a test taker, easy tests disappoint me, and don't prompt me to learn anything; only when something hard and out there appears will I be pushed to up my game.

windu34's Userpage · Post by **windu34** » September 26th, 2018, 11:46 am

I too have been exploring this subject with much interest and plan to try a multi-part question format for the next test I write. I think the best tests facilitate students to actually learn how to apply their knowledge by asking "real world scenario" type problems that really make them think. I plan to try a format where I have between 8-15 questions with 4-6 parts (between 150-300 points available total depending on competitiveness and number of teams of tournament) to each question that presents the competitor with a scenario that they have to work through. The parts in each question will get more and more difficult and most will build off the previous part in some meaningful way. My Gen Chem 2 and Orgo 1 Professors used a format like this incredible effectively (IMO) and I am excited to apply it to Science Olympiad to hopefully make my tests more engaging and interesting to take. Tests that just consist of a crap-ton of unrelated questions really dont force students to understand the concepts, they just reward the students that have seen similar problems before. Of course the exact methodology for writing a test will depend on the event so this strategy may not hold true for all events, but I am excited to try it out for the Physics events I will be supervising.

Userpage · Post by **Unome** » September 26th, 2018, 12:13 pm

My attempts so far have mostly worked out how I intended - with the exception of the first Astronomy test that I wrote, which resulted in one ~60% score and everyone else below 35%. I find that a lot of the difficulty comes from the fact that, on many occasions, there really is no meaningful difference between the knowledge of the bottom 50-70% of teams - no matter how gradated the questions are, most of the teams will fall within a relatively small space - for my past tests, usually the 20-40% range - with a few teams really low and a wider distribution near the top.

As windu talked about, I definitely try to relate sequences of questions to each other, although I tend not to explicitly format it that way very often (Astro being the exception).

nicholasmaurer · Post by **nicholasmaurer** » September 26th, 2018, 3:53 pm

For some tests I have written, I explicitly structured them by difficulty. For each topic, I would create subsections that were explicitly labelled as easy, moderate, or difficult with questions to match.

Generally, I am for the low score to be ~20% and the high score to be ~80%. There may be an outlier, but its generally possible to get almost all of the teams distributed in this range if you're careful with your approach.

TheChiScientist · September 26th, 2018, 4:50 pm

The most effective tests I have seen are the ones that never have a 100% score. I have had a hand in writing/taking tests so what you want is a good format like this. Concepts and Plug and Chug equations should make up 20-40% of a test. These questions are normally the "easy ones" FRQ and deep thinking questions should make up about 30%-50% of a test. These questions tend to be moderately hard to hard. Finally, you should have 10%-30% of the test be college level questions that require in-depth knowledge of the subject in question. These questions should stay within what the rules allow but they should also make the well-prepared teams go "WTH is this!!!!!". These questions serve the purpose of rooting out the top performers from the poor performers. Any other space you feel you need to fill should be gimme questions but try to avoid these as they teach very little about the event. That's my 50 cents in a nutshell.

windu34's Userpage · Post by **windu34** » September 26th, 2018, 5:02 pm

TheChiScientist wrote:The most effective tests I have seen are the ones that never have a 100% score. I have had a hand in writing/taking tests so what you want is a good format like this. Concepts and Plug and Chug equations should make up 20-40% of a test. These questions are normally the "easy ones" FRQ and deep thinking questions should make up about 30%-50% of a test. These questions tend to be moderately hard to hard. Finally, you should have 10%-30% of the test be college level questions that require in-depth knowledge of the subject in question. These questions should stay within what the rules allow but they should also make the well-prepared teams go "WTH is this!!!!!". These questions serve the purpose of rooting out the top performers from the poor performers. Any other space you feel you need to fill should be gimme questions but try to avoid these as they teach very little about the event. That's my 50 cents in a nutshell.

I would disagree with this approach. Throwing in random, sparsely-related subject matter that isnt relevant to the big picture of the event is really just rewarding the teams that have perfected their cheat sheets. How is that a good way to assess which teams truly UNDERSTAND what is on their cheat sheet? The hardest questions on the test should consist of applying various inter-related concepts of the event to solve a problem (or series of problems). The first reaction shouldnt be "What is this?", but rather "How the heck am I going to approach this?".

Post by **dxu46** » September 26th, 2018, 5:04 pm

Give it to your partner (or some other knowledgeable person) and if they get 85% or more, it's a good test.

TheChiScientist · September 26th, 2018, 5:18 pm

windu34 wrote:
TheChiScientist wrote:The most effective tests I have seen are the ones that never have a 100% score. I have had a hand in writing/taking tests so what you want is a good format like this. Concepts and Plug and Chug equations should make up 20-40% of a test. These questions are normally the "easy ones" FRQ and deep thinking questions should make up about 30%-50% of a test. These questions tend to be moderately hard to hard. Finally, you should have 10%-30% of the test be college level questions that require in-depth knowledge of the subject in question. These questions should stay within what the rules allow but they should also make the well-prepared teams go "WTH is this!!!!!". These questions serve the purpose of rooting out the top performers from the poor performers. Any other space you feel you need to fill should be gimme questions but try to avoid these as they teach very little about the event. That's my 50 cents in a nutshell.
I would disagree with this approach. Throwing in random, sparsely-related subject matter that isnt relevant to the big picture of the event is really just rewarding the teams that have perfected their cheat sheets. How is that a good way to assess which teams truly UNDERSTAND what is on their cheat sheet? The hardest questions on the test should consist of applying various inter-related concepts of the event to solve a problem (or series of problems). The first reaction shouldnt be "What is this?", but rather "How the heck am I going to approach this?".

Whoops. Probably should have worded that differently. The main idea I am trying to get at is you should have a part with concept understanding questions. These being the "do you know what you are doing" questions. Harder questions should be the do you understand what the questions is asking you and how you must solve it? The hardest questions should be the ones that really make the teams think and they should have to understand the questions wholeheartedly in conjunction with using their cheatsheet to comprehend how to solve the question. These questions are the one that make teams go initially "WTH is this!!!" but after thinking if they trully understand the concepts in question they should have an "aha" moment. Overall teams have to understand what is in front of them by having prior background knowledge and not just what is on their cheatsheet. I think that should clear up my thinking.

Userpage · Post by **Unome** » September 26th, 2018, 5:38 pm

nicholasmaurer wrote:Generally, I am for the low score to be ~20% and the high score to be ~80%. There may be an outlier, but its generally possible to get almost all of the teams distributed in this range if you're careful with your approach.

This is basically what I try to do, although I use 15% and 70% as my benchmarks, since 80% in Georgia (excluding outliers) would mean removing almost every question that requires thinking.

Scioly.org

Balancing Difficulty and Accessibility in Test Writing

Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Re: Balancing Difficulty and Accessibility in Test Writing

Who is online

Connect

Learn

Get Involved

About

Disclaimer