Musings on Test Length

Alumni make some of the best volunteers!
User avatar
BennyTheJett
Exalted Member
Exalted Member
Posts: 391
Joined: February 21st, 2019, 2:05 pm
Division: C
Pronouns: He/Him/His
Has thanked: 83 times
Been thanked: 171 times

Re: Musings on Test Length

Post by BennyTheJett » January 13th, 2021, 7:17 am

EastStroudsburg13 wrote:
January 12th, 2021, 10:07 am
knightmoves wrote:
January 11th, 2021, 7:55 pm
EastStroudsburg13 wrote:
January 11th, 2021, 6:00 pm
I guess it's technically a choice to break the 3/4 tie and not break the 15/16 one. I just think it's a really bad one.
But unless events are going to have spare medals on hand (so you can award 2nd place to 2 teams) you have to break the tie
If you simplify down to this, you get my position. In my opinion, you can either not break ties and have spare medals just in case, or you break all ties. I really don't like the idea of treating medalists any differently than all of the other teams competing in the event.
Strongly agree with East here. I think that consistency is needed amongst all teams, because sometimes that point could make a difference in team standings, so I think that it is incredibly important to be consistent with either option, breaking ties or not breaking ties.
2021 Events:
Dynamic Planet, Fossils, Geocaching, Geologic Mapping, Water Quality,

Event Volunteering:
- Rickards Fossils C Writer/ES
- SOLVI Dynamic Planet C Cowriter/ES
- Socorro Dynamic Planet B Cowriter/ES
- River Hill Dynamic Planet C Writer/ES
- Menomonie Dynamic Plant B Writer/Grader

knightmoves
Member
Member
Posts: 319
Joined: April 26th, 2018, 6:40 pm
Has thanked: 2 times
Been thanked: 27 times

Re: Musings on Test Length

Post by knightmoves » January 14th, 2021, 11:42 am

Another thought on test length - what is the effect of Monkey noise?

Monkey noise is what I call the effect of random guessing (as by an infinite number of monkeys) on multiple choice tests. I imagine most people make sure to spend the last 30s of the test filling in answers for the questions they didn't get to (whether you do all C, or random, or whatever). So on average, if you have N left-over questions at the end of the test, you expect to score N/5 (assuming 5 answers on the multiple choice), with sigma = sqrt (4N/25). So if your very-long test has 100 extra multiple choice questions that teams guess at, you're adding random Monkey noise of +/- 4 points to the score. Which means that if two teams score within about 4 points of each other, you can't really say which one did better.

If the test isn't multiple choice, monkey noise isn't an issue, because nobody is likely to randomly guess the right answer.
These users thanked the author knightmoves for the post (total 2):
Mr.Epithelium (January 14th, 2021, 3:26 pm) • sneepity (January 15th, 2021, 9:06 am)

User avatar
Unome
Moderator
Moderator
Posts: 4285
Joined: January 26th, 2014, 12:48 pm
Division: Grad
State: GA
Has thanked: 181 times
Been thanked: 59 times

Re: Musings on Test Length

Post by Unome » January 15th, 2021, 12:18 pm

knightmoves wrote:
January 14th, 2021, 11:42 am
Another thought on test length - what is the effect of Monkey noise?

Monkey noise is what I call the effect of random guessing (as by an infinite number of monkeys) on multiple choice tests. I imagine most people make sure to spend the last 30s of the test filling in answers for the questions they didn't get to (whether you do all C, or random, or whatever). So on average, if you have N left-over questions at the end of the test, you expect to score N/5 (assuming 5 answers on the multiple choice), with sigma = sqrt (4N/25). So if your very-long test has 100 extra multiple choice questions that teams guess at, you're adding random Monkey noise of +/- 4 points to the score. Which means that if two teams score within about 4 points of each other, you can't really say which one did better.

If the test isn't multiple choice, monkey noise isn't an issue, because nobody is likely to randomly guess the right answer.
That's part of the reason why I phased out of writing multiple choice for the most part on my tests (that and multiple choice takes an absurd amount of time to write).
Userpage
Chattahoochee High School Class of 2018
Georgia Tech Class of 2022

Opinions expressed on this site are not official; the only place for official rules changes and FAQs is soinc.org.

User avatar
BennyTheJett
Exalted Member
Exalted Member
Posts: 391
Joined: February 21st, 2019, 2:05 pm
Division: C
Pronouns: He/Him/His
Has thanked: 83 times
Been thanked: 171 times

Re: Musings on Test Length

Post by BennyTheJett » January 15th, 2021, 12:30 pm

Unome wrote:
January 15th, 2021, 12:18 pm
knightmoves wrote:
January 14th, 2021, 11:42 am
Another thought on test length - what is the effect of Monkey noise?

Monkey noise is what I call the effect of random guessing (as by an infinite number of monkeys) on multiple choice tests. I imagine most people make sure to spend the last 30s of the test filling in answers for the questions they didn't get to (whether you do all C, or random, or whatever). So on average, if you have N left-over questions at the end of the test, you expect to score N/5 (assuming 5 answers on the multiple choice), with sigma = sqrt (4N/25). So if your very-long test has 100 extra multiple choice questions that teams guess at, you're adding random Monkey noise of +/- 4 points to the score. Which means that if two teams score within about 4 points of each other, you can't really say which one did better.

If the test isn't multiple choice, monkey noise isn't an issue, because nobody is likely to randomly guess the right answer.
That's part of the reason why I phased out of writing multiple choice for the most part on my tests (that and multiple choice takes an absurd amount of time to write).
I just never wrote MC to begin with :oops: . If I need something like that, I've just written Fill in the Blanks.
2021 Events:
Dynamic Planet, Fossils, Geocaching, Geologic Mapping, Water Quality,

Event Volunteering:
- Rickards Fossils C Writer/ES
- SOLVI Dynamic Planet C Cowriter/ES
- Socorro Dynamic Planet B Cowriter/ES
- River Hill Dynamic Planet C Writer/ES
- Menomonie Dynamic Plant B Writer/Grader

knightmoves
Member
Member
Posts: 319
Joined: April 26th, 2018, 6:40 pm
Has thanked: 2 times
Been thanked: 27 times

Re: Musings on Test Length

Post by knightmoves » January 15th, 2021, 12:54 pm

Unome wrote:
January 15th, 2021, 12:18 pm
That's part of the reason why I phased out of writing multiple choice for the most part on my tests (that and multiple choice takes an absurd amount of time to write).
I am told that scilympiad encourages multiple choice questions (by auto-grading them, but not successfully auto-grading any other kind of question). It seems as though I've seen more multiple choice this year than normal.

In a paper competition, multiple choice has the advantage of being gradable by non-experts, whereas even fill-in-the-blank questions often have answers with reasonable synonyms. My preference are multiple step calculation type questions, but those are basically impossible to mark by people who aren't subject experts.

User avatar
PM2017
Member
Member
Posts: 524
Joined: January 20th, 2017, 5:02 pm
Division: Grad
State: CA
Has thanked: 23 times
Been thanked: 12 times

Re: Musings on Test Length

Post by PM2017 » January 15th, 2021, 5:08 pm

One solution to the MC section would be to implement a random guess penalty? so that on average, random guessing yields a 0 score? This has its own issues though.

I think regardless that MCQs definitely have a place in scioly exams (I tend to make my tests 20-30% MC, and a smaller percentage when you weigh the point values). I think especially for casual teams, MCQs are a lot more encouraging than other forms of questions. I say this, despite absolutely despising writing MCQs, and being ambivalent to actually doing MCQs on tests.
knightmoves wrote:
January 14th, 2021, 11:42 am
So if your very-long test has 100 extra multiple choice questions that teams guess at, you're adding random Monkey noise of +/- 4 points to the score. Which means that if two teams score within about 4 points of each other, you can't really say which one did better.
I'm pretty confident you will almost never see a test where there are 100 MCQs that people will randomly choose. And, if you only have maybe 20 MCQs on your exam, I think other random factors that we can not control will be a bigger influence here. (The biggest being the choice of the specific subject matter on the exam. I know the counter-argument is to simply prepare for anything, but this is a) unrealistic for casual teams and b) still open to random chance, because it is almost certain that a competitor will be equally comfortable with each subtopic -- the exception being 0% familiarity lol).
West High '19
UC Berkeley '23

Go Bears!

knightmoves
Member
Member
Posts: 319
Joined: April 26th, 2018, 6:40 pm
Has thanked: 2 times
Been thanked: 27 times

Re: Musings on Test Length

Post by knightmoves » January 15th, 2021, 5:54 pm

PM2017 wrote:
January 15th, 2021, 5:08 pm
One solution to the MC section would be to implement a random guess penalty? so that on average, random guessing yields a 0 score? This has its own issues though.
That doesn't reduce the noise. There are tests that do this (so you expect a monkey to score 0, rather than N/5). Typically you score 4 for a correct answer and -1 for a wrong one, but you're just scaling the binomial distribution by a factor of 4 and offsetting it - you don't reduce the width of the distribution.

You can introduce a harsher guess penalty (so you expect monkeys to get a negative score) to persuade people not to guess, which would reduce the noise because people wouldn't guess, but I was fairly sure I'd found somewhere that negative scores was against SO policy.
PM2017 wrote:
January 15th, 2021, 5:08 pm
I'm pretty confident you will almost never see a test where there are 100 MCQs that people will randomly choose. And, if you only have maybe 20 MCQs on your exam, I think other random factors that we can not control will be a bigger influence here.
In some of the "very long test" discussions, we were getting close to that. And I agree that there's an element of luck in whether the ES chooses to test topics that you're good at, or less good at, that's a different kind of random. If you scored well because the questions were on your pet topics, you really did do well on that test. If you scored well because you threw seven sixes in a row at the end of the test, you were the beneficiary of pure random chance.

User avatar
BennyTheJett
Exalted Member
Exalted Member
Posts: 391
Joined: February 21st, 2019, 2:05 pm
Division: C
Pronouns: He/Him/His
Has thanked: 83 times
Been thanked: 171 times

Re: Musings on Test Length

Post by BennyTheJett » January 20th, 2021, 7:26 am

knightmoves wrote:
January 15th, 2021, 5:54 pm
PM2017 wrote:
January 15th, 2021, 5:08 pm
One solution to the MC section would be to implement a random guess penalty? so that on average, random guessing yields a 0 score? This has its own issues though.
That doesn't reduce the noise. There are tests that do this (so you expect a monkey to score 0, rather than N/5). Typically you score 4 for a correct answer and -1 for a wrong one, but you're just scaling the binomial distribution by a factor of 4 and offsetting it - you don't reduce the width of the distribution.

You can introduce a harsher guess penalty (so you expect monkeys to get a negative score) to persuade people not to guess, which would reduce the noise because people wouldn't guess, but I was fairly sure I'd found somewhere that negative scores was against SO policy.
PM2017 wrote:
January 15th, 2021, 5:08 pm
I'm pretty confident you will almost never see a test where there are 100 MCQs that people will randomly choose. And, if you only have maybe 20 MCQs on your exam, I think other random factors that we can not control will be a bigger influence here.
In some of the "very long test" discussions, we were getting close to that. And I agree that there's an element of luck in whether the ES chooses to test topics that you're good at, or less good at, that's a different kind of random. If you scored well because the questions were on your pet topics, you really did do well on that test. If you scored well because you threw seven sixes in a row at the end of the test, you were the beneficiary of pure random chance.
tHiS Is WhY wE nEeD VeNn DiAgRaMs
2021 Events:
Dynamic Planet, Fossils, Geocaching, Geologic Mapping, Water Quality,

Event Volunteering:
- Rickards Fossils C Writer/ES
- SOLVI Dynamic Planet C Cowriter/ES
- Socorro Dynamic Planet B Cowriter/ES
- River Hill Dynamic Planet C Writer/ES
- Menomonie Dynamic Plant B Writer/Grader

Post Reply

Return to “Alumni”

Who is online

Users browsing this forum: No registered users and 1 guest