Musings on Test Length

Post by **SilverBreeze** » January 6th, 2021, 5:42 pm

knightmoves wrote: ↑January 6th, 2021, 11:00 am
SilverBreeze wrote: ↑January 5th, 2021, 11:53 am I still believe long tests accomplish the same anti-cheating goal as giving one question at a time - they just also give students the ability to manage their own time and do the questions they want to do.
I suppose whether you like long tests or not depends on whether you think getting the correct answer 10s faster than someone else is a particularly valuable thing. Like East said earlier, long tests emphasize speed. I think it's a good thing to get answers fast, but it's not the only thing. I'd rather have a test with progressively more difficult questions that tested whether people could answer the hard ones than a test that tested how fast they could answer easier questions.

One question at a time with a fixed time-per-question is almost the opposite of long tests. As long as you can complete the question in the time available, it doesn't reward speed at all, but does reward accuracy and being able to answer hard questions.

(On the subject of anti-cheating, there are plenty of technological ways to prevent people from cheating in online tests, but none of them are compatible with the goal of allowing people to compete from whatever device they have (which might be a school-issue chromebook, or might be a personally-owned device).

I was addressing one-question-at-a-time in the context of AstroClarinet's post, which argued that single questions with short time constraints are one way to discourage cheating. (I don't disagree, I just think it's suboptimal.) One-question-at-a-time with short time length seems just as strong an emphasis on speed to me.

There are plenty of technological ways... that can't be used because they limit test accessibility and invade privacy. Long tests prevent cheating, BUT are non-invasive and, as far as I know, are accessible from a school laptop.

"I think it's a good thing to get answers fast, but it's not the only thing. I'd rather have a test with progressively more difficult questions that tested whether people could answer the hard ones than a test that tested how fast they could answer easier questions."

I am tired of the implication that all long tests are easy questions that do not require thinking. Why can long tests not have critical thinking?

With larger competitions, you WILL get teams of equal knowledge. Higher speed with the same accuracy/critical thinking should be rewarded when there would otherwise be a tie. Building speed takes time and practice. I agree speed shouldn't be the only thing a competitor is tested on. I am tired of the implication that a long test only tests speed and not critical thinking.

PM2017 · Post by **PM2017** » January 6th, 2021, 6:14 pm

I am inclined to agree with Silver here.

The argument that long tests are bad because they just focus on getting a lot of trivial questions right is (IMO) baseless.
You can absolutely create long tests that become progressively harder as the test goes on (and in fact, this is how I make most of my tests, personally). In fact, this is an argument that comes up repeatedly about the use of computers in the astronomy event. Even before COVID times, competitors were able to use laptops in astronomy, but WiFi was to be turned off. People said that competitors would cheat and use the internet regardless. The thing is, a good astro test is written such that if you have to begin to cheat, you simply won't have the time to actually finish a much of the exam as people who are legitimately taking the test, simply because there is so much content on the exam.

Those who argue that speed is irrelevant are wrong (, again, IMO). The more deeply you understand concepts, the faster you will be able to answer questions (unless of course, it comes down to just writing speed -- not thinking speed-- but again, that normally only happens when the test is full of a bunch of trivialities.)

sneepity · Post by **sneepity** » January 7th, 2021, 5:24 am

I agree too- there are tons of ways you can incorporate critical thinking into a long test. Teams should definetly be able to test all their skills and what they studied on the test.
And with what Knightmoves said- having open internet tests, (excuse me if someone said this already) some teams who aren't well prepared just copy and paste wikipedia articles and other resources into their notes. I agree that tests should be formatted in such a way that make students apply their knowledge onto a question, instead of them scrambling over their notes to find the answer. If the questions were like this, they wouldn't have time to search up specific answers, they'll have to actually think about the question.
And yeah, speed shouldn't be a major factor, teams should be provided more questions than needed.

YouTube · Post by **EastStroudsburg13** » January 7th, 2021, 5:57 am

It feels to me that the rhetoric in these posts about long tests not being dependent on speed are being focused pretty much entirely on higher-level teams that won't be intimidated by an extremely long test that they won't finish, and are also not making any distinction between writing for invitationals versus regionals/states. Ensuring that length is kept in check is of far greater importance at the regional/state level, and those are the tournaments that should be driving a test length discussion, imo.

I am not trying to say long tests are bad in general, but I don't think you should specifically design a test to be finished by nobody, unless you have a giant tournament that basically needs it to differentiate all of the teams. (And realistically, if you have this many teams, you should break it up into divisions. No reason to have 150+ teams in the same pool). If it's long enough for the top teams not to finish it, teams not at that level are going to feel like they're drowning. They are not going to spend time critically thinking about a question because there is too much other test that they see.

There is a good zone for a long test to be in. I think erring on the long side is MUCH better than erring on the short side. But taking that to an extreme is where I tend to draw the line. It is very easy for a non-elite team to feel discouraged if they studied and still can't finish 50-60% of the test.

knightmoves · Post by **knightmoves** » January 7th, 2021, 9:35 am

SilverBreeze wrote: ↑January 6th, 2021, 5:42 pm I am tired of the implication that all long tests are easy questions that do not require thinking. Why can long tests not have critical thinking?

With larger competitions, you WILL get teams of equal knowledge. Higher speed with the same accuracy/critical thinking should be rewarded when there would otherwise be a tie. Building speed takes time and practice. I agree speed shouldn't be the only thing a competitor is tested on. I am tired of the implication that a long test only tests speed and not critical thinking.

Of course long tests can have critical thinking - but for a test that will take a particular team a certain length of time to complete, it either has more questions that are easier, or fewer questions that require more thought. If the questions are multiple choice, the former generates a greater number of possible total scores, so might be better for avoiding ties, but I think the latter makes for a better test.

But I'll admit that I'm biased in favor of more challenging questions, because I find them more interesting (and also prone to make stupid mistakes like missing a double negative or inverting the logic of a question because I'm rushing through it.)

SilverBreeze wrote: ↑January 6th, 2021, 5:42 pm One-question-at-a-time with short time length seems just as strong an emphasis on speed to me.

I don't think that's true. Imagine a set of questions that you'd expect a decent team to take 30s per question, on average, to complete. So you're expecting the team to answer 100 questions in a 50 minute test. Now suppose you have a speedier team, who can average 25s per question. In a normal long test, that team will answer an additional 20 questions.

Now consider the single-timed-questions suggested by AstroClarinet. If you set the time per question to 25s, or even 30s, then the speedier team would answer a lot more questions than the other team, and the other team would rapidly become frustrated at not quite being able to complete a question within the time limit. If you set the time limit to 40s, then both teams will answer all the questions. So then the question is, in this mode, how short does the time have to be to make google-cheating impractical?

I suppose what I'm thinking is that there is a time (let's call it 40s in this example) that is short enough to make it hard to google the answer, but long enough to enable teams who know the answer to answer correctly. And I'm assuming that teams who are less good are going to be able to answer fewer questions right, because of a lack of knowledge or understanding, rather than "would be able to answer all the questions right if they had a minute per question".

It would seem perfectly reasonable, in this model, to use time taken to answer the questions as a tiebreak.

Whether you want to do that depends, at least in part, on how you value being able to answer an extra 20 questions in the time available vs being accurate in your answers to the first 100.

sneepity · Post by **sneepity** » January 7th, 2021, 10:17 am

I feel like that would create many more ties though. Students should be able to answer however many questions they want to, and spend how much time they need on a question as they like.

shadow19 · Post by **shadow19** » January 7th, 2021, 10:57 am

EastStroudsburg13 wrote: ↑January 7th, 2021, 5:57 am It feels to me that the rhetoric in these posts about long tests not being dependent on speed are being focused pretty much entirely on higher-level teams that won't be intimidated by an extremely long test that they won't finish, and are also not making any distinction between writing for invitationals versus regionals/states. Ensuring that length is kept in check is of far greater importance at the regional/state level, and those are the tournaments that should be driving a test length discussion, imo.

I think that as long as the event supervisor makes it clear that the test is not meant to be finished, there's no issue with running a super long test as the test takers shouldn't really be worried about the number of points they get, rather their relative scores. Like the collective average score could appear really low, but it shouldn't really matter as long as a good distribution curve is seen.

I personally also find the super long tests far more enjoyable than a short test. Creating a test where its unlikely that even the top teams may finish gives them the room to choose questions/sections they find most fun/are most confident in, rather than being completely screwed over by one bad station/section. For example, on the Yosemite orni test, I was able to skip a station that were on gen bio/statistics that I thought weren't time-efficient without worrying about falling behind the curve. While its possible that this places a considerable emphasis on test-taking strategies, I'd argue that learning such a test-taking strategy is crucial and doesn't really take much away from the experience.

shadow19 · Post by **shadow19** » January 7th, 2021, 11:13 am

In response to the possibility of making tests open argument, I'd argue that while it may work for more math-based events, for events that are binder heavy it may not work nearly as well since it 1. destroys the hours spent on building a binder which really goes against the events rule play idea and 2. relies heavily on the test writer's ability to create critical thinking questions. Especially for high importance competitions like regionals and states, there are just too many risk factors to justify creating an open internet test. While it may work in the case of MIT where they're probably getting very reputable and hardworking event supps that understand the abilities of their competitors, I can already see how much of a disaster the score distributions would be at states/regionals if they had a similar pool of test writers as the past years

,

YouTube · Post by **EastStroudsburg13** » January 7th, 2021, 2:39 pm

shadow19 wrote: ↑January 7th, 2021, 10:57 am
EastStroudsburg13 wrote: ↑January 7th, 2021, 5:57 am It feels to me that the rhetoric in these posts about long tests not being dependent on speed are being focused pretty much entirely on higher-level teams that won't be intimidated by an extremely long test that they won't finish, and are also not making any distinction between writing for invitationals versus regionals/states. Ensuring that length is kept in check is of far greater importance at the regional/state level, and those are the tournaments that should be driving a test length discussion, imo.

I think that as long as the event supervisor makes it clear that the test is not meant to be finished, there's no issue with running a super long test as the test takers shouldn't really be worried about the number of points they get, rather their relative scores. Like the collective average score could appear really low, but it shouldn't really matter as long as a good distribution curve is seen.

I personally also find the super long tests far more enjoyable than a short test. Creating a test where its unlikely that even the top teams may finish gives them the room to choose questions/sections they find most fun/are most confident in, rather than being completely screwed over by one bad station/section. For example, on the Yosemite orni test, I was able to skip a station that were on gen bio/statistics that I thought weren't time-efficient without worrying about falling behind the curve. While its possible that this places a considerable emphasis on test-taking strategies, I'd argue that learning such a test-taking strategy is crucial and doesn't really take much away from the experience.

I would disagree with this, mostly for the reasons in the post you replied to. It is no issue for high-level teams, and I have already acknowledged this. Mid/lower-level teams, by and large, are not going to have the prior experience to know to immediately skip a section they aren't sure about. Or, alternatively, they could easily be overwhelmed by the sheer size of the test, and not be able to recognize questions that might be more doable. I am not saying that super long tests aren't better than short tests, because they are. But there is a middle ground that a lot of people seem to be ignoring.

Long length > medium length ≥* very long length > short length

*flip this for an invitational or for nationals

Post by **Skink** » January 8th, 2021, 6:19 pm

Interesting discussion. A few points:

1. I signed on to search for thoughts on the (troubling) fact that events seem entirely "open note" this season. What's the point in building a binder or cheat sheet when teams are going to have phones, tablets, or other computers available for looking up answers, a substantial fraction of which will be within reach (especially among the strongest competitors who know what they're looking for and able to find)? The average ES' tests are neither challenging nor lengthy such that I fear my teams are walking into something extremely highly scoring where the differences between scores may very well be determined by the quality of the open notes.

2. Whether questions are objective, subjective, or both, length must be there to separate scores adequately. The rule of thumb is the number of registered teams doubled is the minimum number of points there should be, but even this is limiting depending on the subject matter. The real problem here, as I see it, is the scoring system's demands for breaking all ties. See, at an invite of forty teams, there is no theoretical reason why Skink MS JV3 should score uniquely compared against some other team that came (very) inadequately prepared or not at all. Our goal is to separate the medalists and a few more places from there for the sake of the team score. There is no meaningful difference between a rank of 25 and 30, for example, but the scoring system pretends that there is. Discarding the assumption that low scores need to be separated, we shift focus to medalists, which are less of an issue.

3. I saw someone report a two hundred question test. There should almost never be a two hundred question test because this goes against the program's goals. The goal is for the ES to present a limited number of problems for teams to think critically about and solve; this requirement places a (kind of low) ceiling over how many questions can be completed in fifty minutes. Even the most brilliant and Scioly-famous participants squirm under pressure.

Scioly.org

Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Re: Musings on Test Length

Who is online

Connect

Learn

Get Involved

About

Disclaimer