Little late to the party, but I wrote the VA Dynamic Planet regs/states test this past year and
saw a pretty even spread at states. To put some numbers behind it, the mean=59/115 with the high=87/115, low=23/115, stdev=18, and 3/24 teams within 1 point of each other. I would've liked to see a higher mean and a lower stdev, but I don't think this spread is horrific by any means. I'm not a subject matter expert or veteran test writer, but I thought I'd throw in my 2 cents.
1) If you want to obtain a good spread, you need to have a few basic questions that level the playing field for everyone, and then gradually add questions that force students to problem solve or think outside the box. Adding "basic questions" is good because it allows everyone to (hopefully) get 10-20% of the total points, and teams won't be frustrated by a test entirely composed of curveballs/overly difficult/abstract questions.
2) Last year, DP had a focus on problem solving, and I fully embraced that for 2 reasons. First, the majority of DP tests I've taken/seen are incredibly long multiple choice tests with a few short answers mixed in. I always hated these kinds of tests as a competitor because they were tiring, boring, and didn't really enhance my problem solving ability or thought process--especially compared to the engineering events, which I also competed in heavily. Second, I believe that a team that can demonstrate creativity in answering some pretty tough, open-ended problems deserves to win over a team that memorized a bunch of facts. This philosophy has been beaten into me by my professors and I really agree. On a college midterm, you'll probably get lots of partial credit on a question if you get the wrong numeric answer but have the right logic. I try to make my tests follow a similar style: points are awarded for a solid thought process, not necessarily the correct answer.
I made my test like an Astro test; each group of questions required each team to analyze/interpret 1-3 images, primarily graphs of things like tectonic plate location over time and relative sea level (RSL) changes over time, etc. Instead of asking them what a RSL curve meant, I asked them to extrapolate how many inches it would rise by 2020 or made them justify if an impactful geologic event that occurred at the same time as a dip in the RSL directly caused the RSL drop or if the timing was just a coincidence. In another test, I gave students a set of tectonic plate movement-related data points which included some bogus data. I asked them to throw out whichever data points they thought were bogus, then construct a best-fit line based on the remaining data points and justify why they threw out their removed data. These questions are hard because they're unexpected to the average competitor; students aren't exactly used to doing these kinds of tasks. They also roll multiple concepts into 1 question, which is what someone else in this thread said (USMLE questions I think?). I strayed from asking calculation questions that just use a formula, because that doesn't really prove that you can problem solve. Astro tests can be guilty of this, so if I were to write an Astro test, this is something I would be wary of. (not trying to shade anyone who's written an astro test by any means!)
3) Pros of writing a problem solving-based test: you really get to see who knows what. Depending on what kinds of questions you ask, you get to look at real data sets and that's pretty awesome. Pretty solid distribution because not every team will be able to problem solve as effectively as others.
4) Cons of writing a problem solving-based test: takes a long time to write. Can be frustrating for students to take and hard for the ES to balance questions; I asked teams for feedback after regs/states and the most common complaint was something like "I felt like I didn't need to know anything about DP at all; I just needed to know was how to understand graphs." This is more of a criticism for me and the type of questions I asked (maybe I didn't balance the type of questions well enough), but at least people are acknowledging that they need some form of knowledge other than simply recalling information to do well. Grading also takes significantly longer than a predominantly MC test.
5) Balance is hard and takes experience. You may think your test is incredibly easy, but the test scores might prove otherwise. I had the same "problem solving-style" test at both regs and states; the regs scores were significantly lower than the states scores despite being "easier," just because most teams anticipated a simple MC/short answer test (VA DP tends to use that format). I told students that the states test would be similar in style, and I think telling them that helped them study more effectively.
6) Write a "question bank" and then delegate problems to the regs/states test. When I was caught off-guard by the low regs scores, I had a pool of questions of varying difficulty. I looked at which regs problems gave students the most trouble and threw out similar questions in the bank. The point is, you never know how students are going to perform, so it's better to have a range of questions already written and ready to go than to be shell shocked by what you thought was an "easy" regs test and have to rewrite the entire states test because you made it harder than the regs test. This was incredibly useful because I was really busy with college stuff in the spring semester when I wrote the states test.
Sorry for the massive word vomit haha but I have some pretty strong opinions on how I think a scioly test should be written when the rules stipulate an emphasis on problem solving. A few students really disliked my test because it's hard to prepare for and it seemed like some questions came out of left field, but I genuinely think that if you want to obtain a good spread, this is one way to do it. This handles the difficulty vs accessibility dilemma fairly well, since you don't need advanced resources to study for a problem solving-focused test. And because problem solving tests are so wide in scope, you can really customize the difficulty. Regs is a great place to test out a problem solving-style test and your test writing style in general. Invites are even better, but regs tests hold more weight since most teams attend multiple invites and might not think that your test is representative of the "norm." It's definitely something that comes with experience, but if you write a "bad test," it's a learning experience for all parties. Have fun with it, especially since you're writing for an event you're passionate about!