03.12.04 ARE Score Reporting
“Measure twice, cut once.”
If you or your employees took any of the six multiple-choice divisions of the ARE after February 2, 2004, the results will be delayed until late-April.
Instead of receiving score reports, hundreds of young professionals have been issued a letter stating their scores are being delayed. The delay is due to NCARB’s recent update of the ARE, and the need to re-establish and confirm an accurate level of performance for this new version.
2. NCARB Scoring Process
3. Candidate Frustrations
5. The Bell Curve
6. Size Matters
8. Other Professions
9. Additional Resources
Over the past few weeks, we’ve received numerous emails from ARE candidates around the country, expressing frustration and asking ArchVoices to investigate and publicize the reasons for this delay. When calls to five state licensing boards last week resulted in three different responses about the delay in score reporting, we felt it important to try to provide some clarity for ourselves, our readers, and affected ARE candidates.
Today’s issue is our best explanation of NCARB’s recent actions, as well as a sample of the comments sent to us or posted online by affected ARE candidates. Some people will surely assume that NCARB is wrong again, and others will think that interns are just complaining again. We believe that most candidates are ultimately frustrated with inconsistent or insufficient communication, rather than with the score reporting delay itself. And, if after reading today’s issue you’re not convinced, feel free to email us at email@example.com with any lingering questions.
However, after significant research, we’ve come to believe that NCARB’s process of identifying a minimum level of performance on the ARE is worth publicizing and applauding. NCARB is not “holding” candidate scores–rather, there are no scores to report until what is called a cut score study has been completed. We think even those candidates most affected by the delay may actually appreciate NCARB’s cut score process, once it’s described to them. So here it is.
2. NCARB Scoring Process
From conversations with NCARB, the Educational Testing Service (ETS), and a handful of other testing and certification organizations, we have discerned this is how the so-called “Modified Angoff Method” works:
To begin the process, six panels of 13-15 “experts” (NCARB-certified architects) will convene for two days between March 26-29 to review samples of test questions in the areas to be tested. Each panel focuses on only one division, and the two-day process includes significant training for these panelists (these are supposed to be experts in architecture, not in the cut score process). After training, panelists are instructed to imagine interns who have minimal competency in the particular subject area, and they are given the opportunity as a group to discuss and solidify their conceptions (though not necessarily to agree). Then, the panelists take the actual exam division themselves and are asked to estimate, for each question in a particular exam, the percentage of minimally-qualified interns who they would expect to answer the question correctly. The results of those estimates are then compiled and discussed among the panel as a whole. The main process is then repeated a second time, when panelists can review their original individual determinations in light of comments from the group. NCARB works to ensure that the demographics of the panelists are diverse in age, experience, geographic region, ethnicity and gender. As NCARB Director of Examinations, Stephen Nutt, AIA, said, “It’s not all 50-year old white men sitting around a table doing this.”
While this process sounds fairly subjective and informal, it is used for an incredibly wide range of licensing and other examinations that focus on setting passing thresholds, rather than ranking examinees along a continuum of ability. (“The Angoff procedure is one of the most widely used methods for setting passing scores, because it is simple to perform and deals directly with estimates of minimal competence for test content” (Source: “A Comparison of the Angoff, Beuk, and Hofstee Methods for Setting a Passing Score,” by J. J. Bowers & R. R. Shindoll, 1989).
The original Angoff Method was developed by the late ETS Distinguished Research Scientist William Angoff. No such methodology is perfect, and the Angoff Method is no exception. Modifications to the Angoff Method that respond to some of the subsequent criticism include: 1) providing extensive training for the panelists, 2) providing panelists with actual testing data, and 3) including more than one opportunity for panelists to estimate examinee performance (see “A Comparison of Cut Scores using Multiple Standard Setting Methods,” James C. Impara and Barbara S. Plake, June 2000). NCARB’s cut score method includes all three of these important modifications, which is in part why they needed actual candidate scores to complete the process (described in more detail below).
3. Candidate Frustrations
“I took Lateral Forces on February 2 and have been going nuts checking my mailbox every day. It would have been nice if [NCARB] had told us this when we sat for the exam. It’s not like we wouldn’t have taken it at that point, and it would have saved a lot of anxiety wondering where the scores were.”
–Posted on the web
“I was planning on taking my first exam next Tuesday and really would prefer to not be part of a test study.”
–ARE candidate in Texas
“Seven weeks after [taking a multiple-choice] exam and where the **** is the result? I tried to call NCARB, Chauncey, Prometric…no use. They say ‘wait’ because it could take as much as 7-8 weeks at present. Has this happened to anyone else too?”
–Posted on the web
“I’ve done my multiple-choice exams, but I think that this treatment is unprofessional and deserves some reaction.”
–Posted on the web
“I took a test a few weeks ago and I am taking my final test in a few weeks…. Now I have to wait even longer [to get licensed].”
–ARE candidate in California
“It is totally amazing that [NCARB is] finding new ways to torture us. The worst part is that they don’t even want anyone to know about it!”
–ARE candidate in Michigan
“We’ve been told in the past that trial test questions are thrown in during the exam for future evaluation, if this was done prior to ARE 3.0, then why must they evaluate the new exam?”
–Posted on the web
“NCARB must not have thought it important that intern architects are granted and denied promotions this time of year as well as considered for annual bonuses in their firms based on passing the ARE. They must also have little concern that many interns are reimbursed by their firm for the exam fees only after notice of passing and that by withholding exam scores NCARB is delaying their reimbursement.”
–ARE candidate in North Carolina
“Unfortunately, we find ourselves in a ‘Catch-22.'”
–email from NCARB to state board administrators
In order to the execute cut score study described above, NCARB needs a sufficient number of exams to produce a valid sample. According to Mr. Nutt, NCARB was concerned that if they told candidates there would be a delay ahead of time, enough candidates might choose not to test that the entire process would be delayed further. And because the NCARB Board must officially certify the results of the cut score study, if those results weren’t available by the board’s April meeting, everyone would have had to wait until the board’s June meeting for exam results–an additional two months.
Some candidates suggested to us that perhaps NCARB could have offered financial or other incentives for candidates to participate in the cut score process voluntarily. We agree that an arrangement of this sort seems like a great way to engage candidates in the process. However, we also appreciate that NCARB was concerned about the potential for such an inducement to skew the sampling of candidates and thus taint the results. Though it doesn’t seem like a big difference to us, it also doesn’t seem wholly illogical to us either. And we’re not being advised by professional exam developers with PhD’s who do this for a living.
While you may disagree about whether candidates would have actually delayed taking their exams as a result of this eight-week delay in reporting, the whole point is that there was no way to know for sure. Very few people may actually have registered early to avoid the delay. Very few others may have chosen to take a graphics division not subject to the delay. And even fewer might have waited. In the aggregate, the total difference might still be minimal, but the performance standard set next month will likely apply to candidates for the next 6-8 years.
As they say in the woodshop, “Measure twice, cut once.”
5. The Bell Curve
A number of ARE candidates were concerned that NCARB was essentially not using a cut score process at all, but actually setting a curve for the ARE based on their preferred pass rate. Pass rates on four of the six multiple-choice divisions have increased consistently over the past five years. If the cut score process is based solely on a consensus among expert panelists, it logically shouldn’t require actual candidate scores at all.
This concern seemed especially plausible to us and was one of the particular reasons we kept asking questions. However, after asking a lot of those questions, we are convinced that there is no “curve” at all–and not just because NCARB’s Director of Examinations repeatedly emphasized that there isn’t.
We learned from sources unrelated to NCARB or the ARE that the use of candidate exams in the cut score process is actually a specific response to particular criticisms of the original cut score methodology itself (see the Bowers & Shindoll article quoted above if you really want to read a 30-page scientific paper about exactly how it is done and why). This improvement is one of the reasons it’s called the Modified Angoff Method.
6. Size Matters
Finally, one of the continuing challenges of the architecture profession is that we are a relatively tiny profession. Mr. Nutt mentioned that this fact affects the ability of NCARB to conduct a cut score study quickly. The nursing profession, for instance, has 200,000 candidates per division each year. The architecture profession has just over 3,000 candidates per division per year. NCARB determined that they need six to eight weeks of candidate examinations to provide enough data for the cut score study. The nursing profession tests as many candidates every day as the architecture profession does in six weeks.
The main disagreement seems to come down to whether you think that candidates would or would not have delayed taking the exam had they been told in advance about the delay. While we agree that it sounds extreme to think a lot of candidates would have postponed their exam process for this, we also note all the comments sent to us and posted online about what an inconvenience this delay is. While we note that much of that inconvenience stems precisely from not knowing ahead of time, and that it does seem a bit counterintuitive to assert that people concerned with delay would have chosen to delay even longer if given the option, it also does seem that at least some candidates were concerned about not being “part of a test study” altogether.
Either way, NCARB decided to opt for guaranteed results and a shorter delay for more candidates, rather than take the chance of questionable results and a longer delay for (perhaps) fewer candidates. That’s the “Catch-22” they saw for themselves. But there actually wasn’t a Catch-22 at all. The whole point is that the choice was clear: inconvenience a specified number of candidates, or maintain the highest integrity for the ARE itself. Sorry guys, but NCARB made the right choice. The only real dilemma was how they were going to communicate that choice to affected candidates.
Affected candidates were notified by letter on an individual basis about the delayed score reporting approximately three weeks after their test. We have linked to a PDF version of that letter here. The letter is addressed to “Dear Candidate” and says that the passing threshold must be re-established and confirmed, and so “the scoring of the multiple-choice division that you recently took must be delayed until the end of April 2004…. We apologize for this necessary inconvenience.”
We agree that this delay was a “necessary inconvenience,” but in communicating this inconvenience NCARB made no attempt to explain why it was necessary or to educate candidates about what in the end is a very interesting and credible process. And in a time when all of us get very personalized letters from unknown student loan consolidation companies and well-known presidential candidates, surely someone at NCARB can run a mail merge.
If NCARB’s decision was to apologize to individual candidates one at a time, then they would have done well to embrace that process. Let people know that you feel their pain; explain why this really was necessary. “And for more information on the scoring process, call Stephen or Julia.” That’s what happened anyway, from what we can tell. If we are to accept that a subtle shift in candidate testing might have changed the outcome of NCARB’s entire cut score study, we should be able to assert that a subtle shift in NCARB communication might have changed the responses of the candidates’ themselves.
As we said earlier, we called five state boards and got three different answers about when test results would be available. While we generally feel that it’s important to be willing to name names so that real people can discuss real issues, in this case we feel that identifying the specific boards would only serve to unfairly focus attention on five boards, when we could easily have called those same boards the following day and gotten a completely different set of answers (we’ve done that too). The point is a much broader one, about the lack of uniformity and consistency of information available to students, interns, and ARE candidates.
The most accurate information, it turns out, was posted on random internet chat rooms and shared via email and phone calls among local ARE study groups. At least a few state licensing boards gave out (and presumably continue to give out) inaccurate or incomplete information about what happened. Hopefully this relatively minor conflict will serve to highlight the importance of sharing information publicly, not just for disseminating information, but also for ensuring consistency.
When we posed as an affected candidate and spoke with an NCARB customer service employee, she said that she “wouldn’t characterize this as a ‘surprise,’ but rather as part of ‘the natural course of business.'” Of course, many candidates who eagerly anticipated their scores and instead got a notice one day that they would have to wait two more months do in fact think of this news as a “surprise.” What is unfortunate is that “surprise” for many people actually does characterize “the natural course of business” with NCARB. NCARB may deny this, or they may point to areas where they are genuinely improving in this regard, but the reality is that NCARB does not publicly share a great deal of information.
8. Other Professions
We also asked Mr. Nutt about the possibilities of taking advantage of the computerized examination process to shorten the average response time for candidates going forward (currently 2-3 weeks for the multiple-choice divisions, and 4-6 weeks for the graphic divisions). Additionally, we asked about the possibility of providing candidates in the future with more information than pass, fail or almost pass. Both of these concerns were raised by respondents to the 2003 Internship & Career Survey, so we followed up.
NCARB has no current plans to either shorten the response time or provide more comprehensive information to candidates. However, NCARB’s existing response time is already relatively good (see chart below, with the professional examinations ranked by speed of response). Of particular note, the newly-computerized exam for landscape architects allows candidates in nine states to get their scores directly from CLARB on the internet. This is also a good place to emphasize that most state boards still require that scores go to the candidates directly from them, thus making it difficult for NCARB to expedite the process alone (of course, the state boards comprise NCARB, so they aren’t alone).
As for providing diagnostic information to candidates who fail a particular division of the exam, it was hard to tell how much information other professional examinations provide. Almost every professional examination that we surveyed provides some such information, and only a few were clearly more extensive than others.
9. Additional Resources
We should warn you that some of this stuff is incredibly dense. Enter at your own risk.
National Organization for Competency Assurance (NOCA)
Established in 1977, the National Organization for Competency Assurance (NOCA) is the leader in setting quality standards for credentialing organizations. Through its annual conference, regional seminars, and publications, NOCA serves its membership as a clearinghouse for information on the latest trends and issues of concern to practitioners and organizations focused on certification, licensure, and human resource development. NOCA’s membership is composed of over 300 credentialing organizations, testing companies, and individual professional development consultants, including NCARB.
Educational Testing Service
Major research and testing organization. Website features pages on college and university guidance, testing, and research.
American College Testing (ACT) Research
ACT is an independent, not-for-profit organization that provides more than a hundred assessment, research, information, and program management services in the broad areas of education and workforce development. Though designed to meet a wide array of needs, all ACT programs and services have one guiding purpose: to help people achieve education and career goals by providing information for life’s transitions. Research topics include test reliability and validity, the meaning of test-score differences, racial/ethnic achievement differences, gender bias, occupations, interests, and related subjects.
inTASC is a not-for-profit research group that works collaboratively with schools, educational agencies, and businesses to conduct research and development on a variety of issues related to technology and assessment. inTASC brings together researchers who have examined several aspects of technology and assessment in schools over the past decade to focus on new questions and issues that arise from the field. inTASC is housed in the Center for the Study of Testing, Evaluation and Educational Policy and the Lynch School of Education at Boston College.
Practical Assessment, Research and Evaluation (PARE) Journal
Practical Assessment, Research and Evaluation (PARE) is an on-line journal supported, in part, by the Department of Measurement, Statistics, and Evaluation at the University of Maryland, College Park. Its purpose is to provide education professionals access to refereed articles that can have a positive impact on assessment, research, evaluation, and teaching practice, especially at the local education agency (LEA) level.
Center for Research on Evaluation, Standards, and Student Testing
For more than 36 years, the UCLA Center for the Study of Evaluation (CSE) and, more recently, the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) have been on the forefront of efforts to improve the quality of education and learning in America. Located within UCLA’s Graduate School of Education & Information Studies, CSE/CRESST has pioneered the development of scientifically based evaluation and testing techniques, vigorously promoting the accurate use of data, test scores, and technology for improved accountability and decision making.
As always, we welcome your thoughts by email at firstname.lastname@example.org.
ArchVoices is an independent, nonprofit organization and think tank on architecture education and internship…
To unsubscribe from ArchVoices newsletter, click here.