By Marko Velikonja
Every Foreign Service career is different, yet one experience remains shared among Foreign Service officers: the Foreign Service Institute’s (FSI’s) language test. Located on the fourth floor of Building F in the George P. Shultz National Foreign Affairs Training Center in Arlington,Va., FSI’s Language Testing Unit (LTU) conducts more than 4,000 language tests annually. LTU’s mission is to provide oversight of language tests administered to foreign affairs professionals. These language test scores are used to determine assignments, tenure, promotion, and language incentive pay, among other purposes.
LTU is comprised of three sections: Operations, Training, and Quality Assurance. LTU conducts tests in approximately 70 languages each year, with roughly 80% of tests administered in Arabic, Chinese (Mandarin), French, Portuguese, Russian, and Spanish. Most tests are conducted by 20 full-time raters; raters are selected from a pool of nearly 250 instructors from FSI’s language sections. Testing specialists oversee the training and certification of raters, conduct quality control (QC) assessments of tests, rescore tests as necessary, and provide feedback to raters.
A two-person team conducts the FSI language test: an examiner, who interacts with the examinee in English, and a tester, who interacts in the language being tested. Tests consist of two modalities—speaking and reading—scored on a scale ranging from zero to five established by an interagency group of language specialists known as the Interagency Language Roundtable (ILR). Testing teams use a rubric known as Skill Level Descriptions to guide scoring of tests based upon what an examinee is able (or unable) to accomplish.
The COVID-19 pandemic prompted significant changes in language testing. With social distancing ruling out traditional in-person tests, the School of Language Studies temporarily replaced the LTU test for end-of-training evaluation with Assessment by Observation (ABO) in March 2020. This means that students are now evaluated by their instructor and their language training supervisor. However, LTU continues to use the regular test for non-training examinees. Since January 2021, it conducts most tests by online video conference. Those that involve reading must be conducted at FSI or from an overseas post to keep reading materials secure. Surveys show high user satisfaction with testing by video conference, and LTU expects to retain this testing option post-pandemic.
Examinees sometimes express concern about the fairness of the FSI test. Native and heritage speakers sometimes feel it does not properly assess their language skill, and others are concerned about potential bias based on accents, language variety, or where the testing team or examinee studied. FSI has undertaken a year-long effort to examine training and testing of heritage speakers to ensure that they are tested fairly, their skills are recognized, and other particular challenges are addressed. LTU takes the concerns of all examinees seriously and strives to ensure fairness in the testing process and requires all testing staff to take training aimed at mitigating unconscious bias.
LTU also has a set of QC measures to ensure consistency in test scoring. This includes test review, and a quality management program. In any given year, about 8% of examinees request a review of their test. Using a recording of that examinee’s test, a new team, guided by a testing specialist unaware of the original score, conducts a separate rating of the test. If the new team awards a higher score, the testing specialist will meet with the two teams to establish a final score.
In 2015, LTU introduced a quality management program, where testing staff re-score a random selection of tests in the six high-volume languages: Arabic, Chinese (Mandarin), French, Portuguese, Russian, and Spanish. As part of this program, there are additional steps that are conducted for approximately 20% of tests. One is a third rating, where a rater re-scores a randomly assigned test. In cases of disagreement, a testing specialist conducts a QC assessment. There are plans to expand this activity from LTU raters to testing staff in the language sections, and to third-rate 50% of all tests. Another QC measure is where a testing specialist reviews a test—either through random selection or if a question has been raised about its administration during a third rating—and evaluates both the examinee and the performance of the testing team. If the testing specialist awards a higher score, the examinee will receive that score.
In fiscal year 2021, approximately 60% of speaking test reviews and 67% of reading reviews confirmed the original score. For the randomly selected tests, more than 85% of speaking third ratings and QCs and more than 90% of reading third ratings and QCs confirmed the original score. These agreement rates, and other statistical analyses, meet or exceed industry standards, such as those set by similar testing organizations and the American Council of Education, which offers academic credit for FSI tests in high volume languages. The high rate of agreement between independent evaluators also confirms that, in most cases, testing teams have correctly followed the guidelines in awarding their scores.
LTU continues its efforts to make testing more efficient and a better experience for the examinee. The introduction of non-concurrent testing in 2016 made it possible to test in a single modality following a full test; previously all examinees had to retake an entire test to get a higher score in one modality. The next step is an exploration of full single-modality testing. Other goals include the introduction of an online reading test, which was piloted in 2020.
Language and its use continue to evolve and LTU strives to keep up with those changes. LTU participated in a seven-year effort to update the ILR Skill Level Descriptions in order to better reward language skill as used by Foreign Service officers in the field. LTU is currently training raters in the new standards, which maintain the familiar ILR benchmarks but add context to make the benchmarks clearer and more useful, in anticipation of incorporating them into testing by the middle of 2022.
FSI also commissioned a consensus study from the National Academies of Science, Engineering, and Medicine. Published in early 2020, the report, “A Principled Approach to Language Assessment,” identified potential reforms to the content and administration of the FSI test. Drawing from the report, in early 2021, a task force led by then-School of Language Studies Dean Michael Ratney proposed recommendations to reform the test. FSI has since begun the process to design a new test that better meets the needs of the modern Foreign Service and hopes to introduce the new test by late 2022. For more information about the FSI Language Test Reform project, visit their SharePoint site (Internal link).
Marko Velikonja is deputy director of the Foreign Service Institute’s Language Testing Unit.