Skip to main content Skip to navigation

The Simulated Hazardous Operational Tasks Laboratory was created in 2008 with the assistance of the Critical Job Tasks Simulation Laboratory Expansion for WSU Sleep & Performance Research Center (Vila, PI) grant form the US DOD Office of Naval Research under the Defense University Research Instrumentation Program (DURIP).

Developing a Common Metric for Evaluating Police Performance in Deadly Force Situations

A critical lack of scientific evidence about whether deadly force management, accountability and training practices actually have an impact on police officer performance in deadly force encounters, the strength of such impact, or whether alternative approaches to managing deadly force could be more effective. The primary cause of this lack is that current tools for evaluating officer-involved shootings are too coarse or ambiguous to adequately measure such highly variable and complex events. There also are substantial differences in how key issues associated with police deadly encounters are conceptualized, even by subject matter experts, how agencies can or should train for them, and what officers should—or reasonably can—be held accountable for. As a consequence, trainers and policy makers have generally been limited by subjective or rough assessments of deadly force performance or how challenging a deadly force situation was.

Our research addressed this problem by using a novel pairing of two well-established research methods, Thurstone scaling and concept mapping. With them, we developed measurement scales that dramatically improve our ability to measure police officer performance in deadly force encounters. We expect that these metrics will make it possible to better evaluate the impact of management and training practices, refine them, and make assessment of accountability more just and reasonable.

Developing A Common Metric For Evaluating Police Performance In Deadly Force Situations

Accelerating Realistic Deadly-Force Judgment and Decision Making Training
Defense Advanced Research Projects Agency (DARPA) through Advanced Brain Monitoring, Inc (Vila, PI)

Johnson et al (2014) Identifying psychophysiological indices of expert versus novice performance in deadly force judgment and decision making. Frontiers in Human Neuroscience 8(512). doi:10.3389/fnhum.2014.00512

Objective: To demonstrate that psychophysiology may have applications for objective assessment of expertise development in deadly force judgment and decision making (DFJDM).

Background: Modern training techniques focus on improving decision-making skills with participative assessment between trainees and subject matter experts primarily through subjective observation. Objective metrics need to be developed. The current proof of concept study explored the potential for psychophysiological metrics in deadly force judgment contexts.

Method: Twenty-four participants (novice, expert) were recruited. All wore a wireless Electroencephalography (EEG) device to collect psychophysiological data during high-fidelity simulated deadly force judgment and decision-making simulations using a modified Glock firearm. Participants were exposed to 27 video scenarios, one-third of which would have justified use of deadly force. Pass/fail was determined by whether the participant used deadly force appropriately.

Results: Experts had a significantly higher pass rate compared to novices (p < 0.05). Multiple metrics were shown to distinguish novices from experts. Hierarchical regression analyses indicate that psychophysiological variables are able to explain 72% of the variability in expert performance, but only 37% in novices. Discriminant function analysis (DFA) using psychophysiological metrics was able to discern between experts and novices with 72.6% accuracy.

Conclusion: While limited due to small sample size, the results suggest that psychophysiology may be developed for use as an objective measure of expertise in DFDJM. Specifically, discriminant function measures may have the potential to objectively identify expert skill acquisition. Application: Psychophysiological metrics may create a performance model with the potential to optimize simulator-based DFJDM training. These performance models could be used for trainee feedback, and/or by the instructor to assess performance objectively.

SAFE Driving from California Commission on Peace Officer Standards and Training (POST)

Experimental Test of the Impact of Work-Related Fatigue on Police Officer Vehicle Collision Risk
California Commission on Police Officer Standards and Training (CA POST) (Vila, PI)

James, S.M. (2015) Distracted driving impairs police patrol officer driving performance. Policing: An International Journal of Police Strategies & Management 38(3), 505-516.

James, S.M., & Vila, B. (2012) Driven to distraction. The Journal Of California Law Enforcement 46(2), 14-18.

James, S. M., & Vila, B. (2015) Police drowsy driving: predicting fatigue-related performance decay. Policing: An International Journal of Police Strategies & Management 38(3), 517-538.

California POST SAFE Driving

Impact of Work-Related Fatigue on Deadly Force Judgment and Decision Making Performance and Driving Performance Among Day vs. Night Sleepers
US DOD Office of Naval Research (Vila, PI)

James, L., James, S.M., & Vila, B. (2016) The reverse racism effect: are cops more hesitant to shoot black suspects? Criminology and Public Policy 15(2), 457-479.

James, L., James, S.M., & Vila, B. (2017) Does the “reverse racism effect” withstand the test of police officer fatigue? Policing: An International Journal of Police Strategies & Management 40(2), 184-196. doi:10.1108/PIJPSM-01-2016-0006

Development of Tactical Social Interaction (TSI) training from the Defense Advanced Research Projects Agency (DARPA) Strategic Social Interaction Modules (SSIM) program


Empowering the Strategic Corporal: Training Young Warfighters to be Socially Adept with Strangers in Any Culture

Defense Advanced Research Projects Agency (DARPA)


The interactions of young enlisted warfighters with strangers often form the operational center of gravity in counterinsurgency, peacekeeping, nation-building, and humanitarian missions. Consequences from the decisions they make in fast paced, low information encounters with strangers can reverberate across tactical, strategic, and political boundaries. Despite the critical nature of their decisions in the field, however, our “strategic corporals” frequently are teenagers whose frontal lobes have yet to develop fully.
The DARPA-funded research reported here took an important step toward empowering these young warfighters to do something that is vital to the success of our nation’s strategic interests, but which few of them are well equipped to do: interact successfully during ambiguous operational encounters in very foreign lands with people who are very different from themselves.

Although most warfighters receive pre-deployment training that touches on language and cultural skills, or teaches them to better attend to the human terrain as they hunt for foes and watch for threats, that training tends to be focused on the characteristics of the place to which they are being deployed. This isn’t efficient when today’s warfighter may be assigned to Iraq on one tour and Afghanistan on the next, then suddenly be rerouted to Uganda or Indonesia. In a world where young enlisted warfighters may be sent anywhere, there is a critical need to help them learn the fundamental skills needed to adapt rapidly in any culture.

Our highly experienced interdisciplinary research and training development team attacked this critical gap using a novel process that included:

1. Logic model and metric development that used novel research techniques we pioneered to rapidly identify generic causal models for understanding the fundamental dynamics of stranger encounters, and to develop interval-level metrics for measuring both the relative difficulty of those encounters and individual performance in them.

2. Training test instrument development that used rigorous experimentation to create novel instruments and techniques that enable trainers—and our research team—to readily assess trainee baseline capabilities, strengths and weaknesses in ways that are objective, scientifically valid, and reliable; measure the relative impact of each training technique and module as well as overall training program success; and also use non-intrusive ambulatory neurophysiological measurement devices to track and differentiate between trainee engagement, frustration, and cognitive workload whenever possible during training in order to assess the individual training dosage received.

3. Tactical social interaction (TSI) curriculum development that identified novel training techniques that connect with young enlisted warfighters and give them the foundation for figuring out how to interact effectively in a novel environment. The elements of the TSI curriculum also were designed to be modular, scalable, and blendable. Modularity makes it possible to teach the curriculum as a whole or piece by piece. Scalability makes it easy to adapt the curriculum and related tools to presentation in settings ranging from the schoolhouse to the company, platoon or even fire-team level. Blendability—an attribute recommended by the Marine Corps’ Training and Education Command—makes it possible to integrate TSI training modules seamlessly into existing training programs to minimize costs and encourage warfighters to see tactical social interaction as a core part of their fieldcraft.

4. Pilot testing of the TSI curriculum using military and police students who learned from the curriculum, critiqued it, and helped refine both the curriculum and test instruments.

5. Assessment of the TSI curriculum using the training test instruments developed to assess students behavioral, cognitive, and affective pre- and post-training improvements.

6. Ongoing coordination and liaison with performers from other technical area performers in DARPA’s Strategic Social Interaction Modules (SSIM) program to absorb as much of their knowledge into our work as possible, then transition the results of each phase of our research to technology developers, social scientists, and evaluators.
In short, this research nailed down logic models and metrics that give trainers, evaluators, and future technology developers a unified framework from which they can proceed and helps assure that the work of one complements each of the others. The interval-level performance metrics we created give research teams, trainers, technology developers, and evaluators a common yardstick that makes it possible to use powerful mathematical and statistical techniques, and also is valuable for software and hardware development. These new capabilities will enable advances in the science, systems, and devices used to train young warfighters and build social skills that are invaluable in counterinsurgency, peacekeeping, nation-building, and humanitarian operations.

Military success requires understanding threat capabilities, intentions, and activities, as well as local human, social, cultural, and behavioral factors. Many assume that the skills necessary to do this require social graces and nuanced insights that are beyond the experience or ability of young warfighters. However, our research challenges that assumption. Every social creature from ants to dogs, dolphins, and people is naturally equipped with the potential to learn social skills and nuances—and a drive to do so. As a species, humans tend to excel at reading one another, establishing connections, and finding ways to communicate. Even though performing these mundane tasks can be problematic in a foreign culture, learning to solve these problems in an organized and intuitively reasonable manner is half the battle.

Our research has demonstrated that creative training approaches which focus on conveying the fundamental dynamics of encounters with strangers can be taught in ways that engage warfighters and help them learn to be better at observing what matters, solving problems, and connecting with people who seem different from themselves. Our work has the potential to radically change established training practices and increase the effectiveness of warfighters on the ground in counterinsurgency, peacekeeping, nation building, and humanitarian missions.


The Spokesman-Review:
TSI delivered in Colorado

Training Partners: I2s

Crisis Intervention Team (CIT) Training Metric Development

Applying research techniques pioneered by our team to rapidly identify generic causal models for understanding the fundamental dynamics of situations involving mental illness. During this phase, we brought together 20 law enforcement and mental health professional (MHP) subject matter experts in a two-day focus group to identify key indicators for measuring both the relative difficulty of crisis encounters and individual performance in them. Using this information we created and widely distributed surveys to identify the level of importance law enforcement officers’ and MHPs’ place on difficulty and performance indicators.

Validation of the ACRA Cognitive Assessment System

WSU conducted controlled laboratory experimental trials designed to test the validity of cognitive test batteries to predict driving performance in high fidelity driving simulators. The validation study design will require twenty-four adult participants to partake in 1) a three hour screening and training session and 2) a six hour period of data collection at the Simulated Hazardous Operational Tasks Laboratory at the Sleep and Performance Research Center.

Analyzing Novel Experimental Research Data to Better Understand and Manage Fatigue Across the Range of Military Operations

The research analyzed exploratory data from ONR-funded experiments to identify and develop new ways to manage fatigue and understand its impact on warfighters’ safety and health, interactions with non-combatants, and driving.

Fatigue management applies to every current and long-term ONR expeditionary warfare goal and focus area involving human decision making, information collection, communication and reporting, adaptability in complex combat environments, or operational safety and health. Yet relatively little is known about how to manage operational fatigue in the types of counterinsurgency, stabilization and humanitarian missions that dominate contemporary expeditionary and irregular warfare.

Our focus on individual-level effects of fatigue is especially important in these highly distributed global operations, during which small teams must conduct missions in extreme, politically unstable environments while sleep deprived and physically depleted. Fatigue-related degradation of the strategic corporal’s perception, judgment, decision making, performance, and stress management can undermine both tactical and strategic imperatives. Thus, our lack of knowledge about how to manage the individual-level effects of fatigue constitutes a critical need.

Our research:

1. Provide Navy/Marine Corps with an empirical basis for setting work-hours, scheduling, and equipment-use policies, training drivers to better manage fatigue and distraction load, and improving the structure and presentation of instruments and equipment inside motorized vehicles;

2. Identify individual risk factors associated with performance of operational driving, deadly force judgment and decision making, and tactical social interaction in order to understand the extent to which performance is affected by fatigue-related risk propensity, PTSD symptomology, and mood;

3. Assess the impact of fatigue on warfighters’ tactical social interaction skills and other behaviors that influence non-combatants’ perceptions of their legitimacy, fairness and civility; and

4. Assess the extent to which fatigue-related driving accidents may be reduced by understanding the effects of fatigue and the timing of work shifts on collision risks as well as operational costs such as fuel consumption and maintenance.

Spokane Police Department Training Development Assistance

The Simulated Hazardous Operational Tasks Laboratory at Washington State University (WSU) Spokane provided Training Development Assistance to the Spokane Police Department (SPD). The aim of this contract was to assist the SPD in enhancing the department’s training capabilities to improve officer safety and wellbeing, better serve the community, and meet the recommendations set forth by the U.S. Department of Justice, Office of Community Oriented Policing Services’ Collaborative Reform Process.

Using Interval-Level Police Performance Metrics to Test the Effectiveness of Seattle Police Department’s Early Intervention System

In response to Seattle Police Department’s (SPD) consent decree, they have implemented an Early Intervention System (EIS) to identify those officers who may be exhibiting potentially concerning behaviors. One of the main concerns when implementing EIS, however, is the selection of “triggers” used to identify such officers. This is problematic because, for example, citizen complaints do not take into account individual officers’ exposure rates to high-risk encounters and situations. As a consequence, officers who are more proactive and/or respond to more calls for service are more likely to receive citizen complaints, regardless of their behavior. The goal of the study was to evaluate the ability of SPD’s EIS to correctly identify officers who are behaving in ways that are truly problematic and warrant investigation. To do this, we will use our newly-developed, NIJ-funded, interval-level metrics to evaluate the field performance of officers flagged by the EIS. In doing so, we determined how many officers flagged by the EIS are actually exhibiting problem behaviors.

Oregon Department of Public Safety Standards and Training – Training Development Assistance

The aim of this contract was to assist the Oregon DPSST in enhancing DPSST’s Basic Police training program via the identification and measurement of decision points and behaviors, in dynamic social encounters, that are most likely to contribute to police legitimacy.

Online Training for Law Enforcement to Reduce Risks Associated with Shift Work and Long Work Hours

We reviewed the existing online training program: ‘NIOSH training for nurses on shift work and long work hours’ on and assist the Program Manager with tailoring the content for law enforcement. This included providing photographic images to include in the training (taken of local law enforcement officers who will be reimbursed for allowing us to use their images); narration services (sourced in house to avoid cost); videos for the training (6 “motivational” interviews with police officers and experts sourced locally).

An Evaluation of Simulation vs. Classroom-Based Implicit Bias Training to Improve Police Decision Making and Enhance the Outcomes of Police-Citizen Encounters

In response to broad concerns about racially motivated policing implicit bias training is becoming a staple among many police departments. Two modalities for implicit bias training exist—a classroom based academic presentation on the science of bias, and simulation-based training to teach officers to focus on objective threat indicators over suspect characteristics. The problem, however, is that our knowledge of the effectiveness and persistence of implicit bias training is severely limited. Furthermore, no evidence exists for which implicit bias training modality is superior (from the perspective of the persistence of training-related behavior change over time), or whether both types are required to have an impact on police decision making on the street.

Our subjects were 400 officers, assigned to patrol in diverse metropolitan departments with nationally representative demographics.

Patrol officers were randomly assigned to one of four groups: the first received classroom based implicit bias training, the second received simulation based implicit bias training, the third received both types of training, and the fourth served as a no-training control group.

Test measures included: 1) Fairness in officer decision making (measured by scoring body camera footage using custom metrics for measuring officer performance during police-citizen encounters); 2) Citizen perceptions of police legitimacy (measured by citizen complaints); 3) Arrestee perceptions of how fairly they were treated by police (measured by survey); and, 4) Police perceptions of training effectiveness (measured by survey and focus groups).

The Impact of Shift-accumulated Fatigue on Patient Care and Risk of Post-shift Driving Collisions among 12-hour Day and Night Shift Nurses

Although ample evidence exists that shift-work is dangerous for patients and nurses, very little is known about optimal shift scheduling. Experiments which quantify the fatigue-related risks associated with shiftwork are desperately needed to inform policy regulating shift scheduling. To meet this need, we studied the impact of shift-accumulated fatigue on the spectrum of daily activities nurses engage in: from patient care to post-shift drive home. The between-groups, repeated-measures quasi-experiment was conducted in the Washington State University (WSU) College of Nursing and Sleep and Performance Research Center. Nurse participants (N=100) reported to WSU for testing on two separate occasions—once immediately following their 3rd consecutive 12-hour shift and once on their 3rd consecutive day (72 hours) off work.

This research provided objective evidence of the impact of shift work on nurses’ patient care-related critical skills and risk of collisions during their post-shift drive home. This resulted in concrete recommendations regarding safe shift-scheduling for day and nightshift nurses. The information we generated may provide the push needed to set national work hour policies for nurses. Given that doctors have had regulations on work hours since 1987 it is unacceptable that nurses still do not have set policies protecting them against safety risks and protecting their patients against preventable medical errors.

Using Interval-Level Metrics to Investigate Situational-, Suspect-, and Officer-Level Predictors of Police Performance During Encounters With the Public

The issue of how to measure the impact of situational-, suspect-, and officer-level factors on police actions has long been debated in the policing literature. One promising method is to use interval-level metrics developed via a combined method of concept mapping and Thurstone scaling. Our objective here was to use these metrics to score 667 incident reports from a large (n ∼ 1,500) urban police department. From this process, we explored significant trends in how police officers perform during encounters with the public. We found that officers performed better in “higher stakes” encounters and excelled in vigilance situational assessment as well as use of tactics and adapting tactics. Officers tended to receive the worst scores in routine police–citizen interactions and the highest in crisis encounters. Interpretation and implications of these findings for American policing are discussed.

Evaluating the effectiveness of a police department’s early intervention system

Police departments around the country are implementing Early Intervention Systems (EIS) to identify officers who may be exhibiting problematic or unprofessional behaviors. The goal of EIS is to minimize officer misconduct and increase officer accountability. To evaluate whether EIS can actually differentiate “problem” from “non-problem” officers, we analyzed the performance of officers from incident reports of police–citizen interactions.

Using a blind scoring method, we evaluated performance from 1000 police reports; 500 randomly selected reports from EIS-flagged officers (treatment group) and 500 randomly selected reports from non-flagged officers (control group). Six hundred and sixty-seven reports contained relevant performance data. The interval-level metrics used to score officer performance were developed by Vila and colleagues (2016, 2018) to assess performance—expressed as a percentage—across a range of police–citizen encounters.

The overall performance score assigned to officers across all 667 incident reports was 80.46% (SD = 8.75%). When separated into EIS-flagged and non-EIS–flagged incidents, performance scores were 80.63% (SD = 8.58%) compared to 80.27% (SD = 8.95). There was not a statistically significant difference between EIS-flagged and non-EIS–flagged performance.

The EIS evaluated does not appear to be differentiating between problem behavior and non-problem behavior. This suggests that the “thresholds” used to identify problem officers are not working effectively.

Sleep health and predicted cognitive effectiveness of nurses working 12-hour shifts: an observational study

Due to the 24hr nature of society, shift work has become an integral part of many industries. Within the literature there exists an abundance of evidence linking shift work-related sleep restriction and fatigue with errors, accidents, and adverse long-term health outcomes.

The study goal was to physiologically measure sleep patterns and predicted cognitive decline of nurses working both 12hr day and night shifts to address the growing concern about sleep restriction among healthcare workers.

This study presents the results of a quasi-experimental, mixed between-within design where the sleep of 12hr day and night shift nurses was measured using ReadiBand wrist actigraphs. The between groups component was comprised of day v. night shift nurses. The within groups component was comprised of two separate measurement periods for each nurse—once for three consecutive days while they were working shifts (on duty) and once for three consecutive days off work (off duty).

Participants wore the wrist actigraph at home and in the hospital, and were instructed to adhere to their regular sleep schedule.

Participants were recruited from two hospitals in Washington State (n=90). Participants were 48 night- and 42 day-shift nurses. All participants worked 12-hour shifts.

Sleep was measured using ReadiBand wrist actigraphs, which are licensed with the Sleep, Activity, Fatigue, and Task Effectiveness (SAFTEtm) Alertness Score model, a biomathematical model that predicts cognitive effectiveness based on sleep/wake schedule. ReadiBands also calculate sleep quantity, sleep efficiency, and sleep latency. Results were analysed in SPSS (v26) through multilevel modelling.

Differences were observed in sleep quantity, efficiency, and latency based on shift type (day vs. night) and shift duty (on vs. off). The most extreme differences, however, were noted in cognitive effectiveness (SAFTEtm), whereby night shift nurses experienced substantial decline—frequently into the “high risk” zone—throughout their shifts compared to day shift nurses.

The present study identifies sleep characteristics that differ between day and night nurses working 12-hour shifts using objective measurements of sleep. Biomathematical modelling can offer a novel method to estimate hours of greatest cognitive decline, and have implications for policy around shift duration, timing, and overtime allocation.

An Effectiveness Evaluation of the Oregon State Revised Basic Police Academy Curriculum

In 2013 the State of Oregon passed House Bill 3194—the Justice Reinvestment act—resulting in the creation of the Oregon Center for Policing Excellence. One goal of the center was to revise the curriculum for the statewide Basic Police Academy, with a focus on topics such as communications, crisis intervention, and procedural justice. This curriculum revision was then evaluated independently. A force option simulator was used to assess recruit interaction with community members; this allowed for a reliable and repeatable measurement tool to assess each cohort month after month. Evaluation of behavior changes in recruits from before to after curriculum revisions revealed significant improvements in key policing skills related to interacting with civilians in ways that build trust in police legitimacy, de-escalating hostile situations, and reducing the need for use of force. This chapter describes the curriculum revision in detail, presents results, and discusses them in light of police training moving forward.

Pilot Test of “NIOSH Training for Law Enforcement on Shift Work and Long Work Hours”


Pilot test the effectiveness of an online training program for managing shift work and long work hours.

Fifty-seven officers from across the United States participated for 12 weeks in a pre-test, training intervention, post-test design assessing the following measures: sleep using actigraphy, diaries, and surveys; knowledge and feedback about the training using surveys.

After the training, actigraphy data showed significant reductions in sleep latency and awakenings during sleep. Survey data showed reductions in sleepiness, difficulty staying awake during the day, and difficulty getting things done. Frequency of nightmares also decreased. Participant’s knowledge about sleep improved and satisfaction with the training was high.

Participants were satisfied with the training and showed objective improvements in their sleep and subjective improvements in feelings when awake. This research will help inform interventions to improve police officer health and wellness.