We use cookies on this site to enhance your experience.
By selecting “Accept” and continuing to use this website, you consent to the use of cookies.
Search for academic programs, residence, tours and events and more.
Oct. 30, 2023
Print | PDFBy Jane Phillips, MSc Management, Organizational Behaviour and Human Resources Management
Chet Robie, Professor, Organizational Behaviour and Human Resources Management
Artificial intelligence (AI) has received a significant and increasing level of interest in the news, corporations around the world and research. Generative AI, specifically large language models (LLMs) are attracting attention for their high level of accessibility and ease of use. LLMs like ChatGPT are significantly different than previous AI algorithms because they generate language based solely on user prompts.
As more people use LLMs for help with writing, predictions and comprehension, the need to understand their role in a variety of fields is undeniable. There is a growing concern for people utilizing LLMs to mislead others and claiming AI-generated work as their own. In personnel selection, the misrepresentation of oneself has been a top issue in personality assessments. This misrepresentation, called “faking,” happens when a candidate responds with answers that they think will get them the job, as opposed to answers that accurately represent their true self.
Phillips and Robie
In our recent paper published in Personality and Individual Differences, “Can a computer outfake a human?”, we examined the potential role of LLMs in assisting candidates with faking personality assessments used in personnel selection. We employed GPT-3.5, GPT-4, Google Bard and Jasper against two personality assessments – one fairly easy to fake by humans and one more challenging to fake by humans.
Most of the LLMs were able to fake the “easy to fake” personality assessment at a level at or greater than humans. However, most of the LLMs had more difficulty faking the “harder to fake” personality assessment. Strikingly, GPT-4 was the clear winner among the LLMs as it was able to fake, on average, better than 99.6 per cent of the student population on the “easy to fake” assessment, and 91.8 per cent better on the “harder to fake” assessment.
These results have a few important implications. The superior performance of GPT-4 would suggest that generative AI may make the prevention of faking more difficult on personality selection techniques. Processes used in hiring may need to implement strategies like timed assessments or lock-down browsers to combat AI-assisted faking. Further testing is required to assess LLM efficacy in faking, but the varied results would suggest that one should approach AI-generated assistance with caution.
In the coming months, we are continuing this research by retesting a student population with more complex, phrase-based questions to compare to a set of LLMs. We will integrate new LLMs that have appeared since our initial study, and we will seek out new ways to challenge the efficacy of LLM-assisted faking. This will help to protect the authenticity of personnel selection.