Semantic Embeddings: Fraud and Creativity

One of the most interesting parts of anti-fraud research is using applied math and machine learning techniques for analysis and novel feature engineering development. In this space we can use statistics to build on previously conducted research in social science areas such as behavioral psychology and linguistics. This helps us better understand how fraud practically manifests in the data. One recent analysis highlights how this can be used to leverage data and understand what differences exist between fraudsters and regular applicants in a government grant program.

Creative Divergence and Semantic Scoring

First let’s consider a famous IQ test question. The question, part of the Alternative Uses Test, which is attributed to JP Guilford in 1967, goes something like, “Write as many uses as you can think of for a brick” (Alternative Uses Test, 2024). The examinees then list as many uses for a brick as they can. Sure, there are the normal answers like, “help build a wall” or “use for a step”. But, some examinees also branch out into more creative uses. Things like “break a store window for a smash and grab robbery” or “a paperweight for your outdoor desk during a wind storm” end up on the list too. The IQ test works because it elicits creative responses and illustrates divergent thinking.

Interestingly, we can score the creativity of word choice using math. We do this using a semantic embedding model with a cosine score. Cosine you might ask? Yes, that cosine; the one your high school math teacher told you was really important to know and you should pay attention to during their class. Semantic embeddings are used in Natural Language Processing (NLP) to represent words or phrases in numerical form. This creates a high-dimensional vector space that can be treated numerically. In this instance we are measuring the distance between a concept and a response (Dumas et al, 2021) as an angle between two vectors. This article does a good job of illustrating the concept using the question similar to the brick IQ test question but instead asks about the use for a hammer. Once we have our question and our responses we can use the cosine score to understand that more creative answers are more obtusely angled and more common answers are more acutely angled. This allows us to calculate the “originality” of an answer.

Behavioral Psychology and Fraud Detection

Obviously, we aren’t concerned with hammers and bricks, so how could we use this research for anti-fraud purposes? Well, I am glad you asked but you will need to wait just a little longer. First, we need to talk about some interesting behavioral psychology findings over the last 20 or so years. Most notably, researchers found when deception was employed in various lab-controlled experiments an individual’s linguistic content shrank both in the areas of content and context (Adams, 2002). Multiple studies proved these points out; these studies ranged between analyzing 5 to 100 documents or studies with up to 128 students (Craig et al, 2013; Clatworthy and Jones, 2020; Caso et al, 2005). The stress of deception caused written (or spoken) descriptions to get more vague (Adams, 2002). This means they used less rich, less creative, and less emotional based words. If we then scored a common prompt and response for known grant fraud awards against not known grant fraud awards, we should see a marked difference between the two groups’ semantic scores (illustrated with cosine scores).

Figure 1: Angles of abstract creativity scores comparing not known fraud to known fraud.

 

Cosine angle comparing known fraud vs not known fraud

 

Applying NLP for Anti-Fraud Research

Ok, back to anti-fraud research. Because NLP has created huge lexicons and scored word associations (some as large as 8.4-billion-word associations), we can use cosines to compare word choices across large prompts. So, what if instead of asking, “List the use of a brick?” we treated a project abstract like the prompt, “Describe your grant project?” This allows us to clean the abstract and score it to create a composite creativity score for each award we wanted to analyze. Using DOJ press releases we could compare the scores of known fraud awards to a random sample of not known fraud awards from the same program and we have the makings of a statistical test.

But why do we care about abstracts in grant proposals at all? Ultimately, they provide a direct line of communication between good (or bad) actors in the fraud space. This allows us to understand how different types of actors communicate and what kind of patterns we should try to understand. Sometimes those patterns are based on key words identifying risks to targeted programs. Sometimes though, like in this study, we can create numerical scores and employ statistics to quantify and evaluate risks based on cut-points. Think about it like an interview with someone suspected of fraud. As anti-fraud professionals we would never turn down the opportunity to ask them questions and understand their state of mind. While the eyes may be the window to the soul, word choice appears to be the window to a bad actor’s conscious.

Case Study: TrackLight's Research

So, that is exactly what we did at TrackLight. We took the awards from companies who pled or were sentenced guilty from the Small Business Innovation and Research (SBIR) program (n = 747). Then we compared those to a random sample of not known fraud awards (n = 3,228). This created an achieved power of .99 for a small effect size (of 0.2). Or, said another way, a 99% chance to reject the null hypothesis when it is really false. This level of power helps limit type 2 error (false negative) and detect the true effect if it exists. But, this is a balance, because more n increases power but also inflates p-value and may falsely demonstrate statistical significance when it isn’t a meaningful real world effect. So, we want to evaluate both the p-value and the effect size.

The 3,975 extracted project abstracts were scored for creativity. In this instance, creativity is scored as a composite of the words used to describe the project in the abstract. Typical NLP word cleaning procedures were used to standardize the text and remove stop words. Stop words are the words that are important for communication but don’t convey meaningful information (i.e. words like: the, of, a, but, and, etc.). As a composite, the scores are added together for a single award to create an overall creativity score per award.

Figure 2: Composite Creativity Scores (Known Fraud vs Not Known Fraud):

Composite Creativity Scores

When comparing the scores, we see that known fraud awards do score lower in creativity (mean 49.48, SD 34.84, SEM 1.01) when compared to not known fraud (mean 64.28, SD 27.77, SEM 0.61). But, is this mean creative score difference of 14.8 significant? Using an Independent Samples T-Test, we see high significance (P < .001), and a moderate effect size (Cohen’s d = .47). This supports the findings of behavioral psychologists that in fact the stress of deceit does in fact alter word content and context use by fraudsters. Think about how Benford’s Law works to identify anomalous numbers when bad actors try to camouflage their expenses in expense receipts. Similarly, this seems to be identifying stress induced fraud based semantic camouflage. And, we can now evaluate this score mathematically in near real time.

Advancing Fraud Detection Methods

But, ultimately, how is this knowledge useful? I have long been a proponent of updating anti-fraud research methods to move beyond just outlier detection and rule-based risk analysis. Using methodologically supported analysis methods, grounded in social science backed research, and analyzed statistically, helps us better understand how fraud manifests in the data. It also creates variables that can be used in models, and supports the beginning development of screening tools for government programs. While, creativity by itself cannot indicate fraud (or not fraud) this score, used in conjunction with other indicators, can help identify high risk awards that need further review prior to award. If we understand these indicators we can support moving from the pay and chase system to a forward leaning detection system. And, maybe in the process we can increase public trust and support getting money to the right applicants in competitive based systems.

References:

The alternative uses test: Creative Huddle. RSS. (n.d.). https://www.creativehuddle.co.uk/post/the-alternative-uses-test

Adams, S.H. (2002). Communications under stress: Indicators of veracity and deception in written narratives. Ph.D. thesis, Virginia Polytechnic Institute and State University, April. Retrieved Aug 25, 2011 from http://scholar.lib.vt.edu/theses/available/etd-04262002-164813/unrestricted/adams1.pdf.

Caso, Gnisci, A., Vrij, A., & Mann, S. (2005). Processes underlying deception: an empirical analysis of truth and lies when manipulating the stakes. Journal of Investigative Psychology and Offender Profiling, 2(3), 195–202. https://doi.org/10.1002/jip.32

Clatworthy, & Jones, M. J. (2006). Differential patterns of textual characteristics and company performance in the chairman’s statement. Accounting, Auditing & Accountability Journal, 19(4), 493–511. https://doi.org/10.1108/09513570610679100

Craig, Mortensen, T., & Iyer, S. (2013). Exploring Top Management Language for Signals of Possible Deception: The Words of Satyam’s Chair Ramalinga Raju. Journal of Business Ethics, 113(2), 333–347. https://doi.org/10.1007/s10551-012-1307-5

Dumas, D., Organisciak, P., & Doherty, M. (2021). Measuring divergent thinking originality with human raters and text-mining models: A psychometric comparison of methods. Psychology of Aesthetics, Creativity, and the Arts, 15(4), 645–663.