According to the Coalition Against Insurance Fraud, nearly $80 billion in fraudulent claims is made annually in the US. This figure includes all lines of insurance and is likely a conservative estimate because so much insurance fraud goes undetected and unreported.
One of the largest US personal lines insurers was among the companies experiencing significant loss to fraud. Like most insurers, it relied on a combination of business rules and manual review to identify suspicious patterns and inflated claims. But leaders determined that the process was not very effective: A large number of actual fraud cases were not being caught, and a good amount of the investigators’ time was spent chasing false alarms.
The company had successfully applied analytics to make more informed decisions in other areas of the business, and decided it was time to apply analytics to the problem of fraud prevention. Leaders tapped analytics solutions firm Mu Sigma, which had helped the insurer improve efficiency in claims handling, to investigate and come up with a more effective means for fraud detection.
One of our hypotheses was that the text entries captured during various stages of the claims process could reveal scripted patterns indicative of fraudulent activity. This included unusual narratives in claims notes, indications of prior interest, and use of specific words and phrases such as “chiropractor” that the company associated with past fraud cases. Since it was difficult, if not impossible, to spot some of these patterns via manual review, a lot of these cases escaped attention. We worked with the insurer to build a text-mining algorithm and generated a fraud propensity score that combined the business rule classification with the authenticity of the text entries.
Testing with the fraud investigation unit in the field revealed that the number of fraud cases being caught improved by 5 percent over a control group. Further, some of the suspicious text patterns found through the model indicated collaboration among various entities involved in the claims process, which the field unit then wanted to investigate further.
In order to positively correlate those entities, we borrowed from the concept of influence mapping used in the social media world for marketing purposes, and used social network analysis to identify suspicious relationships among parties involved in a claim (claimant to employees, claimant to medical providers, etc.). This element was added to the propensity score. While this further significantly improved the number of cases being flagged, the number of false positives was still high and a cause of concern for the investigators.
The team next listed all possible factors that could be indicative of suspicious activity and identified relevant data elements, such as relationships among parties as referenced above. The factors identified were then used to build a logistic regression model to predict probability of fraud given historical customer and behavioral characteristics from known cases of fraud. This model replaced the business rules being used and helped improve accuracy of the cases flagged.
Finally, a composite fraud propensity score was generated using all three elements: the text-mining algorithm, social network analysis, and logistic regression model. A claims case tool was developed and deployed that allowed investigators to visualize and examine the fraud propensity score along with relevant reasons for suspecting fraud for each case. This helped investigators fine-tune which claims required manual review. When run against a sample set of claims data (a subset of which had been previously proven fraudulent), the approach positively identified 90 percent of fraudulent claims. The insurer plans to fine-tune the model periodically to ensure it’s keeping up with new approaches criminals use to perpetrate fraud.
This entire project, from start to finish, took approximately three months. Based on the savings it generated, the project paid for itself within a month. The insurer reports the following results:
- It is catching 20 percent more fraudulent claims than before, even though it investigates significantly fewer claims -- just 3.5 percent of total claims.
- This has led to $30 million in annual savings, not only from fraud reduction but also from reducing the cost of manual investigation efforts and false alarms.
What can insurers learn from this experience? The primary lesson is that would-be defrauders are getting smarter and more sophisticated. Insurers must counter with high-tech approaches that leverage data analytics. The upfront cost and time required are certain to pay off in the form of fraud avoidance and investigative cost reductions.
Harsha Rao, associate director at Mu Sigma, contributed to this article.
[Concerned about your data security? Review best-practices at the Interop NY session, Is Your Data Really Safe? A Security Checklist Everyone Must Implement.]
Saurabh Tandon is a senior manager with Mu Sigma. He has over a decade of experience working in analytics across various domains including banking, financial services and insurance. Saurabh holds an MBA in strategy and finance from the Kellogg School of Management and a ... View Full Bio