Machine Learning: Advancing Market Research Goals

By Paul Sideleau, Imperium Senior Director, Technology

Machine learning (ML) has transformed the application of technology in recent years, making complex tasks – like speech recognition, online searches and shopping – simpler and more intuitive. But ML is also quietly revolutionizing workflows in other important areas, including fraud prevention.

As rapid digital transformation exposes increasing vulnerabilities – and, in turn, cyber criminals become more adept at exploiting them – organizations will need to leverage the opportunities of AI and ML, if they are to manage and minimize emerging threats to data quality and security.

Fraud in the market research (MR) sector is endemic. According to industry insights specialist GreenBook, it’s estimated that, on average, between 15 and 30 percent of market research data is fraudulent. And, while bots and other automated programs are responsible for swathes of fake MR survey traffic, professional fraudsters who bypass quality control checks are also on the rise.

Survey fraudsters ruthlessly exploit gaps in user authentication to undermine intelligence gathering. As fast as organizations’ tech teams plug these gaps, new ones appear, and the cycle continues. Imperium’s data shows that survey fraud has been increasing throughout the pandemic; at the end of last year, we saw fraudulent survey responses double overall, with an increase in poor open ends (OE) of around 40-50 percent during 2020.

In the face of these escalating challenges, a multi-layered approach to fraud is necessary. But how can we excise the fraudsters while protecting and supporting legitimate panelists?

On its own, human oversight is no longer enough to ensure data integrity. We already know that individual data points aren’t sufficient to determine fraud or weed out bad actors; only by analyzing multiple signals – calibrated by advanced ML algorithms and audited by manual review – can we hope to achieve our goal of optimal data integrity.

The most effective way of improving data quality is to look at respondent behavior – not in terms of a single interaction but by examining all the events leading up to, and following, the interaction. Obviously, this involves a huge volume of data, which can only be parsed by setting appropriate benchmarks for keystroke patterns, answer speed and other response characteristics, so that outliers can be properly scrutinized.

In effect, machine learning replicates the work of dozens of teams of analysts – in a fraction of the time (and at a fraction of the cost). It can run thousands of simultaneous queries, comparing results and constantly analyzing what it recognizes as ‘normal’ activity, while isolating and blocking anomalous transactions. Decisions are only escalated when specific human insights are needed.

ML models quickly spot suspect behaviors, without impacting on the panelist’s survey experience, and also get smarter over time as they process more and more data. ML systems thrive on an abundance of clean and accurate data: the more examples of good and bad respondents they have, the more efficiently they can compare behaviors and apply the learning to predict future transgressions.

When we don’t have clean data, we must leverage ML to help uncover fraudulent activity that may be missed by manual checking. Moreover, ML models require feeding with the right data points, something that can only be done by experts like Imperium with specific domain expertise in data quality.

No business wants to take too hard a line on data monitoring. The more rules you lay down, the more likely it is you’ll throw out genuine – if flawed – respondents alongside the fraudulent ones. Behavior thresholds also change over time which means that benchmarks can become blurred and fraudsters start to slip through the net.

Intelligent ML models will process historical outcomes and anomalies alongside more recent inputs, creating an automated feedback loop that makes it easier not only to improve current survey data but to predict fraudulent or anomalous behavior in the future. That’s not to say that ML will ever replace human analysis – human intuition has yet to be successfully synthesized by machines – but implementing effective ML frees up time for skilled analysts to focus on more productive tasks.

We speak from experience. By introducing ML into Imperium’s software development cycle, we’ve been able to ensure that survey data quality tools such as Real Answer®, RelevantID® and QualityScoreTM, are continuously evolving and becoming more effective. It’s a carefully controlled process with checks and balances in place. We continuously improve and implement MLOps best practices, including real-time monitoring, automated deployments, versioning models, A/B testing and data drift detection. We also include guard rails to ensure the machines can’t take over (just yet!).

Taking this holistic approach to improving our anti-fraud and data-hygiene solutions not only enables us to more confidently support MR and panel companies in their quest for better data but also means we can more easily tailor models to suit clients’ unique specifications.

Importantly, we know when and how to use machine learning. It’s not a silver bullet; we continue to deploy traditional approaches, together with statistics and data science analysis. ML is a component part of a broad suite of best practices and applications that help ensure data quality.

By using ML models to thwart the bots and fraudsters, we can not only become infinitely better at predicting the behaviors that threaten the veracity of survey data but also improve the scope and quality of the datasets themselves.

Paul Sideleau, is the Senior Director of Technology at Imperium. He focuses on the design, implementation and expansion of existing and new products. His twenty years of experience in scalable application development, together with his impressive software design expertise, bring a high level of technical know-how to Imperium’s board.

Back To News