rohit – Imperium

How to stop professional survey cheaters

By Tim McCarthy, Imperium General Manager

Published in Quirk’s Media May 31, 2022

For market researchers everywhere, tackling fraud has become a high-stakes battle to defend data integrity. Yet, even as organizations scramble to thwart increasingly sophisticated attempts to infiltrate online surveys – including the wholesale deployment of bots and click farms – determined fraudsters constantly seek out fresh ways to subvert the system, ruthlessly exploiting every vulnerability.

Data integrity in the marketing research industry

The impact is significant; it’s estimated that between 15 and 25% of all market research survey respondents are fraudulent. Our own research shows that survey fraud escalated sharply during the pandemic, with fraudulent survey responses nearly doubling at times in 2020 – a worrying development when interference even at the lower end of that estimate can deliver dangerously skewed results.

However, with professional fraudsters not only ramping up attempts to evade detection but also sharing their tactics on YouTube, it’s a problem that’s set to persist.

Further, just as fraudsters are raising the bar, restrictions on third-party access to personal data that can help identify fakes and dupes are gaining ground. For example, Apple recently introduced its iCloud Private Relay service which can be used to send all browser info/traffic through Apple’s relays, returning it through a temporary IP, often shared by many others in the same region. Ironically, the very same processes that are designed to safeguard ordinary people from intrusive data harvesting are also serving to obscure the identities and intentions of bad actors.

Taking a measured approach to data integrity

At the same time, with sample scarcity being where it is, it’s more important than ever that market researchers don’t lose sight of the need to provide genuine respondents with a rewarding survey experience. Even with the focus firmly on preserving data integrity, there’s a balance to be struck between taking essential precautions to eject cheaters and providing a frictionless journey for genuine panelists.

It requires a holistic approach. Researchers must be aware of dozens of individual survey elements that have the potential to affect data quality – as well as understanding how to manage and mitigate these threats while promoting process integrity.

1. Survey design.

Some QA can be addressed through meticulous study design. Surveys should be engaging, relevant, clear and concise – those that can be completed in under 20 minutes are most likely to result in a completely usable data set. It’s best to incorporate a range of question types designed to identify poor respondents that are targeted and fit for purpose, without going overboard. Open-end, grid, low incidence and differing response questions will help weed out cheaters. Data reviews should be conducted consistently while fielding, ideally in real-time through automation.

2. Checks and balances.

Pre-survey checks are necessary to filter out obvious frauds and dupes. But, even when clear fraudulent respondents are removed prior to, or upon survey entry, a further ~10% still need to be removed manually post-completion due to inappropriate survey data/behavior. It’s relatively easy to filter out the fraudsters whose data is clearly and abundantly poor. But experience shows that flagging less-commonly monitored details – such as respondents re-entering the same phrase/response, copying and pasting OE responses or gaming LOI calculations by pausing for long periods of time before completing – can have a dramatic effect on the quality of the respondent pool.

3. Keep it real.

Build your quality checks into the survey itself and ensure all/most are related to the survey content. Creating unrelated quality check pre-screeners and/or setting multiple, off-topic trap questions can backfire and unnecessarily extend a survey’s length. Instead deploy actual survey questions with anomalistic/inappropriate behaviors flagged. Really obvious red herring questions can often be identified by bogus respondents and will sometimes confuse or frustrate real respondents – especially if they’re inserted towards the end of a long survey when attentiveness isn’t at its peak for any respondent.

4. Recalibrate expectations.

Real people aren’t perfect – they can be distracted, inattentive and contrary at times, while still being valid (and valuable) respondents. Rather than ejecting respondents at the first incidence of concern, it’s important to find secondary data points – extracted from other questions or passive checks – to confirm suspicions of poor quality. Identifying a fraudster or malicious respondent is not about catching a single poor response, but rather about recognizing multiple flags consistent with cheating, throughout the course of a survey. Sophisticated cheaters can appear to be the perfect respondents at first glance – completing in precisely the allocated time, for example. You’ll need to dig deeper into the data and use passive data points.

5. Technology can help.

In-survey automated checks that utilize machine learning (ML) models can help quickly spot anomalous behaviors, without impacting the panelist’s survey experience. Intelligent ML models will process inputs to create an automated feedback loop that makes it easier not only to improve extant survey data but to predict fraudulent or unusual behavior in the future. It’s unlikely that ML will ever completely replace human intuition but implementing effective automation and ML frees up time for skilled analysts to focus on more productive tasks.

The AI paradox

Just as artificial intelligence has offered fresh tools for tackling fraud, it’s also provided fraudsters with opportunities to execute increasingly creative encroachments.

Highly sophisticated automated scripts can slice through surveys, flooding them with bad respondents, for example, while advanced AI can analyze OEs to synthesize a closely related answer that may pass superficial inspection. This is why gathering supporting data points to work out what respondents are doing on a page-by-page basis is crucial. As fraud becomes more automated, our response levels must match it in both speed and accuracy.

Mitigating fraud is a long-term commitment. Cheaters will always try to find new ways to circumvent tricks and traps but if the cost-benefit doesn’t add up, the rewards simply won’t justify the effort. By forcing fraudsters to spend more time and energy gaming a survey than its monetary worth – while also providing genuine respondents with a satisfying experience that elicits generous engagement – market researchers could well tip the quality balance in their favor for good.

Tim McCarthy is General Manager at Imperium. He has over 15 years of experience managing market research and data-collection services and is an expert in survey programming software, data analysis and data quality. Imperium is the foremost provider of technology services and customized solutions to panel and survey organizations, verifying personal information and restricting fraudulent online activities.

Data Quality: Can Programmatic Quality Checks Outperform Manual Reviews?

By Tim McCarthy, Imperium General Manager

We launched our new automated tool, QualityScore, in 2021, to help streamline in-survey data quality checks, at the same time delivering significant cost and time savings.

As we analyzed the data from over 17K surveys (with 20m completes scored and 1.5m+ respondents flagged for removal), we expected to see automated outputs mirroring manual removals. Instead, we learned something far more interesting about the nature of the overlap between automated and manual removals.

We found that the automated checks, while flagging roughly similar numbers of respondents for removal, were, in fact, returning different groups from those identified via the manual process. The average overlap between automated and manual removals was just 50-60%.

Initially, this seemed like a worrying development. Were we checking for the right factors? Was our ML model operating correctly? Had we trained it with the right data?

The overlap isn’t a glitch – far from it. The fact that automated and manual checks are returning somewhat different results is actually a very good thing – and here’s why.

While manual checks can be effective in identifying clearly fraudulent respondents, they’re also more likely to miss more sophisticated cheaters – the ones who know not to fall foul of the standard speeder checks, the ones who aren’t straightlining at grids, the ones who provide seemingly accurate OEs. At times, to the naked eye, these could appear to be among the best respondents in the survey. Our automated checks are calibrated to catch these cheaters by utilizing passive and behavioral data (i.e. survey taking acceleration, copy/pasted OEs, repeat OEs across different respondents, mouse movement, etc).

Also, while our research demonstrates cheaters don’t all cheat in the same way it also demonstrates that good respondents don’t fall neatly into a homogenous data subset.

Good respondents don’t necessarily exhibit uniformly good behaviors – and if you base your removals on a single factor, you’ll risk biasing your data. For example, removing respondents on the results of speeding checks alone may unfairly affect younger age groups who generally complete surveys more quickly. A recent report, created through QualityScore activity, showed that respondents in the 18-34 age category failed standard speeding checks twice as often as respondents aged 45+.

Manual checks are useful, but a more automated respondent-scoring process delivers greater objectivity and increases the likelihood of engaging with the most appropriate respondents. Manual checks often cast a large net to make sure they catch the highest number of poor respondents. However, they frequently return a high false-positive rate (real respondents who may have unwittingly tripped one or two flags); we see this a lot when a respondent submits one relatively poor or unengaged OE response.

We know that the broad-brush approach that’s often a feature of manual checks, isn’t deployed because it’s necessarily the best method, but because reviewing all data points in unison for every respondent would simply be too time consuming. By removing respondents who trip flags – even for minor infractions – you will likely catch all the bad ones but you’ll also risk excluding many who are real, somewhat imperfect, respondents. Only by reviewing all key data points together can you confidently identify the difference between an imperfect real person with mostly good data, and a flat-out cheater.

Because our automation also flags respondents who stray from the standard response pattern for that specific survey, it also counters any bias that’s down to the quirks of survey design – if 90% of respondents are straight lining at one question, for example, maybe it’s a fault in the survey rather than an indication of wholesale cheating.

On average, QualityScore flags about 8% of respondents as ‘bad’, with an accuracy rate of 99% in field tests (confirmed through manual checks).

Data quality is a moving target. Which means that when it comes to tech, you can’t ‘set it and forget it’. Relying on increasingly out-of-date intelligence simply won’t work when even less sophisticated cheaters are becoming smarter as they realize they are getting flagged/ removed.

As a rule, real-time automated tools give researchers much greater control over the QA process than manual checks, without committing additional time and resources. Research companies using our fully automated solution save about 85%+ of the time they would otherwise spend checking survey results to identify bad respondents. Moreover, using ML to learn from respondents – good and bad – helps to refine our data-quality solutions so they become more intelligent and more responsive. Standing still simply isn’t an option.

Tackling Change: Putting Brands in the Driving Seat

By Tim McCarthy, Imperium General Manager

Last quarter, COVID-19 was briefly knocked off the top of the news agenda, as climate change became the most talked-about topic – thanks to the COP26 summit in Glasgow, Scotland.

What the prominence of this news story shows for those of us in the market research sector is that climate change is no longer a discussion solely for politicians and environmentalists. It’s a conversation that extends to people’s everyday actions and interactions, impacting the relationships consumers have with brands in almost every niche.

And, as businesses respond to growing public pressure by pursuing purpose alongside profitability, savvy brands are looking to take the initiative by more closely matching their messaging to their audiences. Amid this complex and developing narrative, being able to rely on clean, high-quality data is at the top of every brand’s – and every research agency’s – agenda.

Consumers expect authenticity

Buying decisions are increasingly influenced by philosophical and emotional factors, with individual actions more likely than ever to be rooted in personal experience and aspirations. It’s a mood that brands need to tap into if they want to stay ahead of the curve.

With misconceptions about demographics commonplace, companies looking to forge closer connections with their target audience must also train their focus towards accurate data gathering rather than relying on broad-brush trends analysis.

For example, the commonly held belief that younger generations care more about the environment than their older peers was debunked by a recent sustainability report from McCann Worldgroup’s Truth Central which showed that Boomers reported even higher levels of climate anxiety than Gen Z respondents.

If brands want to foster a greater sense of trust, cohesion and community with their customer base, they need to prioritize collecting the cleanest data to underpin and uplift their campaigns and communications.

Small changes can make a big difference

Naturally, understanding precisely what messages elicit the most positive responses helps brands to target the right people with the right insights.

By fine-tuning research surveys, brands can gain a deeper understanding of culture which, in turn, will enable them to refine and realign their messaging, leaning into the messages that will resonate most powerfully with their audience.

As consumers occupy increasingly stratified niches – with views that are constantly evolving – it’s more important than ever that brands and research agencies have access to the most accurate data available.

In a recent Imperium article, Technology Director Paul Sideleau talks about how AI and ML have become more crucial in the drive to maintain data integrity and minimize opportunities for fraud. This piece also highlights the need to prioritize input quality, given that output quality is so critical. However sophisticated or wide-ranging the panel’s scope, unless the right data points are used, entire models could simply miss the mark (lending weight to the ‘garbage in/garbage out’ theory).

Putting data quality at the heart of change

At Imperium, we know that quality data begins with selecting good survey participants and is further shaped by individual survey elements that can impact data for good or bad.

We routinely use ML – constantly tweaking and refining the algorithms – to ensure that tools such as Real Answer ^® and RelevantID ^® are as effective as they can be in detecting fraud and assuring data integrity. It means we’re better at predicting bad behavior and more confident in the scope and quality of the datasets we return.

As brands look for more authentic ways to engage with their audiences and communicate their ethical credentials, the demand for data that provides a more granular picture of the consumer mindset – and with it, the opportunity to more closely align with evolving views on ethical practices – can only increase.

Accurate data analysis can enable brands to not only illuminate and demystify abstract concepts, such as sustainability, but also to promote the essential actions, both corporate and individual, that will help move us toward a more hopeful future.

Machine Learning: Advancing Market Research Goals

By Paul Sideleau, Imperium Senior Director, Technology

Machine learning (ML) has transformed the application of technology in recent years, making complex tasks – like speech recognition, online searches and shopping – simpler and more intuitive. But ML is also quietly revolutionizing workflows in other important areas, including fraud prevention.

As rapid digital transformation exposes increasing vulnerabilities – and, in turn, cyber criminals become more adept at exploiting them – organizations will need to leverage the opportunities of AI and ML, if they are to manage and minimize emerging threats to data quality and security.

Fraud in the market research (MR) sector is endemic. According to industry insights specialist GreenBook, it’s estimated that, on average, between 15 and 30 percent of market research data is fraudulent. And, while bots and other automated programs are responsible for swathes of fake MR survey traffic, professional fraudsters who bypass quality control checks are also on the rise.

Survey fraudsters ruthlessly exploit gaps in user authentication to undermine intelligence gathering. As fast as organizations’ tech teams plug these gaps, new ones appear, and the cycle continues. Imperium’s data shows that survey fraud has been increasing throughout the pandemic; at the end of last year, we saw fraudulent survey responses double overall, with an increase in poor open ends (OE) of around 40-50 percent during 2020.

In the face of these escalating challenges, a multi-layered approach to fraud is necessary. But how can we excise the fraudsters while protecting and supporting legitimate panelists?

On its own, human oversight is no longer enough to ensure data integrity. We already know that individual data points aren’t sufficient to determine fraud or weed out bad actors; only by analyzing multiple signals – calibrated by advanced ML algorithms and audited by manual review – can we hope to achieve our goal of optimal data integrity.

The most effective way of improving data quality is to look at respondent behavior – not in terms of a single interaction but by examining all the events leading up to, and following, the interaction. Obviously, this involves a huge volume of data, which can only be parsed by setting appropriate benchmarks for keystroke patterns, answer speed and other response characteristics, so that outliers can be properly scrutinized.

In effect, machine learning replicates the work of dozens of teams of analysts – in a fraction of the time (and at a fraction of the cost). It can run thousands of simultaneous queries, comparing results and constantly analyzing what it recognizes as ‘normal’ activity, while isolating and blocking anomalous transactions. Decisions are only escalated when specific human insights are needed.

ML models quickly spot suspect behaviors, without impacting on the panelist’s survey experience, and also get smarter over time as they process more and more data. ML systems thrive on an abundance of clean and accurate data: the more examples of good and bad respondents they have, the more efficiently they can compare behaviors and apply the learning to predict future transgressions.

When we don’t have clean data, we must leverage ML to help uncover fraudulent activity that may be missed by manual checking. Moreover, ML models require feeding with the right data points, something that can only be done by experts like Imperium with specific domain expertise in data quality.

No business wants to take too hard a line on data monitoring. The more rules you lay down, the more likely it is you’ll throw out genuine – if flawed – respondents alongside the fraudulent ones. Behavior thresholds also change over time which means that benchmarks can become blurred and fraudsters start to slip through the net.

Intelligent ML models will process historical outcomes and anomalies alongside more recent inputs, creating an automated feedback loop that makes it easier not only to improve current survey data but to predict fraudulent or anomalous behavior in the future. That’s not to say that ML will ever replace human analysis – human intuition has yet to be successfully synthesized by machines – but implementing effective ML frees up time for skilled analysts to focus on more productive tasks.

We speak from experience. By introducing ML into Imperium’s software development cycle, we’ve been able to ensure that survey data quality tools such as Real Answer®, RelevantID® and QualityScoreTM, are continuously evolving and becoming more effective. It’s a carefully controlled process with checks and balances in place. We continuously improve and implement MLOps best practices, including real-time monitoring, automated deployments, versioning models, A/B testing and data drift detection. We also include guard rails to ensure the machines can’t take over (just yet!).

Taking this holistic approach to improving our anti-fraud and data-hygiene solutions not only enables us to more confidently support MR and panel companies in their quest for better data but also means we can more easily tailor models to suit clients’ unique specifications.

Importantly, we know when and how to use machine learning. It’s not a silver bullet; we continue to deploy traditional approaches, together with statistics and data science analysis. ML is a component part of a broad suite of best practices and applications that help ensure data quality.

By using ML models to thwart the bots and fraudsters, we can not only become infinitely better at predicting the behaviors that threaten the veracity of survey data but also improve the scope and quality of the datasets themselves.

Paul Sideleau, is the Senior Director of Technology at Imperium. He focuses on the design, implementation and expansion of existing and new products. His twenty years of experience in scalable application development, together with his impressive software design expertise, bring a high level of technical know-how to Imperium’s board.

Imperium Shortlisted for Prestigious Technology Award

Imperium announced as a finalist in Quirk’s 2021 Marketing Research and Insight Excellence Awards

(NEW YORK – Sep 16, 2021) – Data quality solutions specialist Imperium (www.imperium.com) has been shortlisted for Quirk’s 2021 Technology Impact Award – a category that recognizes outstanding innovations in the marketing research industry.

Imperium is renowned for its powerful data integrity, validation and hygiene tools, including flagship ID-verification product RelevantID® and new multi-point respondent-scoring tool QualityScore™.

The company’s sector-specific solutions are used by some of the world’s leading market research companies and panels to boost data quality. Imperium’s real-time automated tools assess both passive and behavioral data to swiftly, and cost-effectively, weed out fraudsters and dupes, at the same time delivering consistently high-quality respondents.

“We’re delighted that Imperium is a finalist for Quirk’s Technology Impact Award, especially as the judges are considering the real-world application and long-term benefits of shortlisted technologies,” commented Tim McCarthy, General Manager, Imperium.

“Over the years, our focus on innovation has enabled us to create a suite of top-level tools that assure data quality for marketing research clients, providing an informed response to increasingly complex fraud attempts. Deploying a sophisticated machine learning model enables us to continuously adapt to new behavior patterns, allowing us to isolate fraudulent and poor-quality respondents before they have the chance to subvert data.”

“The threat posed to the integrity of online panel surveys by disengaged or fraudulent responses is on the rise. Although meticulous study design is the cornerstone of effective data collection, Imperium’s tools give researchers greater control over the QA process without the need to commit additional time and resources.”

The Marketing Research and Insight Excellence Awards, powered by Quirk’s Media, recognize the researchers, suppliers and products and services that are adding value and impact to marketing research. Finalists are selected by a panel of judges made up of a combination of end-client researchers, supplier partners and Quirk’s editorial staff.

Award winners will be announced at The Marketing Research and Insight Excellence Awards Virtual Ceremony on November 9, 2021.

About Imperium

Founded in 1990, Imperium provides a comprehensive suite of technology services and customized solutions to verify personal information and restrict fraudulent online activities. The world’s most respected market research and e-commerce businesses rely on Imperium’s superior technology and solutions to validate their customers’ identities, verify data accuracy, automate review processes and uncover the intelligence that improves profitability. The company’s flagship product RelevantID® is widely recognized as the market research sector’s de facto data-quality and anti-fraud tool. In recent years, Imperium has invested heavily in machine learning, NLP and neural networks, capitalizing on its domain knowledge to expertly map fraudsters’ behavior. Last year, Imperium prevented 1 billion instances of fraud at source. https://www.imperium.com/

About Quirk’s

Quirk’s Marketing Research Media provides sector-focused resources devoted to professionals responsible for conducting, coordinating and purchasing marketing research products and services. www.quirks.com.

Press contact
connect@theflexc.com

Top 5 Tips for Ensuring Research Data Quality

By Tim McCarthy, Imperium General Manager

High-quality data – data that’s accurate, valid, reliable and relevant – is the ultimate goal for market researchers and brands alike.

While well-constructed engagements return critical insights on timely topics, their poorly planned/designed counterparts can accrue flawed results: invalid, unreliable or irreproducible data that can lead to the wrong conclusions and inform bad decisions further down the line.

Taking the time to explore and improve data quality delivers tangible benefits for everyone, increasing reliability, reducing the costs and aggravation associated with refielding and saving on time and resources all round. Luckily, there are multiple ways of fine-tuning survey processes to minimize problems and optimize data quality.

Quality data begins with selecting the right participants – respondents who are well suited to the aims and objectives of the study. But researchers also need to be cognizant of dozens of individual survey elements that have the potential to impact data quality – and understand how to manage and mitigate these threats while maintaining process integrity.

Technology can help. Greater automation helps select the best participants at the outset. By reducing subjectivity and lowering bias, automation also achieves a more balanced and consistent view of what constitutes “good” and “bad” respondents. A smoothly automated system reduces friction, boosting project speed while scaling back the cost and duration of manual checks.

Here are my top 5 tips for ensuring data quality in research:

1. Plan properly

Great results start with careful planning. Everyone involved should thoroughly understand the aims and objectives of the research before it leaves the starting blocks. Early actions include engaging in a process to identify the ideal target audience for a study, followed by building out an accurate sample plan based on both audience make-up and research objectives. Once this stage is agreed, it’s time to create a thorough screener that only allows appropriate respondents to proceed to the main survey.

2. Set candidates up for success

A successful survey isn’t one that’s predicated on tripping up respondents. It’s in everyone’s interests to build an engaging survey that is relevant to the audience and – crucially – not over-long. Surveys should be mobile friendly and shouldn’t include numerous trap/trick questions that may confuse even the most genuine of participants. Trick questions can backfire if respondents get frustrated and abandon the survey or intentionally answer incorrectly just to “see what happens”. Likewise, if you think the use of “insider jargon” could be problematic, you may want to consider conducting research to identify how your audience speaks about a particular topic, so you can communicate with them in a way that makes sense and is more likely to return relevant responses.

3. Include an appropriate mouse trap

You will need to incorporate a range of question types designed to identify poor respondents, but make sure they’re targeted and fit for purpose. Employ a variety of Open-End Questions, Grid Questions, Low Incidence Questions, Red Herring Questions and Conflicting Answers to weed out the weakest candidates. But don’t overdo it by adding multiple trap questions that are unrelated to the survey; instead use actual survey questions with anomalistic/inappropriate behaviors flagged. Also, be sure to not throw out respondents at the first sign of concern, but rather look for secondary data to confirm your suspicions of poor quality. Flushing out acceptable respondents because they’ve triggered one flag depletes the potential respondent pool and risks biasing data at a time when we need more diverse voices.

4. Ditch the fraudsters and dupes!

Using the right tech at the right time will save you time and money: by utilizing survey data quality solutions like RelevantID ® at the outset, you’ll be able to build up a detailed picture of respondents’ fraud potential. RelevantID maps a participant’s ID against dozens of data points (including geo-location, language and IP address) to weed out obvious fraud and dupes before they enter the survey. This will not only mean that fewer respondents will need to be manually reviewed/removed after survey data is collected but will also prevent large-scale attacks from bots and click farms.

5. Develop consistent, efficient and accurate data quality checks

Ensure that you design an effective and efficient plan for removing bad data that is consistent for all respondents, and run frequently to ensure you don’t have large amounts of removals upon quota completion. Automating in-survey data reviews is one of the best ways to safely streamline the quality process. By utilizing solutions like QualityScore ^TM, you can ensure your data checks are consistent and run in real-time, while reducing the time and resources needed to conduct effective manual reviews. Our data shows that by using QualityScore, clients save about 85 percent of the time they would otherwise spend checking survey results to identify bad respondents.

Automation Paving the Way to Standardize ‘Good Quality’ Data in Surveys

By Tim McCarthy, Imperium General Manager

As brands look to better understand the rapidly evolving requirements and motivations of their customers, the need for quality data has never been more urgent.

But, with demand for respondents at an all-time high, one of the central challenges facing the industry is that there’s no baseline of data quality on which all parties can agree – and with so much left to subjective reviews, it’s easy to see how disagreements on data quality continue to proliferate.

In principle, everyone involved – sample providers, market research agencies and brands alike – wants the same thing: good respondents. In practice, this means removing those who demonstrate some level of bad behavior during a survey without eliminating otherwise strong candidates who’ve provided one or two sub-optimal responses. Spotting bad respondents manually is trickier than it sounds and extremely time consuming. Moreover, without an agreed benchmark for quality, it’s not surprising that standards vary.

Respondents are often removed at the first sign of a poor response because it is very labor intensive to review the data more holistically and determine if a poor response was an isolated incident (potentially due to survey setup/design), or whether additional data contributed to detecting a broader unfavorable response pattern.

At Imperium, we believe that the answer lies in moving to a more automated respondent-scoring process. Reducing subjectivity and tailoring quality checks to the specifics of each survey is key to reaching a more balanced agreement on what constitutes “good” and “bad” respondents. A smoothly automated system also increases project speed while greatly decreasing the cost and duration of manual checks.

We’ve been reviewing data from our new QualityScore™ solution and have revealed some useful insights. We analyzed approximately 200K respondents across 125+ projects and found tangible links between the various types of behavior that produce an overall bad respondent score.

For example, our analysis revealed that those who scored in the bottom quadrant for Open-Ends had a 40 percent likelihood of being in the bottom quadrant for quality overall, while speeding (16 percent correlation) and straight-lining (10 percent correlation) were less reliable indicators of generally poor-quality respondents.

When it comes to banding respondents based on quality, QualityScore metrics have led us to some interesting observations. Our analysis shows that, on average, 8 percent of respondents fall into our poor-quality range, while 65 percent rate highly. This leaves about 20 or 30 percent whose results are less clear cut.

It’s an important group, comprising a number of respondents that may have triggered one or two flags, but have nevertheless scored within an acceptable range. Ditching all of these respondents at first sign of concern will not only waste a significant percentage of the potential respondent pool, which could lead to difficulty in reaching quotas, but also risks biasing data at a time when listening to more diverse voices is critical.

Importantly, QualityScore uses machine learning to compare each respondent’s data against peers from that specific question/survey. For example, if any part of the survey is set up in a way that lends itself to straight-lining or poor Open-End responses, respondents will only be flagged for poor quality if there is other supporting evidence.

Our data shows that clients using the fully automated QualityScore solution save about 85 percent of the time they would otherwise spend checking survey results to identify bad respondents. This not only provides time and cost savings for our clients, but, by reducing the potential for conflict between sample providers and market researchers, we hope it will provide a sound basis for driving data quality higher for the industry as a whole.

New Imperium Solution Automates In-Survey Data Cleaning Process

QualityScore™ tool automatically assesses survey respondent quality to significantly improve data set accuracy and fielding efficiency

(NEW YORK – Feb 25, 2021) – Data quality solutions specialist Imperium (www.imperium.com) today announced the release of QualityScore, a fully automated, platform-agnostic tool that improves survey data quality, while reducing reliance on costly manual checks.

“We know this tool will save clients time and money, by providing valuable insights into how respondents are performing,” said Tim McCarthy, General Manager, Imperium. “We calculate that QualityScore will improve data accuracy by ~10%, resulting in savings of thousands of dollars per project. It will enable researchers and project managers to allocate more time and energy to their customers and to productive data analysis rather than spending hours reviewing data simply to check the quality.”

The new tool will not only ensure cleaner data but will also greatly reduce the frustration caused by having to return to field after closing quotas or ending a project, only to find out you need to remove or replace poor respondents.

“By implementing QualityScore, companies can have greater confidence in their completes,” explained McCarthy. “There’ll no longer be any need to either revisit surveys or to include sub-par respondents just so they can close fielding on time.”

QualityScore uses a sophisticated scoring algorithm to return a per-respondent quality rating that incorporates behavioral information, such as speeding, straightlining and poor OE responses, as well as passive data points such as mouse movement, and browser activity to paint a complete picture of the respondent’s attentiveness and data reliability. Because it’s completely automated – and customizable – it allows MR companies to focus on their business priorities without the distraction of bad data/respondents.

A recent review of online panels by industry specialists Grey Matter Research and Harmon Research reported that at least 90 percent of researchers are not taking adequate steps to ensure online panel quality – a statistic that has serious implications for the quality of MR survey data.

Researchers have long been aware of the threat posed by disengaged or fraudulent responses to the integrity of online panel surveys. Yet, tens of thousands of market researchers rely on their efficacy. While meticulous study design is the cornerstone of effective data collection, tools like QualityScore give researchers much greater control over the QA process without committing additional time and resources.

Targeting Better Quality Data

A recent study by Grey Matter Research and Harmon Research revealed that online panels are incredibly susceptible to respondent quality issues – a finding that threatens to undermine the trust in MR data.

The study’s researchers fielded an online questionnaire with a handful of the largest panel providers that included a range of tests and quality control measures designed to assess the caliber of respondents. Just under half (46 percent) of respondents failed to meet the researchers’ standards for inclusion due to multiple errors or outright fraud; overall, the report concluded that around 90 percent of researchers weren’t taking “sufficient steps to ensure online panel quality”. It’s a statistic which, if even remotely accurate, is deeply worrying.

The battle for better data

Among the quality issues identified were nonsensical responses to OE questions, failure to identify and remove straightliners, as well as a lack of fake or ‘red herring’ questions designed to weed out inauthentic responses. Crucially, the report showed stark differences between respondents whose identities the researchers had verified and those tagged as bogus: the bogus respondents significantly skewed results, rendering data ineffective, at best.

Research companies and panels face a constant battle to guard against the infiltration of fraudulent and disengaged respondents – including bots and click farms – into online surveys but it’s becoming more difficult to detect them as their methodologies become increasingly sophisticated.

Obviously, panel companies are under significant pressure to provide fast, affordable data, a demand that doesn’t always go hand in hand with the need to provide quality. It’s not a problem that can be easily solved after the data is collected – without a lot of manual checking – so, it’s best tackled before results are aggregated.

Some QA can be addressed through study design. Every questionnaire should include measures to determine respondents’ validity. Data reviews should be scheduled during and after the field. Obviously, different types and lengths of study require different solutions – speeding issues are more obvious on longer questionnaires, for example, while straightlining won’t be a problem where there are no grids. Red herring questions can be readily identified by bogus respondents, so aren’t always effective.

Cleaning the respondent pool

The majority of bad or fraudulent respondents can be taken out of the pool before being given the opportunity to start a survey. Imperium’s data integrity solutions are fully automated and are designed to validate only those respondents who pass our stringent checks.

RegGuard^® is a broad-spectrum, automated data-validation solution. It combines our flagship ID-validation API RelevantID^® with Fraudience^®, Real Answer^® and Real Mail^™ tools during registration to perform a customizable 360-degree check on each registrant – importantly, before they’re added to your panel.

It weeds out fraudsters and dupes, at the same time verifying their IP reputation. Items are scored and flagged, making it easy to identify registrants for processing or removal. When used in concert with self-reported data-authentication tool, Verity^®, and CASS^™-certified postal record-checking tool, Address Correction^™, it provides a robust first line of defence for MR companies and panels.

Looking forward

We’re adding a new tool to our collection in the New Year that will give MR companies and panels even more control over data quality using an entirely automated process that will totally transform survey results, saving time, money and resources. Stay tuned!

Prioritizing Quality: Tackling the Covid-19 Spike in Survey Fraud Head-on

By Tim McCarthy, Imperium General Manager

The Covid-19 pandemic has already taken a devastating toll on the health of tens of millions of people across the globe. But it’s also opened up fresh opportunities for fraud from unscrupulous operators. News agency Reuters estimates that losses from coronavirus-related fraud and identity theft in the U.S. alone have reached nearly $100 million since March this year, with complaints about scams at least doubling in most states.

As you might expect, much of this criminal activity is targeted at ordinary American citizens – the kind of shopping scams and phony ‘cures’ that make capital out of misery, while creating massive anxiety for people already suffering from the physical, mental and economic distress caused by the pandemic.

But fraudsters are also causing consternation for businesses that rely on collecting accurate data – like the survey companies and panels we work with every day at Imperium. In a sector where trust is at a premium, any increase in fraudulent activity can undermine core intelligence gathering – and carry the potential for ruining hard-won reputations.

We know from our own experience that survey fraud has been on a steady incline since COVID-19’s first wave. In spring this year we identified a 25 percent increase in fraudulent survey respondents; more recently we have seen this spiking to around double the expected numbers. And, while we’re witnessing an inevitable rise in fraudsters trying to enter surveys, even verified respondents are providing less actionable insights in their open-ends (OEs) due to an increase in the poor OE rate of about 40-50 percent.

Dupe rates are particularly interesting: although they dipped at the outbreak of COVID-19, the numbers have not only bounced back but are now rocketing. Although it appears anomalous, there’s a logical reason for this pattern. As fears grew over the spread of the coronavirus earlier in the year, market research activities slowed correspondingly, reducing MR companies’ reliance on multi-sourcing for their projects and resulting in a greater-than-sufficient supply of respondents, with commensurately lower levels of overlap.

Over the past four or five months, however, with the market rapidly rebounding, the survey duplicate trend is swinging the other direction – as the number of projects-per-month is on a steep incline, it’s being accompanied by a sharp leveling-off in panelist supply. All of which means that MR agencies are now having to rely on multi-sourcing methods to meet their quotas. This supply-demand imbalance naturally fuels higher dupe rates which are likely to persist until the shortfall is rectified.

Event-driven impacts – like those we’re seeing as a result of the current crisis – will often lead to short- and medium-term problems, but there’s no one-size-fits-all solution to improving data accuracy in the long term. Some believe that adopting a more human-centric approach to data collection could help address some of the problems created by the wholesale movement of market research methodologies to online platforms. While there’s little likelihood of a return to the resource-intensive days of in-person interviews, online panels can contribute valuable, high-quality feedback, as long as we take the steps needed to maintain a powerful connection between brands and their consumers.

Survey companies have an important part to play in this dynamic by creating robust systems capable of withstanding the closest scrutiny. We know that surveys are all-too-easily inundated with bots, survey farms, and fraudsters, and that quotas can be easily met with misinformation. Offering brands access to millions of consumers only holds currency if you have the response rates to back your claims up.

We recommend taking an agile approach to research. Adopting more flexible methodologies enables the creation of a highly iterative model that allows you to engage more consistently and dig deeper for more granular information when necessary. This only works if you prioritize the respondent experience – providing a better UX will always result in better outputs.

Whatever approach you favor, implementing robust multi-level security measures is essential. Current conditions are creating a perfect storm for the depletion of trust in MR data: (1) the proliferation of fraudsters attempting to enter surveys, just as (2) MR companies are being forced to multi-source their projects, at a time when (3) real respondents are returning less actionable OE information. It’s more important than ever for research companies to incorporate the necessary tools into their surveys to ensure these conditions are not allowed to negatively impact the overall quality of the insights they are providing.

Tim McCarthy is General Manager at Imperium. He has over 15 years of experience managing market research and data collection services and is an expert in survey programming software, data analysis and data quality. Imperium is the foremost provider of technology services and customized solutions to panel and survey organizations, verifying personal information and restricting fraudulent online activities.