Same customer. Two calls. Two different answers. Why consistency in the contact centre is one of insurance's most underrated CX challenges and what it takes to fix it.
Why Customer Experience in Insurance Contact Centres Is Harder Than It Looks
Insurance contact centres operate under a unique set of pressures that most other industries don't face. Agents aren't just handling queries, they're navigating policy wording, regulatory disclosure requirements, vulnerable customer protocols, live claims, and emotionally charged conversations, often back-to-back across an entire shift.
The result is that even well-run operations with strong training programmes, clear call guides, and experienced QA teams will produce inconsistent outcomes. Not through negligence, but because the volume of interactions, combined with the complexity of each one, makes consistency genuinely difficult to achieve and even harder to measure.
For customers, that inconsistency is often the most damaging part of the experience. Research consistently shows that a single negative interaction with an insurer, a conflicting answer on a policy, an agent who failed to acknowledge distress during a claim, a renewal call where the customer felt rushed, has a disproportionate effect on retention compared to equivalent positive experiences. In an industry where the average customer touches the contact centre relatively rarely, those moments carry enormous weight.
The result is that even well-run operations with strong training programmes, clear call guides, and experienced QA teams will produce inconsistent outcomes. Not through negligence, but because the volume of interactions, combined with the complexity of each one, makes consistency genuinely difficult to achieve and even harder to measure.
For customers, that inconsistency is often the most damaging part of the experience. Research consistently shows that a single negative interaction with an insurer, a conflicting answer on a policy, an agent who failed to acknowledge distress during a claim, a renewal call where the customer felt rushed, has a disproportionate effect on retention compared to equivalent positive experiences. In an industry where the average customer touches the contact centre relatively rarely, those moments carry enormous weight.
The QA Sampling Problem:
Why 2–5% Isn't Enough
Traditional contact centre quality assurance is built on sampling. A team of QA assessors, sometimes sitting within the contact centre, sometimes in a central quality function, listens to a selection of recorded calls, scores them against a framework, and uses those scores to drive coaching, performance management, and reporting upward into the business.
In most insurance contact centres, that sample sits somewhere between 2% and 5% of total call volume. Some operations go higher, but resource constraints typically cap meaningful manual review well below 10%.
The statistical problem is straightforward. A contact centre handling 8,000 calls a week that reviews 400 of them is forming its entire view of quality, what agents are doing well, what risks are present, whether TCF (Treating Customers Fairly) obligations are being met, on the basis of 5% of interactions. The other 95% are logged, stored, and largely invisible.
In most insurance contact centres, that sample sits somewhere between 2% and 5% of total call volume. Some operations go higher, but resource constraints typically cap meaningful manual review well below 10%.
The statistical problem is straightforward. A contact centre handling 8,000 calls a week that reviews 400 of them is forming its entire view of quality, what agents are doing well, what risks are present, whether TCF (Treating Customers Fairly) obligations are being met, on the basis of 5% of interactions. The other 95% are logged, stored, and largely invisible.
This creates several specific vulnerabilities:
Survivorship bias in QA scoring.
When assessors cherry-pick calls, or when call selection is random but the sample is small, the results are not reliably representative. An agent can perform poorly on the majority of their calls and appear compliant based on the handful reviewed.
Compliance gaps that compound over time.
A systematic issue, agents consistently omitting a required disclosure, for example, or handling a specific objection in a way that doesn't align with policy, can exist across thousands of calls before appearing in the QA data. By the time it's visible, it may already represent a material regulatory risk.
AHT pressure masking quality problems.
In high-volume contact centres, Average Handle Time (AHT) targets can inadvertently drive agents to shorten or skip parts of the call guide. This rarely shows in sampled QA unless the assessor is specifically looking for it. At scale, the pattern is far more visible.
Inconsistent calibration between assessors.
Even with standardised scorecards and calibration sessions, different QA assessors will score the same call differently. Subjectivity is inherent in human review. Where scores drive agent performance ratings and remuneration, this inconsistency creates operational and people management challenges.
When assessors cherry-pick calls, or when call selection is random but the sample is small, the results are not reliably representative. An agent can perform poorly on the majority of their calls and appear compliant based on the handful reviewed.
Compliance gaps that compound over time.
A systematic issue, agents consistently omitting a required disclosure, for example, or handling a specific objection in a way that doesn't align with policy, can exist across thousands of calls before appearing in the QA data. By the time it's visible, it may already represent a material regulatory risk.
AHT pressure masking quality problems.
In high-volume contact centres, Average Handle Time (AHT) targets can inadvertently drive agents to shorten or skip parts of the call guide. This rarely shows in sampled QA unless the assessor is specifically looking for it. At scale, the pattern is far more visible.
Inconsistent calibration between assessors.
Even with standardised scorecards and calibration sessions, different QA assessors will score the same call differently. Subjectivity is inherent in human review. Where scores drive agent performance ratings and remuneration, this inconsistency creates operational and people management challenges.
Key Metrics That Matter and What They're Actually Measuring
Insurance contact centre leaders track a range of metrics, but it's worth being clear about what each one does and doesn't tell you about customer experience.
NPS (Net Promoter Score) captures likelihood to recommend, typically gathered through post-call surveys. It's useful as a directional signal but suffers from response bias, customers who had strongly positive or negative experiences are most likely to respond and it's disconnected from the specific behaviours that drove the score.
CSAT (Customer Satisfaction Score) is similar: a useful indicator of sentiment but a lagging metric that tells you little about what happened in the conversation or why the customer felt as they did.
FCR (First Contact Resolution) is one of the strongest predictors of both customer satisfaction and operational cost. Every callback represents a failure, either the agent didn't fully resolve the query, gave incorrect information that prompted a follow-up, or failed to proactively address a related need. In insurance, where many queries involve complex policy conditions, FCR rates below 75–80% are common and represent significant hidden cost.
AHT (Average Handle Time) is frequently misused as a proxy for efficiency. Shorter calls are not inherently better. A call that resolves a query correctly in seven minutes is better than a call that closes in four minutes but generates a callback and a complaint. AHT should always be contextualised against FCR and CSAT before any performance conclusions are drawn.
QA Score is the metric most directly under the contact centre's control, yet as outlined above, it's reliability is limited when based on small samples and subjective human assessment.
Churn Propensity Indicators are increasingly important in insurance, where the cost of acquiring a new customer significantly exceeds the cost of retaining an existing one. Conversations where customers express pricing objections, comparison shopping intent, or general dissatisfaction are early signals of attrition risk, but they're only actionable if you can identify them at scale.
NPS (Net Promoter Score) captures likelihood to recommend, typically gathered through post-call surveys. It's useful as a directional signal but suffers from response bias, customers who had strongly positive or negative experiences are most likely to respond and it's disconnected from the specific behaviours that drove the score.
CSAT (Customer Satisfaction Score) is similar: a useful indicator of sentiment but a lagging metric that tells you little about what happened in the conversation or why the customer felt as they did.
FCR (First Contact Resolution) is one of the strongest predictors of both customer satisfaction and operational cost. Every callback represents a failure, either the agent didn't fully resolve the query, gave incorrect information that prompted a follow-up, or failed to proactively address a related need. In insurance, where many queries involve complex policy conditions, FCR rates below 75–80% are common and represent significant hidden cost.
AHT (Average Handle Time) is frequently misused as a proxy for efficiency. Shorter calls are not inherently better. A call that resolves a query correctly in seven minutes is better than a call that closes in four minutes but generates a callback and a complaint. AHT should always be contextualised against FCR and CSAT before any performance conclusions are drawn.
QA Score is the metric most directly under the contact centre's control, yet as outlined above, it's reliability is limited when based on small samples and subjective human assessment.
Churn Propensity Indicators are increasingly important in insurance, where the cost of acquiring a new customer significantly exceeds the cost of retaining an existing one. Conversations where customers express pricing objections, comparison shopping intent, or general dissatisfaction are early signals of attrition risk, but they're only actionable if you can identify them at scale.
Consumer Duty and the Obligation to Evidence Good Outcomes
The FCA's Consumer Duty, which came into full effect in 2023 and extended to closed book products in 2024, has materially raised the bar for what insurers are expected to evidence about their customer interactions.
The Duty requires firms to demonstrate, not simply assert, that they are consistently delivering good outcomes across the four outcome areas: products and services, price and value, consumer understanding, and consumer support. The contact centre sits squarely within consumer support, and the FCA has been explicit that firms should be able to show how they are monitoring and improving customer interactions.
A sample-based QA process that reviews 3% of calls is difficult to hold up as evidence of consistent good outcomes. It demonstrates that some interactions were reviewed and found acceptable. It says very little about the other 97%.
For compliance and risk teams, this creates a meaningful question about how the organisation can demonstrate that required disclosures are being made reliably, that vulnerable customers are being identified and handled appropriately, and that the information customers receive is accurate and consistent. These are not questions that can be answered confidently from a small sample.
Vulnerable customer handling is particularly significant in this context. The FCA's guidance on the fair treatment of vulnerable customers (FG21/1) requires firms to have processes in place to identify vulnerability signals and adapt their approach accordingly.
In practice, identifying those signals reliably requires either very strong agent training and consistent application, or tooling that can flag indicators across entire call populations, tone, specific language patterns, hesitation, emotional distress, that human reviewers would miss at volume.
The Duty requires firms to demonstrate, not simply assert, that they are consistently delivering good outcomes across the four outcome areas: products and services, price and value, consumer understanding, and consumer support. The contact centre sits squarely within consumer support, and the FCA has been explicit that firms should be able to show how they are monitoring and improving customer interactions.
A sample-based QA process that reviews 3% of calls is difficult to hold up as evidence of consistent good outcomes. It demonstrates that some interactions were reviewed and found acceptable. It says very little about the other 97%.
For compliance and risk teams, this creates a meaningful question about how the organisation can demonstrate that required disclosures are being made reliably, that vulnerable customers are being identified and handled appropriately, and that the information customers receive is accurate and consistent. These are not questions that can be answered confidently from a small sample.
Vulnerable customer handling is particularly significant in this context. The FCA's guidance on the fair treatment of vulnerable customers (FG21/1) requires firms to have processes in place to identify vulnerability signals and adapt their approach accordingly.
In practice, identifying those signals reliably requires either very strong agent training and consistent application, or tooling that can flag indicators across entire call populations, tone, specific language patterns, hesitation, emotional distress, that human reviewers would miss at volume.
What Speech Analytics and AI-Driven QA Actually Do
The technology landscape for contact centre quality assurance has changed significantly. What was once the preserve of large composite insurers with significant technology budgets is now accessible to mid-market and specialist insurers.
At the core is speech analytics, the automated transcription and analysis of call recordings. Modern speech-to-text accuracy is high enough that transcripts can be reliably analysed for content, including specific phrases, disclosure language, agent script adherence, and sentiment indicators.
Layered on top of transcription, AI-driven QA platforms can score calls automatically against a defined framework, the same framework a human assessor would use, across 100% of recorded interactions. This is sometimes referred to as automated quality monitoring (AQM) or AI-assisted QA, and it represents a fundamental shift from sampling to full-population analysis.
At the core is speech analytics, the automated transcription and analysis of call recordings. Modern speech-to-text accuracy is high enough that transcripts can be reliably analysed for content, including specific phrases, disclosure language, agent script adherence, and sentiment indicators.
Layered on top of transcription, AI-driven QA platforms can score calls automatically against a defined framework, the same framework a human assessor would use, across 100% of recorded interactions. This is sometimes referred to as automated quality monitoring (AQM) or AI-assisted QA, and it represents a fundamental shift from sampling to full-population analysis.
Practical applications in insurance include:
Disclosure verification.
Automatically confirming that required regulatory language was present on every relevant call, key facts statements, right of cancellation, how complaints are handled, fair value assessments, rather than relying on agent compliance and sampled review.
Script adherence monitoring.
Identifying where agents are deviating from call guides, whether on objection handling, upsell conversations, or FNOL (First Notification of Loss) processes. Deviations aren't always bad, experienced agents often improve on scripts, but knowing where they occur enables more targeted management.
Sentiment analysis.
Tracking emotional tone across the call, including escalation points where customer sentiment shifts negatively. Calls with significant negative sentiment trajectories that don't result in a logged complaint are often the interactions most worth reviewing, these are the silent churners.
Dead air and talk-over detection.
Excessive silence periods can indicate agent uncertainty, system issues, or customer confusion. High talk-over rates can signal tense or unresolved conversations. Both are signals that manual listening would catch but become visible at scale through automated analysis.
Call categorisation and reason coding.
AI-driven categorisation of call intent, claims FNOL, renewal, mid-term adjustment (MTA), complaints, general enquiry, provides far more accurate MI than agent-coded wrap data, which is subject to selection error and time pressure at the point of After Call Work (ACW).
The Agent Experience Angle
It's worth addressing something that QA conversations often avoid: how agents experience quality processes matters.
Traditional QA, particularly where samples are small and feedback is infrequent, is often experienced by agents as arbitrary. An agent who handles 50 calls a day and receives feedback on two of them, selected at random, weeks after the fact has limited ability to connect that feedback to current behaviour. When scoring varies between assessors and calibration is inconsistent, agents reasonably question whether the process is fair.
Full-call analytics changes this dynamic. When feedback is drawn from a genuinely representative sample of an agent's interactions, it becomes harder to dispute and more clearly useful for development. Agents who are performing well, but whose good calls were never in the review sample, become visible. Coaching becomes more specific and credible.
There is also a wellbeing dimension worth noting. Agents in insurance contact centres frequently handle difficult conversations, distressed claimants, customers facing financial hardship, complaints that escalate mid-call. Identifying these interactions automatically and ensuring agents receive appropriate support, debrief, and recognition is better for people as well as for quality outcomes.
Traditional QA, particularly where samples are small and feedback is infrequent, is often experienced by agents as arbitrary. An agent who handles 50 calls a day and receives feedback on two of them, selected at random, weeks after the fact has limited ability to connect that feedback to current behaviour. When scoring varies between assessors and calibration is inconsistent, agents reasonably question whether the process is fair.
Full-call analytics changes this dynamic. When feedback is drawn from a genuinely representative sample of an agent's interactions, it becomes harder to dispute and more clearly useful for development. Agents who are performing well, but whose good calls were never in the review sample, become visible. Coaching becomes more specific and credible.
There is also a wellbeing dimension worth noting. Agents in insurance contact centres frequently handle difficult conversations, distressed claimants, customers facing financial hardship, complaints that escalate mid-call. Identifying these interactions automatically and ensuring agents receive appropriate support, debrief, and recognition is better for people as well as for quality outcomes.
Building a Contact Centre QA Framework That Scales
For insurers looking to improve the rigour and coverage of their quality assurance without simply hiring more assessors, the practical approach typically involves a combination of:
Defining a clear QA framework that maps to both internal standards and regulatory obligations. This means being explicit about what good looks like across accuracy, empathy, compliance, resolution, and vulnerability handling, not just using a scorecard that's been in place for years without review against current Consumer Duty requirements.
Increasing calibration rigour among human assessors so that where manual review does occur, scores are consistent and defensible. Calibration sessions should be regular, use a standardised set of benchmark calls, and resolve scoring differences explicitly rather than averaging them away.
Identifying the high-value call types for human review, complaints, escalations, vulnerable customer interactions, high-value renewals, FNOL calls, while using automated tooling to monitor compliance and baseline quality across the broader population.
Closing the loop between QA data and coaching. QA findings that don't result in visible behaviour change are wasted effort. The most effective operations have clear processes for translating QA insights into targeted 1-2-1 coaching, team-level trend discussions, and upstream product or process improvements where call data reveals systemic issues.
Using MI from call data proactively. If your QA data is only feeding performance management conversations, you're leaving value on the table. Call analytics can inform product design, renewal strategy, claims process improvement, and complaints root cause analysis, all of which have direct commercial value beyond the contact centre itself.
Defining a clear QA framework that maps to both internal standards and regulatory obligations. This means being explicit about what good looks like across accuracy, empathy, compliance, resolution, and vulnerability handling, not just using a scorecard that's been in place for years without review against current Consumer Duty requirements.
Increasing calibration rigour among human assessors so that where manual review does occur, scores are consistent and defensible. Calibration sessions should be regular, use a standardised set of benchmark calls, and resolve scoring differences explicitly rather than averaging them away.
Identifying the high-value call types for human review, complaints, escalations, vulnerable customer interactions, high-value renewals, FNOL calls, while using automated tooling to monitor compliance and baseline quality across the broader population.
Closing the loop between QA data and coaching. QA findings that don't result in visible behaviour change are wasted effort. The most effective operations have clear processes for translating QA insights into targeted 1-2-1 coaching, team-level trend discussions, and upstream product or process improvements where call data reveals systemic issues.
Using MI from call data proactively. If your QA data is only feeding performance management conversations, you're leaving value on the table. Call analytics can inform product design, renewal strategy, claims process improvement, and complaints root cause analysis, all of which have direct commercial value beyond the contact centre itself.
The Competitive Dimension
It's easy to frame contact centre quality purely as a cost and compliance issue. But there is a competitive angle that insurers in personal and commercial lines markets should take seriously.
Customer retention in insurance is heavily influenced by perceived value and experience at the point of need. A customer who calls to report a claim and has an outstanding experience, empathetic, efficient, clearly communicated next steps, is statistically less likely to shop at renewal, more likely to add additional products, and more likely to recommend. The inverse is equally true.
In a market where PCW (price comparison website) aggregators have compressed differentiation on premium, the contact centre interaction is one of the few remaining places where an insurer can genuinely stand out. Customers can't see your reserving methodology or your reinsurance programme. They can feel whether the person they spoke to understood their situation, gave them accurate information, and left them confident about what happens next.
Building the capability to understand and consistently improve those interactions across every call, not just the ones that happened to be reviewed is as much a competitive investment as a compliance one.
Customer retention in insurance is heavily influenced by perceived value and experience at the point of need. A customer who calls to report a claim and has an outstanding experience, empathetic, efficient, clearly communicated next steps, is statistically less likely to shop at renewal, more likely to add additional products, and more likely to recommend. The inverse is equally true.
In a market where PCW (price comparison website) aggregators have compressed differentiation on premium, the contact centre interaction is one of the few remaining places where an insurer can genuinely stand out. Customers can't see your reserving methodology or your reinsurance programme. They can feel whether the person they spoke to understood their situation, gave them accurate information, and left them confident about what happens next.
Building the capability to understand and consistently improve those interactions across every call, not just the ones that happened to be reviewed is as much a competitive investment as a compliance one.
What Good Looks Like: A Checklist for Insurance Contact Centre Leaders
A contact centre quality framework that genuinely supports customer experience in insurance should be able to answer the following questions with confidence:
- What proportion of calls in the last 30 days contained all required regulatory disclosures?
- Which agents have the highest and lowest FCR rates, and what characterises the conversations driving those outcomes?
- Where in the call journey are customers most likely to express confusion, frustration, or intent to switch?
- How are vulnerability signals being identified and handled, and is that consistent across the team?
- What are the most common drivers of repeat contact within 7 days, and what's the cost of those callbacks?
- Is AHT correlated with quality outcomes, or are shorter calls masking unresolved queries?
- How does QA score vary by call type, renewals vs claims vs MTAs and what does that tell us about where to focus coaching?
If those questions can be answered from your current QA and MI setup, your operation is ahead of most. If they can't or if the answers rely on extrapolating from a small sample understanding what a more complete picture would make possible is a worthwhile exercise.
Insights360 provides AI-driven conversation analytics for insurance contact centres, helping teams move beyond sample-based QA to full-call intelligence. To explore how the platform supports Consumer Duty compliance, agent performance, and customer experience improvement, visit www.conversant.technology/insights360/