Why Customer Satisfaction Surveys and Text Analytics Belong Together
Wednesday, May 25, 2011| by Michaela Mora | ![]() |
| by Michaela Mora | ![]() |

Customer satisfaction surveys are pretty common these days. Every time I make a purchase on Amazon.com or Bestbuy.com I get a customer satisfaction survey. Every time I buy at Kohls or Walmart they hand me a survey invitation printed on the receipt promising a chance to win something if I fill the survey. Nowadays many retailers have some form of enterprise feedback system aimed at assessing customer experience during specific transactions. That’s terrific.
Common questions in satisfaction surveys include:
They are often analyzed as a composite (3M model) as they seem to capture different dimensions of the satisfaction. Asking only about overall satisfaction doesn’t provide a complete picture of how the customer feels and is likely to act. Customers may not be completely satisfied with a company, but they may keep buying its products out of habit (inertia) or because there is no another alternative. It is often the case that we do have other alternatives, but keep patronizing a brand because although we may not be happy with certain aspects, still like others.
Likelihood to recommend, which sometimes is used alone to predict satisfaction or purchase behavior, doesn’t tell the whole store either. Recommending a product or a brand doesn’t always translate into purchase behavior. I may recommend a product for someone else because I find it fitting for that person, but not necessarily a good fit for me. On the other hand, a consumer would recommend a product for a particular use even if she is not entirely satisfied with it.
Finally, likelihood to buy again is not a very reliable metric. There are too many factors at play when it is time to make a purchase. Our financial situation, competition or own irrational consumer-self can cancel any intentions we may have claimed to have when answering this question in a survey.
“OVERALL” SATISFACTION METRICS DON’T TELL THE WHOLE TRUE
Over the years I have been buying laptops from a particular brand. Overall they have worked pretty well. In the latest model I bought, I really don’t like what they have done to the buttons on the touchpad. They feel too hard and the finger-guided pointer has a life of its own. The fact is that I hate this feature, so if you ask me about my satisfaction on a scale from 1 (dissatisfied) to 5 (satisfied), I give my laptop a 3.
On the other hand, if you ask, how likely it is that I would buy another laptop from this brand again or recommend it to others, I would say a 4 (somewhat likely) or 5 (very likely). Why? I’m satisfied with other attributes more important to me. I was able to customize it to all my requirements at a reasonable price, and it does the job. I mostly use a mouse, so the hard click buttons only bother me when I travel with my laptop. However, since you asked about my “overall” satisfaction and nudged me to take into account everything about my laptop, then I feel pushed to the middle of the scale trying to balance out the good and the bad.
I feel the same way when I get an overall satisfaction question after a call to the customer service of any brand in which the actual experience during the call was good, but my problem was not solved for whatever reason. The call gets a 5, but the brand may get a 2. If my problem doesn’t get solved, no amount of courtesy from a customer service rep will attenuate my feeling of frustration with the unresolved issue.
What I have seen over the years doing customer satisfaction research is that asking these questions in customer satisfaction surveys without a context produces answers that are too sterile and disjointed to be meaningful. This is why they often fail to predict how customers will behave.
We need to dig deeper and ask why a customer gives a particular answer. This is especially true if we use rating questions because scale points mean different things to different people. See the example below about a product review. If we average the number of stars it gets, the overall rating is very high as most reviewers give it 4 stars and one gives it a 5 star. However if you actually read what the reviewers meant when rating the product, you see that 4 stars are associated with both positive and negative reviews.

EVALUATING SPECIFIC ATTRIBUTES
To get around this problem, customer satisfaction surveys usually include questions asking customers to evaluate the performance of the brand, product or the organization on different aspects (e.g. product performance, customer service, product availability, selection, etc.).
Companies, which go beyond reporting frequencies, often try to run key driver analyses to determine what influences overall satisfaction or a composite of these three questions (3M model). They still have a hard time linking the customer satisfaction survey results to actual sales. Why?
The answer is aggregation. When we aggregate the individual answers to these questions, we lose information about what drives the behavior of individual customers and the prediction error at the individual level increases. In the aggregate, we lose the link between the answers to customer satisfaction questions and individual purchase behavior. We need a model in which the individualized context of an answer is taken into consideration to weight up or down a particular answer.
WHAT CAN WE DO?
To be fair, I have seen many customer satisfaction surveys that attempt to capture context data using open-ended questions. Realistically, it is almost impossible to code the thousands of responses to these questions many companies get in transactional customer satisfaction surveys sent out.
There is no tool yet that can capture of all the nuances of the human language, particularly when there are typos, half-finished sentences, grammar errors or the sarcasm we often see in answers to open-ended questions. For now, we have to read customers answers, code them and adjust the results based on the expressed sentiment (which is very, very time consuming and costly).
Nonetheless, I’m hopeful. My hope is in the text analytics field. With the explosion of information thanks to social networking, there are many companies racing to develop text analytic tools that would make text analysis and coding easier, faster, and more efficient. They just need to hurry up!

Attitudinal questions are common in surveys. They are often asked using an agree-disagree rating question format. The challenge is always to create statements that capture important elements of the attitudes we are trying to measure. Ideally, if the budget allows it, we should do qualitative research to gather insights into such elements and how people think and talk about them.
Even with qualitative data available, writing good attitudinal statements is not an easy task. Here are some guidelines to facilitate the process:

Writing short surveys is an uphill battle with many clients. Whenever the word is out that a survey will be conducted, everybody close to the subject, being the product team, senior management or operations, wants to add questions. The thought is, “since we are doing a survey let’s get as much as possible out of it.”
Unfortunately, the only thing you get out with very long surveys is bad quality data. Why?
NON-RESPONSE & ABANDONMENT
As the survey length increases, so does the non-response bias and abandonment rate. Simply said, respondents won’t stay too long answering questions. Many won’t even start if they know the survey length (It is a best practice to announce the length of the survey in the invitation).

For those who think they can get away with it by not announcing how long the survey will be, think again. Respondents can always figure out the length from the progress bar and will drop in the middle of the survey if they perceive it as too long (even if no progress bar is shown). High abandonment and non-response rates affect sample representativeness negatively.
In an experiment conducted by Galesic and Bosnjac (2003) to prove this point, 3,472 respondents were divided in 3 groups based on an online survey with different lengths (10, 20 and 30 minutes). The chart above shows how the number of respondent who started and completed the survey declined as the survey length increased.
DATA QUALITY
Respondents, who are willing to endure a long survey, are at high risk of experiencing high burden and becoming “satisficers.”
Satisfacing occurs when the respondents select the answer options without giving them too much thought. They go for the most effortless mental activity trying to satisfy the question requirement, rather than work on finding the optimal answers that best represent their opinion. Respondents may start selecting the first choice in every question, straight-lining in grid questions (selecting the same across all options) or simply selecting random choices without much consideration. This type of behavior renders the data worthless.
The same experiment by Galesic and Bosnjac was set to test the impact of survey length on data quality, which was measured with a variety of indicators including response times, item response rate, length of answers to open-ended questions, and variability of answers to questions in grids.
Of all the indicators, item response rate (defined as the percentage completed from all questions presented in a block) was the only one that seemed unaffected by survey length, however it is unclear if the survey was programmed to force respondents to answer before going forward in the survey. For the other indicators, the results strongly suggest that survey length affects quality.

There are powerful reasons that push clients and force research vendors to launch long surveys. Budget, time constraints, and different agendas from internal groups are some of them. However, when surveys start getting too long, clients and research vendors should take a minute to think about the implications. After all if we get bad data, we have wasted the little time and money we started with.