This section describes in detail the checks carried out by Inspirient to assess the quality of datasets. It also defines the metrics used in the process, with a particular emphasis on metrics applicable to survey data.

Data Quality Assessment Checks

The following three checks are available for assessing the quality of survey datasets:

Survey Duration Anomalies

Detecting cases with abnormally short or long duration can help identify survey quality issues for investigation. Abnormal case durations are detected using the generalized ESD (Extreme Studentized Deviate) test on the specified duration variable. The analysis output is a list of the Top (high-value) and Bottom (low-value) case duration outliers.

Input requirements Generated output
  • A survey duration variable
  • Histogram chart with outlier detection
  • Top and Bottom duration outliers list

Straightliner Indicator

The straightliner indicator utilizes a score, between 0 and 1, which defines the degree to which a case is considered a straightliner.

For each interview, the straightliner score is derived by comparing the variation of the observed responses for a given interview against the expected variation which is calculated using from responses of all the interviews in the survey.

In order derive a measure of response variation, batteries of questions with matching Likert-scale domains are grouped and then the relative frequency distribution is calculated for each domain group to produce…

  1. The mean and variance for each domain for every interview, i.e., the observed domain response variance for any given interview, and
  2. The mean and variance for each domain across all interviews, i.e., the expected domain response variance for the entire survey

From the observed and expected variance, a variance distance measure can be easily calculated for each interview. This distance measure is then normalized to derive the survey response quality indicator, also named Case Divergence Score in the output generated by Inspirient’s Automated Analysis.

Input requirements Generated output
  • A case ID variable
  • Multiple response variables with the same point-scale
  • Bar chart of the case divergence classifications
  • Detailed report of the divergence result for each case in a survey as a Microsoft Excel file

Interviewer Effect Indicator

The interviewer effect indicator provides a data-driven estimation of the trustworthiness of each interviewer for any given survey. Currently, the indicator combines two factors: Degree of survey response deviation from the expected response distribution and the degree of interview duration deviation from the expected time to complete the survey. The survey response deviation score is calculated by locating outliers in expected vs. actual frequency distributions of survey response variables for each interviewer – the interviewers with the most deviation across interviews are surfaced to the top and could indicate foul-play.

The interview duration deviation score is calculated as the average survey duration per interviewer – the interviewers with survey durations significantly quicker than average are surfaced to the top and could indicate foul-play.

Input requirements Generated output
  • An interviewer ID variable
  • A survey duration column
  • At least one response variable
  • Interviewer effect quadrant analysis
  • Top 10 list of interviewers with largest interviewer effect score
  • Detailed report of the interviewer effect score calculation for each survey case available as a Microsoft Excel file and JSON file

Scoring Methods

This section provides a deep-dive into the following quality assessment indicators:

Straightliner Score

The straightliner score, ranging between 0 and 1, defines the degree to which a case is considered a straightliner.

The algorithmic steps of the analysis are as follows:

  1. Group survey response variables with equal Likert scale domains. For example, a survey may contain 20 response variables with a 3-point scale, which we can call Domain #1, 10 response columns with a 5-point scale, which we can call Domain #2, and 5 response columns with a 7-point Likert scale, which we can call Domain #3.

  2. For each interview, i(1…n), calculate the frequency distribution for each survey response domain, d(1…m). For example, the domain frequency distribution for i1 might look like the following:

Domain #1 (3-point Likert scale)
Point value 1 2 3 Total
Freq. abs. 3 10 7 20
Freq. rel. 0.15 0.5 0.35 1
Domain #2 (5-point Likert scale)
Point value 1 2 3 4 5 Total
Freq. abs. 2 2 2 2 2 10
Freq. rel. 0.2 0.2 0.2 0.2 0.2 1
Domain #3 (7-point Likert scale)
Point value 1 2 3 4 5 6 7 Total
Freq. abs. 5 0 0 0 0 0 0 5
Freq. rel. 1 0 0 0 0 0 0 1
  1. Now that the domain frequency distributions for each interview have been calculated, the overall domain frequency distributions are calculated for the survey, i.e., all interviews, to derive an expected domain distribution. For example, a survey with 10 interviews might have the following overall domain frequency distributions:
Domain #1 (3-point Likert scale)
Point value 1 2 3 Total
Freq. abs. 60 100 40 200
Freq. rel. 0.3 0.5 0.2 1
Domain #2 (5-point Likert scale)
Point value 1 2 3 4 5 Total
Freq. abs. 15 12 35 28 10 100
Freq. rel. 0.2 0.2 0.2 0.2 0.2 1
Domain #3 (7-point Likert scale)
Point value 1 2 3 4 5 6 7 Total
Freq. abs. 5 7.5 10 12.5 7.5 5 2.5 50
Freq. rel. 0.1 0.15 0.2 0.25 0.15 0.1 0.05 1
  1. For each interview, i(1…n), the distance between the observed frequency distribution and the expected frequency distribution is be derived by calculating normalized residual, , i.e., the difference, between the relative variance of the observed frequency distribution, rvobs, and the relative variance of the expected frequency distribution, rvexp. For each domain, d(1…m), this calculation can be broken down into three sub-steps:

    i. First, calculate the relative variance rv (also known as the index of dispersion) of the observed relative frequencies and the expected relative frequencies using the formula:

    rv = \frac{variance}{mean}
    

    Thus, for i1, the observed relative variance of domain d1 is:

    rv_{obs(i=1, d=1)} = \frac{0.031}{0.333} = 0.093
    

    And, the expected relative variance of domain d1 is:

    rv_{exp(d=1)} = \frac{0.023}{0.333} = 0.07
    

    ii. Then, calculate the residual r to derive a distance measure between the observed and actual relative frequency distributions for interview i and domain d:

    r_{(i, d)} = rv_{obs(i, d)} - rv_{exp(d)}
    

    Thus,

    r_{(i=1, d=1)} = 0.023
    

    iii. Finally, normalize the residual for a given domain, to a range between [-1, +1], where [-1, 0) indicates that the observed relative frequency distribution has lower variation than expected, while, a positive value (0, +1] indicates a greater variation than expected. A score of 0 indicates that the observed relative frequency distribution matches the expected distribution. The following calculation is used to normalize the residual:

    r̃_{(i=1, d=1)} = \begin{cases}
    	\frac{r_{(i, d)}}{rv_{exp(d)}}, \text{if }r_{(i, d)} \lt 0 \\
    	\frac{r_{(i, d)}}{1 - rv_{exp(d)}}, \text{otherwise}
    \end{cases}
    

    Thus,

    r̃_{(i=1, d=1)} = \frac{0.023}{1 - 0.07} = 0.024
    

    i.e, the observed relative frequency distribution of the responses for domain d1 of interview i1 is slightly more variant than expected.

  2. For a given interview i repeat steps 2 to 4 for each domain, d(1…n), to produce a set of normalized residuals, R, i.e., one normalized residual for each domain. A weighted average of R is used to derive a divergence score, sdiv, for a given interview, i.e., survey case. The weights applied are the normalized response counts for each domain, e.g., for the given example above, the domain weightings would be: d1 = 0.57, d2 = 0.29, and d3 = 0.14.

  3. Finally, repeat step 5 for each interview, i(1…n), to produce a set of divergence scores, Sdiv, i.e., one for each interview.

Interviewer Effect Score

The interviewer effect score provides a data-driven estimation of the trustworthiness of each interviewer for any given survey. Currently, the indicator combines two factors: Degree of survey response deviation from the expected response distribution and the degree of interview duration deviation from the expected time to complete the survey.

To calculate the degree of survey response deviation for each interviewer, carry out the following steps:

  1. Calculate the contingency tables for all survey response variables by interviewer ID variable and perform a Chi-square test

  2. For each contingency table, calculate the Chi-square residuals, i.e., the difference between the actual and expected frequency distribution of the interviewer ID and response variables

  3. Locate anomalies by searching for outlier residuals (threshold defined in Haberman 1973)

  4. For each anomaly, sum the absolute residual values by interviewer ID to get total survey response deviation for each interviewer.

  5. The survey response deviation score for each interviewer should be normalized between 0 and 1 so that it can be combined with other scores later on

To calculate the degree of survey duration deviation for each interviewer, carry out the following steps:

  1. For each interview, calculate the average survey duration by interviewer ID

  2. Calculate the global average survey duration and then set all interviewers with an average survey duration greater than the global average to zero to penalize only the interviewers that are faster than average

  3. The survey duration deviation score for each interviewer should be normalized between 0 and 1 so that it can be combined with other scores later on

Finally, the interviewer effect indicator for each interviewer is calculated by taking the equally weighted average of the survey response deviation and survey duration deviation scores.