While reviewing the results, remember that it is difficult to draw a conclusion based upon a single metric. It is important to use charts that display 2 or 3 metrics together as well as to view charts representing a single metric in conjunction with other charts. The panel moderator should never focus on "improving" a subject's metric without taking into account all of the subject's metrics. For example, let's assume a panel moderator finds a subject's mean standard deviation score is too high and they work on reducing the score over several training sessions. The subject is successful in significantly reducing the mean standard deviation score, but upon closer examination of all the subject's metrics, it is discovered that the subject's range across all products has been reduced and the ability to discriminate as well. The "improvement" in the mean standard deviation thus reduced the contribution to the panel's decisions.


Panel Analysis

  • Panel Analysis - For every study, RedJade provides a comprehensive summary about the panel as a whole. This summary is designed to be a quick overview of how cohesively the panel performed in determining product similarities and differences and how well the panel together used the language to encompass their perceptions of the products.
    • Improvement Areas - A summary is generated highlighting the areas needed for improvement of the panel overall.
    • Language Completeness - RedJade determines how complete the language is in describing the products and whether some attributes in the language are not being used by the panel to describe the given product array.
    • Attribute Decisions for Review - A unique feature of RedJade is the ability to quickly assess and summarize lists of attributes that require attention from the analyst and/or panel moderator. The summaries list attributes where a product decision was overly influenced by one or two subjects as well as attributes that require further panel discussion and clarification during group language sessions.
  • Subject Analysis - RedJade summarizes in detail the metrics of each panlist and displays them visually in graphical form. The metrics and graphical displays are designed to give the analyst and panel moderator the ability to see specifically how an individual is contributing to the panel results.
    • Action Table - For each of the major subject metrics, Judgments and Actions are suggested. These suggestions are a guide for the panel moderator to help determine where to focus their future activities in relation to panel training.
    • Charts
      • The charts contain multiple metrics and are created to give a full visual picture of each subject's scoring pattern and contribution to the data.
      • Charts with a single metric allow the moderator to focus on a specific subject's scoring behavior.
      • Charts are also created showing each subject's metrics across multiple studies allowing a panel moderator to see the subjects' scoring trend.



Metrics and Interpretations Defined


Subject Standard Deviation

  • Definition – Measure the variation associated with the Subject’s mean scores for each Attribute.
  • Purpose – To provide a measure of the Subject’s ability to repeat his or her scores on multiple servings of the same product.
  • Score Range – Varies depending on the number of repetitions and scale. For example, with three repetitions the maximum score is about 57% of the line scale maximum score. Therefore, on a 100-point scale with three repetitions the Subject standard deviation score could range between 0 to 57.
  • Score Defined – Subject SD, where SSes = Sum of Squares Error for Subject, DFes = Degrees of Freedom Error for Subject.
  • How to Use – Large standard deviation scores would indicate that the Subject is not scoring Product repetitions in the same area of the line scale. If the analyst looks at all of the Subjects’ SD scores for an Attribute and sees an unusually large amount of high scores for the Panel it could be caused by a poorly defined Attribute or Product variation within that Attribute (commonly seen with visual Attributes) rather than the Subjects being inconsistent. If it is strangely small, it may indicate that the Subject is using some cue to remember where to score. The anchors and middle of the scale may be used by a Subject; such nested scores exhibit less variability than one would expect under normal circumstances. Typically, one should become concerned with SD scores exceeding 16.7% of the line scale. If a Subject has a large number of these scores compared to the rest of the Panel, it may imply they are having trouble measuring his or her perceptions.
  • Interpretation
    • Summary
      • Percentage Calculation – The percentage of a Subject’s standard deviation scores that exceed 16.67% of the scale.
    • Judgment
      • Well Performing – If percentage calculation ≤ 33%.
      • Poor Standard Deviation – If percentage calculation > 33%.
    • Action
      • Monitor Performance – If percentage calculation > 33%.
      • Remedial Training – If percentage calculation > 50%.


Crossover

  • Definition – Identify Subjects whose rank order is contrary to the Panel rank order when there is a Significant Product difference.
  • Purpose – Determine whether a crossover may be responsible for loss of potential significance. Identify Subjects that frequently rank Products contrary to the Panel.
  • Score Range – 0 to 100
  • Score Defined – 0 is no crossover and 100 means scores are completely reversed when all Products are significant from all other Products. Typically, we become concerned with a Subject’s crossover score greater than 20.
  • How to Use – Any score greater than 0 indicates crossover for a Subject and the larger the score the greater the crossover. Although trained Subjects may disagree on the magnitude of a Product difference, it is expected that they would agree on the direction (order) of that difference where the difference is statistically significant. Crossover scores are displayed for each Subject on each significant Attribute, and also are summarized in the Panel Management section.
  • Interpretation
    • Summary
      • Percentage Calculation – The percentage of a Subject’s crossover scores that exceed 20. This is only counted on significant attributes since attributes that are not significant do not have a crossover score calculated.
    • Judgment
      • Well Performing – If percentage calculation ≤ 20%.
      • Poor Crossover – If percentage calculation > 20%.
    • Action
      • Monitor Performance – If percentage calculation > 20%.
      • Remedial Training – If percentage calculation > 33%


Scale Range

  • Definition – Measures the range of means for an individual Subject for an attribute.
  • Purpose – Useful for determining whether a Subject’s scoring range distorted Product means and/or significant differences.
  • Score Range – 0 to Scale Maximum
  • Score Defined – The difference between the largest and smallest product mean scores for the subject on each attribute. Additionally the difference between the Subject’s range and the Panel’s range is adjacent to the calculation and in parenthesis.
  • How to Use – Range interaction is usually less of a concern and reflects differences in sensitivity by the Subjects. It is always desired to have values similar to that of the panel. Panel Performance highlights score ranges that exceed twice the panel average or are less than half of the panel average. If interaction is significant, then one needs to determine if it is caused by range, crossover, or both. Range can also be used to determine if the Subject is scoring a narrow or wide area for every Attribute. Subjects that have high crossover scores as well as a high range can cause the most distortion of Product means and/or significant differences. Sometimes highly mixed range scores can reflect Product variation versus the panel being inconsistent.
  • Interpretation
    • Summary
      • Percentage Calculation – The percentage of a Subject’s range scores whose difference is greater than the acceptable threshold setting. If the Subject’s range is larger than the Panel’s range the acceptable threshold setting is 20% of the Maximum scale value. If the Subject’s range score is less than the Panel’s range then the acceptable threshold setting is 15% of the Maximum scale value.
    • Judgment
      • Well Performing – If percentage calculation ≤ 33%.
      • Poor Magnitude Usage – If percentage calculation > 33%.
    • Action
      • Monitor Performance – If percentage calculation > 33%.
      • Remedial Training – If percentage calculation > 50%.


Discrimination

  • Definition – Ability of Subjects to discriminate between Products which are different.
  • Purpose – Identify Subjects that are or are not able to differentiate sensory differences between Products that are different.
  • Score Range – 0% to 100%
  • Score Defined – Calculate the percentage of Attributes in which the Subject had a P value greater than or equal to 0.50.
  • How to Use – Successful discrimination is based on each Subject’s p value of the F statistic calculated from the MS Between/MS Within. P < 0.25 is highly discriminating, p between 0.25 and 0.50 is discriminating; and p > 0.50 is judged as non-discriminating. The lenient criterion of p < 0.50 is due to the small N involved in the F calculation for the Subject’s “Product Effect”. Where a Subject’s p values are large (0.50 to 0.99) compared to the Panel (for significant Attributes) the Subject is viewed as a poor discriminator. Generally, a poor discriminator would be a Subject who does not discriminate on greater than half of the significant Attributes.
  • Interpretation
    • Summary
      • Percentage Calculation – The percent of a Subject’s p for Product that was ≥ 0.50.
    • Judgment
      • Well Performing – If “% p ≥ 0.50” ≤ 20% or “% p ≥ 0.50” ≤ 1.3x panel average.
      • Non-Discriminator – If “% p ≥ 0.50” > 20% and “% p ≥ 0.50” > 1.3x panel average.
    • Action
      • Monitor Performance – If “% p ≥ 0.50” > 20% and “% p ≥ 0.50” > 1.3x panel average.
      • Remedial Training – If “% p ≥ 0.50” > 50% and “% p ≥ 0.50” > 2x panel average.

 

Scale Mean Position

  • Definition – Identifies where on the scale the subject scores.
  • Purpose – Identify Subjects whose mean scores are excessively higher or lower compared to the Panel. Subject perceptions may be different (approaching a near normal curve); however, extreme means and distributions (for example, bi-Modal) need to be noted and accounted for.
  • Score Range – 0 to Scale Maximum
  • Score Defined – For each Attribute it calculates the mean score across all Products for a Subject. Additionally, the difference between the Subject’s Scale Position and the Panel’s Scale Position is adjacent to the calculation and in parenthesis.
  • How to Use – One should not expect Subject’s scores to be at the same location on the scale. The Analysis of Variance will account for these differences. Each Subject’s Product and grand mean for an Attribute are examined for the size of the mean difference compared to the Panel, to determine whether extreme scores have distorted the Product means and/or a significant difference. If so, the distortion should be corrected and noted. It is a judgment call whether the Panel needs more training. A highly variable scale may also reflect Product variation rather than Panel inconsistency.
  • Interpretation
    • Summary
      • Percentage Calculation – The percentage of a Subject’s Scale Position scores whose difference from the panel average is greater than 25% of the Maximum scale value.
    • Judgment
      • Well Performing – If percentage calculation ≤ 33%.
      • Poor Scale Usage – If percentage calculation > 33%.
    • Action
      • Monitor Performance – If percentage calculation > 33%.
      • Remedial Training – If percentage calculation > 50%.

 

Subject Attribute Decision Influence Table

  • Definition – Identify Subjects that influenced the attribute significance decision.
  • Purpose – Identify Subjects who have a strong influence on the Panel’s results.
  • Score Range – 0% to 100%
  • Score Defined – For each Attribute and each Subject the ANOVA is run again with a Subject’s data removed. If the ANOVA results cause the significance decision to change and the change in the calculated p value is greater than 0.13, then the Subject is flagged for the attribute. For each Subject the percentage of flagged attributes is calculated and displayed.
  • How to Use – Subjects with scores greater than 5% should have a review of their past scores to see if they consistently have a strong influence on the Panel. Review all Attributes that were influenced by any individual Subject to determine if the issue is a Subject issue or Panel issue. Based upon the results you can then choose to either monitor or train the Panel as a whole or the individual Subject.
  • Interpretation
    • Summary
      • Percentage Calculation – The percentage of attributes that had a large p value shift causing the Attributes Significance Decision to change if the Subject’s data is excluded from the Attributes ANOVA calculation.
    • Judgment
      • No Attribute Decision Influence – If percentage calculation = 0%.
      • Minimal Attribute Decision Influence – If percentage calculation > 0% and ≤ 5%.
      • Considerable Attribute Decision Influence – If percentage calculation > 5%.




Overall

The objective of RedJade's Descriptive Analysis module is to achieve accurate conclusions about product differences. The purpose of Panel Performance is to determine whether any training or other changes should be made to enhance panel accuracy.


Panel Performance provides a comprehensive summary of four performances aspects of a project: Overall Panel, Individual Subject, Attribute and Product performances. The Overall Panel Performance and Individual Subject Performance summaries can be analyzed for an individual study or across multiple studies as a tracking procedure. Attribute Performance and Product Performance are only analyzed on an individual study basis.


The human perceptual system is complex and interactive, with considerable individual differences based on genetics, physiology of the senses, and past product and test experience as well. Individuals differ from one another in their sensitivity to various stimuli (for example, a pure chemical, a simple mixture, or a complex mixture such as a food) and in assessing the strength of the response. 


Variations in response behavior within and across individuals have to be taken into account in the descriptive process. Variations in responses can me minimized, but not eliminated, through subject selection; however, invariance is atypical and is contrary to our knowledge of the physiology of the perceptual process. Individuals also will differ in their use of language to describe perceptions, for example, using the same word to represent different sensations or the converse. Communicating a perception is complex and context-related and, therefore, the descriptive process must take into account individual differences in sensitivity, in estimating the strength of a perception, and in differences in the use of language as a means of verbalizing perceptions. Products are themselves another source of variability as well.


All of this presents a challenge for Sensory Evaluation. Analysis of Variance is a useful tool for determining whether observed differences (variability) are within the realm of chance or represent a statistically significant occurrence. Additional metrics help guide us about how variations in response patterns of the individual subjects contribute to the panel decision.