Prioritizing and annotating the dimensions of input data is part of the optional guidance step for each analysis, as introduced in Working with Inspirient > Providing Guidance. For both dimension priorities and annotations, suggested values are provided by the system based on its current understanding of the dataset. In most cases, users only need to review and possibly tweak these suggestions. If an analysis is started via the I’m feeling lucky button, all suggestions are used directly without reconfirming them with the user.

Prioritization of Input Dimensions

Prioritization affects the sorting order of results, so that results that are derived from higher-priority dimensions are more prominently displayed among results and more likely to be included in stories.

The exact effects of low / high priorities are as follows, with the objective of ensuring that overall comprehensiveness across all analytical methods degrades gracefully for very large datasets:

  • Dimensions set to the lowest priority setting are ignored during the analysis
  • Dimensions set to the highest priority setting are analysed preferentially, i.e., they are guaranteed to be evaluated by all applicable analytical methods
  • Other low-priority dimensions may be omitted from certain analytical methods if otherwise computational requirements would exceed allocated resources.

Client-specific and hardware-specific options may be set by system administrators to fine-tune system behavior.

Analysis Guidance - Dimension priority and context dialogs
Analysis Guidance – Dimension priority and context dialogs

Contextualization of Input Dimensions

Annotations allow users to establish the analytical context of input dimensions, for example, to specify whether calculating the sum over a column of numeric values is sensible (e.g., for inventory quantities) or not (e.g., for time-series measurements).

There are four kinds of data annotations:

  • Filter – Annotations that perform various filters on the input data
  • Transformation – Annotations used to carry out a transformation on the input values
  • Semantic – Annotations to explicitly communicate column meaning
  • Analysis – Annotations that affect the analysis

The full list of supported annotations and their effects are listed in the following table.

Table of supported data annotations
Annotation Type Description Details
FILTER_ON Filter Filter input table with a given criteria (regular expression accepted)
FILTER_ON_DOMINANT_DOMAIN Filter Filter table on the most frequent items (accounting for 80% of occurrences)
FILTER_ON_DOMINANT_DOMAIN_BY_VALUE Filter Filter table on items with largest sum of a given value column (accounting for 80% of accumulated value)
FILTER_ON_TOP_3_BY_VALUE Filter Filter table on the 3 items with the largest sum of a given value column
FILTER_ON_TOP_10_BY_VALUE Filter Filter table on the 10 items with the largest sum of a given value column
ABC_CLASSIFICATION Transformation Classify column into n categories using ABC analysis
ANONYMIZE Transformation Anonymize items in column with a generated ID value Annotates a dimension to be anonymized by replacing it with a new column that contains a cryptographically strong hash value for each original value. A look-up table to map hashed values to original values is made available separately to the user who owns the analysis.
DEEP_DRILL_DOWN Transformation Split input table on each item in column
DEFAULT_VALUE Transformation Transform missing values, i.e., absent or null, to a default value Annotates a dimension to use a given default value in case no value is present, e.g., {DEFAULT_VALUE(no data available)}
DEFINE_AS_MISSING Transformation Define a value to be treated as missing during analysis Annotates a dimension to define a value to be treated as missing, e.g., {DEFINE_AS_MISSING(-1:not applicable)}
DRILL_DOWN Transformation Split input table on the most frequent items (accounting for 80% of occurrences)
IGNORE_VALUE Transformation Ignore specified value(s), i.e., treat as absent or null Annotates a dimension to ignore certain values by excluding matching rows during analysis, e.g., {IGNORE_VALUE(John Doe)}
JOINABLE_ID_VALUES Transformation Enable joining tables on specified ID values Annotates a dimension to specify that it can be used to join tables with a corresponding JOINABLE_ID_VALUES annotation
USE_AS_IS Transformation Disable any automated transformations during analysis for this column
CATEGORICAL Semantic Values representing categorical items Annotates a dimension as containing categorical values without a natural order (also see ORDINAL)
DEMOGRAPHIC_VARIABLE Semantic Socio-demographic information Annotates a dimension to be treated as a socio-demographic variable, e.g., when analyzing survey data
HAS_SUBTOTALS Semantic Numeric column contains subtotals Annotates a dimension as including subtotals
ID Semantic Values representing ID values Annotates a dimension to be treated as ID values
IS Semantic Values representing selected meaning
LESS_IS_BETTER Semantic A lower numeric value is better Annotates a dimension to contain numeric values for which lower values are more desirable in the context of the current analysis
MAXIMIZABLE Semantic Numeric values where the maximum should be considered for all numeric operations
MINIMIZABLE Semantic Numeric values where the minimum should be considered for all numeric operations
MORE_IS_BETTER Semantic A higher numeric value is better Annotates a dimension to contain numeric values for which greater values are more desirable in the context of the current analysis
NATURAL_LANGUAGE_TEXT Semantic Text values that should be considered for natural language processing Annotates a dimension to be treated as natural language text and applying corresponding analytical methods
NOT_CATEGORICAL Semantic Values that should be considered for all numeric operations Annotates a dimension as not containing categorical values
NOT_SUMMABLE Semantic Numeric values that cannot be summed Annotates a dimension as not containing summable values
ORDINAL Semantic Numeric values representing ordered categorical items Annotates a dimension as containing numeric categorical values with a natural order (also see CATEGORICAL)
SUMMABLE Semantic Numeric values that can be summed Annotates a dimension as containing summable values
SURVEY_DURATION Semantic Survey duration indicator Annotates a dimension to be treated as a survey duration indicator
SURVEY_INTERVIEWER_ID Semantic Survey interviewer identifier Annotates a dimension to be treated as a survey interviewer ID
SURVEY_META Semantic Survey meta-information Annotates a dimension to be treated as survey meta-information
SURVEY_RESPONSE Semantic Survey response values Annotates a dimension to be treated as a survey response
AGGREGATION_WEIGHT Analysis Weight aggregations by values in this dimension, typically used for analysis of survey data to reduce bias Annotates a dimension to be treated as a weighting variable, e.g., when analyzing survey data
DEPENDENT_VARIABLE Analysis An input variable of interest that should be explained Annotates a dimension to be treated as a dependent variable in the current analysis. Also known as a target or label in machine learning.
INDEPENDENT_VARIABLE Analysis A control variable that is used to explain effects on a dependent variable Annotates a dimension to be treated as an independent variable in the current analysis. Also known as a predictor or feature in machine learning.
OVERRIDE_RESTRICTIONS Analysis Disable analysis restrictions for performance optimisation Annotates a dimension to be analyzed without any restrictions that would usually be in place to ensure acceptable runtime when analyzing very large tables. Use with caution!

Advanced users may also prefer to embed these annotations directly in their data, by appending them to column labels enclosed in curly brackets, e.g., {SUMMABLE}.

Best Practices

  • Prioritize sparingly, but with confidence – In most cases, it is not necessary to fine-tune the priorities of every dimension of a dataset. It’s more time-efficient to quickly adjust the priorities of the most important dimensions, and then later use tags to filter out less important results.
  • Annotate selectively – Annotations help the system to correctly handle the dimensions of a dataset in all corner cases. This means that in most cases the correct analytical methods will be applied, even without annotations. If pressed on time, some users may even do a quick initial run with the I’m feeling lucky button, check key results for issues, and add only annotations required to address these issues.
  • Re-use prior priorities and annotations – Priorities and annotations of all past analyses are scanned to make the best possible suggestion for the current dataset. This includes datasets from other users (with accounts on the same Inspirient service instance). Suggested priorities and annotations may thus reflect what your co-workers may find appropriate for your data at hand.