Before any analytical methods can be automatically applied to a submitted dataset, the data must be ingested, consolidated and enriched. This section covers the details of these three steps.

Data Ingestion

Inspirient’s automated analytics solution can ingest tabular business data from a range of data sources including Microsoft Excel files (XLS or XLSX), text files with delimited values (CSV), and data from Relational Database Management Systems (RDMS).

The data import component is responsible for ingesting the source data into Inspirient’s internal data structure which provides a standardized data representation upon which best practice statistics and advanced analytical methods, general business heuristics and use case specific processing can be applied.

To achieve this, the component performs the following tasks:

  • Format detection of the input data source, which includes Microsoft Excel, SPSS, CSV and RDMS
  • Automatic table detection for semi-structured data sources, for example Excel files containing multiple tables with non-tabular meta-information
  • Classification of input dimensions by structural properties, e.g., automatic detection of yes/no values, numeric values including inference of decimal and thousands separator, text values, and date values from different locales

Data Consolidation

Following data ingestion, Inspirient’s Automated Analytics Engine carries out a data quality assessment (reference list of checks published here) and then performs common data wrangling tasks:

  • Data model standardization – Data can be modelled in different ways, e.g., relational, matrix, time-series, this consolidates tables to a standard format for further automated processing, including table joining capability
  • Table filtering and splitting – Tables can be filtered through logical expressions or automatically split by every possible category for optional deep-drill down analysis
  • Data cleansing – Common data cleansing tasks are automatically performed on the input data, including, standardizing common N/A and NULL values, date-time standardization, and un-scaling of numeric values (e.g. from ‘1K’ to ‘1000’)
  • Standardization of values – For example, standardization of country and city names, and conversion of post-codes, cities and longitude/latitude values to geo-coordinates, or dates of varying formats transformed to a single standard format
  • Anonymization – Values in columns can be automatically anonymized to unique hash values if specified and can be converted back to the original values with secure mapping table if needed

Data Enrichment

Following data consolidation, Inspirient’s Automated Analytics Engine enriches the input data to increase the relevance of the analytical output and surface previously unknown insights. The following enrichments are performed on applicable input dimensions:

  • Date and time – Derived date and time values at various resolutions are generated from source timestamp columns, e.g., hour of day, day of week, quarter of year, etc.
  • Demographic – Derived demographic values such as age segments categorization from date of birth columns
  • Geographic – Distance between geo-coordinates and geographical and political indicators
  • Natural language – Key phrase detection and text analysis indicators such as word count for columns containing natural language text
  • Business KPIs – Common business KPIs are calculated if the required columns are detected for example if a table contains both ‘Unit cost’ and ‘Quantity’ then ‘Revenue’ is automatically calculated
  • External data – Enriching with external data sources such as country indicators from The World Bank. Global indicators available at data.worldbank.org
  • Custom rules – User-defined custom rules can be configured as Excel-style functions or JavaScript which are applied to the input table as an enrichment