Learn the 1.5(IQR) Statistical Outlier Rule to Analyze Datasets
Learn how to apply the statistical theory of the box & whisker plot to find trends and patterns in datasets: small, medium or big. You can watch a video I created to explain how the box & whisker plot and the 1.5(IQR) Rule can be used to analyze data using an Excel AddIn that I developed: Learn the 1.5(IQR) Statistical Outlier Rule Participants will receive several Microsoft Excel-based automated spreadsheet templates that uses outlier detection in different ways to analyze datasets (time series or point in time). You can find output from some of these templates on my We Protect T.O.R.O.N.T.O blog. 1.5(IQR) Statistical Outlier Rule Analysis: Easier to Apply and Interpret on Datasets: Use the box & whisker plot outlier analysis technique instead of other data analysis techniques currently used in software. And, this is true whether you are attempting to segment bricks & mortar customers, online visitors or other behaviour — at one point in time or over time. Creating segments using outlier detection with the theory of the box & whisker plot is easier to do and interpret versus K-Means Clustering or Hierarchical Clustering algorithms: on small datasets or Big Data. The following is a summary of the advantages of using box & whisker plot statistical outlier detection analysis detection to segment data vs. other techniques used in data analysis software: - segments can be applied with SQL or another language such as Python - results can be directly, dynamically applied to operational databases - not sensitive to extremely high or low values, relies on Median to define segments - there is no need for extensive training of users on software or statistical analysis - it produces a guaranteed # of segments (between 0 and n (# columns in analysis)+1) every time - no need to guess the optimal number of segments like other techniques - results are much easier to interpret than output from other techniques - find segments much quicker, since no multiple runs are required - every result produces statistically significant segments