How to handle outliers so they don't skew your data

Data is crucial to understanding your business, but outliers can complicate your results. Here's what you need to know to avoid common pitfalls.

Get started growing your business with a B12 website.

No credit card required
How to handle outliers so they don't skew your data

Resources

Key quotes:

  • "A data is called as skewed when curve appears distorted or skewed either to the left or to the right, in a statistical distribution."
  • "In a normal distribution, the graph appears symmetry meaning that there are about as many data values on the left side of the median as on the right side."
  • "So, the distribution which is right skewed have a long tail that extends to the right or positive side of the x axis, same as the below plot."
  • "Below is one real life example You can clearly see that it is a right skewed data with its tail in the +ve side of the distribution."
  • "So in skewed data, the tail region may act as an outlier for the statistical model and we know that outliers adversely affect the model’s performance especially regression-based models."
  • "A log transformation can help to fit a very skewed distribution into a Gaussian one."
Read more →

Transcript: In this video I'm going to describe and discuss the method that is used by SPSS for the purposes of detecting an outlier and I'll point out in the first instance that there are two ways that SPSS goes about that based on something called the interquartile range rule and the multiplier it uses in that context I'm also going to discuss it in the context of whether it's really appropriate and if I had to choose which one I would use so the what I've done is I've created three variables one of which does not actually include an outlier but two of which might include an outlier now I'm going to use the utility and SPSS to help me identify whether there are outliers in these three variables to do that going to analyze descriptive statistics and explore and I got a variable in there already because I did this analysis so I'm just going to look at the first variable in the first instance you don't have to click on any buttons SPSS has as a default the result that will be produced in this case if you press ok so I'm going to do that and... See more →

Transcript: When you perform some specific statistical analysis and there are some criteria that should be met in order to get a really reliable result so for example in the independent sample t-test or away in one way another you should be sure that your data is normally distributed or that there are no outliers in your data or extreme outliers in this video I'm going to show you how to detect and deal with outliers so the procedure for you to detect outliers is as follows you just have to click and analyze hover your mouse over on descriptive statistics and click on Explorer now again let me just change this to display variable names so it's shorter and we want to check out Liars in these two scales competence in the argument so we select both and add it to our dependent list in this case we are not going to add a factor list a factor list in this case might be sex or gender so but we are not going to use it now additionally and we don't want a stick for this example so we are just going to display plots when I click here notice... See more →

Transcript: In this video we want to identify outliersin a set of data. If you are not sure what an outliers is, hereis what they are. An outlier is an extremely high or extremelylow value in the data set. Now in addition to just being something extremelyhigh or something low, you want to make sure that it satisfies the following criteria. If you want to find an outlier it must begreater than Q3 + 1.5(Interquartile Range) or it must be lower than Q1 - 1.5(InterquartileRange) This is making sure that it really is an extremelyhigh value or extremely low value. You can see though that you need to computea few different things like Q3 and Q1 and the Interquartile Range if we are going toproperly identify one of these outliers. So lets look at some data, and see how thisworks. In my data I have a chart of how many phonecalls were received on any given day. So I have 10 phone calls on the first day,12 phone calls on the second day, and so on and so forth. If I'm going to compute things like Q1 andQ3 and the Interquartile Range, its probably a good idea to take all of... See more →

Transcript: Now let's look at some continuous variables using histograms and plots for a basic histogram you call function hist with a variable of interest in this case interest rate you can use the arguments main and xlab for nicer labels the frequencies for the variable of interest are shown on the y-axis here you can see that all loans had an interest rate over 5% and very few loans had an interest rate higher than 20% let's have a look at the histogram of annual income we notice that we get a strange results here with seemingly just one big bar stirring the histogram in hist underscore income and using dollar sign breaks we get information on the location of the histogram breaks in order to get a clear idea on the data structure you can change the number of breaks using the breaks argument such that you get a more intuitive plot this can be done by choosing a number that seems more appropriate or use a rule of thumb such as a square root of the number of survey shion's in the data set this results in a much longer vector vector of breaks however the result still... See more →

Key quotes:

  • "The variance is computed by finding the difference between every data point and the mean, squaring them, summing them up and then taking the average of those numbers."
  • "Let’s look at an example that illustrates the difference between variance and standard deviation: Imagine a data set that contains centimeter values between 1 and 15, which results in a mean of 8."
  • "Squaring the difference between each data point and the mean and averaging the squares renders a variance of 18.67 (squared centimeters), while the standard deviation is 4.3 centimeters."
  • "In a normal distribution, approximately 34% of the data points are lying between the mean and one standard deviation above or below the mean."
  • "Since a normal distribution is symmetrical, 68% of the data points fall between one standard deviation above and one standard deviation below the mean."
  • "Note that a perfect normal distribution would have a skewness of zero because the mean equals the median."
Read more →

Key quotes:

  • "It measures the lack of symmetry in data distribution.It differentiates extreme values in one versus the other tail."
  • "There are two types of Skewness: Positive and Negative Positive Skewness means when the tail on the right side of the distribution is longer or fatter."
  • "If the peak of the distributed data was right of the average value, that would mean a negative skew."
  • "High kurtosis in a data set is an indicator that data has heavy tails or outliers."
  • "Investigate!Low kurtosis in a data set is an indicator that data has light tails or lack of outliers."
  • "It means that the extreme values of the distribution are similar to that of a normal distribution characteristic."
Read more →

Key quotes:

  • "Different features in the data set may have values in different ranges."
  • "In such a situation, applying statistical measures across this data set may not give desired result."
  • "Data transformation predominantly deals with normalizing also known as scaling data , handling skewness and aggregation of attributes."
  • "Min-Max normalization: It is simple way of scaling values in a column."
  • "Here is the formula Converting it into R can be pretty simple as follows Let’s apply this normalization technique to year attribute of our data set."
  • "let us calculate the normalized values manually as well as using scale() function."
Read more →

Key quotes:

  • "We will also look into the outlier detection and treatment techniques while seeing their impact on different types of machine learning models."
  • "Many machine learning models, like linear & logistic regression, are easily impacted by the outliers in the training data."
  • "This can become an issue if that outlier is an error of some type, or if we want our model to generalize well and not care for extreme values."
  • "These extreme values need not necessarily impact the model performance or accuracy, but when they do they are called “Influential” points."
  • "With multiple predictors, extreme values may be particularly high or low for one or more predictors (univariate analysis — analysis of one variable at a time) or may be “unusual” combinations of predictor values (multivariate analysis) In the following figure, all the points on the right-hand side of the orange line are leverage points."
  • "It represents the number of standard deviations an observation is away from the mean: Here, we normally define outliers as points whose modulus of z-score is greater than a threshold value."
Read more →

What customers & experts say

Debra
Debra Customer / A New View of Food

“B12 has positively impacted my business with its wide variety of integrations, such as the bookings integration! I’ve been getting more emails from potential customers who are able to book online and schedule consultations with me easily. I’m not tech savvy, so if you’re like me, having a company like B12 to work with is incredibly helpful.”

Read more →
Chuck
Chuck Customer / Crowdfluencer

“My B12 website looks great, I am truly satisfied with the outcome. I love the responsiveness of our new website, plus it's incredibly light and fast. The team did an incredible job with the styles and the designs really popped for me.”

Read more →
Karen
Karen Customer / The Lin Life

“B12 uses artificial intelligence to create websites quickly. The human team is friendly and accessible for those new to website development. I have a polished website in a shorter amount of time at an excellent value.”

David
David Design expert at B12

“Designing a website in just a couple of hours is actually possible. Thanks to B12's simple-to-use website editor, I'm able to create professional designs that would normally cost thousands of dollars.”

Andrew
Andrew Customer / Heroes Homestead

“I was always intimidated by my perceived difficulty of building a website but my experience with B12 was pretty smooth. I especially appreciate the quick response time to any requests I’ve sent the B12 team, and I also like the flexibility and ability to design the website ourselves. Since building my website, I’ve received a significant increase in the number of page views and email requests from the page.”

Read more →
Carlos
Carlos Customer / FitFuel

“If you want to sit at the head of the table, use B12. It is clear they have talented designers who are genuinely motivated to see you reach your goals.”

Read more →
Roberto
Roberto Customer / Emerald Gardens

“B12 was able to give us the flexibility we needed to play around with website elements while guiding us in our first attempt at building a website.”

Read more →
Natalie
Natalie Customer / Natalie Elisha Gold

“The entire \[B12] process felt really quick and efficient. I was able to work with them and highlight certain areas that I wanted on my website. Everything that I didn’t like was changed in a matter of seconds and minutes. It was vastly different compared to the previous team that I had hired to do my website.”

Read more →
Sheila
Sheila Customer / Dr. Sheila Hughes Weight Loss & Wellness

“There's absolutely no comparison between our old website and the B12 one. Our B12 website gives us the call to actions we need and better communication tools with our patients. Since our B12 launch, we started seeing more online visitors scheduling a consultation through our website.”

Read more →
Shane
Shane Customer / Spike On The Water

“The B12 website editor is so simple to use. I added a bunch of images, reworked text, linked my products, adjusted form links, and even got an animation flying around my footer. The experience with B12 has been amazing. Building a quality website is now fast and affordable.”

Jaiden
Jaiden Design expert at B12

“B12’s AI draft dramatically reduces my time traditionally spent working on content structure. As a B12 web expert, I can focus more time on creating a beautiful website and UX that fits the customer’s content and goals.”

Annabel
Annabel Customer / Evolv Ventures

“My B12 experience has been great! What I appreciate most is that I can edit the site personally whenever I want to, but I can also ask for support when there are changes that I'm not able to make myself. B12 is great blend of a DIY service and a full-service website agency.”

Read more →
Daniel
Daniel Sales at B12

“I'm proud to work for B12 and deliver the product and service we do. Too often, individuals do not have the time or industry knowledge to build, manage, or maintain a website. In today's world, your website can be your best friend. We strive to make sure your best friend is working as best it can!”

Megan
Megan Customer / Body Wise

“I love using the B12 website editor, which provides the ease of updating my website myself. With an intuitive editing platform, I didn’t need any guidance on making website updates. I made changes to my photos, copy, and text color — and I can do it whenever I want.”

Dan
Dan Customer / Dan Garcia Photography

“B12 bookings is truly one of the best features of my web page. My customers can always get in touch with me with this awesome tool.”

Ready to grow your business online?

Join the tens of thousands of professionals who’ve found success with a B12 website, SEO, blogging, and more!

Get a website in minutes
This website uses cookies to ensure you get the best browsing experience.  Learn more
I agree