Comparison of Supervised and Unsupervised Fraud Detection

Image for post
Image for post
Photo by Erik Mclean on Unsplash

Introduction

Since Yelp’s early days, reviews are one of the most important factors customers have relied on to determine the quality and authenticity of a business. A local consumer review survey published last year shows that 90% of consumers used the internet to find a local business in the previous year, and 89% of 35–54-year-olds trust online reviews as much as personal recommendations. Although Yelp’s listings often have hundreds or thousands of reviews, many of those reviews can’t be trusted.

“Fake reviews can be devastating to a brand. Simply put, once shoppers suspect a company of having fake reviews, trust is in question. In an era of misinformation and fake news, brand integrity is essential to building consumer trust, which directly translates to profit.” …


Best Resources to Upscale Your Skills and Portfolio

Image for post
Image for post
Photo by Sincerely Media on Unsplash

There are thousands of free courses online. Thanks to the internet, we can now learn almost everything online. However, it’s possible that once you got the course certificates, you may have forgotten what you have learned if you just finished watching all the videos. The best way to learn machine learning or anything is by working on a project. Besides, it’s always great to add another personal project to your GitHub repo. 🙌

If you’re new to data science, you probably don’t know where to start. Don’t worry! Everyone starts somewhere. I still remember myself working on the famous Titanic dataset several years ago. …


Top SQL Interview Questions You Should Know in 2021

Image for post
Image for post
Image Taken by Author

It’s almost the end of the year! I hope you all are studying hard and doing well. The beginning of the year (January and February) is usually the best time of the year to look for a job. So we need to be better prepared now and hit the ground running in 2021!

Tech companies usually have a large amount of data stored in their relational databases, so they want to see whether the candidates can extract and manipulate data using complex SQL queries before jumping into any modeling. SQL questions can be very tricky, so don’t underestimate SQL. Besides, no matter how good you are with SQL, you might still find it challenging to write the queries fast enough, especially under pressure. We all know how intimidating interviews can be, it’s ok if you didn’t fail the last interview. …


A Step-by-Step Guide to Accessing Spotify Data and Creating a
Radar Chart

Image for post
Image for post
Image by Author

It’s almost the end of 2020! If you have been using Spotify for years, you probably know at the end of each year, Spotify will provide premium users personalized insights, such as your favorite songs and artists, and how much time you spent on the services, etc. As a data scientist, I wanted to take a look at all the songs’ audio features from the Discovery Weekly playlist and see what music features I enjoy the most based on my listening history on Spotify.

Setting Up Spotify’s Web API


GETTING STARTED

A Step-by-Step Guide to Setting Up a Cron Job

Image for post
Image for post
Photo by Possessed Photography on Unsplash

Introduction

Have you ever found yourself doing repetitive tasks on a regular basis? For example, deleting temporary files every week to conserve your disk space, scraping data from a site every week to gather new information or sending recurring emails to the same set of people for “reminder” campaigns, and so on. If so, you might want to set up a cron job scheduler, which will automatically perform the tasks for you at any scheduled time.

Cron comes from “chron,” the Greek prefix for “time.” …


GETTING STARTED

With No Errors

Image for post
Image for post
Photo by Noah Boyer on Unsplash

If you are familiar with the job-hunting process, you probably already noticed that some companies like using take-home assignments to determine if a candidate is the right fit or not. Since most companies use SQL, it’s common that they want to see if you can solve problems using SQL. However, not all the companies will provide you any dataset to work with. It’s likely that a company might only provide a table schema, and you might be wondering if your queries can actually run. Therefore, importing a dataset into a database can be very helpful.

In this article, I will cover how to install MySQL Workbench and import data into MySQL Workbench step by step. …


GETTING STARTED

A gentle introduction to imputation of missing values

Missing Data
Missing Data
Photo by Markus Winkler on Unsplash

The biggest challenge for data scientists is probably something that sounds mundane, but very important for any analyses — cleaning dirty data. When you think of dirty data, you are probably thinking about inaccurate or malformed data. But the truth is, missing data is actually the most common occurrence of dirty data. Imagine trying to do a customer segmentation analysis, but 50% of the data have no address on record. It would be hard or impossible to do your analysis since the analysis would be bias in showing no customers in certain areas.

Explore Missing Data

  • How much data is missing? You can run a simple exploratory analysis to look at the frequency of your missing data. If it’s a small percentage, let’s say 5% or less, and the data is missing completely at random, you could consider ignore and delete those cases. But keep in mind that it’s always better to analyze all data if possible, and dropping data can introduce biases. Therefore, it’s always better to check the distribution to see where the missing data are coming from. …


PYTHON FOR PROBABILITY AND STATISTICS

Four Types of Sampling Methods all Data Scientists Must Know

Image for post
Image for post
Photo by Churrasqueira Martins on Kindpng

Why do we need Sampling?

Sampling is used when we try to draw a conclusion without knowing the population. Population refers to the complete collection of observations we want to study, and a sample is a subset of the target population. Here’s an example. A Gallup poll¹, conducted between July 15 to 31 last year, found that 42% of Americans approve of the way Donald Trump is handling his job as president. The results were based on telephone interviews of a random sample of ~4500 calls (assuming one adult per call. ~4500 adults), aged 18 and older, living in the U.S. The poll was conducted during a period of controversy over Trump’s social media comments. For this survey, the population is ALL the U.S …


What I’ve Learned as a First Time Webinar Speaker

Image for post
Image for post
Photo by Nycholas Benaia on Unsplash

I recently spoke at a Webinar about the Mentorship Effect hosted by Correlation One, a data and analytics training program sponsored by leading employers. Although this is not my first time speaking in front of a crowd, I have never been invited to be a panelist before. The idea of being one of the speakers as a new grad sounds totally intimidating but extremely exciting. I have been attending many Data Science Meetup events, where the speakers share their success stories. Who would have thought that one day I would be invited to share my experience as well! …


Probability and Statistics

The Most Common Discrete Probability Distributions Explained with Examples

Image for post
Image for post
Image by Author

Probability Distributions

A probability distribution is a mathematical function that describes the likelihood of obtaining the possible values for an event. A probability distribution may be either discrete or continuous. A discrete distribution is one in which the data can only take on certain values, while a continuous distribution is one in which data can take on any value within a specified range (which may be infinite).

There are a variety of discrete probability distributions. The usage of discrete probability distributions depends on the properties of your data. For example, use the:

  • Binomial distribution to calculate probabilities for a process where only one of two possible outcomes may occur on each trial, such as coin tosses. …

About

👩🏻‍💻 Kessie Zhang

I’m passionate about the possibilities that Data Science can enable. I write about what I’ve learned. Never stop learning because life never stops teaching.❤️

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store