Have you ever found yourself doing repetitive tasks on a regular basis? For example, deleting temporary files every week to conserve your disk space, scraping data from a site every week to gather new information or sending recurring emails to the same set of people for â€śreminderâ€ť campaigns, and so on. If so, you might want to set up a cron job scheduler, which will automatically perform the tasks for you at any scheduled time.

Cron comes from â€śchron,â€ť the Greek prefix for â€śtime.â€ť â€¦

If you are familiar with the job-hunting process, you probably already noticed that some companies like using take-home assignments to determine if a candidate is the right fit or not. Since most companies use SQL, itâ€™s common that they want to see if you can solve problems using SQL. However, not all the companies will provide you any dataset to work with. Itâ€™s likely that a company might only provide a table schema, and you might be wondering if your queries can actually run. Therefore, importing a dataset into a database can be very helpful.

In this article, I will cover how to install MySQL Workbench and import data into MySQL Workbench step by step. â€¦

Since Yelpâ€™s early days, reviews are one of the most important factors customers have relied on to determine the quality and authenticity of a business. A local consumer review survey published last year shows that 90% of consumers used the internet to find a local business in the previous year, and 89% of 35â€“54-year-olds trust online reviews as much as personal recommendations. Although Yelpâ€™s listings often have hundreds or thousands of reviews, many of those reviews canâ€™t be trusted.

â€śFake reviews can be devastating to a brand. Simply put, once shoppers suspect a company of having fake reviews, trust is in question. In an era of misinformation and fake news, brand integrity is essential to building consumer trust, which directly translates to profit.â€ť â€¦

The biggest challenge for data scientists is probably something that sounds mundane, but very important for any analyses â€” cleaning dirty data. When you think of dirty data, you are probably thinking about inaccurate or malformed data. But the truth is, missing data is actually the most common occurrence of dirty data. Imagine trying to do a customer segmentation analysis, but 50% of the data have no address on record. It would be hard or impossible to do your analysis since the analysis would be bias in showing no customers in certain areas.

**How much data is missing?**You can run a simple exploratory analysis to look at the frequency of your missing data. If itâ€™s a small percentage, letâ€™s say 5% or less, and the data is missing completely at random, you could consider ignore and delete those cases. But keep in mind that itâ€™s always better to analyze all data if possible, and dropping data can introduce biases. Therefore, itâ€™s always better to check the distribution to see where the missing data are coming from. â€¦

Sampling is used when we try to draw a conclusion without knowing the population. **Population** refers to the complete collection of observations we want to study, and a **sample** is a subset of the target population. Hereâ€™s an example. A Gallup pollÂą, conducted between July 15 to 31 last year, found that 42% of Americans approve of the way Donald Trump is handling his job as president. The results were based on telephone interviews of a random sample of ~4500 calls (assuming one adult per call. ~4500 adults), aged 18 and older, living in the U.S. The poll was conducted during a period of controversy over Trumpâ€™s social media comments. For this survey, the population is ALL the U.S â€¦

I recently spoke at a Webinar about the Mentorship Effect hosted by Correlation One, a data and analytics training program sponsored by leading employers. Although this is not my first time speaking in front of a crowd, I have never been invited to be a panelist before. The idea of being one of the speakers as a new grad sounds totally intimidating but extremely exciting. I have been attending many Data Science Meetup events, where the speakers share their success stories. Who would have thought that one day I would be invited to share my experience as well! â€¦

A probability distribution is a mathematical function that describes the likelihood of obtaining the possible values for an event. A probability distribution may be either discrete or continuous. A discrete distribution is one in which the data can only take on certain values, while a continuous distribution is one in which data can take on any value within a specified range (which may be infinite).

There are a variety of discrete probability distributions. The usage of discrete probability distributions depends on the properties of your data. For example, use the:

- Binomial distribution to calculate probabilities for a process where only one of two possible outcomes may occur on each trial, such as coin tosses. â€¦

As a data analyst or data scientist, we not only need to know probabilities and statistics, machine learning algorithms, coding, but most importantly we need to know how to use these techniques to solve any business problems. Most of the time, you will be given a 30â€“45 min interview with a single data scientist or a hiring manager in which youâ€™ll answer a multifaceted business problem thatâ€™s likely related to the organizationâ€™s daily work.

When I first started to prepare for the case study interview, I didnâ€™t know there are different types of case studies. The fastest way to be an expert in the case study is to know all the frameworks to solve different kinds of case studies. A case study interview can help the interviewers evaluate if a candidate would be a good fit for the position. Sometimes, they might even ask you a question that they actually encountered. â€¦

This blog is a continuation of my previous workÂą, in which I talked about how I gathered product reviews and information through web scraping. I will now explain more about how I built the product recommendation system.

The Goals of this project were to:

- Gather product information and reviews data from BackCountry.com through web scraping using selenium, beautifulsoup (Part I)
- Perform an exploratory data analysis using ScoreFastâ„˘ platform
- Convert text data into vector
- Build a KNN predictive model to find the most similar products
- Run a Sentiment Analysis on product reviews
- Use each reviewâ€™s sentiment score to predict its reviewâ€™sâ€¦

Today, if we think of the most successful and widespread applications of machine learning in business, recommender systems could be one of the first examples people have in mind. Each time you purchase something online, you might see the â€śproducts you might also likeâ€ť section. Recommender systems help users discover items they might like but have not yet found, which could help companies maximize revenue from upselling and cross-selling. As a Data Science Intern at ScoreData, I wanted to take the opportunity to try to build a recommendation model and analyze data on ScoreDataâ€™s ML platform (ScoreFastâ„˘). Since we donâ€™t have customersâ€™ purchase history from any of the E-commerce websites, I decided to build a content-based recommendation system using product descriptions and reviews. â€¦

About