1. Generalities about Kaggle

What is Kaggle?

Kaggle is an AirBnB for Data Scientists - this is where they spend their nights and weekends. It's a crowd-sourced platform to attract, nurture, train, and challenge data scientists from all around the world to solve data science, machine learning, and predictive analytics problems. It has over 536,000 active members from 194 countries and it receives close to 150,000 submissions per month. Started from Melbourne, Australia Kaggle moved to Silicon Valley in 2011, raised some 11 million dollars from the likes of Hal Varian (Chief Economist at Google), Max Levchin (Paypal), Index, and Khosla Ventures, and then ultimately been acquired by the Google in March of 2017. Kaggle is the number one stop for data science enthusiasts all around the world who compete for prizes and boost their Kaggle rankings. There are only 94 Kaggle Grandmasters in the world to this date.

Do you know that most data scientists are only theorists and rarely get a chance to practice before being employed in the real world?

Kaggle solves this problem by giving data science enthusiasts a platform to interact and compete in solving real-life problems. The experience you get on Kaggle is invaluable in preparing you to understand what goes into finding feasible solutions for big data.

Kaggle enables data scientists and other developers to engage in running machine learning contests, write and share code, and host datasets. The types of data science problems posted on Kaggle can be anything from attempting to predict cancer occurrence by examining patient records to analyzing sentiment to evoke by movie reviews and how this affects audience reaction.

Different sources post projects on this trailblazing platform. While some are just for educational purposes and fun brain exercises, others are genuine issues that companies are trying to solve. Kaggle makes the environment competitive by awarding prizes and rankings for winners and participants. The prizes are not only monetary but can also include attractive rewards such as jobs or free products from the company hosting the competition.

Monetary prices are exciting to most Kagglers. For instance, Home Depot was offering a winning prize of a whopping $40,000 in search of an algorithm to improve search results on homedepot.com. For most data science enthusiasts, this innovative website is not only a monetary resource, but it is also an indispensable learning tool that helps improve the experience, gain knowledge, elevate and enhance the skills, and learn from mistakes by resubmitting the code. It is the perfect platform to practice consistently.

The Kaggle community is growing fast. There are currently over one million Kaggle members (Kagglers). This data community has submitted above four million learning models to different competitions. Kaggle users have shared over one thousand datasets, more than 170,000 forum posts, and over 250 kernels. According to the founder, this incredibly fast growth can be attributed to high-quality content, data, and code-shared by Kagglers.

Most Kaggle users are committed and active hence the 4,000 forum posts per month and more than 3,500 competition submissions daily. This platform is the place to be for data scientists and machine learning engineers worldwide.

 How Kaggle Works

Why using Kaggle?

The host of the competition is in charge of preparing the data and preparing a detailed description of the problem at hand. To make it more convenient for hosts, Kaggle offers an additional consulting service that can help prepare data and describe the problem in the best possible format.

The participants who compete for projects submit their models with a variety of techniques. All the work is shared on the platform through detailed Kaggle scripts with the intention of inspiring new ideas to achieve better benchmarks. In most Kaggle competitions, submissions are scored immediately and clearly summarised publicly on the live leaderboard.

Competitors are not given a single chance at solving a problem. Before the deadline expires, the competitors are allowed to make revisions on their submissions as they deem fit. This fuels competitors' motivations to consistently innovate, be creative, and polish their skills to produce better, elegant, and effective solutions. Allowing for revisions elevates the level of accuracy and precision as well.

When the deadline for a competition expires, the host pays the prize money to the winner. Hosts have the sole ownership and royalty-free license to use the winning entry any way they want with all intellectual property.

How the Winner is Selected

The host will screen participants depending on where they are placed on the leaderboard. Their final scripts and also the content of the scripts submitted. Most hosts take the prerogative to reach out to strong contenders and arrange interviews.

Do Kaggle Projects have any Real impact 

One of its biggest and most recognized projects is one by Heritage Health which offered a remarkable cash price of $3 million. Competitions hosted on Kaggle have had far-reaching impacts such as enhancing and enabling state-of-the-art HIV/AIDS research and improving traffic forecasting.

  • Several informative academic papers have been written and published on the basis of the findings generated through Kaggle's contributions. Essentially Kaggle has given companies the opportunity to seek solutions from the best data scientist in the world and to have external pairs of eyes to look at the problems they are trying to solve.
  • Interesting and challenging projects where contributors can learn and practice Kaggle competitions involve solving challenging and interesting problems.                                      Companies post projects to numerous contributors. It especially a great place for beginners who are just trying to break into the data science field.                                                  Aside from the competitions that are open to the general public, Kaggle also has private competitions which are only open to top-rated participants (Kaggle Masters).
  • Insightful discussions with industry leaders and learned experts.                                                  Apart from the projects, Kaggle also consists of live discussions between numerous people on the platform. Such forums are very interesting, stimulating, and informative.                              Through these discussions, you can either seek advice from others or offer advice to people who are dealing with issues you understand.
  • Kaggle offers its audience a chance to get into the biggest data science community in the world.  This platform is trusted by some of the largest data science companies of the world such as Walmart, Facebook and Winton Capital. On Kaggle, data scientists get exposure, and a chance to work on problems faced by big companies in real-time.                                        While it is not a guarantee, there is always the chance that the company will be impressed enough to recruit.

Kaggle characteristics and advantages

As a summarization, we can highlight the following elements: 

  1. Find and publish data sets
  2. Explore and build models in a web-based data-science environment
  3. Work with other data scientists and machine learning engineers
  4. Enter competitions to solve data science challenges."

2. Kaggle in this project. 

The competition

In the particular case, the competition was directly launched on Kaggle, so it was obvious that added to the previously mentioned reasons to use it, it was going to be easier to directly code on Kaggle, as it has all the resources at hand to complete it as well as the community working on it directly affordable on Kaggle.


This data science platform is the brain Child of Anthony Goldbloom, a brilliant 28-year-old econometrics expert. 

His objective was to bring large and open data to the masses through crowdsourcing. 

According to Goldbloom, Kaggle has united data scientists and businesses in a meaningful way. His concept did not receive sufficient backing in Australia initially, and then he decided to relocate to Silicon Valley in the United States. 

In a recent tech conference, Goldbloom expressed his surprise at how much talent is available and was inaccessible to companies before the inception of Kaggle. 

