Baseline is the most basic sharing in the data race,

It not only has ideas, methods and content;

Maybe the gap between you and the Top players is a baseline!

01

Project introduction

If you are a beginner of data competition, the baseline is not only a sharing of game ideas, but also a summary of methods for a class of data problems. What we want to do is collect and collate and share baseline plans for various competitions.

Why baseline, you might ask, and not the winner’s code share? Compared with the winner’s code baseline code is relatively simple, easy to organize and learn; Secondly, baseline code is more practical and concise, suitable for entry learning.

www.kaggle.com/c/nfl-big-d…

Baseline sharing is generally the most popular and liked kernel in every Kaggle contest. Baseline not only lowers the threshold of entry, but also greatly stimulates the enthusiasm of competitors.

Since there is no similar sharing mechanism on domestic competition platforms, we (A Shui and Fishman) plan to make a domestic competition baseline sharing plan in Datawhale. Our goal is to make the most complete domestic competition baseline and competition case sharing project.

Initial build of our Baseline Open Source project is completed:

Github.com/datawhalech…

02

The project content

We sort out the common data competition platforms at home and abroad:

Foreign competition platforms:

    • Kaggle
    • DrivenData
    • Colalab
    • CrowdAI
    • Kelvins
    • Signate
    • analyticsvidhya

Domestic Competition Platform:

    • tianchi
    • Some stone
    • JData
    • DataCastle
    • DataFountain
    • Biendata
    • kordsa
    • AI study club
    • Turing federal
    • AI Studio
    • FlyAI

We also have a comprehensive review of the baseline of domestic competition. In order to help you learn better, we divide into three typical competitions according to the data types of the questions:

  • Structured data competition: tabular questions;
  • Computer Vision (CV) Competition: image type problems;
  • Natural Language Processing (NLP) competition: text-based questions;

Structured Data Competition:

  • White wine quality prediction
  • Muscle activity signals predict gestures
  • Muscle activity signals predict gestures
  • Retention Rate of Baidu Hao Kan APP Users
  • kaggle-two-sigma-connect-rental-listing-inquiries
  • kaggle-allstate-claims-severity


Computer Vision CV Competition: \

  • Chest X-ray pneumonia detection
  • CCF2019- Video copyright detection algorithm
  • kaggle-quickdraw-doodle-recognition
  • TinyMind RMB face value & Crown code Recognition Challenge


Natural Language Processing NLP Competition: \

  • Smart Source & Institute of Computing – Internet Fake News Detection Challenge
  • New entity of Internet finance discovered
  • Calculation model of correlation degree between technical demand and technical achievement project
  • Sentiment analysis of Internet news
  • The third Ali Cloud Security Algorithm Challenge

03

Project collaboration

A great open source project requires collaboration, and we hope that you will participate in the sharing process so that Baseline can help more people learn and grow.

In order to make your contribution more reasonable and orderly, we have initially developed the following cooperation mechanism: \

  1. The codes are arranged according to the form of the competition, indicating the website of the competition, data types and problem-solving questions;
  2. Code indicates the operating environment, machine minimum configuration, such as:
  • Operating system: Linux, 16GB memory,
  • Python environment: PYTHon2/3
  • Pytorch version: 0.4.0
  • The baseline code only provides runnable code and ideas. Please do not provide a result file that can be submitted directly.

  • Code providers are responsible for code copyright and sharing rights;

  • Project address, welcome STAR