Title: Building machine Learning Projects

Course 3 looks relatively light, with only 2 weeks of content and no assignments. But the content is valuable and can help you make decisions and better understand your project. Once you’ve got your project up and running, there’s the crucial optimization problem to solve, and you’ll need to put in a lot of effort to reach your goal. In this optimization process, there are many strategies to choose, so how to use engineering means to analyze and choose relatively reliable strategies to solve the problem, rather than wasting a lot of time in the wrong direction is very meaningful. Here are some methods and strategies to help you make your decision. Here are the details.

1. A single result goal can greatly speed up your optimization. 2. Projects generally need to have both performance indicators and restrictive indicators, and optimize performance indicators as far as possible on the premise of meeting restrictive indicators. 3. Divide the training/development/test set. The test and development sets here do not need to meet the usual 7/2/1 (or close to) ratio, as long as the number of test and development sets is sufficient for evaluation (1000-tens of thousands). Due to the need of data volume, it is not strictly required that the training set and the development test set have the same distribution, but the development set and the test set should ensure the same distribution. 4. Deep learning is supervised learning that uses best human performance (defined as Bayesian error) to help us evaluate optimizable Spaces. Error = Bayes error + alWAYsbias 5. In the case of different distribution of training sets, a part of them is selected as the training and development set, and the deviation is identified as distribution error when the test results are compared with those of the development/test set. This deviation is caused by distribution and can be corrected at the data level. It is necessary to analyze the distribution difference between the two in depth and try to improve your training data through artificial synthesis. The difference between the results of the training development set and the results of the training set can be thought of as an Ariance error, which is overfitting. We can try some of the methods we learned earlier to solve the problem (Dropout/L2 / BN/improved network structure etc.). Sometimes, you also need to increase the size of the test set to get a better estimate. 6. Sometimes you need to do manual analysis, but usually you don’t need to fix the error labels in the training set (because they are too large). If the percentage of error labels in the test set is high, you need to fix them manually and reevaluate them. A table can be set up for the test situation of misclassification, statistics of the high impact category and whether it can be operated (optimized), so as to make the next decision. 7. End-to-end deep learning systems vs deep learning as part of the system. The so-called end-to-end is simply to define the whole problem as a deep learning problem, and directly use the method of deep learning to solve. This may be a good solution when you have a large data set and the model feels reliable, but in many cases deep learning plays a role as a link in the system, and other processing mechanisms need to be introduced to help the whole system work better. Such as image preprocessing, splitting a complex problem into two problems and so on. 8. Transfer learning and multitasking learning. Multitasking learning is different from SoftMax’s multi-classification, but instead says multiple different outputs, depending on the same input. There are few applications for multitasking, but there are some in computer vision (such as the ability to simultaneously detect different categories of objects in a picture). Transfer learning allows us to take advantage of built models and parameters and move on to other tasks. Freeze some of them or retrain them altogether. This works if you build a network that includes the characteristics of the new task, so you can experiment. If it’s not relevant at all, it probably won’t work at all.