Summary of the machine learning/data mining/data analysis books I have read, some suitable for entry, some suitable for advanced, not in accordance with the hierarchy, first summarize, and so summarized almost according to the entry -> advanced block writing. The following list of books I wrote are basically read, or dare not write, for fear of mislead people

The data analysis

When I was an intern, I only knew Matlab. The company was small and I had no money to buy legal copies, so the leader asked me to learn R in two weeks. I read these books at that time

1.R language practice

Comments: a good primer on installation, getting started, basic statistical analysis, plotting commands, and common classification, regression, dimensionality reduction methods

Rating: five stars

2. Data analysis -R language practice

Evaluation: data analysis book written specifically in R language, can be read after mastering the basics of R, focusing on the basic methods of data analysis, introduces some common analysis methods, comparison of the foundation.

Rating: four and a half stars

3. Exploratory data analysis

Review: written by a foreigner, but the translation is really bad. And the content doesn’t matter actually dry goods, look for the teaching material of this statistics directly about quantile, spread cloth and so on these concepts.

Rating: Three stars

4.R language programming art

Review: Stumbled across this great book in the library, which is good for data structures and performance enhancement in R.

Recommended rating: four stars

5. Use Python for data analysis

The book is written by the author of the Pandas module. It is summed up in one sentence: The Manual for pandas. Pandas is a necessary package for data analysis in Python.

Recommended rating: four stars

Data mining/machine learning

4. R language data mining in the era of big data: R language actual combat

Evaluation: and the above “data analysis -R language combat” seems to be a series, basically common data mining methods are introduced, there are theoretical examples, suitable for entry.

Recommended rating: four stars

5. Data mining concepts and technologies

Evaluation: introductory book, more theory, seems to be a lot of graduate students to learn data mining teaching materials, very detailed, Meng Xiaofeng teacher’s translation or good, relatively many translation is very bad or can.

Recommended rating: four stars

6. Machine learning

Comments: Written in Python, if you do not have the foundation of Python or learn Python first, basically all examples, the code is very detailed, also very easy to understand, github can download the code

Rating: five stars

7. Collective intelligence programming

Evaluation: and machine learning actual combat together, are basically examples, translation can also be, than the “exploratory data analysis” translation is much better!! Have code, can practice, basically really mastered can deal with the general data mining needs.

Rating: five stars

Statistical learning methods

Evaluation: The mathematical derivation of the common algorithm of machine learning written by Dr. Li Hang is more detailed, and it is very good to understand the mathematical basis. If there is no mathematical basis, you can first look at the number of points of high generation convex optimization and other books. Suitable for learning with a certain foundation.

Rating: five stars

9. Recommend system practice

For those who don’t know what a recommendation system is, you can take a good look at it. After reading it, you can basically understand the general framework and process of the recommendation system. There are also some examples, but each example and theory are very shallow, not in-depth, only suitable for beginners.

Recommended rating: four stars

10. Introduction to data mining

Evaluation: intern colleague undergraduate course teaching materials, is also a big giant ah, foreigners write books, very easy to understand, very very detailed.

Recommended rating: four stars

So I’m going to leave you there, basically a few introductory books, and some of them are in Evernote, and I’ll summarize them later. Next time I’ll write a hadoop/Python/Spark book and some good papers.

# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 4.12 update — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – ———–

11.Spark fast big data analysis

Learning Spark is a Chinese version of Learning Spark. It is a very simple book that introduces the basic syntax commands of Spark

Rating: five stars

12.Spark advanced data analysis

Evaluation: There are few comments on Douban, but AFTER I bought it, I found that it is still good. Basically, there are examples explaining the classification, clustering, recommendation and credit investigation, which are quite detailed and quick to read. I am good at it.

Rating: five stars

13.Hadoop authoritative Guide

Evaluation: 7.8/10, very thick, Hadoop talk very deep, not quite suitable for entry, suitable for people to do data warehouse, data mining can first see Hadoop combat

Rating: Three stars

14. Hadoop in actual combat

Evaluation: 7.0/10, I read a professor in China wrote, not “Hadoop in Anction” Chinese translation, this write is very shallow, suitable for entry, but feel or Hadoop in Action write better

Recommended index: Three and a half

15.Hive Programming Guide

Evaluation: 7.4/10, about Hive operation, really, if you really just want to know how to operate Hive, you can not read this book, directly to search Hive programming command set can be, this book is more suitable for ETL people, if only data mining entry you can temporarily do not read this book. But the book itself is very good

Recommended rating: four stars

16.R language and Website analysis

Evaluation: 7.4/10. It was just a book I happened to read when I went to Guitu, but after reading a few chapters, I thought it was very clear. Besides, the examples behind it were very good, so I bought a Kindle e-book on Amazon.

Recommended rating: four stars

17.R’s Geek Ideal Tools section

Evaluation: 7.5/10. The author is Zhang Dan. At the beginning, I paid attention to his blog, which is very clear and the steps are very clear. The last few pages of this book are mainly about R performance, as well as database, Hadoop, hive combined methods, worth a look.

Recommended rating: four stars

Mysql must know must know

Comments: 8.4/10, not much to say, getting started with Mysql is a must-read, very thin booklet.

Rating: five stars

19. High-performance MySQL

Review: 8.7/10, professional MySQL books, suitable for advanced, but the Chinese translation is very poor, buy English English English

Recommended rating: two stars (and three for the English version)

19. The convex optimization

Evaluation: 9.4/10, a very good and comprehensive textbook, including a lot of content learned in the numerical analysis course before, and many concepts of machine learning can be found in the book, so that you can have a deeper understanding of machine learning, rather than just apply the package.

Recommended rating: 5 stars!

20.Pattern Recognition and Machine Learning

Comments: 9.6/10, PRML is a classic textbook on machine learning, it is very worth watching! Someone translated the Chinese version, if necessary, you can leave a message I will send the link ~

Rating: five stars

22. Statistical natural language processing

Evaluation: 8.8/10, do an introduction to natural language processing, the book is very thick, but many conceptual things, but don’t feel boring, the only drawback is that is probably because classical teaching material books, so less instance, a bit like a review, lot, if you want to actual combat, can have a look at Python to write a book of natural language processing, me, Natural Language Processing with Python (Natural Language Processing with Python)

Recommended rating: four stars

Recommend a few popular science books, amateur can read to enhance interest

1. From 0 to 1

2. Age of big data

Top of the wave

4. The beauty of mathematics

5. Top of the data

There are other temporarily can not remember, next time update ~

# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 4.19 update — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – ———–

23. Machine learning

Evaluation: the last incredibly forgot to write the book, teacher zhou’s new book, the formula is derived in detail, douban score of 9.2, weeks before the teacher talking about data mining algorithm about how to evaluate the effect of the algorithm and selection, can understand macroscopic, machine learning some basic knowledge, at the time of after learning algorithm for their is also a general understanding of the scene. The author’s thought is very clear, strongly recommend!!

Rating: five stars