I talked about the importance of reading source code like a book, but today I’ll talk about how to read a piece of code. What I mean by a piece of code can range from a few thousand lines to tens of thousands, sometimes even hundreds of thousands. The code acts as an organism that performs some important functions together. For instance, a few famous ones, Rails, Django and Phoenix:



The comparison is interesting – Rails/Django is in the 200K range, while Phoenix is an order of magnitude smaller at around 20K. Is there such a big difference in the expressiveness of languages to achieve roughly the same function? It doesn’t make sense. In fact, the functionality implemented by Phoenix, compared to Rails, should be about 80K of actionPack/ActionView code. The Rails embedded Activemodel/Activerecord should be benchstamped against Elixir’s ECTO, which happens to be 80K versus 20K again. This difference reflects differences in language expressiveness as well as differences in the maturity of the framework.

Tens of thousands of pieces of code are like a book that is neither too heavy nor too full to read.

Some, like the Linux Kernel, are “war and peace” in size, with code that is hopelessly large and beyond our comprehension. The result is that every time we decide to read, we put so much energy into it that it is like throwing a stone into a lake.

Don’t despair if you can’t read it. Set a small goal and read one part at a time, such as scheduler (20K) or Memory Management (80K).

Once you’ve decided on a reasonable code size and the source code to read, you can clear out your desk, put out your MAC, get your ink and paper ready, and set aside at least an hour to half a day to start swimming in code.

Due to the previous article (Why should we read the source code?) By comparing reading a book to reading code from beginning to end, many people can’t help but wonder if this article is analogous to “How to read a book”, offering the same ideas: basic reading, review reading, analysis reading, and comparison reading. Yes, these reading methods are useful for reading code. For example, review reading is similar to reading large code projects:

  • We read the readme in the root directory, or anything that seems to tease you to poke it. This is like the “preface” of a book, which helps you understand the purpose of the code further;

  • Next we need to focus on the directory structure of the code and the filename of each source. They are like the contents pages of a book. If you have a readme in each directory, you can also read it quickly. Many languages and frameworks have conventions by contracts, so directories are a quick way to know which parts to skip, such as Django’s management/ Commands directory, Elixir’s Mix/Tasks directory, which hosts side stories that can be read when needed or when you’re bored;

  • And then we start combing through the main line from the entrance. Different languages and frameworks have different mainlines, such as main() for C, app:start for Erlang/Elixir OTP, index.js for nodejs at the root, etc. Generally speaking, the main line of an application is clear and leads to a Mainloop along the way, while the main line of the Framework is more obscure because the framework is usually the abstract part of the application.

But this article won’t make too many comparisons — if you’re interested, why not read the book yourself (and re-read it as you read code)? It is, after all, more complete and systematic.

I want to talk about how to read through another Angle — the scene of reading. When it comes to scenarios, many people associate them with a well-known book: Linux Kernel Scenario Analysis. The authors of this book have clearly grasped the essence of reading code: follow a clue and read end-to-end in a self-contained way. In different scenarios, what we know, what we don’t know, what we want to achieve through reading are different, and obviously, the methods are different. It is like reading books: to make yourself wise, read history; Want to make yourself smart, read poetry; To be thorough, to study mathematics; I want to be profound, study philosophy and so on.

Next, this article will start with a few scenarios of reading code and discuss a little personal experience of reading code.

Scenario 1: Read the code to solve the crime

This is our main code reading scenario. In my work, I have to use various open source systems (other people’s code). You will encounter all kinds of weird problems, which may be due to a lack of understanding of the documentation, or copying an existing example from the Internet that doesn’t fit your usage scenario, or actually running into a brother. When your coworkers aren’t helping you, When Google/StackOverflow isn’t helping you, when forums aren’t responding to “anxious, online, etc.” you start to freak out and feel like you’re being dementor ridden, and all the good things in your life start to slip away.

At this point, like a CSI detective, you have to start dissecting the code, trying to cut through the fog. You tune out all the noise and focus, take the lead, follow the problem, and read and read only the code directly related to solving the problem. This state, I call it “the hunter model” – we like Africa to chase the lion from the group, the body of the small universe concentre, eyes staring prey on the direction of the leg as high winds, like lightning, heart constantly contemplating thunderclap using lock throat, dozen face chance or get bigger. The stones on the road hurt my feet; Running into a freak, annoying, but it’s not a thing. Even if the remote Mount Kilimanjaro is decorated with two rainbows, which are so beautiful that you can post a selfie with your friends and get hundreds of likes, you don’t have time to care about the minor details.

Focus, focus on attacking and attacking only one point is the main way to read code in this scenario.

Take, for example, the nginx cache problem I encountered. A year ago, when I took over Tubi TV’s low performance and difficult to maintain API system, I decided to rewrite it later, but the immediate problem was to improve performance. There’s not a lot of room in the application layer (the data is already in Redis), so you have to think about it in the Web layer. Given the choice between HAProxy and nginx cache, I chose the latter because Nginx was already heavily used in production environments at the time. I have never used nginx cache before, but it is not difficult to enable nginx cache. After setting the cache path and size according to the document, I can set the cache key in the location where the cache needs to be used and enable it. My simple local test worked fine. However, in a production environment, a request that should have been hit remains in a miss state. I was at a loss. I tried all kinds of schemes found on the Internet, but to no avail. Eventually, I decided to build a version of Nginx myself with the DEBUG switch on (–with-debug), log more, and look for problems with the source code.

The amount of cache-related code in nginx cache and upstream is not very large, with several thousand lines. I quickly went through it and searched for the relevant processing process based on the contents of the log, and speculated about possible scenarios in several large bailout branches. Since the Nginx debug log was still not detailed enough for my needs, I added debug code, recompiled, and ran on each of the branches that were not taken into account.

In this process, “guess” plays a big role. I remember my undergraduate math teacher, a lovely little old man, used to say, “Guess what. He always said that even monte carlo is a way to solve problems, that great mathematicians are also great guessers.

When we read code, we guess the intention of filename, function name, variable name, guess the intention of a branch, guess the intention of a piece of code, and finally combined with the result of running, printed debugging information to verify our guess. It’s an interesting cat-and-mouse game between reader and author. The more you read, guess, and verify, forming a valid feedback loop (read-gue-verify), the better your chances of succeeding next time.

Finally, THE problem was identified by me — it was two or three incorrectly configured configurations. The answer on StackOverflow is partially correct, and it solves most people’s problem – the header associated with no ignore cache control is almost ignored by every first-time user, and it’s one of my configuration problems. The reason this answer didn’t solve my problem was that nginx in our production environment had an obscure configuration that disabled proxy buffer, causing Nginx to skip the cache.

From the above process, let’s take an abstract look at what to look for when reading code in order to solve a crime:

  1. Take a clue and find the code in the pile that is relevant to the problem. In the case of the nginx cache, the cue is the proxy upstream. The cache always fails to hit, so the offending code is related to the cache, proxy, and upstream. Look in the source directory and pick out the files you need to see. Because the problem is in the cache, in the selected file, the specific look cache related function name, macro name, and code.

  2. Focus on the selected content and ignore irrelevant noise. As you read, focus on looking for potential trigger routes and use the “me guess me guess” method to add debugging information.

  3. Compile and run the modified code, reproduce the problem, analyze the debug information, and bingo, congratulations! Please proceed to step 5.

  4. If you are not correct, please return to step 1. Don’t worry, it’s not the Sats, and you’ll always have another bottle until your boss gets fed up and fires you.

There’s also a crucial step 5, which I’m singling out. Most of the time we circle several times, finally after the third step bingo happily like just K.O. The opponent of Chunli, jumped up with legs, left and right hands in the air together, two, so complacent that they forgot the implementation of the fifth step.

Joy is short, and so are memories. Your goals are clear, your execution is powerful, and you do whatever it takes to get there. Three days later the boss asks you, cheng, you’re great. What did you use to conquer this incredibly difficult brother? At this time you desperately recall, but like a sieve filled with water, busy for a long time nothing. You begin to doubt life: am I the same person as I was three days ago?

So the key step 5 is: replay. Once you’ve solved the problem, don’t rush to accept the thanks and the glances of your colleagues. While the memory is still hot, grab Evernote (or XXX) and write down the entire process in the simplest way possible — the key code, the critical path, the entire guessing process to get to the end, and the logs that confirm the guess. Any logs that prove your guesses wrong (congratulations — console or Terminal should still be running at this point) should be kept as a running list of all the tricks you use in your unscrupulous process. Finally, analysis and summary:

  • What is the root cause of this problem? What is the flow of the code that triggers it?

  • What did I guess right and what didn’t I guess right when I read the code?

  • What parts of the code are worth perusing if you have the time?

  • The next time a similar problem occurs, how can I locate the problem faster in the source code?

In this “solving” code reading process, if there is no review, 70% of your effort is wasted — you spend a lot of time, read a lot of code, and get nothing but a good result. Unfortunately, we skip this step in most work scenarios. So am I. While writing this chapter, I searched my Evernote and looked through my mail, and there was no record of anything other than a brief email from early last year explaining how I had enabled nginx cache. Fortunately, I had read some nginx cache code after solving the problem, so I was able to use it as an example.

The review helps you precipitate this information, giving you a chance to review it, organize and solidify it into the knowledge described in the previous article. Such content accumulates much, slowly your head can carry a halo, on the halo proudly jiao ground is written: brick home.

Scenario 2: Read the code for clarity

Scenario 1 describes a passive approach to reading code. Reading to solve a problem is the only way for most people to improve their code. It’s like in Heroes, you’re the one who holds the fort, collects mines, gathers troops, never takes the initiative to provoke monsters, just waits for the enemy to attack. In this way, after three months, we fought several battles, but our experience value increased too slowly.

What if I want it to go up faster? Make the first move! Many algorithms, basic knowledge and theories in the field of computer are barely understood after reading books and articles. At this time, reading codes is the fastest way to consolidate and deepen our understanding:

  • Algorithm: How is bloom Filter implemented? How to make simple recommendations for Bandit algorithms on your own system? What does the actual osSIP production environment code look like? How to implement O(1) Scheduler?

  • Basics: How to implement a complete REST API framework covering the HTTP 1.1 protocol? How does a packet send from OS driver all the way to Application? What is Zero Copy? How does the Linux kernel implement Zero Copy?

  • Theory: What is IoC/DI/Pub Sub? How do various frameworks implement these design patterns? What is the implementation behind the Supervisor behavior?

This process is a positive feedback, is the Matthew effect cumulative process. The more books you read, the more knowledge in your mind, and the more questions you have. These questions prompt you to read relevant codes to confirm and clarify your doubts. When you read too much code, you feel that you lack theoretical knowledge, so you go round and round and keep learning. On the other hand, if you read less, you don’t have a question mark in your head, so you don’t have to read the code to find out.

Take the REST API Framework for example — when I was working on Web Security at Juniper two years ago, I needed to build a solid API system. We know that doing things on the Web is much more rigorous (and therefore much slower) than doing things on the Internet, so I spent some time reading RFC 2616 and its subsequent revisions (7230-7235). The next step is to select the API framework and find a suitable one. I was working on Clojure at the time, so I picked Up Liberator. Liberator was inspired by Webmachine in Erlang and implemented the decision Tree elegantly with simple Macro. Later I also scanned the decision tree of Webmachine, pattern matching + recursion, very beautiful. Unfortunately, I was on a team with a rigid mind that only had room for Python. Instead, I chose EVE, a Rest API framework for Python. The quality of EVE’s code is well-organized and flat, and it clearly looks like code you and I would write.

Tell me more about my process of reading WebMachine. I read Webmachine, which was completely introduced by Liberator. The author of Liberator said that his decision tree came from Webmachine and attached the diagram. At this time, I felt like I had just practiced Bruce Lee’s Jeet Kune Do. When I heard that it was derived from Wing Chun, I suddenly felt itchy to explore the reality of Wing chun.

Webmachine’s code is short, at just 4700 lines. Follow the file name quickly to find webmachine_decision_core.erl, which is the main content to read, about 800 lines. These 800 lines of code can be divided into three sections: the first 150 lines, the decision Tree shelf; The middle, with more than 300 lines, is the implementation of specific decisions; The other 200 lines are the auxiliary functions.

The flow of each decision is shown below:

Atom is named after the diagram. For example, the V3B13 atom is the decision node in the V3 diagram, column B, row 13. This is the first decision, if the service is available, the whole flow continues down, otherwise 503 Service Unavailable is returned.

With this in mind, the code execution process is very easy to understand. It’s easy to follow the decision Node code one by one, and RFC 2616 comes alive and pulsates before your eyes. Let’s look at one more example: Feed Check, which is v3G11:

This code reads the list of ETags in the IF-match header from the HTTP header and then calls generate_etag via resource_call to generate an ETAG that matches any item in the ETAGS. If it matches, Jump to V3H10, otherwise 412 PreConditional Failed. How does a Webmachine know how to generate an eTag? This is where the framework comes in. It extracts and implements the common parts of the protocol and extends the business logic to applications that use the framework. In other words, Generate_etag is the callback to be implemented by application. This is IoC.

This code should be obvious to anyone who understands the role of etag in the HTTP protocol or is familiar with the concurrency Control scenario. But I’m sure many people will have a hard time understanding how it works. To further explain: For example, Xiao Ming and Xiao Hong are the two administrators of Program Life. They use the API to get basic information about Program Life (name, description, etc.) v1 from the database simultaneously. Xiao Ming changed the name of program life to “programmer life” and called PUT API to successfully modify data to V2. Xiao Hong also modified the data at the same time, but she still used the original v1 data to modify, resulting in the submission of xiao Ming’s modification overwritten. This is the classic concurrency Control scenario — the concurrency condition. What to do (think about how you normally handle it)? The WAY the HTTP protocol works is that Ming and Hong get a version number (think of it as sha1 for the data), called eTAG. After Xiao Ming changed, the etag of the data was changed. When Xiao Hong submitted the old Etag, the server checked the current etag and found it did not match, so it was 412. This is a simplified optimistic lock.

There’s only one thing lesbians want to say: being able to read code is not the same thing as understanding how code is used. But when you really understand it, your code gets better. When you do updates to shared objects in a concurrent environment, a question will arise in your head: why not use a lock lock and consider something like if-match instead?

Back to the point. The whole process before, I was trying to understand the author’s intention. When I’m satisfied, I usually ask:

  • Is there anything about this code that can be optimized?

  • Are there potential security vulnerabilities?

  • Are there any unhandled states or exceptions?

Lists :member is an O(N) operation in this short five-line code. Any operation of O(N) deserves our attention. As we know, it is better to use set for membership check instead of list. Split ETags are a weak link from a security standpoint. Attackers can construct if-match headers that are large and complex enough to slow down the processing of individual requests for better DoS. As for the unhandled state, rest assured that with a flowchart (state machine) as detailed as the one shown above, the code won’t be a big problem.

OK, the chestnuts have been cooked long enough. Hold on. Let’s compare the amount of code for three API frameworks: Liberator 1.2K, WebMachine 5K, and Eve 12K. Reading Liberator feels like a song of chu, beautiful but obscure; Reading WebMachine feels like a math textbook, full of recursive derivations; Reading EVE is like reading an undergraduate thesis. It is only functional. After reading it, I don’t have much impression. The framework was written so badly that some decisions were made that the framework shouldn’t have made, so that we ended up forking framework code to meet our requirements, which is a framework no-no (we were using 0.4, 0.7 at the time of writing).

Let’s conclude — it’s not too hard to read code for clarity:

  1. Go through the entire code using the review method described above to find the core code worth reading.

  2. Read through this part of the code and break it down further. Keep a pen and paper (or other handy tools) handy and take notes. The best way to document it is on a chart. Software tools are not recommended for this stage of recording (unless there is a very comfortable, human-part recording).

  3. Read the code closely, use your existing knowledge to understand the information needed for the code, and guess and reconstruct the scenario of an event, message, or process in the code. Take notes. At this point, if you encounter peripheral code (call external functions), as long as there is no obstacle to understanding, you can first play, the whole process is complete and detailed again. Ask as many questions as possible to minimize the “I think I understand but I don’t.”

  4. Skim the rest of the code with the review method, and skip to 2 if you find something else worth reading.

  5. Use contrast reading (or thematic reading) to scan through similar repOS. Try to digest different authors’ implementations in your own words, pay attention to their differences, and try to judge those differences.

  6. Use software to digitize manuscripts for future review. Text can be loaded directly into a notebook tool (even try Gitbook), and charts can be plantuml if you can’t afford visio, omniGraffle, etc. Use the method refer to my article: those years I’ve chased drawing tools

The last step is a time-consuming process, and unless you have a prodigious amount of persistence, or a fete of teaching, you may not be motivated to give your heart away. Of course, in this scenario, we are reading the code in a leisurely manner, without the client boss holding a whip behind us, so the memory will be very strong after reading it. Even if there is no electronic document, we can go back to the code and flip through it, and the memory will be restored.

Introspection again: I did not do well in step 6, some manuscripts, if not photographed, will be lost forever.

(My quick summary of initializing the App Master)

The more you do this reading, the more theory, knowledge, and algorithms you really understand, the more irreplaceable you become. Sometimes, if we just spend a few months really figuring out a big project, we’ll be the best of the best.

Scenario 3: Reading code for energy level transitions

Middle school physics tells us that atoms can jump from low energy state to high energy state by absorbing photons under the radiation of light, and the orbital of electrons, or energy level, will change. This is the same as the quantitative change to qualitative change often mentioned in philosophy. As a programmer, your development process is the same: you slowly climb the hill of work, reach a platform and then stop, like a stock of boxes. Then suddenly there is a period of time, do not know what stimulation (such as field battle hit the Dragon King magic power, or read the program life *_*), suddenly pull several trading limit on another platform.

To break a plateau, an achievement level transition, you need to absorb the right photons. This photon can be a ground-breaking project (such as Google Map, Docker docker, Alibaba’s Taobao, etc.), but such opportunities are not always caught up by you and me, most people are doing some insignificant things day after day. Small jobs that can only be upgraded slowly — like a level 7 hero with a group of eagles who can only fight against big-eared monsters, helldogs, griffins and the like. At this time, instead of sinking in silence, it is not like the northern dark fish in Zhuangzi’s words, floating in the water, storing up energy, waiting for the next kneading up ninety thousand miles.

One way to store energy for the transition is to read code. What to read? Read code that is so basic that you think you will never write it. Like the Linux kernel, like the OTP. You never know when you will finish reading this way, so have enough patience and time. Review reading + topic reading + Mind mapping is often used. The figure below is the OTP source code after I read the review, I have highlighted the parts to read gradually, in bold is I have finished reading the rough parts.

OTP code is not too little, but the coupling degree is very low, in fact, the final split into a number of scene 2 to read. Let’s look at the total code:

1.4m LOC, almost terrifying. But after removing example and some irrelevant auxiliary code:

930K, about 45% smaller. The first batch of code I identified to read was only 130K, and could be read roughly in two days or intensively in six months at most. So, after half a year, your level is bound to be different from that of Wu Xia Ameng.

It should be pointed out that, this kind of reading sometimes can let a person very depressed, because you’ll meet very, very much knowledge gap, thus have to offset these books and find information you lack knowledge, pull slowed the pace of the whole reading comprehension, sometimes even a few days no progress, let your heart be stimulated the breath began to gradually failure. At this point, steady! These knowledge gaps are a gift and a great opportunity to fill you don’t know what you don’t know. Take it easy and enjoy the extra knowledge.

So years down to enrich themselves, you write code to do the project, from yu Guangzhong teacher described, fascinating Li Bai “embroidered mouth spit, on the half of the tang dynasty” state is not far away!

To end this post, share a quick story:

In my early years at Juniper, when I was writing a summary of the Netscreen Data Plane code (the internal document I mentioned in other articles and was referred to as the “Sunflower Book” by my colleagues in the company), I wanted to better understand the process of establishing IPSec Phase 1 SA. Into IKE’s back garden. I didn’t know much about IKEv2 at the time and did a lot of reading before I started looking at the code. I didn’t understand the code at all. Catcher, Thrower and other strange expressions made me lose my direction like the Yellow Emperor caught in the fog of Chiyou. It was only later that I was reminded by a colleague that the terminology was derived from baseball, and when I found out what the baseball terms meant on Wikipedia, the code became cute.

A few years later, I read “How to Read a Book” for the first time, and the author devoted a long passage to the analogy of the skill of reading through the skill of the catcher in baseball, and the relationship between the catcher and the pitcher and the reader and the writer. Reader, reader, I couldn’t help but go back to that sunny afternoon more than a decade ago: I used the diskless SunRay workstation on my desk to access a Solaris server named Gretel and Hansel, then used VIm to open the IKE connection, six handshake codes, Sip a cup of coffee and watch a great baseball game…