Go at Google: Design in Software Engineering Services

by Rob Pike

Abstract

This is a modified version of Rob Pike’s keynote speech at SPLASH 2012 in Tucson, Arizona, on October 25, 2012.)

The Go programming language, conceived in late 2007, addresses some of the problems we encountered when developing software infrastructure at Google. Today’s computing environment has little to do with the environment in which the language is created (primarily C ++, Java, and Python). Problems introduced by multi-core processors, networking systems, massive computing clusters and the Web programming model are being solved rather than tackled head-on. Moreover, the scale has changed: today’s server programs contain tens of millions of lines of code, are handled by hundreds or even thousands of programmers, and are updated daily. Worse, even on large compiled clusters, build times can stretch to minutes or even hours.

Go was designed and developed to improve productivity in this environment. In addition to well-known aspects such as built-in concurrency and garbage collection, Go’s design considerations include strict dependency management, adaptability of the software architecture as the system grows, and robustness across component boundaries.

This article explains how to solve these problems when building efficient, compiled programming languages that feel light and pleasant. Examples and explanations will be drawn from real-world problems that Google faces.

Introduction

Go is a compiled, concurrent, garbage collector statically typed language developed by Google. This is an open source project: Google introduced repository, a common repository, but not the other.

Go is scalable and efficient. Some programmers find work fun; Others find it unimaginative, even boring. In this article, we will explain why these are not contradictory positions. Go, which aims to solve problems encountered in Google’s software development, is not a breakthrough Research language, but is an excellent tool for engineering large software projects.

Go at Google

Go is a programming language designed by Google to help solve Google’s problems. Google has big problems.

The hardware is big, the software is big. There are millions of lines of software, the server mainly uses C ++, and the rest uses a lot of Java and Python. Thousands of engineers work in code, at the “head” of a single tree that contains all the software, so major changes are made to all levels of the tree every day. A large custom designed distributed build system makes this scale of development feasible, but it’s still large.

Of course, all this software runs on tens of thousands of machines that are considered a modest number of independent clusters of networked computing.

In short, Google’s development is big, can be slow, and often clumsy. But it works.

The goal of the Go project is to eliminate the slowness and clumsiness of Google’s software development to make the process more productive and scalable. The language was designed by people who read and write, debug, and maintain large software systems. Therefore, Go’s purpose is not to study programming language design; It’s about improving the working environment for designers and their colleagues. Go is more about software engineering than programming language research. Or, to reiterate, it’s about language design in software engineering services. But how does language help software engineering? The rest of this article is an answer to that question.

Pain points

When Go was released, some claimed that it lacked features or methods considered necessary for a modern language. Without these facilities, how can Go be valuable? Our answer is that Go does solve the problems that make large-scale software development difficult. These issues include:

Slow builds
Uncontrolled dependencies uncontrolled dependencies
Each programmer uses a different subset of the language
Poor program understanding (code difficult to read, improper documentation, etc.)
Duplication of labor effort
Cost of updates
Version distortion? version skew
The difficulty of writing automatic tools
Cross-language builds across languages

Features of the language cannot solve these problems. A larger software engineering perspective is needed, and in the design of Go, we try to focus on solutions to these problems.

As a simple, self-contained example, consider a structured representation of a program. Some observers object to the c-like block structure of Go with parentheses, preferring Python or Haskell style indented Spaces. However, we have extensive experience in tracking build and test failures caused by cross-language builds, where Python fragments embedded in another language, such as through SWIG calls, are subtly and invisibly undermined by changes in indentation of surrounding code. Therefore, our position is that while indentation is great for small programs, it does not scale well, and the larger and more exotic the code base, the more trouble it becomes. It is better to give up the acquisition of safety and reliability, so Go has brace-bounded blocks.

Dependencies in C and C++

More substantive explanations of scale and other issues emerge when dealing with package dependencies. We started talking about how they work in C and C ++.

ANSI C was first standardized in 1989, elevating the concept of #ifndef “Guards” in standard header files. The ubiquitous idea is that each header file is surrounded by conditional compilation clauses, so that the file can be included multiple times without errors. For example, the Unix header file <sys/stat.h> looks like this:

/* Large copyright and licensing notice */
#ifndef _SYS_STAT_H_
#define _SYS_STAT_H_
/* Types and other definitions */
#endif
Copy the code

The goal is that the C preprocessor reads the file but ignores the second and subsequent reads of the file. The call after the symbol _SYS_STAT_H_ “daemon” defined when the file is first read.

This design has some nice properties, the most important being that each header file is safe to #include all of its dependencies, even if other header files also include them. If followed, ordered code is allowed, for example, to sort the #include clause alphabetically.

But the size is terrible.

In 1984, compilation of ps.c (source code for the Unix ps command) #include <sys/stat.h> 37 times was observed while all preprocessing was done. Even if the content is discarded 36 times while doing this, most C implementations open the file, read it, and scan it 37 times. In fact, without a good Cleverness, the potentially complex macro semantics of the C preprocessor require this behavior.

The impact on software is the gradual accumulation of #include clauses in C programs. It doesn’t break the program that added them, and it’s hard to know when they’re no longer needed. Removing #include and compiling the program again isn’t even enough to test it, because another #include might itself contain a #include to extract it.

Technically, it doesn’t have to be that way. The Plan 9 library designers took a different non-ANSI approach by using # ifNdef protection to address long-term issues. In Plan 9, header files are prohibited from including other #include clauses; All #includes must be in a top-level C file. Of course, this requires some rules – programmers must list the necessary dependencies once in the right order – but the documentation helps, and in practice it works very well. As a result, no matter how many dependencies the C source file has, each #include file is read only once when the file is compiled. Of course, it’s also easy to see if #include is necessary by removing it: The edited program will compile if and only if the dependency is unnecessary.

The most important result of the Plan 9 approach is faster compilation: the amount of I/O required to compile is far less than a program that uses #ifndef Guards to compile the program.

However, outside of Plan 9, the “guard” approach is a best practice for C and C ++. In fact, C ++ exacerbates the problem by using the same approach at a finer granularity. By convention, C ++ programs usually consist of a header file for each class, or perhaps a small group of related classes, a much smaller group than

. As a result, dependency trees are more complex and reflect library dependencies rather than a complete type hierarchy. In addition, C ++ header files usually contain real code types, method and template declarations – not just the simple constants and function signatures typical of C header files. So not only does C ++ push more content to the compiler, it pushes content that is harder to compile, and the compiler must reprocess this information each time it is called. When building large C ++ binaries, the compiler may be taught thousands of times how to represent strings by handling header

files. (It is recorded that around 1984, Tom Cargill observed that dependency management using C preprocessors would be a long-term responsibility of C ++ and should be addressed.)

Building a single C ++ binary on Google can open and read hundreds of individual header files tens of thousands of times. In 2007, Google’s build engineers were responsible for compiling the main Google binaries. The file contains about two thousand files that, if simply linked together, amount to 4.2 megabytes. When extending #includes, more than 8 gigabytes are sent to the compiler’s input, erupting 2000 bytes per C ++ source byte.

As another data point, in 2003 Google’s build system moved from a single Makefile to a per-directory design with better management and clearer dependencies. A typical binary reduces the file size by about 40%, just recording more accurate dependencies. Even so, the C ++ (or C language) attributes make it impractical to automatically validate these dependencies, and today we still don’t exactly understand the dependency requirements of large Google C ++ binaries.

The result of these uncontrolled dependencies and large scale was that it was impractical to build Google server binaries on a single computer, so a large distributed build system was created. With this system, which involves a lot of machines, a lot of caches and a lot of complexity (the build system itself is a large program), Google’s build is still cumbersome but practical.

Even with distributed build systems, large Google builds still take a long time. With the precursor Distributed Build system, the 2007 binaries took 45 minutes; Today’s version of the same program takes 27 minutes, but of course the program and its dependencies have grown in the interim. The engineering effort required to scale build systems is almost impossible to keep up with the growth of software being built.

Enter Go

When builds are slow, there’s time to think. The origin myth of Go suggests that it was in one of these 45-minute constructs that Go was conceived. It was thought worth trying to design a new language suitable for writing large Google programs, such as Web servers, with software engineering considerations that could improve the quality of life for Google programmers.

While the discussion so far has focused on dependencies, there are many other issues that need attention. In this case, the main considerations for the success of any language are:

It has to work at scale, for large programs with lots of dependencies, and large teams of programmers. It has to look, roughly, like C. Programmers who work at Google are most familiar with programming languages, especially those from the C family, in the early stages of their careers. The need for programmers to produce quickly in new languages means that languages cannot be too radical. It has to be modern. C, C ++ in some ways, Java is pretty archaic, designed before the advent of multicore machines, the Web, and Web application development. Certain features of the modern world can be better met with newer approaches, such as built-in concurrency. With that background, let’s take a look at Go’s design from a software engineering perspective.

Dependencies in Go

Now that we’ve looked at the dependencies in C and C ++ in detail, a good place to start our tour is to see how Go handles them. Dependencies are defined syntactically and semantically by the language. They are unambiguous, unambiguous, and “computable,” that is, easy to write analysis tools.

The syntax is that, after the Package clause (the subject of the next section), each source file may have one or more import statements containing the import keyword and a string constant that identify the packages to be imported into the source file (only) :

The first step in importing “Encoding/JSON” to specify the Go Scale is for the language to define unused dependencies as compile-time errors (not warnings, errors). If the source file imports a package it does not use, the program will fail to compile. This ensures that the dependency tree of any Go program is accurate by constructing it without exigent edges. This, in turn, ensures that no additional code is compiled when the program is built, thus minimizing compilation time.

There is another step, this time the implementation of the compiler, which further ensures efficiency. Consider a Go program with three packages and this dependency graph:

package A imports package B; package B imports package C; package A does not import package C

This means that package A is passed using C only by using B; That is, there is no mention of identifiers from C in A’s source code, even though some projects used in A do mention C. For example, package A might refer to A structure type with fields defined in B. A type defined in C, but A does not reference itself. As an incentive example, suppose A imports A formatted I/O package B that uses the buffered I/O implementation provided by C, but A itself does not call the buffered I/O.

To build this program, first compile C; Dependent packages must be built before the packages that depend on them. And then compile B; Finally A compiles, and then the program can be linked.

When compiling A, the compiler reads the object file of B, not the source code. The object file of B contains all the type information the compiler needs to execute

Import “B” clause from source code. This information includes any information about C that B’s clients need at compile time. In other words, when B is compiled, the generated object file contains type information for all of B’s dependencies that affect B’s public interface.

This design has the important effect that when the compiler executes the import clause, it opens only one file, identified by the string in the import clause. Of course, this is an approach reminiscent of Plan 9 C (not ANSI C) dependency management, except that the compiler actually writes the header file when compiling the Go source file. Nevertheless, the process is more automatic and efficient than Plan 9 C: the data read when evaluating an import is just “exported” data, not generic program source code. The impact on overall compile time can be significant and expands as the code base grows. The time to execute the dependency graph and therefore compile can be exponentially less than the “include files” model of C and C ++.

It is worth noting that this general approach to dependency management is not original; These ideas date back to the 1970s and flow through languages like Modula-2 and Ada. In the C series, Java has elements of this approach.

To make compilation more efficient, the body file is arranged so that the exported data is the first thing in the file, so the compiler can stop reading immediately when it reaches the end of the section.

This dependency management approach is the biggest reason Go compiles are faster than C or C ++ compiles. Another factor is that Go puts the exported data in the target file; Some languages require the author to write or the compiler to generate a second file containing this information. That’s twice as many open files. In Go, only one file can be opened to import packages. In addition, the single-file approach means that the exported data (or header files in C/C ++) will never become obsolete relative to the target file.

For the record, we measured the compilation of large Google programs written in Go to see how source fan-out compared to the C ++ analysis done previously. We found that it was about 40 times better than C ++, 50 times better (and simpler, and therefore faster to process), but it was still bigger than we expected. There are two reasons. First, we found an error: The Go compiler generated a lot of data in the export section that didn’t need to be there. Second, export data using detailed encoding that can be improved. We plan to solve these problems.

Still, it takes 50 minutes less for things to turn into seconds, and coffee becomes an interactive construct.

Another feature of the Go dependency graph is that it has no loops. There can be no circular imports in the language definition graph, and both the compiler and linker check to see if they don’t exist. Although occasionally useful, circular imports cause major problems on a large scale. They require the compiler to process larger sets of source files simultaneously, which slows incremental builds. More importantly, when our experience allows, these imports end up tangling large portions of the source tree into large child elements that are difficult to manage independently, bloating binaries, and complicating initialization, testing, refactoring, distribution, and other software development tasks.

The lack of circular imports can cause occasional annoyance, but keeping the build tree clear forces clear boundaries between packages. Like many design strategies in Go, it forces programmers to think earlier about a larger-scale problem (in this case, package boundaries) that may never have been satisfactorily solved if left to a later date.

Through the design of the standard library, a great deal of effort has gone into controlling dependencies. Copying some code is better than pulling in a large library for a function. (System build tests complain if new core dependencies appear.) Dependency hygiene trumps code reuse. One example in practice is that (low-level) network packages have their own integer to decimal conversion routines to avoid relying on larger and more dependent formatted I/O packages. The other is that the string conversion package Strconv has a private implementation of the ‘printable’ character definition, rather than pulling into the large Unicode character class table; Strconv respects the Unicode standard as verified by the package’s tests.

Packages

Go’s package system is designed to combine library, namespace, and module properties into a construct.

Each Go source file, such as “encoding/json/json.go”, begins with the package clause, as follows:

Package JSON where JSON is “package name”, a simple identifier. Package names are usually concise.

To use a package, import the source file identified by the package path in the import clause. The meaning of “path” is not specified by the language, but in practice, by convention, it is the slash-separated directory path of the source package in the repository, here:

Import “encoding/json” then use the package name (different from the path) to qualify the project of the package imported into the source file:

Var dec = json.newdecoder (reader) This design provides clarity that people might always be able to tell from syntax whether a name is local or not: name vs. PKG.name. (More on that later.)

For our example, the package path is “encoding/json” and the package name is JSON. Outside of the standard repository, the convention is to place the project or company name in the root directory of the namespace:

Import “Google /base/go/log” It is important to realize that the package path is unique, but there is no such requirement for the package name. The path must uniquely identify the package to be imported, and the name is just a convention for how the client of the package refers to its contents. The package name does not have to be unique, and you can override each imported source file by providing a local identifier in the import clause. These two references are both reference packages that call their own package logs, but must be (locally) renamed to import them into a single source file:

Import “log” // Standard package import googlelog “Google/Base/go/log” // Google-specific package Each company may have its own log package, but the package name does not need to be unique. Quite the opposite: Go style suggests keeping package names short and clear, rather than worrying about collisions.

Another example: Google has many server packages in its code base.

Remote packages

An important feature of Go’s package system is that the package path is usually an arbitrary string that can be selected to reference a remote repository by identifying the URL of the site serving the repository.

Here’s how to use Github’s Doozer package. The go get command uses the Go build tool to get the repository from the site and install it. Once installed, you can import and use it just like any regular package.

$go get github.com/4ad/doozer // Shell command used to obtain the package

Import "github.com/4ad/doozer" // import statement of Doozer client

It is worth noting that the go get command downloads dependencies recursively, and only because dependencies are explicit can properties be implemented. In addition, the allocation of import path space is delegated to urls, making package naming decentralized and therefore extensible, as opposed to centralized registries used in other languages. One feature that is missing from Go is that it does not support default function arguments. This is a deliberate simplification. Experience has taught us that default parameters make it too easy to fix API design flaws by adding more parameters, resulting in too many parameters and interactions that are difficult to unravel or even understand. The absence of default parameters requires defining more functions or methods, because a function cannot hold the entire interface, but this leads to a clearer API that is easier to understand. These functions also require separate names, which clearly indicate what combinations exist, as well as encourage greater consideration of naming, which is a key aspect of clarity and readability.

One mitigating factor for the lack of default arguments is Go’s easy-to-use type-safety support for mutable functions.

Syntax

Syntax is the user interface of a programming language. While it has limited impact on the semantics of a language, which is probably the more important component, syntax determines readability and clarity. In addition, syntax is critical to tools: if the language is hard to parse, automated tools are hard to write.

As a result, Go is designed with clarity and tools in mind and with clear syntax. Its syntax is modest in size compared to other languages in the C Series, with only 25 keywords (C99 has 37; C ++ 11 had 84; The numbers continue to grow). More importantly, the syntax is regular and therefore easy to parse (in most cases; We may have fixed some quirks, but not caught them early). Unlike C and Java, especially C ++, Go can be parsed without type information or symbol tables; There is no specific type of context. The syntax is easy to reason about, so the tool is easy to write.

One detail of the Go syntax that surprised C programmers was that the declaratory syntax was closer to Pascal than to C. Var fn func ([] int) int type T struct {a, b int} int (* fn) (int []); Struct T {int a, b; } declarations introduced by keywords are easier for humans and computers to parse, and type syntax is not expression syntax because it has a significant effect on parsing in C: it adds syntax but disambiguates. But there is a nice side effect: for initialization declarations, you can remove the var keyword and just get the type of the variable from the expression. These two statements are equivalent; The second is short and idiomatic: var buf * bytes.buffer = bytes.newbuffer (x) // explicit buf: = bytes.newbuffer (x) // Derived golang.org/s/decl-syntax has a blog post detailing the syntax of declarations in Go and why it is so different from C.

Functional syntax is simple for simple functions. This example declares the function Abs that takes a single variable x of type T and returns a single float64 value: the func Abs (x T) float64 method is just a function with special arguments whose receiver can be passed to the function using the standard “dot” notation. The method declaration syntax places the sink in parentheses before the function name. This is the same function now used as a method of type T:

Func (x T) Abs () float64 This is a variable with arguments of type T (closure); Go has first-class features and closures:

NegAbs: = func (x T) float64 {return -Abs (x)} Finally, multiple values can be returned in the Go function. A common case is to return a function result and an error value as a pair, as follows: ‘func ReadByte() (c byte, err error)

c, err := ReadByte() if err ! = nil { … } ‘We’ll talk more about errors later.

One feature that is missing from Go is that it does not support default function arguments. This is a deliberate simplification. Experience has taught us that default parameters make it too easy to fix API design flaws by adding more parameters, resulting in too many parameters and interactions that are difficult to unravel or even understand. The absence of default parameters requires defining more functions or methods, because a function cannot hold the entire interface, but this leads to a clearer API that is easier to understand. These functions also require separate names, which clearly indicate what combinations exist, as well as encourage greater consideration of naming, which is a key aspect of clarity and readability.

One mitigating factor for the lack of default arguments is Go’s easy-to-use type-safety support for mutable functions.

Naming

Go takes an unusual approach to defining the visibility of an identifier, the ability of a package’s client to use the item specified by the identifier. For example, unlike private and public keywords, in Go, the name itself contains information: visibility is determined by the case of the first letter of the identifier. If the initial character is uppercase, the identifier is exported (public); Otherwise not:

Uppercase: name visible to package client Otherwise: Package client cannot see name (or _Name) This rule applies to variables, types, functions, methods, constants, fields…… Everything. Everything here belongs to it.

This was not a simple design decision. We spent more than a year trying to define symbols to specify the visibility of identifiers. Once we decided what to do with the name, we quickly realized that it had become one of the most important attributes of the language. After all, that’s the name the package’s customers use; Putting visibility in the name rather than its type means whether an identifier is always clear when viewed, and whether it is part of a public API. After using Go for a while, it can be cumbersome to Go back to other languages where you need to look up declarations to find this information.

The result is once again clear: the source text of the program simply expresses what the programmer meant.

Another simplification is that Go has a very compact scope hierarchy:

Universe (pre-declared identifiers, such as int and string) package (all source files of a package are in the same scope) file (only for package import renaming; Not important in practice) there is no space for a function (usually) block (usually) namespace or class or other wrapper structure. The names come from very few places in Go, and all follow the same scope hierarchy: at any given place in the source, the identifier represents just one language object, regardless of how it is used. (The only exceptions are the statement label, the target of the break statement, etc.; They always have function scope.

This has clear consequences. For example, notice that the method declares an explicit sink and must be used to access fields and methods of that type. There was no hint of that. In other words, always write

Rcvr. Field (where RCVR is any name chosen for the receiver variable) so that all elements of the type are always lexically bound to the value of the receiver type. Similarly, for imported names, there is always a package qualifier; A person writes IO.Reader instead of Reader. Not only is this clear, it also frees the identifier Reader into a useful name to use in any package. In fact, there are multiple exported identifiers in the library, named Reader, or Printf for that matter, but it is always clear which one is referenced.

Finally, these rules combine to ensure that every name except the top-level predefined name (such as int) (the first component of each name) is always declared in the current package.

In short, the name is local. In C, C ++, or Java, the name Y can refer to anything. In Go, y (or even y) is always defined in the package, and the interpretation of x.Y is unambiguous: look up x locally, and y belongs to it.

These rules provide an important extension property because they guarantee that adding an exported name to a package will never break the client side of that package. Naming rules decouple packages, providing scaling, clarity, and robustness.

There is another naming aspect to mention: method lookups are always just the name, not the signature (type) of the method. In other words, there can never be two methods of a single type with the same name. Given the method x.M, there is only one M associated with x. Again, this makes it easy to identify name-only methods. It also makes it easy to implement method calls.

Semantics

The semantics of Go statements are usually C. It is a compiled, statically typed procedural language, with Pointers and so on. By design, it should be familiar to programmers accustomed to the C Series. When introducing a new language, it is important that the target audience can learn it quickly; Root in Go in the C Family helps ensure that young programmers (most of whom know Java, JavaScript, and C) should make Go easy to learn.

That said, Go makes many small changes to THE C semantics, mainly to provide robustness. These include:

No pointer operation
There is no implicit number conversion
Always check array boundaries
No type aliases (after type X int, X and int are different types and not aliases)
++ and – are statements, not expressions
Assignment is not an expression
Getting the address of a stack variable is legal (and even encouraged) and there are even bigger changes away from the traditional C, C ++ and even Java models. These include language support:
concurrent
Garbage collection
Interface type
reflection
Type switches The following sections briefly discuss two topics in Go, concurrency, and garbage collection, primarily from a software engineering perspective. For a complete discussion of language semantics and usage, see the many resources on the Golang.org website.

Concurrency

Concurrency is important in modern computing environments, where multi-core computers run Web servers with multiple clients, which can be described as typical Google applications. This software is not particularly suited to C ++ or Java, which lacks adequate concurrency support at the language level.

Go represents a variant of CSP with first-class channels. CSP was chosen partly because of familiarity (one of us had studied a predecessor language based on CSP ideas), but also because CSP has features that can be easily added to a process programming model without requiring profound changes to that model. That is, given a C-like language, CSPS can be added to the language in most orthogonal ways, providing additional expressiveness without limiting other uses of the language. In short, the rest of the language can remain “ordinary.”

Thus, the method is a combination of functions that independently execute other regular program code.

The resulting language allows us to smoothly combine concurrency with computation. Consider a Web server that must validate the security certificate of each incoming client call; In Go, it’s easy to use the CSP build software to manage the client’s execution as a standalone process, but the full power of an efficient compiled language can be used for expensive cryptographic calculations.

In summary, CSP works for Go and Google. This model is ideal for writing Web server, canonical Go programs.

There is one important caveat: Go is not pure memory safe in the case of concurrency. Sharing is legal, and passing Pointers over channels is idiomatic (and efficient).

Some concurrency and functional programming experts are disappointed that Go does not take a write-once approach to semantic estimation in the context of concurrent computation, for example, unlike Erlang. Again, the reasons are primarily about familiarity and applicability to the problem domain. Go’s concurrency features work well in an environment that most programmers are familiar with. Go supports simple, safe concurrent programming, but does not prohibit programming errors. We compensate by training programmers to think of messaging as a version of ownership control, based on convention. The motto is “Don’t communicate over shared memory, communicate over shared memory”.

Our limited experience with programmers unfamiliar with both Go and concurrent programming suggests that this is a practical approach. Programmers love simplicity, and supporting concurrency brings simplicity to networking software and simplifies powerful functionality.

Garbage collection

Garbage collection might be a controversial feature for system languages, but it took us very little time to decide that Go would be a GC language. Go does not explicitly free memory: the only way allocated memory can be returned to the pool is through the garbage collector.

This was an easy decision to make because memory management has had a profound impact on the way languages work in practice. In C and C ++, too much programming effort is spent allocating and freeing memory. The resulting design tends to expose memory management details that might be hidden; Instead, memory considerations limit how they can be used. Garbage collection, by contrast, makes interfaces easier to specify.

In addition, having automatic memory management is almost essential in a concurrent object-oriented language, because ownership of a chunk of memory can be difficult to manage when passed in concurrent execution. It is important to separate behavior from resource management.

Languages are easier to use because of garbage collection.

Of course, garbage collection comes with significant costs: general overhead, latency, and implementation complexity. Nonetheless, we believe that the benefits most often felt by programmers outweigh the costs, which are borne primarily by the language implementers.

In particular, Java’s experience as a server language makes some nervous about garbage collection in user-facing systems. The overhead is uncontrollable, the latency can be large, and a lot of parameter tuning is required to achieve good performance. However, going is different. The language has attributes that alleviate these problems. Not all, of course, but some.

The key is that Go gives programmers tools to limit allocation by controlling the layout of data structures. Consider this simple type definition for a data structure containing a byte buffer (array) : Type X struct {a, b, c int buf [256]byte} In Java, the BUf field requires a second allocation and access to the second level of indirection. However, in Go, buffers are allocated in a single block of memory along with include structures, and no indirection is required. For system programming, this design can provide better performance and reduce the number of items known to the collector. It can make a significant difference in size.

As a more straightforward example, it is simple and efficient to provide second-order allocators in Go, such as the arena allocator, which allocates a large number of structures and links them to a free list. Libraries that repeatedly use such small structures can generate no waste through proper pre-arrangement, and are efficient and responsive.

Although Go is a garbage collection language, knowledgeable programmers can improve performance by limiting the pressure placed on the collector. (In addition, the Go installation comes with great tools for studying the dynamic memory performance of running programs.)

To give programmers this flexibility, Go must support internal Pointers to what we call allocated objects in the heap. The X. Bouf field in the above example is in the body of the structure, but it is legal to capture the address of this internal field, for example by passing it to an I/O routine. In Java, as in many garbage collection languages, it is impossible to construct an internal pointer like this, but in Go it is idiomatic. This design point affects which collection algorithms can be used and may make them more difficult, but after careful consideration we decided it was necessary to allow internal Pointers because of the programmer’s benefits and the ability to relieve stress (perhaps harder) to implement) for collectors. Our experience so far comparing programs like Go and Java shows that using internal Pointers can have a significant impact on overall arena size, latency, and collection time.

In summary, Go is garbage collection, but gives programmers some tools to control the cost of collection.

Garbage collectors are still an active area of development. The current design is a parallel marker and scan collector, and there are still opportunities to improve its performance and possibly even its design. (The language specification does not require any particular implementation of the collector.) However, if programmers use memory carefully, the current implementation works fine for production use.

Composition not inheritance

Go takes an unusual approach to object-oriented programming, allowing methods of any type, not just classes, but without any form of type-based inheritance, such as subclassing. This means there is no type hierarchy. This was an intentional design choice. While type hierarchies have been used to build highly successful software, we think this model is overused and worth taking a step back.

Instead, Go have interface, this idea has been discussed in detail elsewhere (e.g., see research.swtch.com/interfaces), but there is a brief summary.

In Go, an interface is just a set of methods. For example, here is the definition of the Hash interface in the library.

type Hash interface { Write(p []byte) (n int, Err Error) Sum(b []byte) []byte Reset() Size() int BlockSize() int} Now all data types of these methods implicitly satisfy this interface. No tool declaration. That is, interface satisfaction is statically checked at compile time, so although this decoupled interface is type-safe.

A type typically satisfies a number of interfaces, each corresponding to a subset of its methods. For example, any type that satisfies the Hash interface also satisfies the Writer interface:

Type Writer Interface {Write(p []byte) (n int, err Error)} This fluidity of interface satisfaction encourages a different approach to software construction. But before we do, we should explain why Go is not subclassed.

Object-oriented programming provides a powerful insight: the behavior of data can be generalized independently of the representation of that data. The model works best when the behavior (method set) is fixed, but once you subclass the type and add methods, the behavior is no longer the same. If, on the contrary, the behavior set is fixed, such as in Go’s statically defined interface, the consistency of behavior enables data and programs to be unified, orthogonal, and safely combined.

An extreme example is the Plan 9 kernel, where all system data items implement exactly the same interface, a file system API defined by 14 methods. Even today, this uniformity allows for a degree of object composition that is rarely achieved in other systems. Examples abound. Here’s one: the system can import the TCP stack (in Plan 9 terms) to a computer without TCP or even Ethernet, connect to a computer with a different CPU architecture over that network, import its/proc tree, and run a local debugger to debug breakpoints for remote processes. This kind of action is feasible under Plan 9, nothing special. The ability to do these things falls outside of design; It requires no special arrangements (and is all done in simple C).

We believe that this systematic construction of style is ignored by the language of type-hierarchy-driven design. Type hierarchies lead to fragile code. Hierarchies must be designed early, often as the first step in designing a program, and once the program is written, early decisions may be difficult to change. Thus, the model encourages early overdesign, as programmers try to anticipate every possible use the software might need and add types and layers of abstraction in the following cases. It’s upside down. The way the system interacts should adapt as it grows, not be fixed in the morning of time.

Therefore, Go encourages composition rather than inheritance, using simple, often single-method interfaces to define trivial behaviors that act as clear, easily understood boundaries between components.

Consider the Writer interface shown above, which is defined in package IO: any item that has a Write method with this signature applies to the supplementary Reader interface:

Type Reader Interface {Read(p []byte) (n int, Err Error)} These two complementary methods allow type-safe links to have rich behavior, such as general-purpose Unix pipes. Files, buffers, networks, encryptors, compressors, image encoders and so on can all be connected together. Fprintf formatted I/O routines use IO.Writer instead of FILE * as C does. A formatted printer doesn’t know what it’s writing; It can be an image encoder, which in turn writes to a compressor, which in turn writes to an encryption, which in turn writes to a network connection.

Interface composition is a different style of programming, and people who are used to type hierarchies need to adjust their thinking to do well, but the result is that design adaptation is difficult to achieve through type hierarchies.

Also note that eliminating the type hierarchy also eliminates a dependency hierarchy. Interface satisfaction allows the application to grow organically without a scheduled contract. It’s a linear form of growth; Changes to an interface affect only the immediate clients of that interface; No subtrees can be updated. The lack of tool declarations can upset some people, but it allows the program to evolve naturally, gracefully, and safely.

Go’s interface has a major impact on programming. One place we see it is using functions with interface arguments. These are not methods, they are features. Some examples should illustrate their power. ReadAll returns a byte slice (array) that contains all the data that can be read from io.reader: func ReadAll (r io.reader) ([] byte, error) Wrappers- and return interfaces are also common. Here are some prototypes. LoggingReader records every Read call passed to a Reader. LimitingReader stops reading after n bytes. ErrorInjector simulates I/O errors to aid testing. There’s more. LoggingReader (r IO.Reader) IO.Reader func LimitingReader (r IO.Reader) N int64) IO.Reader func ErrorInjector (r IO.Reader) IO.Reader These designs are completely different from the hierarchical, subtype inheritance approach. They are looser (even temporary), organic, detached, independent, and therefore scalable.

Errors

Go has no exception tools in the traditional sense, that is, no control structure associated with error handling. (Go does provide a mechanism for handling exceptions, such as dividing by zero. A pair of built-in functions called Panic and Recovery allow programmers to prevent these things. However, these functions are intentionally clumsy, rarely used, and are not integrated into the way, for example, Java libraries use exceptions.

The key language capability for error handling is a predefined interface type named Error, which represents the value of an error method that returns a string:

Type Error interface {error () string} Returns a description of the error using the error type. Combined with the function’s ability to return multiple values, it is easy to return calculated results and error values, if any. For example, getchar, equivalent to C, does not return an out-of-band value of EOF or throw an exception; It just returns an error value next to the character, nil error value indicating success. Func (b *Reader) ReadByte() (C byte, err Error) This is a straightforward design that is easy to understand. The error is that the values and programs use them to evaluate, because they would use any other type of value to evaluate.

Deliberately choose not to include exceptions in Go. While many critics disagree with this decision, we think there are several reasons for better software.

First, there is nothing special about bugs in computer programs. For example, the inability to open a file is a common problem that does not merit special language constructs; If and return are ok.

f, err := os.Open(fileName) if err ! = nil {return err} Furthermore, error handling distorts the control flow of the program handling the error if the error uses a special control structure. The Java-like style of the try-Catch-finally block interweaves multiple overlapping control flows that interact in complex ways. While Go makes error checking more verbose by comparison, explicit design makes the control process straightforward – literally.

No doubt the generated code could be longer, but the clarity and simplicity of the code offset its verbosity. Explicit error checking forces programmers to consider and handle errors as they occur. Exceptions make it too easy to ignore them rather than deal with them, passing the debug stack up until it’s too late to fix or diagnose the problem.

Tools

Software engineering needs tools. Each language runs in an environment with other languages and countless tools to compile, edit, debug, configure, test, and run programs.

Go’s syntax, package system, naming conventions, and other features are designed to make the tool easy to write, and the library includes lexers, parsers, and type checkers.

Tools for manipulating Go programs are easy to write, and many have been created, some of which have had an interesting impact on software engineering.

The best known of these is gofmt, the Go source code formatter. Starting with the project, we were going to machine format the Go program, eliminating the whole argument between programmers: How do I lay out the code? Gofmt runs on all the Go programs we write and is used by most of the open source community. It runs as a “pre-commit” check for the code repository to ensure that all checked-in Go programs are in the same format.

Gofmt is often cited by users as one of Go’s best features, even if it’s not part of the language. The existence and use of Gofmt means that the community has always viewed Go code as GoFMT formatting since the beginning, so there is only one style of Go programs that everyone is now familiar with. The uniform presentation makes the code easier to read and therefore faster to manipulate. Time not spent on formatting is time saved. Gofmt also affects scalability: because all code looks the same, teams can work more easily together or with other code.

Gofmt enables another class of tools that we didn’t foresee. The program works by parsing the source code and reformatting it from the parse tree itself. This makes it possible to edit the parse tree before formatting, hence a suite of automated refactoring tools. They are easy to write and can be semantically rich because they work directly on the parse tree and automatically generate code in a canonical format.

The first example is gofmt’s own -R (rewrite) flag, which uses a simple pattern matching language to enable expression level rewriting. For example, the other day we introduced a default value for the right side of the slice expression: the length itself. The entire Go source tree has been updated to use the default values for this command:

Gofmt -r’a [b: len (a)] – > a [b:]’ a key point about this transformation is that because the input and output are in canonical format, the only changes made to the source code are semantic.

If the statement ends at a newline, a similar but more complex process allows the use of gofmt to update the tree when the language no longer requires a semicolon as a statement terminator.

Another important tool is GoFix, which runs a tree rewriting module written in Go and therefore enables more advanced refactoring. The GoFix tool allows us to make sweeping changes to the API and language features up to the Go 1 release, including changing the syntax for removing items from maps, a completely different API for manipulating time values, and more. As these changes roll out, users can update all the code by running the simple command

Gofix please note that these tools allow us to update the code even if the old code is still valid. Therefore, it is easy to keep the Go repository up to date as the repository evolves. Old apis can be quickly and automatically deprecated, so only one version of the API needs to be maintained. For example, we recently changed the protocol buffer implementation of Go to use “getter” functions that weren’t previously in the interface. We ran GoFix on all of Google’s Go code to update all applications that use the protocol buffer, and now only one version of the API is in use. A similar overhaul of C ++ or Java libraries is almost impossible in Google’s code base.

The existence of parsing packages in the standard Go library also enables many other tools. Examples include the GO tool, which manages application builds, including fetching packages from remote repositories; Godoc document extractor, program to verify that API compatibility contracts are maintained when libraries are updated, and so on.

Although these tools are rarely mentioned in the context of language design, they are an integral part of the language ecosystem, and Go was designed with tools in mind, which has had a huge impact on the language, the library, and its development. The community.

Conclusion

The use of Go is growing inside Google.

Several large user-facing services use it, including youtube.com and dl.google.com (a download server that offers Chrome, Android and other downloads), as well as our own golang.org. Of course, many of the smaller ones are built using Google App Engine’s native support for Go.

Many other companies also use Go; The list is long, but some are better known:

BBC global model Heroku’s Nokia SoundCloud looks like Go is achieving its goal. Still, it is too early to declare it a success. We don’t have enough experience, especially with large programs (millions of lines of code), to know if attempts to build an extensible language pay off. But all the indicators are positive.

On a smaller scale, something small is not quite right and may be later (Go 2?). Version of the language to make adjustments. For example, there are too many forms of variable declaration syntax, programmers can easily be fooled by the behavior of nil values in non-zero interfaces, and there are many library and interface details that could use another round of design.

It’s worth noting that Gofix and Gofmt gave us the opportunity to address a number of other issues during the Go version 1 boot. Because it is now closer than designers would like to be without these tools, which are enabled by language design.

But not everything has been resolved. We are still learning (but the language is now frozen).

An important drawback of the language is that the implementation still needs to work. The code and runtime generated by the compiler should perform better and continue to work. Progress has been made; In fact, some benchmarks in today’s development release show a doubling of performance compared to the first release of Go version 1 in early 2012.

Summary

Software engineering guided the design of Go. In addition to most general-purpose programming languages, Go aims to solve a range of software engineering problems we encounter when building large-scale server software. On the other hand, this might make Go sound rather dull and industrial, but in fact, the focus on clarity, simplicity, and composability throughout the design has resulted in an efficient, fun language that many programmers find expressive and powerful.

The attributes that lead to this include:

Clearing dependencies
Grammar is clear
Clear semantics
Composition of succession
Simplicity provided by the programming model (garbage collection, concurrency)
Simple tools (Go tool, Gofmt, GoDoc, GoFix) If you haven’t tried Go yet, we recommend you do. golang.org