$./goBible --name= goBible (2) --chapter=Basic Data typeCopy the code

The string encoding related section is in# 2.4.

(I have to say, go is really convenient to see the source code)

1. number(integer/float/complex)

1.1 type

  1. Int is an unambiguous type, different from int32. The size of int is uncertain and is passed32 << (^uint(0) >> 63)It’s calculated. Source code (src/strconv/atoi.go) is as follows:
  2. Int32 and rune, unit8 and byte are of the same type and are not aliases. That means the two can be used interchangeably in any situation.

Check out the source code (src/builtin/builtin.go) :

1.2 calculation

  1. % evaluates to the value to the left of %. -5%3 == -2, -5%-3 == -2
  2. / Whether the result is a whole number or a decimal depends on whether there is a decimal on both sides. In the decimal case, the type is float64.
  3. It should be noted that if you want to use float, use Float64 instead of float32, which often produces all sorts of weird errors.

1.3 unsigned and signed

  1. Use int instead of unit even for numbers such as length, which must not be less than zero, because assignments can occur, such as:
    var uint32 length = getLength()
    for i := length; i >= 0; i-- {
        // do sth
    }
    Copy the code

    This is going to go on forever becausei := lengthIn this step, I is already a uint32, it’s never less than 0, but we just want length to be a uint32, and accidentally I becomes that too.

    • So use unsigned types unless you’re using a bit or something special.

1.4 floating point number

  1. Convert float to int, equivalent to floor
  2. The float type has several special values, 0, -0, Inf, -INF, NaN.

1.5 the plural

Go provides the complex function in builtin.go, which can be used to create a complex number. Complex64 and complex128 correspond to float32 and float64 respectively.

var a complex64 = complex(1.2) // A is a complex64 type
var b complex128 = complex(1.2) // b is a complex128 type
Copy the code

Or use a literal:

a := 1 + 2i
Copy the code

Can you guess what type of a this is, 64 or 128? The answer is 128, for the simple reason that 128 corresponds to Float64, and float defaults to float64.

2. The string

2.1 In Golang, strings are made of bytes, not char.

  • The len function on a string returns not the number of characters, but the number of bytes. If it’s English, it doesn’t matter. If it’s Chinese or any other character, it’s wrong:
  • In the same way,s[i]The i-th byte is the same as the i-th byte.

2.1 Various string value methods and basic operations

  • The type of a string is string
  • s[i]Of type, uint8, or byte. Not a string. To get a string, you have to cast it:string(s[i])
  • s[0:3]The type ofNote: this has nothing to do with slicing. It’s just using the same syntaxThe output is a new string. In other words,s[i:j]Equivalent to onesubstringFunction.

That is, if you want to get an” h” from s := “human”, you should not use s[0], but s[0:1]. If it is s := “human”, to get “human”, use s[0:3].

  • You can use+To concatenate two strings, note that both sides must be strings. Byte doesn’t work either.

2.2 Immutable Strings

As in most languages, strings are immutable. The + operator produces a new string. S [I] cannot be assigned.

Immutable strings do not mean that memory cannot be shared; rather, immutable strings make it safe to share memory.

s := "human"
t := s[0:3]
T and s share the memory of the first three characters. But since you can't change the memory in any way, it's safe.
// If you change t, then t also uses new memory (or shared memory of other strings).
The memory of the first three characters of // s is not affected.
Copy the code

2.3 literal

Except for some general grammar. Raw String literals can be wrapped in ‘, without any escape, including the newline character \n. There is one exception, however, and that is Carriage Return. This is commonly known as the CR character. This will be removed to keep the printing consistent across all machines.

2.4 codingsmallknowledge

Against 2.4.1 ASCII

Long ago, computer code used ASCII to represent everything. But it can only be used to express English, not other languages, and all kinds of symbols. ASCII takes up 7 bits, which is 128 different characters.

2.4.2 Unicode

Then came Unicode, or UTF-32, or UCS-4. In GO, Unicode is rune, or INT32. However, UTF-32 also has a defect, that is, all characters are 32 bits/4 bytes, even English characters that consist of only one byte. In fact, the reality is that all characters in the world so far are no more than 2 bytes.

In Go, characters enclosed in single quotes are of type rune. It is written like the char type in other languages, except that rune is a four-byte Unicode.

2.4.3 UTF-8

Hence utF-8. Utf-8 encodes Unicode again (so you see, an extra layer solves everything). Utf-8 is a variable-length encoding. ASCII characters are represented by a single byte. If one byte is not enough, multiple bytes are used. This is exactly how Go chooses to encode strings.

The double-quoted, or string, is encoded internally in UTF-8. But using range will automatically decode to UTF-32. The FORi loop is used to traverse utF-8 bytes slice.

2.5 Special Range

If you use range, range will automatically decode utF-8 to UTF-32, so you don’t have to worry about how many bytes each character takes and how to change index. Such as:

s := "Hi, human"
for i, r := range s {
    fmt.Println(i, r, string(r))
}
/ / output
// 0 72 H
// 1 105 i
/ / 2, 44,
/ / 3 32
/ / 4, 20154
/ / 7 31867 class
Copy the code

Notice the index of the last one, you can see that it jumped by 3, because people took up 3 bytes. So the class starts at 7. Of course, r is of type rune, so if you want to see strings you have to use string conversions, or Printf formatting.

2.5.1 Use range to get the display length of the string

As mentioned earlier len counts bytes. So here we use the range feature to calculate the actual length of the string.

length := 0
for_, _ : =range s {
    length++
}
Copy the code

Or you can use the special syntax of range when no variables need to be initialized like this:

length := 0
for range s {
    length++
}
Copy the code

Of course, the library already provides this functionality: utf8.runecountinString ()

2.5.2 Decoding Failed

If there are some decoding problems in the string, go defaults to using a single character instead, and when you see it, it tells you that your string’s origin is wrong. The unicode encoding for this character is \uFFFD, which prints out as a white question mark on a black diamond background, like this: �.

2.6 the use of[]runeType conversion to decode

show my code:

s := "Oriental"

r := []rune(s) // Decode to []rune
Copy the code

One thing to note here is that while string and []byte in Go have a lot in common, []rune doesn’t have that special treatment, just a slice of type Rune.

2.7 Pit where int type is converted to String

You might think I := 65; S := string(I) will get “65”, but will actually get “A”. Because the string conversion interprets int as rune (i.e., UTF-32) (even if you declare I to be an integer, it sees it as rune, with a value of 65). Since rune is int32, which is really hard to distinguish from int), it is then encoded as UTF-8 (string).

If you want to convert it toa string, use itoa.

Quiz: Why doesn’t string() just be converted to byte? You want to go to Unicode and then UTF-8?

  • The answer:
    1. Because UTF-8 is a Unicode encoding, you can’t get UTF-8 without Unicode. Utf-8 is composed of byte types, which does not mean that UTF-8 is encoded by byte types. Byte is just a representation, and UTF-8 is a specification for interpreting bytes. It depends on what your code wants to say:
      • If you print a btye with % D, you get a uint8 number.
        • If you print a byte in %q, you get UTF-8 characters (because UTF-8 can also be a byte).
      • You print with %q[]byte, resulting in utF-8 characters.
        • You print with %d[]byte, an error is reported
      • When you print a rune with %q, you get utF-32 characters.
        • When you print a rune with %d, you get an int32 number.
    2. String and[]byteNot exactly the same thing, but string is immutable, among other things. Slice is mutable. But many features do share.

2.8 the string and[]byte

There are many ways to manipulate string, and there are many libraries to do it for us. Four libraries are particularly important: String, Bytes, Unicode, and Strconv.

2.8.1 strings library

The String library, as its name implies, is a library that performs various operations on strings. Pretty much everything you need.

2.8.2 bytes library

As the name suggests, is a library that operates on [] bytes. Because strings are immutable, operating on strings in many cases, especially changing strings in a for loop, can be costly. In this case, changing string to []byte is useful. For this reason, the Bytes library implements many of the same effect functions as the Strings library.

[] Byte and string can be converted to each other. No matter which side converts to the other side, we copy it when we convert, because we can’t change the string. So there is no need to change it to []byte if it is only a string lookup operation that does not change the string, or if it is only a small string change.

2.8.3 strconv library

Because of the string() type conversion bug, GO provides this library for converting various types to strings and vice versa. Such as the famous Itoa and Atoi. Quoting and unquoting functions are also provided.

  • Itoa is just a wrapper around FormatInt. The first argument to FormatInt must be in int64 format, and Itoa performs a type conversion; The second argument is base.
  • Atoi is a little different, although there is a wrapper around ParseInt, but for smaller numeric strings (intSize, 32-bit less than 10, 64-bit less than 19), Atoi will convert them directly. If it’s long, Atoi will call ParseInt.

2.8.4 unicode library

Unicode is the library that operates on RUNe. There are a number of functions like IsDigit,IsUpper (unfortunately, only one, so to judge an entire string you have to judge it one at a time in the for loop. Fortunately, Range automatically decodes rune for us, saving some decoding effort).

func isLetterString(s string) bool {
	for _, r := range s {
		if! unicode.IsLetter(r) {return false}}return true
}
Copy the code

2.8.5 bytes. The Buffer

Bytes also provides the bytes.buffer type for better manipulation of []byte.

var buf bytes.Buffer // An empty buF can be used without initialization
buf.WriteRune('A')
fmt.Fprintf(&buf, "%d".byte(65))
/ / output A65 ""

// FMT.Fprintf can also be changed to writeString
Fprintf also uses WriteString internally, so it's better to use WriteString directly
buf.WriteString(strconv.FormatInt(65.10))
Copy the code

3. The constant

3.1 iota

I don’t know how to use it. We all know it.

Not ITOA, IOTA. Iota is a Greek letter, the smallest letter.

3.2 untyped constansts

Constants must have a value, but may not be typed (although the approximate type is inferred, the following example is inferred to be an integer, but is not treated as an int, int32, int64, etc.)

const a = 27891723897213897219837219
const b = 123213213213213213213
Copy the code

3.2.1 Calculation accuracy of untyped Constanst

Obviously, the string of numbers on the right exceeds the limit of what INT64 can represent. But you can store it in a constant. You can even calculate it. And it’s extremely accurate. For example, if b is 2, a/ B will be an error because there is no room for it.

3.2.2 Types of untyped constanst

Untyped constanst can be used as all small types within this large type.

  • Math. Pi, for example, is also an untyped constant, so it can be assigned to any floating point type.
    1. If you assign a value to a new variable, you need to infer the type from the value on the right, for exampleVar f = 0.0If the one on the right is untyped, the float64 type is actually inferred.
    2. But if thefIs an already declared float32 type, such asf = getFloat32(); F = 0.0,0.0 is implicitly converted to float32.

3.2.3 When assigning a value to a variable, the one on the right is actually untyped constant

Such as:

var a = 0 // The 0 on the right is an untyped int. It is automatically converted to an int when assigned.
var b = 0.0 // 0.0 is an untyped float that is automatically converted to float64 when assigned
Copy the code