Moment For Technology

"Go language series tutorial" string

Posted on Oct. 3, 2022, 4:34 a.m. by Kathy Manning
Category: The back-end Tag: The back-end Go

This is the sixth day of my participation in Gwen Challenge

Hi, I'm @Luo Zhu

This article was first published on luo Zhu's official website

This article synchronizes in the public account Luo Zhu early teahouse, reprint please contact the author.

This article was translated from Golang Tutorial Series

Creation is not easy, form a habit, quality three even!

In Go, Strings deserve special mention because they are implemented differently than other languages.

What is String?

In Go, a string is a slice of bytes. Strings can be created by placing a set of characters inside double quotes

Let's look at a simple example of creating a string and printing it out.

package main

import (
    "fmt"
)

func main(a) {
    name := "Hello World"
    fmt.Println(name)
}
Copy the code

Run in playground

The above program prints Hello World.

The strings in Go are Unicode-compliant and UTF-8 encoded.

Access a single byte of a string

Because a string is a slice of bytes, you can access every byte of the string.

package main

import (
    "fmt"
)

func printBytes(s string) {
    fmt.Printf("Bytes: ")
    for i := 0; i  len(s); i++ {
      fmt.Printf("%x ", s[i])
    }
}

func main(a) {
  name := "Hello World"
    fmt.Printf("String: %s\n", name) // The input string is printed out
    printBytes(name)
}
Copy the code

Run in playground

%s is a formatted identifier for printing strings. Len (s) returns the number of bytes in the string, which we print in hexadecimal notation using a for loop. %x is a hex format specifier. The output of the above program is:

String: Hello World
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
Copy the code

This is the Unicode UT8 encoding value for Hello World. To better understand strings, you need a basic understanding of Unicode and UTF-8. I recommend reading naveenr.net/unicode-cha... Learn more about Unicode and UTF-8.

Accessing a single character of a string

Let's modify the above program slightly to print the characters of the string.

package main

import (
    "fmt"
)

func printBytes(s string) {
    fmt.Printf("Bytes: ")
    for i := 0; i  len(s); i++ {
        fmt.Printf("%x ", s[i])
    }
}

func printChars(s string) {
    fmt.Printf("Characters: ")
    for i := 0; i  len(s); i++ {
        fmt.Printf("%c ", s[i])
    }
}

func main(a) {
    name := "Hello World"
    fmt.Printf("String: %s\n", name)
    printChars(name)
    fmt.Printf("\n")
    printBytes(name)
}
Copy the code

Run in playground

The %c formatting identifier is used to print characters in the string arguments of the printChars method. The program prints:

String: Hello World
Characters: H e l l o   W o r l d
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
Copy the code

Although the above program appears to be a legitimate way to access a single character of a string, there is a serious bug. Let's see what the mistake is.

package main

import (
    "fmt"
)

func printBytes(s string) {
    fmt.Printf("Bytes: ")
    for i := 0; i  len(s); i++ {
        fmt.Printf("%x ", s[i])
    }
}

func printChars(s string) {
    fmt.Printf("Characters: ")
    for i := 0; i  len(s); i++ {
        fmt.Printf("%c ", s[i])
    }
}

func main(a) {
    name := "Hello World"
    fmt.Printf("String: %s\n", name)
    printChars(name)
    fmt.Printf("\n")
    printBytes(name)
    fmt.Printf("\n\n")
    name = "Senor,"
    fmt.Printf("String: %s\n", name)
    printChars(name) //
    fmt.Printf("\n")
    printBytes(name)
}
Copy the code

Run in playground

The output of the above program is

String: Hello World Characters: He L L O W O r L D Bytes: 48 65 6C 6C 6F 20 57 6F 72 6C 64 String: Senor Characters: S e A ± o r Bytes: 53 65 c3 b1 6f 72Copy the code

We try to print Senor characters, but it prints Se A ± or, which is wrong. Why does this program make mistakes with Senor and work perfectly well with Hello World? The reason is that the Unicode code point for N is U+00F1, whose UTF-8 encoding takes up two bytes c3 and B1. We try to print characters, assuming that each code point is a byte, which is incorrect. ** In UTF-8 encoding, a code point can occupy more than 1 byte. ** So how do we solve this problem? It's up to Rune to save our place.

Rune

Rune is a built-in type in Go, which is an alias for INT32. Rune represents a Unicode code point in Go. No matter how many bytes this code point takes up, it can be represented by Rune. Let's modify the program above to print characters with Rune.

package main

import (
    "fmt"
)

func printBytes(s string) {
    fmt.Printf("Bytes: ")
    for i := 0; i  len(s); i++ {
        fmt.Printf("%x ", s[i])
    }
}

func printChars(s string) {
    fmt.Printf("Characters: ")
    runes := []rune(s) // The string is converted to a slice of runes
    // Then we loop through it and display the characters.
    for i := 0; i  len(runes); i++ {
        fmt.Printf("%c ", runes[i])
    }
}

func main(a) {
    name := "Hello World"
    fmt.Printf("String: %s\n", name)
    printChars(name)
    fmt.Printf("\n")
    printBytes(name)
    fmt.Printf("\n\n")
    name = "Senor,"
    fmt.Printf("String: %s\n", name)
    printChars(name)
    fmt.Printf("\n")
    printBytes(name)
}
Copy the code

Run in playground

The above program prints:

String: Hello World Characters: He L L O W O r L D Bytes: 48 65 6C 6C 6F 20 57 6F 72 6C 64 String: Senor Characters: S e n O R Bytes: 53 65 C3 B1 6f 72Copy the code

The output above is perfect. Just what we want ?.

usefor rangeLoop through a single Rune

The program above is a perfect way to iterate over each Rune of a string. But Go gives us an easier way to do this using a for range loop.

package main

import (
    "fmt"
)

func charsAndBytePosition(s string) {
    // Use the for range loop to iterate over string
    for index, rune: =range s {
        fmt.Printf("%c starts at byte %d\n".rune, index)
    }
}

func main(a) {
    name := "Senor,"
    charsAndBytePosition(name)
}
Copy the code

Run in playground

The loop returns the position of the byte at the beginning of Rune, as well as the position of Rune. This program outputs:

S starts at byte 0
e starts at byte 1
ñ starts at byte 2
o starts at byte 4
r starts at byte 5
Copy the code

As you can see from the output above, n takes up two bytes because the next character o starts at byte 4, not byte 3 ?.

Creates a string from a byte slice

package main

import (
    "fmt"
)

func main(a) {
    byteSlice := []byte{0x43.0x61.0x66.0xC3.0xA9}
    str := string(byteSlice)
    fmt.Println(str)
}
Copy the code

Run in playground

ByteSlice contains utF-8 encoded hexadecimal bytes for the string Cafe. The program prints out

CafeCopy the code

What if we had the decimal equivalent of a hexadecimal value? Does the above program work? Let's take a look.

package main

import (
    "fmt"
)

func main(a) {
    byteSlice := []byte{67.97.102.195.169} / / decimal equivalent to {' \ x43 ', '\ x61', '\ x66', '\ xC3', '\ xA9'}
    str := string(byteSlice)
    fmt.Println(str)
}
Copy the code

Run in playground

The decimal point value will also work, and the program will print Cafe as well.

Search
About
mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.