string

  • Swift strings do not support random access

Unicode instead of fixed width

Unicode today is a variable-length format. Its variable length feature has two different meanings:

  • A Unicode character, also known as an Extended Grapheme cluster, consists of one or more Unicode scalars.
  • A Unicode scalar can be encoded as one or more code units.

The primitive of the group base in Unicode is called a code point: it is an integer in the Unicode encoding space (0 to 0x10FFFF, which is decimal 1,114,111).

Unicode scalars and code points are, for the most part, the same thing. Or, any value other than 0xB800-0xDFFF in the code point can be called a Unicode scalar. The 2048 values of 0xD800-0xDFFF are surrogate Code points, and are used in UTF-16 encoding to represent characters with a value greater than 65525.

The same Unicode data can be encoded in a number of different encodings, the most common of which are 8-bit (UTF-8) and 16-bit (UTF-16). The smallest entity used in the encoding scheme is called the encoding unit, which means utF-8 encoding units are 8 bits wide and UTF-16 encoding units are 16 bits wide.

The “single character” that the user sees on the screen may be a combination of Multiple Unicode scalars. Unicode has a term for this user-aware “single character” called an (extended) bit cluster. The English equivalent is an (extended) Grapheme cluster.

Word bit clusters are equivalent to standards

Merge tag

let single = "Pok\u{00E9}mon" / / Pokemon
let double = "Poke\u{0301}mon" / / Pokemon

(single, double) / / (" Pokemon ", "Pokemon")

single.count / / 7
double.count / / 7

single = = double // true

single.unicodeScalars.count / / 7
double.unicodeScalars.count / / 8
Copy the code

Performing a word

  • Unicode represents complex emojis as a sequence of simple emojis that are invisible by a scalar value of U+200DZero-width joiner (ZWJ)The connection.

Strings and sets

Two-way indexing, not random access

  • String is not a randomly accessible collection. Knowing the position of the NTH character in a given string does not help in calculating how many Unicode scalars precede that character. So, String only implements BidirectionalCollection. You can start at the beginning or end of the string, move backwards or forwards, and the code will look for combinations of adjacent characters, skipping the correct number of bytes. Anyway, you can only iterate one character at a time.

Ranges are replaceable, not mutable

  • String also meet RangeReplaceableCollection agreement.

    var greeting = "Hello, world!"
    if let comma = greeting.index(of: ",") {
      greeting[..<comma] // Hello
      greeting.replaceSubrange(comma.,with: " again.")
    }
    greeting // Hello again.
    Copy the code

String index

  • Swift does not allow subscripting strings with integer values. Because subscript access to integers cannot be done in constant time (an intuitive requirement for the Collection protocol), the search for the NTH Character must also check all bytes before it.

  • The API for manipulating string indexes is the same as for any other Collection you encounter, which is based on the Collection protocol.

    let s = "abcdef"
    let second = s.index(after: s.startIndex)
    s[second] // b
    Copy the code
  • Extended delimiters syntax, which surrounds strings with #. This allows you to use references directly in strings without escaping them.

    let scv = #""" "Value in quotes","can contain, characters" "Values without quotes work as well:", 42 """#
    Copy the code

The substring

  • Like all collection types, strings have a specific SubSequence type called Substring. A Substring is similar to an ArraySlice: it is a view based on the contents of the original string, marked with different start and end positions.

StringProtocol

  • The interface for Substring and String is almost identical. This is done through a generic protocol called StringProtocol, which both Strings and SubStrings adhere to.

    func lastWord(in input: String) -> String? {
      // Handle input and manipulate substrings
      let words = input.split(separators: [",".""])
      guard let lastWord = words.last else { return nil }
      // Converts to a string and returns
      return String(lastword)
    }
    
    lastWord(in: "one, two, three, four, five") // Optional("five")
    Copy the code
  • The fundamental reason to discourage long-term string storage is that substrings hold the entire original string forever.

  • By using substrings within an operation and only creating a new string at the end, we defer copying until the last minute, which ensures that the overhead incurred by these copying operations is actually needed.

  • It would be simpler and clearer to just use String in most apis, rather than changing it to generics (which have their own overhead). This rule does not apply to apis that have a high probability of handling substrings without further generalizing to Sequence or Collection operations.

    extension Sequence where Element: StringProtocol {
      /// Concatenates the elements of a sequence into a new string using the given delimiter, and returns
      public func joined(separator: String = "") -> String
    }
    Copy the code
  • If you want to extend String to add new functionality to it, it’s a good idea to put this extension in StringProtocol to keep the String and Substring APIS consistent. StringProtocol was originally designed to be used when you want to extend strings.

  • StringProtocol is not a target protocol that you should implement if you want to build your own string types.

    Do not declare any new StringProtocol compliant types. Only String and Substring from the standard library are valid adaptation types.

Coding unit view

  • Sometimes, when word clusters are not sufficient, we can also look and manipulate them at lower levels, such as Unicode scalars or coding units. String provides three views for this: unicodeScalars, UTf16, and UTf8.
  • Utf-8 is the de facto standard for storing or sending text over the network. The UTF-8 view has the lowest overhead of all String encoding unit views. Because it is the native memory storage format for Swift strings.
  • Note that the UTF8 collection does not contain the null byte at the end of the string. If you need to use null for the ending, you can use the withCString method of String or the utf8CString property.

Share index

  • Strings and their views share the same Index type, String.index.

Strings and Foundation

  • Swift’s String type is closely related to Foundation’s NSString. Any String instance can be converted to NSString via the AS operation, and the Objective-C API that accepts or returns NSStrings automatically converts the type to String.
  • The native encoding of Swift strings in memory is UTF-8, while NSString is UTF-16. This difference causes some additional performance overhead when Swift strings bridge to NSStrings. For AN NSString, moving a position on a UTF-16 offset takes constant time, whereas for a Swift string, this is a linear time operation. To reduce this performance difference, Swift implements a very complex but efficient index caching method that allows these linear time operations to be achievedAmortized Constant TimePerformance.

Other string-based Foundation apis

  • NSAttributedString corresponding immutable strings, NSMutableAttributedString corresponding variable string. Unlike collections that adhere to value semantics in the Swift library, they all adhere to reference semantics.

  • NSRange is a structure containing two integer fields location and length:

    public struct NSRange {
      public var location: Int
      public var length: Int
    }
    Copy the code

Range of characters

  • CharacterThe Strideable protocol is not implemented, only the scope that implements this protocol iscountableThe collection.

CharacterSet

  • CharacterSetIt should actually be calledUnicodeScalarSetBecause it really is a data structure that represents a series of Unicode scalars. It is completely incompatible with the Character type.

Unicode properties

  • In Swift5, some of the features of CharacterSet have been ported to unicode.scalar.

    ("😀" as Unicode.Scalar).properties.isEmoji // true
    ("∬" as Unicode.Scalar).properties.isMath // true
    Copy the code

    You can be in Unicode. The Scalar. The Porperties find the complete property list.

Internal structure of String and Character

  • Like other collection types in the library, string is a value semantic type that implements copy-on-write.
  • In Swift 5, Swift native strings (as opposed to strings received from Objective-C) are represented in memory in UTF-8 format.
  • As a special optimization, Swift does not create a special storage buffer for small strings that have less than 16 (or less than 11 on 32-bit platforms) UTF-8 encoding units. Since strings are at most 16 bytes, these encoding units can be stored inline.
  • A character is now internally represented as a string of length 1.

String literals

  • ""Is a string literal. We can realize ExpressibleByStringLiteral agreement through your own types support string literal initialization.

String interpolation

String interpolation is a syntactic feature that has been around since the launch of Swift. He can let us insert expressions into string literals (for example, “A * b = \(a * b)”).

Custom String description

  • Customize print and String output by implementing CustomStringConvertible. By implementing CustomDebugStringConvertible custom String (the result of reflecting:).

    extension SafeHTML: CustomStringConvertible {
      var description: String {
        return value
      }
    }
    
    extension SafeHTML: CustomDebugStringConvertible {
      var debugDescription: String {
        return "SafeHTML: \(value)"}}Copy the code

Text output stream

  • The library’s print and dump functions record the text to standard output. The default implementation of these two functions calls print(_:to:) and dump(_:to:). The to argument is the output target, which can be any type that implements the TextOutputStream protocol:

    public func print<Target: TextOutputStream>
    	(_ items: Any..separator: String = "".terminator: String = "\n".to output: inout Target)
    Copy the code
  • We can also create our own output stream. The TextOutputStream protocol has only one requirement, which is to take a string and write it to the write method in the liu. For example, the output stream writes the input to a buffer array:

    struct ArrayStream: TextOutputStream {
      var buffer: [String] = []
      mutating func write(_ string: String) {
        buffer.append(string)
      }
    }
    
    var stream = ArrayStream(a)print("Hello", to: &stream)
    print("World", to: &stream)
    stream.buffer // ["","Hello","\n","","World","\n"]
    Copy the code
  • The source of the output stream can be any type that implements the TextOutputStreamable protocol. This protocol requires the write(to:) generic method, which can take as input any type that satisfies a TextOutputStream and write self to the output stream.

    struct StdErr: TextOutputStream {
      mutating func write(_ string: String) {
        guard !string.isEmpty else { return }
        
        // The Swift string can be passed directly to those who accept const char *
        fputs(string, stderr)
      }
    }
    
    var standarderror = StdErr(a)print("oops!", to: &standarderror)
    Copy the code
  • Streams can also hold state or deform the output. In addition, you can link multiple streams together.

    struct ReplacingStream: TextOutputStream.TextOutputStreamable {
      let toReplace: DictionaryLiteral<String.String>
      private var output = ""
      
      init(replacing toReplace: DictionaryLiteral<String.String>) {
        self.toReplace = toReplace
      }
      
      mutating func write(_ string: String) {
        let toWrite = toReplace.reduce(string) { partialResult, pair in
        	partialResult.replacingOccurrences(of: pair.key, with: pair.value)
        }
        print(toWrite, terminator: "", to: &output)
      }
      func write<Target: TextOutputStream> (to target: inout Target) {
        output.write(to: &target)
      }
    }
    
    var replacer = ReplacingStream(replacing: [
      "in the cloud": "on someone else's computer"
    ])
    
    let source = "People find it convenient to store their data in the cloud."
    print(source, terminator: "", to: &replacer)
    
    var output = ""
    print(replacer, terminator: "", to: &output)
    output
    // People find it convenient to store their data on someone else's computer.
    Copy the code