string

A String in Swift is a collection of Character values, and Character is a single Character that humans understand when reading text, regardless of how many Unicode scalars that Character consists of.

Strings in Swift do not support random access. You cannot use something like STR [999] to get the thousandth character of the String.

When characters have variable width, the string doesn’t know where the NTH character is stored, and it has to look at all characters before that character before it can finally determine where the object character is stored, so it can’t be an O(1) operation, right

Unicode, not fixed width

An ASCII string is simply a sequence of integers between 0 and 127. You can put this integer in an 8-bit byte.

But eight bits is not enough for many languages.

When the fixed width coding space is used up, there are two options:

  • Increase the width (originally defined as 2 bytes fixed width format, 2 bytes is not enough, 4 bytes is too inefficient)
  • Switch to variable-length encoding (eventually choose this variable-length format)

Some Unicode noun relationships

  • Swift “single Character” = Swift 1 Character = 1 byte cluster
  • 1 Unicode character = 1 word cluster = 1 or more Unicode scalars
  • A Unicode scalar can be encoded into one or more encoding units
  • A Unicode scalar can be understood in most cases as a code point

Code points

  • The most basic element of Unicode is called the encoding point, which is an integer in the Unicode encoding space (from 0 to 0x10FFFF, which is 1,114,111 in decimal).
  • Each character or other language unit or emoji in Unicode has a unique coding point.
  • Code points are always written as hexadecimal numbers prefixed with U+, such as euro -> U+20AC, in Swift -> “\ U {20AC}” = “€”

Coding unit

  • Unicode data can be encoded in a number of different encodings, the most commonly used being 8-bit (UTF-8) and 16-bit (UTF-16).
  • The smallest entity used in the encoding scheme is called the encoding unit, that is, the utF-8 encoding unit is 8 bits wide and the UTF-16 encoding unit is 16 bits wide.
  • An added benefit of UTF-8 is its backward compatibility with 8-bit ASCII encodings, which has helped UTF-8 take over the ASCII banner as the most popular encoding for the Web and file formats today.
  • In Swift, the encoding unit values used by UTF-8 and UTF-16 are described as UInt8 and UInt16 respectively (they also have two aliases, unicode.utf8.CodeUnit and Unicode.utf16.codeunit).

Word clusters are equivalent to standards

Merge tag

String

Let single = "Pok\u{00E9}mon" let double = "Poke\u{0301}mon" // Pokemon (single, double) // (" Pokemon ", "Pokemon ") single.count // 7 double. Count // 7 // the default is to compare the standard equivalent single == double // true // by comparing the Unicode scalars that make up the string Single. UnicodeScalars. Count / / 7 double. UnicodeScalars. Count / / / / 8 by comparing string utf8 single. Utf8. ElementsEqual (double. Utf8) // falseCopy the code
let chars: \ [Character] = [" u ecd {1} \ u {300} ", / / ọ ́ \ "u \ u {323} {F2}", / / ọ ́ "\ {6} f u \ u {323} \ u {300}", / / ọ ́ \ "u {6} f \ u {300} \ u {323}" / / ọ ́] let allEqual = chars. DropFirst () allSatisfy {$0 = = chars. First} / / trueCopy the code

NSString

let nssingle = single as NSString nssingle.length // 7 let nsdouble = double as NSString nsdouble.length // 8 nssingle == nsdouble // false // to compare two nsstrings in a standard equivalent way, we use NSString.compare(_:)Copy the code

Performing a word

In Java or C#, “😂” is considered to be two “characters” long. Swift handles this correctly:

Let oneEmoji = "😂"// U+1F602 oneemoji.count // 1Copy the code

What matters here is how the string is rendered in the program, not how it is stored in memory.

Some emojis can also be composed of multiple Unicode scalars:

Let flags = "🇧🇷🇳🇿" flags.count // 2Copy the code

To see the Unicode scalars that make up a string, we can use the unicodeScalars view of the string, where we format the scalar values into the hexadecimal format commonly used for encoding points:

fags.unicodeScalars.map { 
    "U+\(String($0.value, radix: 16, uppercase: true))" 
}
// ["U+1F1E7", "U+1F1F7", "U+1F1F3", "U+1F1FF"]
Copy the code

By combining five skin modifiers (such as 🏽, or one of four other skin modifiers) with a base character such as 👧, you get a character with skin tones such as 👧🏽. Again, Swift handles it correctly:

Let skinTone = "👧 🏽" / / 👧 + 🏽 skinTone. Count / / 1Copy the code

There are countless combinations of people, both sexes and numbers, and defining a single coding point for each of them can be problematic. When these combinations are taken into account in terms of skin color, it becomes almost impossible to have a code point for each case.

The solution for Unicode is to represent such complex characters as a sequence of simple emoji joined by an invisible zero-width Joiner (ZWJ) with a scalar value of U+200D

The presence of ZWJ is a hint to the operating system to treat the ZWJ concatenated character as a glyph, if possible.

"Let the let family1 =" 👨 👩 👧 👦 family2 = "👨 \ u {200} d 👩 \ u {200} d 👧 \ {200} d u 👦" family1 = = family2 / / true family1. Count / / 1 family2.count // 1Copy the code

Strings and collections

String is a collection of Character values

After Swift4: When joining two sets, you might assume that the length of the resulting set is the sum of the lengths of the two sets to be joined. But for strings, if the end of the first set and the beginning of the second set can form a word cluster, they are no longer equal.

Let flagLetterC = "🇨" let flagLetterN = "🇳" Let flag = flagLetterC + flagLetterN // 🇨🇳 flag.count // 1 flag.count ==  flagLetterC.count + flagLetterN.count // falseCopy the code

Bidirectional indexing, not random access

A String is not a randomly accessible set. Knowing the position of the NTH character in a given string does not help to calculate how many Unicode scalars there were before that character. String only implements BidirectionalCollection. You can start at the beginning or end of the String, move backwards or forwards, and the code will look at the combinations of adjacent characters and skip the correct number of bytes. Either way, you can only iterate one character at a time.

Keep this performance impact in mind when writing string processing code. Algorithms that require random access to maintain their performance guarantees are not a good choice for Unicode strings

Prefix always works from scratch, then runs the required number of characters on the string, running another linear complexity operation in a linear complexity process, meaning that the algorithm complexity will be O(n^2).

extension String { var allPrefixes1: [Substring] { return (0... count).map(prefix) } } let hello = "Hello" hello.allPrefixes1 // ["", "H", "He", "Hel", "Hell", "Hello"]Copy the code

The string needs to be iterated once to get the index set Indices. The subscript operation in the map is O(1), which keeps the complexity of the entire algorithm at O(n).

extension String { 
    var allPrefixes2: [Substring] { 
        return [""] + indices.map { index in self[...index] } 
    } 
}
let hello = "Hello" 
hello.allPrefixes2 // ["", "H", "He", "Hel", "Hell", "Hello"]
Copy the code

Ranges are replaceable, not mutable

String also meet RangeReplaceableCollection agreement

First find an appropriate range in the string index, and then complete the string replacement by calling replaceSubrange

var greeting = "Hello, world!" if let comma = greeting.index(of: ",") { greeting[..<comma] // Hello greeting.replaceSubrange(comma... , with: " again.") } greeting // Hello again.Copy the code

As before, note that the replacement string has the potential to form a new variety of characters that are adjacent to the original string.

String index

The Index type of String is string.index, which is essentially an opaque value that stores a byte offset from the beginning of the String.

It takes O(n) time to calculate the index -> corresponding to the NTH character

It takes O(1) time to access the string -> by index subscript

The API for manipulating string indexes is the same as the indexing operations you would use for any other Collection, and they are all based on the Collection protocol.

index(after:)

let s = "abcdef" 
let second = s.index(after: s.startIndex) 
s[second] // b
Copy the code

index(_:offsetBy:)

Let sixth = s.dex (second, offsetBy: 4) s[sixth] // fCopy the code

LimitedBy: parameter

let safeIdx = s.index(s.startIndex, offsetBy: 400, limitedBy: s.endIndex)
safeIdx // nil
Copy the code

Some simple requirements, using indexes, can seem cumbersome:

s[..<s.index(s.startIndex, offsetBy: 4)] // abcd
Copy the code

But strings can be accessed through the Collection’s interface

s.prefx(4) // abcd
Copy the code
let date = "2019-09-01" 
date.split(separator: "-")[1] // 09 
date.dropFirst(5).prefx(2) // 09
Copy the code
var hello = "Hello!" if let idx = hello.frstIndex(of: "!" ) { hello.insert(contentsOf: ", world", at: idx) } hello // Hello, world!Copy the code

There are some string manipulation tasks that the Collection API cannot perform, such as parsing a CSV file:

func parse(csv: String) -> [[String]] { var result: [[String]] = [[]] var currentField = "" var inQuotes = false for c in csv { switch (c, inQuotes) { case (",", false): result[result.endIndex-1].append(currentField) currentField.removeAll() case ("\n", false): result[result.endIndex-1].append(currentField) currentField.removeAll() result.append([]) case ("\"", _): inQuotes = ! inQuotes default: currentField.append(c) } } result[result.endIndex-1].append(currentField) return result }Copy the code
// The string is surrounded by ##. Let CSV = #""" "Values in quotes","can contain, characters" "Values without quotes work as well:",42 """# parse(csv: csv) /* [["Values in quotes", "can contain , characters"], ["Values without quotes work as well:", "42"]] */Copy the code

With a little extra work, we can also ignore empty lines, ignore Spaces around quotes, and allow for escaping quotes inside fields.

Substring

Views marked with different starting and ending positions based on the original string content.

  • Substrings share text storage with the original string. The benefit: Slicing a string becomes a very efficient operation.
let sentence = "The quick brown fox jumped over the lazy dog." let frstSpace = sentence.index(of: " ") ?? sentence.endIndex let frstWord = sentence[..<frstSpace] // The type(of: FrstWord) // Substring // Creating firstWord does not result in an expensive copy operation or memory applicationCopy the code
  • Split, it returns one[Substring].
let poem = """ Over the wintry forest, winds howl in rage with no leaves to blow. """ let lines = poem.split(separator: "\n") lines// ["Over the wintry", "forest, winds howl in rage", "with no leaves to blow."] type(of: Lines) // Array<Substring> // No copy of the input string occursCopy the code
  • Split accepts closures as arguments.
extension String {
    func wrapped(after maxLength: Int = 70) -> String {
        var lineLength = 0
        let lines = self.split(omittingEmptySubsequences: false) { character in
            if character.isWhitespace && lineLength >= maxLength {
                lineLength = 0
                return true
            } else {
                lineLength += 1
                return false
            }
        }
        return lines.joined(separator: "\n")
    }
}

let sentence = "The quick brown fox jumped over the lazy dog." 
sentence.wrapped(after: 15)
/*
The quick brown
fox jumped over
the lazy dog.
*/
Copy the code
  • Split takes a sequence with multiple separators as an argument.
extension Collection where Element: Equatable { func split<S: Sequence>(separators: S) -> [SubSequence] where Element == S.Element { return split { separators.contains($0) } } } "Hello, world!" .split(separators: ",! ") // ["Hello", "world"]Copy the code

StringProtocol

  • The interfaces of Substring and String are almost identical because they both follow the StringProtocol protocol.

  • Almost all String apis are defined on StringProtocol, and for Substring, you can pretend to think of it as a String.

  • Like all slicing, Substring is designed to be used for short-term storage to avoid expensive copying during operation.

  • When this operation is complete, a new String should be created from Substring using the initialization method. The fundamental reason for discouraging long-term storage of substrings is that the substring will always hold the entire original string, causing a memory leak.

func lastWord(in input: String) -> String? Let words = input.split(separators:) {// let words = input.split(separators:) [",", ""]) guard let lastWord = words. Last else {return nil} // Convert to String and return String(lastWord)} lastWord(in: "one, two, three, four, five") // Optional("five")Copy the code
  • Most functions take a String or a StringProtocol. Very few accept a Substring. If you need to pass a Substring, you can do this:
Let substring = sentence[...]Copy the code
  • Swift does not recommend converting all of your apis from accepting String instances to stringProtocol-compliant types. It is recommended to stick with String.

    • Generics also introduce overhead of their own.
    • String is much simpler and cleaner.
    • The user can convert a String on a limited number of occasions without putting too much burden on it.
  • If you want to extend String to add new functionality, it’s a good idea to put the extension in the StringProtocol to keep the String and Substring apis consistent. StringProtocol is designed to be used when you want to extend strings. If you want to move an existing extension from String to StringProtocol, the only change you need to make is to replace self passed to another API with a concrete instance of String via String(self).

  • Do not declare any new types that comply with the StringProtocol protocol. Only the library String and Substring are valid adaptation types.

Code unit view

Sometimes when Character characters are not sufficient, we can also look and manipulate at lower levels such as Unicode scalars or encoding units.

  • String provides three views for this: unicodeScalars, UTF16, and UTF8.

  • Why would you want to access and manipulate a view?

    • Render in a UTF-8 encoded web page.
    • Interact with a non-Swift API that only accepts a particular encoding.
    • You need strings, information in a particular format, etc.
    • It is faster to operate on encoding units than on complete characters
  • Twitter’s previous character calculation algorithm was based on NFC normalized scalars:

    Let tweet = "☕ ️ \ u e {301} 🇫 🇷 ☀ ️" print (tweets. Count) / / 1 + 1 + 1 + 1 = 4 var characterCount = tweet. UnicodeScalars. Count print(characterCount) //2+2+2+2=8 characterCount = tweet.precomposedStringWithCanonicalMapping.unicodeScalars.count print(characterCount) //2+1+2+2=7 //precomposedStringWithCanonicalMapping: String normalization according to C standard //NFC normalization can convert base letters and merge markers, such as the e and diacritics in "cafe\u{301}" can be correctly grouped.Copy the code
  • Utf-8 is the de facto standard for storing or sending text over a network. Because the UTF8 view is a collection, you can use it to pass the UTF-8 bytes of a string to any other API that accepts a string of bytes, such as Data or Array initialization methods:

    Let tweet = "☕️e\u{301}🇫🇷 "let utf8Bytes = Data(tweet.utf8) print(utf8bytes.count) // 6+3+8+6=23Copy the code
  • Utf-8 is the lowest overhead of all the codec views in String. This is because it is the native in-memory storage format for Swift strings.

  • The UTF8 collection does not contain null bytes at the end of the string. If you need to end with null, you can use the withCString method of String or the utf8CString property. The latter returns an array of one byte.

    Let tweet = "☕️e\u{301}🇫🇷 "let withCStringCount = tweet.withCString {_ in strlen(tweet)} print(withCStringCount) // 23 let nullTerminatedUTF8 = tweet.utf8CString print(nullTerminatedUTF8.count) // 24Copy the code
  • None of these views provide the random access feature we want. As a result, algorithms that require random access will not work well on the String and its views.

  • If you really need random storage, you can still convert the string itself or its views into arrays, such as Array(STR) or Array(str.utf8), and then manipulate them.

Share index

  • stringAnd theirviewShare the same Index type, string.index. You can take an index from a string and use it in a view’s subscript access.
Pokemon = "Poke\u{301}mon" // Pokemon if let index = Pokemon.index (of: "E ") {let scalar = pokemon. UnicodeScalars [index] // scalar (String) // e}Copy the code
  • As long as you’re going from the top down, you’re going fromcharacterAnd to thescalarAnd to theUTF-16UTF-8In the direction of the coding unit, there’s no problem with that. The other direction, however, is not necessarily correct, because not every valid index in the encoding unit view will be on the Character boundary.
Let family = "👨👩👧👦" let someUTF16Index = string. Index(utf16Offset: 2, in: family) family[someUTF16Index] //CrashCopy the code
  • samePosition(in:)Nil will be returned if the input index has no corresponding position in the given view
Let pokemon = "Poke \ u {301} mon" / / pokemon if let accentIndex = pokemon. UnicodeScalars. FirstIndex (of: "\u{301}") { accentIndex.samePosition(in: pokemon) // nil }Copy the code

Strings and Foundation

  • String instances and NSString instances can be transformed through as.

  • In Swift 5.0, strings still lack many of the features found in NSStrings. Strings get special treatment by the compiler. With the introduction of Foundation, NSString members can be accessed on String instances.

  • The two libraries have some overlapping features, and sometimes there are two apis with completely different names that do almost the same thing.

    • The split method in the library and the components in Foundation (separatedBy:)

    • The standard library designs assertions around Boolean values, and Foundation uses ComparisonResult to represent the result of a comparison assertion.

      assert

      let valueId = "666"
      assert(valueId.isEmpty == true) // crash
      Copy the code

      ComparisonResult

      let result = valueId.compare("777")
      print(result.rawValue) // -1
      Copy the code
    • EnumerateSubstrings (in:options:_:) This super-powerful method of iterating the input string by byte cluster, word, sentence, or paragraph using strings and ranges. The API in Swift uses substrings

      let sentence = """ The quick brown fox jumped over the lazy dog. """ var words: [String] = [] sentence.enumerateSubstrings(in: sentence.startIndex... , options: .byLines) { (word, range, _, _) in guard let word = word else { return } words.append(word) } print(words)//["The quick brown fox jumped", "over the lazy dog."]Copy the code
  • Swift strings have a native in-memory encoding of UTF-8 and NSStrings are UTF-16, resulting in some additional performance overhead when Swift strings are briked to NSStrings. For example, passing NSString in enumerateSubstrings(in:options:using:) is faster than passing String. Because it takes constant time for an NSString to move on an offset calculated in UTF-16, and it’s a linear operation on a String.

Other string-based Foundation apis

  • The native NSString API is the most convenient API to use for Swift strings. Because the compiler does most of the bridging for you.

  • Many other Foundation apis that deal with strings are a little less friendly to use because Apple hasn’t created a special Swift wrapper for them. Such as NSAttributedString

    • NSAttributedString (immutable strings), NSMutableAttributedString (string variable, comply with reference semantics.
    • The NSAttributedString API used to accept NSStrings, but it now accepts a swift.string. But the whole API is based on the concept of the UTF-16 encoding unit set of NSString. Frequent bridging between strings and NSStrings can incur unexpected performance overhead.
    Let text = "👉 Click here for more info." let linkTarget = URL(string: "https://www.youtube.com/watch?v=DLzxrzFCyOs")! / / despite using ` let `, object is still variable (reference semantics) let formatted = NSMutableAttributedString (string: If let linkRange = formatted. String.range (of: "Click here") {// convert the Swift range to NSRange // Note that the range starts with a value of 3, Because emojis preceding text cannot be represented in a single UTF-16 encoding unit let nsRange = nsRange (linkRange, in: // {3, 10} // Add a property. Formatted formatted. AddAttribute (.link, value: linkTarget, range: nsRange)}Copy the code
    // Query the format property in the property string by a specific character position // Query the format property starting with the word "here" if let queryRange = formatted.string.range(of: "Here ") {let NSRange = NSRange(queryRange, in: String) var attributesRange = NSRange() // Execute the query let attributes = formatted.attributes(at: nsRange.location, effectiveRange: &attributesRange) attributesRange // {3, EffectiveRange = Range(attributesRange, in:) String [effectiveRange] // Click here}} displaymode. Formatted formatted. String [effectiveRange] // displaymode.Copy the code

    Such code is a far cry from true Swift idiomatic writing.

Range of characters

  • Unable to traverse the character range

    let lowercaseLetters = ("a" as Character)..." Z "//ClosedRange<Character> for c in lowercaseLetters {// 错... } // It is necessary to convert "a" to Character, otherwise the default type of String literals will be String //Character does not implement the Strideable protocol, and only ranges that implement this protocol are countable collectionsCopy the code

  • The only operation you can do with a character range is to compare it to other characters.

    let lowercaseLetters = ("a" as Character)..." Z "lowercaseLetters. The contains (" A") / / false lowercaseLetters. The contains (" e ") / / falseCopy the code
  • With the Unicod. Scalar type, the concept of countable ranges makes sense when you stay in ASCII or some other subset of the Unicode category with good ordering. The order of Unicode scalars is defined by the value of their code points, so there must be a finite number of scalars between the two boundaries.

    extension Unicode.Scalar: Strideable { public typealias Stride = Int public func distance(to other: Unicode.Scalar) -> Int { return Int(other.value) - Int(self.value) } public func advanced(by n: Int) -> Unicode.Scalar { return Unicode.Scalar(UInt32(Int(value) + n))! }}Copy the code
    Lowercase = ("a" as Unicode.Scalar)..." Z "for c in lowercase {} / / no error print (Array (lowercase. The map (Character. The init))) / * [" a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"] */Copy the code

CharacterSet

Is a Foundation type. This structure should actually be called UnicodeScalarSet, because it really is a data structure that represents a list of Unicode scalars and is completely incompatible with Character.

Let favoriteEmoji = CharacterSet("👩🚒👨🎤".unicodescalars) favoriteemoji. contains("🚒") // true // Because the female firefighter emoji is actually a woman + ZWJ + fire engine combinationCopy the code

Unicode properties

In Swift 5, the Foundation types are no longer required to test whether a Scalar belongs to an official Unicode classification. Instead, we simply need to access a property in Unicod. Scalar, such as isEmoji or isWhiteSpace. To avoid stuffing too many members in Unicod.Scalar, all Unicode properties are in the properties namespace.

("😀" as Unicode.scalar).property.isemoji // true ("∬" as Unicode.scalar).property.ismath // trueCopy the code

Now listing the encoding point, name, and general classification of each scalar in the string requires only a little formatting of the string

"I'm a 👩🏽🚒.".unicodescalars. map {scalar -> String in let codePoint = "U+\(String(scalar. Value, radix:) 16, uppercase: true))" let name = scalar.properties.name ?? "(no name)" return "\(codePoint): - \ \ (name) (scalar. The properties. GeneralCategory) "}. Joined (separator: "\ n") / * U + 49: LATIN CAPITAL LETTER I -- uppercaseLetter U+2019: RIGHT SINGLE QUOTATION MARK -- finalPunctuation U+6D: LATIN SMALL LETTER M -- lowercaseLetter U+20: SPACE -- spaceSeparator U+61: LATIN SMALL LETTER A -- lowercaseLetter U+20: SPACE -- spaceSeparator U+1F469: WOMAN -- otherSymbol U+1F3FD: EMOJI MODIFIER FITZPATRICK Type-4 -- modifierSymbol U+200D: ZERO WIDTH JOINER -format U+1F692: FIRE ENGINE -otherSymbol U+2E: FULL STOP -othersymbol */Copy the code

These properties of Unicode scalars are very low-level and are primarily defined to express less familiar terms in Unicode. Provide some similar categories at the more commonly used Character level.

Character("4").isNumber // true
Character("$").isCurrencySymbol // true
Character("\n").isNewline // true
Copy the code

The internal structure of String and Character

  • Strings are copied on write. When creating a copy of a string, or when creating a substring, all of these instances share the same buffer. Character data is copied only if the buffer is shared with one or more other instances and one instance is changed.)

  • In Swift 5, Swift native strings are represented in memory in UTF-8 format, which allows for the theoretical best performance of string processing, since traversing UTF-8 views is faster than traversing UTF-16 or Unicode scalar views.

  • Strings received from Objective-C are represented by an NSString. In this case, to make bridging as efficient as possible, an NSString-based String is converted to a native Swift String when changed.

  • Swift does not create a dedicated buffer for small strings with less than 16 UTF-8 encoding units as a special optimization. Since strings are only 16 bytes at most, these encoding units can be stored inline.

String literals

Can be realized through ExpressibleByStringLiteral agreement for your own types support initialized by string literals.

When using a SafeHTML value, we can ensure that all potentially risky HTML tags in the string it represents have been escaped. Advantages: security issues can be avoided. Cons: You have to write a lot of code to wrap strings before calling these apis.

extension String { var htmlEscaped: String { return replacingOccurrences(of: "<", with: "&lt;" ) .replacingOccurrences(of: ">", with: "&gt;" ) } } struct SafeHTML { private(set) var value: String init(unsafe html: String) { self.value = html.htmlEscaped } } let safe: SafeHTML = SafeHTML(unsafe: "<p>Angle brackets in literals are not escaped</p>") print(safe)//SafeHTML(value: "&lt; p&gt; Angle brackets in literals are not escaped&lt; /p&gt;" )Copy the code

ExpressibleByStringLiteral SafeHTML implementation, to ensure security at the same time, the advantage of saving the complex code.

extension SafeHTML: ExpressibleByStringLiteral {
    public init(stringLiteral value: StringLiteralType) {
        self.value = value
    }
}

let safe: SafeHTML = "<p>Angle brackets in literals are not escaped</p>"
print(safe)//SafeHTML(value: "<p>Angle brackets in literals are not escaped</p>")
Copy the code

String interpolation

Let’s insert an expression into a literal string for example: “a * b = \(a * b)”

Swift 5 further opens up the public API to support the use of string interpolation when building custom types.

let input = ... // This part is input by the user, not safe! let html = "<li>Username: \(input)</li>"Copy the code

In the above code, the input content must be escaped and used because its source is not secure. But the segmentation of literals in HTML variables should not change, because we are writing values with HTML tags here. To implement this logic, we can create a custom string interpolation rule for SafeHTML.

Swift string interpolation API consists of two protocols: ExpressibleByStringInterpolation and StringInterpolationProtocol.

demo:

final class ViewController: UIViewController { override func viewDidLoad() { super.viewDidLoad() let unsafeInput = "<script>alert('Oops! ')</script>" let safe2: SafeHTML = "<li>Username: \(unsafeInput)</li>" print(safe2)//SafeHTML(value: "<li>Username: &lt; script&gt; alert(\'Oops! \')&lt; /script&gt; </li>") let star = "<sup>*</sup>" let safe3: SafeHTML = "<li>Username\(raw: star): \(unsafeInput)</li>" print(safe3)}} //MARK: -string interpolation extension SafeHTML: The method of ExpressibleByStringInterpolation {/ / StringInterpolationProtocol performed, Public init(stringInterpolation: int) SafeHTML) { value = stringInterpolation.value } } extension SafeHTML: StringInterpolationProtocol {/ * interpolation type is about how much space to store all literal to be combined, and expects the number of interpolation. If we are concerned about interpolating performance, we will tell the compiler about reserved space with these two arguments */ init(literalCapacity: Int, interpolationCount: Int, interpolationCount: Int) Int) {value = ""} // Non-interpolating part mutating func appendLiteral(_ literal: {value += (literal)} {value += (literal)} {value += (literal)} X).htmlEscaped) {// insert mutating func appendInterpolation<T>(raw x: T) { self.value += String(describing: x) } }Copy the code

Custom String description

  • Custom SafeHTML type, print with print:

    struct SafeHTML { private(set) var value: String init(unsafe html: String) { value = html } } let safe: SafeHTML = SafeHTML(unsafe: "<p>Hello, World! </p>") print(safe) // SafeHTML(value: "<p>Hello, World! </p>") print(String(describing: safe)) // SafeHTML(value: "<p>Hello, World! </p>") print(String(reflecting: safe)) // SafeHTML(value: "<p>Hello, World! </p>")Copy the code
  • Make SafeHTML conform to the CustomStringConvertible protocol:

    extension SafeHTML: CustomStringConvertible { var description: String { return value } } let safe: SafeHTML = SafeHTML(unsafe: "<p>Hello, World! </p>") print(safe) // <p>Hello, World! </p> print(String(describing: safe)) // <p>Hello, World! </p> print(String(reflecting: safe)) // <p>Hello, World! </p>Copy the code
  • Let SafeHTML follow CustomDebugStringConvertible agreement:

    extension SafeHTML: CustomDebugStringConvertible { var debugDescription: String { return "Debug: \(value)" } } let safe: SafeHTML = SafeHTML(unsafe: "<p>Hello, World! </p>") print(safe) // Debug: <p>Hello, World! </p> print(String(describing: safe)) // Debug: <p>Hello, World! </p> print(String(reflecting: safe)) // Debug: <p>Hello, World! </p>Copy the code
  • If let SafeHTML and to follow the CustomStringConvertible and CustomDebugStringConvertible agreement:

    let safe: SafeHTML = SafeHTML(unsafe: "<p>Hello, World! </p>") print(safe) // <p>Hello, World! </p> print(String(describing: safe)) // <p>Hello, World! </p> print(String(reflecting: safe)) // Debug: <p>Hello, World! </p>Copy the code

    Conclusion: But if you don’t realize CustomDebugStringConvertible, String (reflecting:) will choose to use CustomStringConvertible provided as a result, If your type does not implement CustomStringConvertible, String (describing:) will choose to use CustomDebugStringConvertible provide results.

    The author’s advice:

    • If your custom types than simple, it is not necessary to implement CustomDebugStringConvertible.
    • If you define type is a container, let it realize CustomDebugStringConvertible is a more friendly behavior, through it, you can print the container information of each element in the debug mode.
    • When are you going to do to debug print results after special processing, also should be done by implementing CustomDebugStringConvertible.
    • If you provide the same result for description and debugDescription, then implement either.
  • Array will always print a debug version of the element it contains, even if you pass it to String(describing:). This is because the normal string description of an array should never be rendered to the user. (Because, for example, the empty String “”, string. description ignores the quotes surrounding the String.)

    Print (String(STR)) // Print (String(STR)) // Print (String(STR)) // Print (String(STR)) // Print (String(STR)) // Print (String(STR)) // Print (String(STR)) str)) // "" let array: [String] = ["", "", ""] print(array) // ["", "", ""] print(String(describing: array)) // ["", "", ""] print(String(reflecting: array)) // ["", "", ""]Copy the code

Text output stream

followTextOutputStreamAgreement, can be used asOutput target

The library’s print and dump functions record text to standard output. The default implementations of both functions call print(_:to:) and dump(_:to:). The to parameter is the target of the output, which can be any type that implements the TextOutputStream protocol.

  • String is the only output stream type in the library.

    var s = "" 
    let numbers = [1, 2, 3, 4]
    print(numbers, to: &s) 
    print(s) // [1, 2, 3, 4]
    Copy the code
  • Create your own output stream: Following the TextOutputStream protocol, create a variable that takes a string and write it to the write method in the stream:

    struct ArrayStream: TextOutputStream { var buffer: [String] = [] mutating func write(_ string: String) { buffer.append(string) } } var stream = ArrayStream() print("Hello", to: &stream) print("World", to: &stream) print(stream.buffer) // ["", "Hello", "\n", "", "World", "\n"] So "", "\n", etc.Copy the code
  • Extend the Data type to accept stream input and output the result in UTF-8 encoding.

    extension Data: TextOutputStream { mutating public func write(_ string: String) { self.append(contentsOf: String. Utf8)}} var utf8Data = Data() utf8data.write (" cafe ") print(Array(utf8Data)) // [99, 97, 102, 195, 169]Copy the code

    Using print, you get the same result as above:

    Var utf8Data = Data() print(" cafe ", to: &utf8Data) print(Array(utf8Data)) // [99, 97, 102, 195, 169]Copy the code

followTextOutputStreamableAgreement, can be used asThe output source

Demo:

struct ReplacingStream: TextOutputStream, TextOutputStreamable { let toReplace: KeyValuePairs<String, String> // The purpose of using KeyValuePairs is not to remove duplicate keys or reorder all keys. private var output = "" init(replacing toReplace: KeyValuePairs<String, String>) { self.toReplace = toReplace } mutating func write(_ string: String) { let toWrite = toReplace.reduce(string) { partialResult, pair in partialResult.replacingOccurrences(of: pair.key, with: pair.value) } print(toWrite, terminator: "", to: &output) } func write<Target>(to target: inout Target) where Target : TextOutputStream { output.write(to: &target) } } var replacer = ReplacingStream(replacing: ["in the cloud": "on someone else's computer"]) let source = "People find it convenient to store their data in the cloud." print(source, terminator: "", to: &replacer) var finalSource = "" print(replacer, terminator: "", to: &finalSource) print(finalSource) // People find it convenient to store their data on someone else's computer."Copy the code

Implementation process:

  • ReplacingStreamIt follows two protocols, so it can act asThe output source,Output target.
  • Print (source, terminator: “”, to: &replacer): print(source, terminator: “”, to: &replacer): (ReplacingStream as output target)
  • Print (replacer, terminator: “”, to: &finalSource) : transfer output in ReplacingStream to external finalSource. (ReplacingStream as the output source)