Study notes for The Python language and Its Applications

String: Sequence of Unicode characters used to store text data bytes and byte arrays: sequence of 8 bytes used to store binary data

1. Encoding and decoding

Encoding: The process of converting a Unicode string into a series of bytes

# STR type
snowman = '\u2603'
len(snowman) # 1 contains only one Unicode string, regardless of the number of bytes stored

# bytes type
ds = snowman.encode('utf-8') # b'\xe2\x98\x83'
len(ds) # 3 takes up 3 bytes of space
Copy the code
# encode(encoding='utf-8', errors='strict') -> bytes
The first parameter is encoding mode, and the second parameter is encoding exception handling

# strict, throws a UnicodeEncodeError on an error

# ignore, discard characters that cannot be encoded
snowman = 'abc\u2603'
ds = snowman.encode('ascii'.'ignore')
print(ds) # b'abc'
print(len(ds)) # 3

# replace, replace unencodable characters with?
snowman = 'abc\u2603'
ds = snowman.encode('ascii'.'replace')
print(ds) # b'abc? '
print(len(ds)) # 4

# backslashreplace converts unencodable characters to the form \\u
snowman = 'abc\u2603'
ds = snowman.encode('ascii'.'backslashreplace')
print(ds) # b'abc\\u2603'
print(len(ds)) # 9

# xmlCharrefreplace converts unencodable characters into string entities
snowman = 'abc\u2603'
ds = snowman.encode('ascii'.'xmlcharrefreplace')
print(ds) # b'abc☃ '
print(len(ds)) # 10
Copy the code

Decoding: The process of converting a sequence of bytes into a Unicode string (note that the encoding and decoding formats must be the same, such as UTF-8, otherwise you will not get the desired value)

place = 'caf\u00e9'
place_bytes = place.encode('utf-8') # b'caf\xc3\xa9'
place2 = place_bytes.decode('utf-8') # cafe
Copy the code

2. The format

Old formatting: % New formatting: {} and format

n = 42
f = 7.03
s = 'string cheese'

print('{} {} {}'.format(n, f, s)) # the default
print('{2} {0} {1}'.format(f, s, n)) # location index
print('{n} {f} {s}'.format(n=n, f=f, s=s)) # key

d = {'n': n, 'f': f, 's': s}
print('{0[n]} {0[f]} {0[s]} {1}'.format(d, 'other')) # dictionary

print('{0:5d} {1:10f} {2:20s}'.format(n, f, s)) Align left by default
print('{0:>5d} {1:>10f} {2:>20s}'.format(n, f, s)) # left-aligned
print('{0:<5d} {1:<10f} {2:<20s}'.format(n, f, s)) # right-aligned
print('{0:^5d} {1:^10f} {2:^20s}'.format(n, f, s)) # center align
print('{0:! ^5d} {1:#^10f} {2:&^20s}'.format(n, f, s)) # placeholder
print('{0:! # ^ ^ 5 d} {1:10.4 f} {2: & ^ 20.4 s} '.format(n, f, s)) # precision
Copy the code

3. Regular expressions

Match checks whether a… At the beginning

import re

source = 'Young man'
m = re.match('You', source)
if m:
  print(m.group())  # You
Copy the code

Search returns the first successful match

import re

source = 'Young man'
m = re.search('man', source)
if m:
  print(m.group())  # man

m1 = re.match('.*man', source)
if m1:
  print(m1.group())  # Young man
Copy the code

Findall returns all matches

import re

source = 'Young man'
m = re.findall('n.? ', source)
print(m)  # ['ng', 'n']
Copy the code

Split is similar to split for strings, except in this case it is a pattern instead of text

import re

source = 'Young man'
m = re.split('n', source)
print(m)  # ['You', 'g ma', '']
Copy the code

Sub replaces the match, which is similar to the replace of a string, except there is a pattern instead of text

import re

source = 'Young man'
m = re.sub('n'.'? ', source)
print(m)  # You? g ma?
Copy the code

Special characters:

model matching
. Any character other than \n
* Any number of characters, including 0
+ One or more characters
? Optional characters (0 or 1)
\d A numeric character
\w An alphanumeric or underscore character
\s Whitespace characters
\b Word boundaries
import string
import re

printable = string.printable

re.findall('\d', printable) # [' 0 ', '1', '2', '3', '4', '5', '6', '7', '8', '9']

re.findall('\s', printable) # [' ', '\t', '\n', '\r', '\x0b', '\x0c']
Copy the code

Define the matching output

M.groups () gets the matched tuple

import re

source = 'a dish of fish tonight.'

m = re.search(r'(. dish\b).*(\bfish)', source)
print(m.group())  # a dish of fish
print(m.groups()) # ('a dish', 'fish')
Copy the code

(? P

expr) matches expr and stores the results in a group named name

import re

source = 'a dish of fish tonight.'

m = re.search(r'(? P
      
       . dish\b).*(? P
       
        \bfish)'
       
      , source)
print(m.group())   # a dish of fish
print(m.group('DISH'))  # a dish
print(m.group('FISH'))  # fish
print(m.groups()) # ('a dish', 'fish')
Copy the code

4. Read and write files

Open (filename, mode) where the first letter of mode indicates the operation to the file:

  • rPresentation read mode
  • wWrite mode (if the file does not exist, create a new file. If present, rewrite new content)
  • xCreate and write a file that does not exist
  • aIf a file exists, write data to the end of the file

Second letter:

  • tText type (default)
  • bBinary file

Write files using write() :

poem = The moon was shining before my bed, and I thought it might be frost on the ground. Looking up the bright moon, lower the head to think of home. ' ' '

with open('a.txt'.'wt', encoding='utf-8') as fout:
  fout.write(poem)


# Data chunking
with open('a.txt'.'wt', encoding='utf-8') as fout:
  size = len(poem)
  offset = 0
  chunk = 100
  while True:
    if offset > size:
      break
    fout.write(poem[offset:offset+chunk])
    offset += chunk

# Avoid rewriting
try:
  with open('a.txt'.'xt', encoding='utf-8') as fout:
    fout.write(poem)
except FileExistsError:
  print('File already exists')
Copy the code

Use read(), readline(), readlines() to read files:

with open('a.txt'.'rt', encoding='utf-8') as fin:
  poem = fin.read()

# Read one line at a time
with open('a.txt'.'rt', encoding='utf-8') as fin:
  poem = ' '
  while True:
    line = fin.readline()
    if not line:
      break
    poem += line

# use iterators
with open('a.txt'.'rt', encoding='utf-8') as fin:
  poem = ' '
  for line in fin:
    poem += line

Return a list of single-line strings by reading in all lines
with open('a.txt'.'rt', encoding='utf-8') as fin:
  lines = fin.readlines()
# output [' \ n ', 'bed bright moonlight, \ n', 'doubt is frost on the ground. \ n', 'look at the bright moon, \ n', 'bowed their heads and remembering home. \ n']
Copy the code

Tell () returns the file’s byte offset at the moment, seek(n) jumps to the file’s byte offset of n:

Seek (offset, origin) Origin =0(default), offset from the beginning of the origin=1, offset from the current position of the origin=2, offset from the end of the last

with open('b'.'rb') as fin:
  print(fin.tell())   # start at 0
  fin.seek(254.0)    # skip to 254(return the last two bytes)
  print(fin.tell())   # 254
  fin.seek(1.1)      # Forward one byte on this basis
  print(fin.tell())   # 255
  data = fin.read()   # read to end of file b'\ XFF '

print(data[0])  # 255
Copy the code