One, foreword

Hello, everyone, I am Li Peng Li from fate.

This course will focus on strings in Python, so let’s get started.

This article is about 3500 words and takes about 15 minutes to read. Enjoy reading it.

2. String

Before we move on to today’s topic, let’s take a look at what a string is.

The language that we speak, the language that we communicate with other people are strings.

It’s just that in Python, strings become a data type, but now that we know that strings become a data type,

But how is this type used in computers? Is it ok for me to use Chinese on my computer?

Let’s start by exploring the encoding format in Python.

2.1 the ASCII coding

First of all, the bottom of the computer can not receive Chinese, even text can not process.

It has to be converted to numbers for representation, which we call bytes.

Note:

  • 1 byte = 8 bits
  • The largest integer represented by a word is 255 (binary 11111111).
  • The largest integer that can be represented by two bytes is 65, 535
  • The largest integer that can be represented by four bytes is 4 294 967 295

At the beginning of the computer, only 127 characters were stored in the computer, including numbers, letters, special symbols and so on.

This table is collectively referred to as THE ASCII code.

The thing to notice is that there are only three of them that we need to remember.

Decimal representation content meaning
48 0 0
65 A Letter A (uppercase)
97 a Letter A (lowercase)

By remembering these three things, we can deduce the ASCII numbers 0 to 9, A to Z, and A to Z, respectively.

2.2 the Unicode

Now that WE have ASCII coding, is that enough?

Of course not. If China had its own code table (GB2312 code), Japan had its own code table (Shift_JIS), and Korea had its own Korean code table (EUC-KR), wouldn’t that be chaos?

What you write in China becomes a pile of gibberish when you go to Japan or Korea. I believe that our work and life will be greatly affected.

So what do we do?

That’s when the International Organization for Standardization (ISO) and the Unicode Consortium, a consortium of multilingual software manufacturers, came along and joined forces with others to work on harmonization, and the result was what is known as Unicode.

With Unicode, it is possible to communicate characters and languages between different countries.

But some countries use Chinese characters, while some countries use English for daily communication, so there is a big difference.

Remember that Chinese characters have gone beyond ASCII encoding, and most require two bytes to represent, some even require four quads, while English requires only one byte.

The ASCII code for the letter A is 65 in decimal and 01000001 in binary;

The ASCII encoding for character 0 is 48 in decimal and 00110000 in binary. Note that the character ‘0’ is different from the integer 0;

Chinese characters are beyond the ASCII code and are encoded in Unicode as 20013 in decimal and 01001110 00101101 in binary.

Should we use Unicode encoding for chinese-related content?

ASCII is preferred when writing in English or numbers.

This brings us to a new standard, UTF-8.

2.3 UTF-8

Because to save space, content that should be encoded in ASCII should not be encoded in Unicode.

ASCII code of A: 01000001 Unicode code of A: 00000000 01000001Copy the code

So how does a computer tell when to use Unicode and when to use ASCII?

To solve this problem, an intermediate Format character set, known as UTF (Unicode Transformation Format), emerged.

Common UTF formats are:

  • UTF-7
  • UTF – 7.5 –
  • UTF-8
  • UTF-16
  • UTF-32

One of the things we need to focus on is utF-8 encoding, or “variable length encoding”.

Utf-8 encodes a Unicode character into 1-6 bytes, depending on the numeric size.

Common English letters are encoded into 1 byte, Chinese characters are usually 3 bytes, and more obscure characters are encoded into 4-6 bytes.

One thing to note here:

Python was born before the Unicode standard was published, so the original Python only supported ASCII encoding,

The ordinary string ‘ABC’ is ASCII encoded inside Python.

An added benefit of using UTF-8 encoding is that ASCII encoding can actually be considered part of UTF-8 encoding,

So, a lot of legacy software that only supports ASCII can continue to work in UTF-8.

This is why we need to declare UTF-8 at the beginning of the file every time we use Chinese.

2.4 “Slicing” of Strings

Once we’ve figured out the coding quirks of programming, we’re ready to use our strings.

Let’s say I declare a variable name.

name = "lipeng"
Copy the code

That one day my student Fei elder sister wants to make fun of me, want to take out part of my name, this time actually need to use “slice”.

Note that strings, lists, and tuples all support slicing.

Let’s start with the syntax of slicing:

Syntax of slicing: [Start: End: step] Note: The selected interval belongs to left-closed and right-open type, that is, from"Start"Bit start, go"The end"The end of the preceding bit of the bit (excluding the end bit itself).Copy the code

So let’s do a real example.

Last login: Tue Aug  7 09:27:48 on ttys000
➜  ~ python

>>> name = "lipeng"
>>> print(name[0:3])
lip
>>> 
Copy the code

We’ve got characters with subscripts 0 to 2, so let’s test characters 0 to 4.

Last login: Tue Aug  7 09:27:48 on ttys000
➜  ~ python

>>> name = "lipeng"
>>> print(name[0:5])
lipen
>>> 
Copy the code

One more three or four characters.

Last login: Tue Aug  7 09:27:48 on ttys000
➜  ~ python

>>> name = "lipeng"
>>> print(name[3:5])
en
>>> 
Copy the code

What if we need all the data from a certain location?

Last login: Tue Aug  7 09:27:48 on ttys000
➜  ~ python

>>> name = "lipeng"
>>> print(name[2:])
peng
>>> 
Copy the code

Suppose, too, that we don’t know how long the total is, but we know how many digits we need to cut to the end. What do we do?

Last login: Tue Aug  7 09:27:48 on ttys000
➜  ~ python

>>> name = "lipeng"
>>> print(name[1:-1])
ipen
>>> 
Copy the code

If you think you know how to slice, try these questions for yourself:

>>> print(name[:3])
lip
>>> print(name[::2])
lpn
>>> print(name[5:1:2])
"" # empty
>>> print(name[1:5:2])
ie
>>> print(name[::-2])
gei
>>> print(name[5:1:-2])
ge
>>> 
Copy the code

2.5 Common methods in Strings

2.5.1 ord() : Integer representation of a character

➜  ~ python

>>> ord('a')
97
>>> ord('A')
65
>>> ord('0'48) > > >Copy the code

2.5.2 CHR () : The encoding is converted to the corresponding character

>>> chr(69)
'E'
>>> chr(103)
'g'
>>> 
Copy the code

2.5.3 find() : Queries whether the corresponding text exists

Checks whether STR is included in mystr, returning the starting index if it is, or -1 otherwise.

Mystr.find (STR, start=0, end=len(mystr))

>>> name = "my name is MR_LP"
>>> name.find("MR_LP"11) > > >Copy the code

2.5.4 index() : Queries whether the corresponding text exists

Check whether STR is included in mystr, if it returns the starting index value, otherwise an error is reported

Mystr.index (STR, start=0, end=len(mystr))

>>> name = "my name is MR_LP"
>>> name.index("MR_LP")
11
>>> name.index("ZZZ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found
>>> 
Copy the code

2.5.5 count() : Returns the number of occurrences of characters

Returns the number of occurrences in mystr between start and end

Mystr.count (STR, start=0, end=len(mystr))

>>> name = "my name is MR_LP"
>>> name.count('m'2) > > >Copy the code

2.5.6 Replace () : Replace the text no more than count times

Replace str1 in mystr with str2, not more than count times if count is specified.

Replace (str1, str2, mystr.count(str1))

>>> name = "my name is MR_LP"

>>> name.replace("MR_LP"."233")
'my name is 233'

>>> name.replace("m"."233"1),'233y name is MR_LP'
>>> 
Copy the code

2.5.7 split() : Splits the string according to the delimiter

Slice mystr with STR as the delimiter, separating only maxsplit substrings if maxsplit has a specified value

Mystr.split (STR =” “, 2)

>>> name = "my name is MR_LP"

>>> name.split(' ')
['my'.'name'.'is'.'MR_LP']

>>> name.split(' ', 2) ['my'.'name'.'is MR_LP'] > > >Copy the code

2.5.8 Capitalize () : Capitalize the first letter of a string

Capitalize the first character of the string

Standard syntax: mystr.capitalize()

>>> name = "my name is MR_LP"

>>> name.capitalize()
'My name is mr_lp'
>>> 
Copy the code

2.5.9 Title () : Capitalize the first letter of each word in the string

Capitalize each word of the string

Standard syntax: name.title()

>>> name = "my name is MR_LP"

>>> name.title()
'My Name Is Mr_Lp'
>>> 
Copy the code

2.5.10 startswith() : checks whether the string startswith a certain content

Checks if the string begins with obj, returning True if it does, and False otherwise

Standard syntax: mystr.startswith(obj)

>>> name = "my name is MR_LP"

>>> name.startswith('my')
True
>>> name.startswith('My')
False
>>> 
Copy the code

2.5.11 endswith() : checks whether the string endswith a certain content

Checks if the string ends in obj, True if so, False otherwise

Standard syntax: mystr.endswith(obj)

>>> name = "my name is MR_LP"

>>> name.endswith('LP')
True
>>> name.endswith('lp')
False
>>> 
Copy the code

2.5.12 lower() : Uppercase characters are lowercase characters

Convert all uppercase characters in mystr to lowercase

Mystr.lower ()

>>> name = "my name is MR_LP"

>>> name.lower()
'my name is mr_lp'
>>> 
Copy the code

2.5.13 All Uppercase Characters are uppercase

Converts all uppercase characters in mystr to uppercase

Standard syntax: mystr.upper()

>>> name = "my name is MR_LP"

>>> name.upper()
'MY NAME IS MR_LP'
>>> 
Copy the code

2.5.14 ljust() : Left-align and fill the string

Return a left-aligned string and fill it with Spaces to a new string of length width

Standard syntax: mystr.ljust(width)

>>> s = "lipeng"

>>> s.ljust(10)
'lipeng '
>>> 
Copy the code

2.5.15 Rjust () : Right-align and fill the string

Return a right-aligned string filled with Spaces to the new string of length width

Standard syntax: mystr.rjust(width)

>>> s = "lipeng"

>>> s.rjust(10)
' lipeng'
>>> 
Copy the code

2.5.16 Center () : Center and fill the string

Returns a centered string filled with Spaces to a new string of length width

Standard syntax: mystr.center(width)

>>> s = "lipeng"

>>> s.center(20)
' lipeng '
>>> 
Copy the code

2.5.17 lstrip() : Clears the left whitespace

Remove the whitespace character to the left of mystr

Standard syntax: mystr.lstrip()

>>> str = ' lipeng '

>>> str.lstrip()
'lipeng '
>>> 
Copy the code

2.5.18 rstrip() : Clears blank characters on the right

Remove the whitespace character to the right of mystr

Standard syntax: mystr.rstrip()

>>> str = ' lipeng '

>>> str.rstrip()
' lipeng'
>>> 
Copy the code

2.5.19 Strip () : Clears blank characters on the left and right sides

Remove whitespace characters on both sides of mystr

Standard syntax: mystr.strip()

>>> str = ' lipeng '

>>> str.strip()
'lipeng'
>>> 
Copy the code

2.5.20 rfind() : Checks whether the corresponding text exists on the right

Check whether the corresponding text exists on the right

Mystr.rfind (STR, start=0,end=len(mystr))

>>> name = "lipeng"

>>> name.rfind("p"2) > > >Copy the code

2.5.21 rindex() : Queries the corresponding text on the right

It’s the same as index(), but it starts on the right

Mystr.rindex (STR, start=0,end=len(mystr))

>>> name = "lipeng"

>>> name.rindex("e")
3
>>> name.rindex("LI")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found
>>> 
Copy the code

2.5.22 Partition () : Divides a string into three parts based on the corresponding text

Split mystr with STR into three parts, before STR, STR, and after STR

Mystr.partition (STR)

>>> s = "i'm lipeng, you can call me MR_LP"

>>> s.partition("can")
("i'm lipeng, you ".'can'.' call me MR_LP') > > >Copy the code

2.5.23 Rpartition () : Splits a string into three parts based on the corresponding text on the right

From the right, mystr is divided into three parts with STR, before STR, STR, and after STR

Syntax: mystr.rpartition(STR)

>>> s = "i'm lipeng, you can call me MR_LP"

>>> s.rpartition("can")
("i'm lipeng, you ".'can'.' call me MR_LP') > > >Copy the code

2.5.24 splitlines() : splitlines

Returns a list of rows as elements, separated by row

Standard syntax: mystr.splitlines()

>>> s = "MR_LP.\nmr_lp"

>>> print(s)
MR_LP.
mr_lp

>>> s.splitlines()
['MR_LP.'.'mr_lp'] > > >Copy the code

2.5.25 isalpha() : Checks whether it contains all letters

Return True if all characters of mystr are letters, False otherwise

Standard syntax: mystr.isalpha()

>>> s = "abc"
>>> s.isalpha()
True

>>> s = "123"
>>> s.isalpha()
False

>>> s = "abc 123"
>>> s.isalpha()
False
Copy the code

2.5.26 isDigit () : Checks whether all digits are digits

Return True if mystr contains only numbers, False otherwise.

Standard syntax: mystr.isDigit ()

>>> s = "abc"
>>> s.isdigit()
False

>>> s = "123"
>>> s.isdigit()
True

>>> s = "abc123"
>>> s.isdigit()
False
Copy the code

2.5.27 isalnum() : Checks whether all numbers or letters are contained

Mystr returns True if all characters are letters or numbers, False otherwise

Standard syntax: mystr.isalnum()

>>> s = "123"
>>> s.isalnum()
True

>>> s = "abc"
>>> s.isalnum()
True

>>> s = "abc123"
>>> s.isalnum()
True

>>> s = "abc 123"
>>> s.isalnum()
False
Copy the code

2.5.28 isspace() : Checks whether only Spaces are included

Return True if mystr contains only Spaces, False otherwise.

Standard syntax: mystr.isspace()

>>> s = "abc123"
>>> s.isspace()
False

>>> s = ""
>>> s.isspace()
False

>>> s = "" 		# Spaces
>>> s.isspace()
True

>>> s = ""	# the TAB key
>>> s.isspace()
True
Copy the code

2.5.28 Join () : Checks whether only Spaces are included

Return True if mystr contains only Spaces, False otherwise.

Standard syntax: mystr.isspace()

>>> s = ""
>>> li = ["my"."name"."is"."LIPENG"]
>>> s.join(li)
'my name is LIPENG'

>>> s = "_"
>>> li = ["my"."name"."is"."LIPENG"]
>>> s.join(li)
'my_name_is_LIPENG'
Copy the code