Do you really know what Python strings are? In Python, a string is an immutable sequence of uniocde-encoded characters that has some operations in common with other sequences, such as determining the presence of an element, concatenating a sequence, slicing, finding the length, minimizing, index position of an element, occurrence, and so on.

In addition, it has a lot of special operations that we need to learn more about, so today I’m going to continue to talk about strings.

This article focuses on Python string manipulation methods, such as its concatenation, splitting, substitution, lookup, and character determination, and highlights some possible pitfalls. Finally, two extended thoughts: why doesn’t a Python string have some of the operations that a list type does, and why doesn’t it have some of the operations that a Java string does? Hopefully, this will give you a better understanding of how Python strings are used.

0. Concatenate the string

String concatenation is the most commonly used operation, and I wrote an article on this topic called “Seven Ways to Concatenate Strings in Python.”

Here, a brief review: Seven concatenation methods can be divided into three types from the implementation principle, that is, formatting class (% placeholder, format(), template), concatenation class (+ operator, class ancestor mode, join()) and interpolation class (F-string), in use, I have the following suggestions —

Join () is used when dealing with sequence structures such as lists of strings. When the splicing length is less than 20, the + sign operator is used. If the length is greater than 20, f-string is used for earlier versions. Format () or Join () is used for earlier versions.

I can’t say there are only seven ways to concatenate strings, but they are the most common. One of the things I missed is that string multiplication can repeatedly concatenate itself. Yes, from the results, this is the eighth way of stitching, as a complement.

One additional recommendation for string concatenation is to avoid using these native methods in complex scenarios and instead use a powerful external processing library. For example, when concatenating SQL statements, different query statements are often assembled according to different conditional branches, and different variable values have to be inserted. Therefore, when faced with such complex scenarios, traditional concatenation only increases the complexity of the code, reducing readability and maintenance. Using the SQLAlchemy module will solve this problem effectively.

1. Split the string

Among the several methods of concatenating strings, the join() method concatenates a list of string elements into a long string, while the split() method splits a long string into a list. As mentioned earlier, strings are immutable sequences, so string splitting takes place on the copied string and does not alter the original string.

The split() method takes two arguments. The first argument is the delimiter, which is the character used to separate the string. By default, all null characters, including Spaces, newlines (\n), tabs (\t), and so on. The splitting process consumes delimiters, so no delimiters are included in the splitting result.

s = 'Hello world'
l = ' ''Hi there, my name is Python cat Do you like me? '' '

# When no arguments are passed, the default delimiter is all null characters
s.split() >>> ['Hello'.'world']
s.split(' ') > > > ['Hello'.'world']
s.split(' ') > > > ['Hello world'] # There are no two Spaces
s.split('world') > > > ['Hello'.' ']

# Null characters include Spaces, multiple Spaces, newlines, and so on
l.split() >>> ['Hi'.'there'.', '.'my'.'name'.'is'.'the Python cat'.'Do'.'you'.'like'.'me'.'? ']
Copy the code

The second argument to the split() method is a number. The default is the default, which is all split. Maxsplit can also be used to specify the number of splits.

# pass parameters by position
l.split(' 'And 3) > > > ['Hi'.'there'.', '.'My name is Python cat \nDo you like me? \n']

# specify the parameter to pass
l.split(maxsplit=3)
>>> ['Hi'.'there'.', '.'My name is Python cat \nDo you like me? \n']

# Incorrect usage
l.split(3)
---------------
TypeError  Traceback (most recent call last)
<ipython-input-42-6c16d1a50bca> in <module>()
----> 1 l.split(3)
TypeError: must be str or None, not int
Copy the code

The split() method traverses from left to right, as opposed to the rsplit() method, which traverses from right to left and is less commonly used, but can work wonders.

There is another way to split strings, splitlines(), which splits the string by line. It takes an argument, True or False, to determine whether the newline is preserved. The default is False, meaning that the newline is not preserved.

# Default does not keep newlines
'ab c\n\nde fg\rkl\r\n'.splitlines()
>>> ['ab c'.' '.'de fg'.'kl']

'ab c\n\nde fg\rkl\r\n'.splitlines(True)
>>> ['ab c\n'.'\n'.'de fg\r'.'kl\r\n']
Copy the code

2. Replace the string

The replacement string includes the following scenarios: case replacement, special symbol replacement, and custom fragment replacement……

Again, strings are immutable objects, and the following operations do not alter the original string.

These methods are very clear, easy to use, I suggest you try it yourself. The strip() method is only mentioned here. It is a common method that removes Spaces before and after strings, and also removes specified characters at the beginning and end of strings.

s = '******Hello world******'
s.strip(The '*') > > >'Hello world'
Copy the code

3. Search for strings

It’s a common operation to find out if a string contains something. There are several ways to do this in Python, such as the built-in find() method, but this method is not used very often because it simply tells you the index location of what you are looking for, and in general, that location is not our goal.

The find() method is the same as the index() method, except that it returns -1 and throws an exception when the content is not found:

s = 'Hello world'

s.find('cat') >>>  -1

s.index('cat') 
>>> ValueError  Traceback (most recent call last)
<ipython-input-55-442007c50b6f> in <module>()
----> 1 s.index('cat')

ValueError: substring not found
Copy the code

The above two methods can only be used to meet the simplest search requirements. In practice, we often look for content in a particular pattern, such as a date string in a certain format, which requires more powerful finding tools. Regular expressions, which are used to customize matching rules, and the RE module, which provides methods like match(), find(), and findAll () that combine to implement complex look-ups, are just such tools. I’ll cover these two tools in more detail in the future, but here’s a simple example:

import re
datepat = re.compile(r'\d+/\d+/\d+')
text = 'Today is 11/21/2018. Tomorrow is 11/22/2018.'
datepat.findall(text)
>>> ['11/21/2018'.'11/22/2018']
Copy the code

4. Character judgment

Determining whether a string contains certain characters is also common. For example, when registering for a website and requiring a user name to contain only letters and numbers, verifying the input requires determining whether it contains only these characters. Other common judgment operations are listed as follows:

5. What strings can’t do

These are all Python string manipulation methods. After reading this, you should have a better idea of what Python can do.

But that’s not enough to answer the question of the title of this article — do you really know how to use Python strings? These specific operations, combined with the sequence common operations mentioned in the previous article, string reading and writing files, string printing, string Intern mechanism, and so on, pretty much answer the question.

For the sake of rigor, though, I’ll try to answer the question from the opposite dimension by revisiting “What Python strings don’t do.” Here are some moments to open your mind and brainstorm:

(1) Restricted sequence

In contrast to typical sequence types, strings do not have the following operations for lists: Append (), clear(), copy(), insert(), pop(), remove(), and so on. Why is that?

A few are well understood, append(), insert(), pop(), and remove(), all of which operate on single elements. However, a single element in a string is a single character, which usually doesn’t make sense, and we don’t add or remove it very often, so it makes sense that a string doesn’t have any of these methods.

The list’s clear() method empties the list to save memory and is equivalent to anyList [:] = [], but, oddly, Python does not support emptying/deleting.

First, the string does not have a clear() method, and second, it is immutable. The assignment anystr[:] = ” is not supported, nor is del anystr[:] :

s = 'Hello world'

s[:] = ' '>>> Error: TypeError:'str'Object does not support item Assignment del s[:] >>>'str' object does not support item deletion
Copy the code

Of course, you can’t delete a string by del s, because the variable s name is just a reference to the string object (dig a hole, write about that later), just a label, and deleting the label doesn’t directly cause the object entity to die.

There seems to be no solution to manually emptying/deleting Python strings.

Finally, there is a copy() method, which is copying, but strings don’t have this method either. Why is that? Isn’t there a scenario for copying strings? At this point, I can’t think of a way to put the question aside.

Comparing several common list operations above, we can see that the string sequence is quite limited. A list can be thought of as a train of cars linked together, while a string feels like a long train of seats linked together.

(2) Compare, who is afraid of who

Next, it’s Python strings versus Java strings. In the previous article, Do you Really know what Python strings are? They’ve had two rounds of object definition, and the balance has tipped in Python’s favor, so let’s see what happens this time.

Java has methods for comparing strings, namely compareTo(), which compares the character encodings of two strings one by one and returns the difference of an integer, and equals(), which compares the contents of two strings as a whole.

Python strings don’t have these two separate methods, but it’s easy to implement similar functionality. Let’s take an example:

myName = "Python cat"
cmpName = "world"
newName = myName

Compare directly with the compare symbolmyName > cmpName >>> False myName == newName >>> True cmpName ! = newName >>> True# compare whether the object is the same
myName is cmpName
>>> False
myName is newName
>>> True
Copy the code

In the above example, these comparisons can also be made if the assigned string is replaced by a list or other object. That said, the ability to compare is a basic capability of Python citizens, and does not impose limits on you or give you privileges just because you are a string.

Similarly, Python citizens come with the ability to find their own length. Len () is a built-in method that can be passed to any sequence argument to find the length. In Java, different sequence objects are required and only the respective length() methods can be called. To put it figuratively, Python uses one scale, which can be weighed by anyone from all walks of life. In Java, there are many scales. You weigh yours, AND I weigh mine.

Python used to have CMP () methods and __cmp__() magic methods, but they were officially deprecated and removed in Python 3. The operator module still has a pulse of incense for it, but it may one day be scrapped altogether.

import operator
operator.eq('hello'.'name')
>>> False
operator.eq('hello'.'hello')
>>> True
operator.gt('hello'.'name')
>>> False
operator.lt('hello'.'name')
>>> True
Copy the code

(3) The door on the wall

In Java, strings also have a powerful valueOf() method that can take arguments of various types, such as Boolean, char, char array, double, float, int, and so on, and then return the string type of those arguments. For example, to convert an int to a String, you can use string.valueof (anynum).

Python strings still don’t have this separate method, but it’s easy to do the same thing. For Python, converting different data types to strings is a piece of cake, for example:

str(123) > > >'123'
str(True) > > >'True'
str(1.22) > > >'1.22'
str([1.2) > > >'[1, 2]'
str({'name':'python'.'sex':'male'})
>>> "{'name': 'python', 'sex': 'male'}"
Copy the code

Converting from a string to another type is not difficult, for example, int(‘123’) can get the number 123 from the string ‘123’. In contrast to Java, this operation is written as integer.parseint (‘123’).

In Java, the wall of separation between different data types stands so high that you need a higher drawbridge to connect the two sides, whereas in flexible Python, you can easily open the door in the wall and walk back and forth.

To recap, Python strings do have fewer methods than Java strings, but for good reason, they have a lot of talent, and all of these operations can be implemented simply. On the one hand, Python strings can’t do certain things, but on the other hand, Python can do these things very well.

6. Summary

In this article, we will introduce Python string manipulation methods, such as concatenation, splitting, substitution, lookup, and character determination. Finally, we also answered the question in reverse: What can’t Python strings do? Some things you can’t do, in fact, don’t do, in order to work better elsewhere, and after all, Python strings have everything you should have.

In this article, Python is still compared to Java, and there are a few small differences that reflect differences in the worldview of the two language systems. The ancients said, with copper as a mirror, you can dress. Then, in the world of programming languages, it is better to look at another language as a mirror. I hope this cross-language collision of thinking can give you a spark of wisdom.

Finally, a bonus moment: This official account (Python Cat), sponsored by Tsinghua University Press, will be raffle off two new books entitled “Python Machine Learning in Plain English” until 18:18, November 29th. Click this link to participate now.

—————–

This article was originally published on the wechat public account [Python Cat]. The background replies “Love learning”, and you can get 20+ selected e-books for free.

Read more:

7 Ways to concatenate Strings in Python

Do you Really know what Python strings are?

Java string comparison method:

Blog.csdn.net/barryhappy/…

Python3 why do Python3 cancel CMP methods:

www.zhihu.com/question/47…