This is the second day of my participation in the First Challenge 2022.

lstripThe source code

(In order to make it easier for beginners to understand the principles behind the C language, this article has made a lot of simplification, and everything is subject to the source code and documentation.)

The underlying source for the parameterized (L /r)strip method is the do_xstrip function, located in the Objects/stringobject.c file.

Here is a brief explanation of this function, again using the line in the title:

char *s = PyString_AS_STRING(self);
Py_ssize_t len = PyString_GET_SIZE(self);
char *sep = PyString_AS_STRING(sepobj);
Py_ssize_t seplen = PyString_GET_SIZE(sepobj);
Copy the code

The first 4 lines of the function get the content and length of the string and (L /r)strip arguments, s, len, sep, and seplen, respectively.

To be precise, s and sep are C Pointers to the memory address of the first character. A string is represented in C as a contiguous array of characters, and the char type is 1 byte in size. In other words, s points to the 0th character, s + 1 points to the first character, and so on.

Py_ssize_t i, j;

i = 0;
if(striptype ! = RIGHTSTRIP) {while (i < len && memchr(sep, Py_CHARMASK(s[i]), seplen)) {
        i++;
    }
}

j = len;
if(striptype ! = LEFTSTRIP) {do {
        j--;
    } while (j >= i && memchr(sep, Py_CHARMASK(s[j]), seplen));
    j++;
}
Copy the code

Here two Pointers I and j in opposite directions are defined for scanning. While uses the C library memchar function, which is worth mentioning here:

Memchar takes three arguments:

  1. ptrPointer to the block of memory to perform the search;
  2. value— The value to be located. The value asintPass, but when performing a byte by byte search, the function treats this value asunsigned charType. (abovePy_CHARMASKWhat you do is you take
    [ 128 . 127 ] [- 128, 127]

    [ 0 . 255 ] [0, 255]
    A character or integer in a range is forcibly converted to unsigned character type.
  3. num— The length of the search bytes.

When I < len and the character corresponding to index I exists in SEP, we increment I by 1; I will be the starting position of the string returned later.

Since we are talking about lstrip, the second if in the code snippet above will not be explained, but the principle is basically the same.

if (i == 0 && j == len && PyString_CheckExact(self)) {
    Py_INCREF(self);
    return (PyObject*)self;
}
else
    return PyString_FromStringAndSize(s+i, j-i);
Copy the code

Finally comes the return phase; If the string has not changed, it returns itself, self. Before returning, the PY_INCREF function increments the reference count of self to prevent the object from being garbage collected.

If the string has changed, PyString_FromStringAndSize is called to create a new string and return. The first argument to this function is the new string (that is, the address value of the starting character) and the second argument is the length of the string in bytes.

Complete source code

Py_LOCAL_INLINE(PyObject *)
do_xstrip(PyStringObject *self, int striptype, PyObject *sepobj)
{
    char *s = PyString_AS_STRING(self);
    Py_ssize_t len = PyString_GET_SIZE(self);
    char *sep = PyString_AS_STRING(sepobj);
    Py_ssize_t seplen = PyString_GET_SIZE(sepobj);
    Py_ssize_t i, j;

    i = 0;
    if(striptype ! = RIGHTSTRIP) {while (i < len && memchr(sep, Py_CHARMASK(s[i]), seplen)) {
            i++;
        }
    }

    j = len;
    if(striptype ! = LEFTSTRIP) {do {
            j--;
        } while (j >= i && memchr(sep, Py_CHARMASK(s[j]), seplen));
        j++;
    }

    if (i == 0 && j == len && PyString_CheckExact(self)) {
        Py_INCREF(self);
        return (PyObject*)self;
    }
    else
        return PyString_FromStringAndSize(s+i, j-i);
}
Copy the code

What does the documentation say?

lstrip(self, chars=None, /)

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

From help(str.lstrip), we can tell that the chars argument is a string specifying the character set to be removed. In simple terms, lstrip traverses the string from left to right, removing characters that exist in chars until they do not match, and finally returning the rest of the string.

PEP 616 — String method for removing prefixes and suffixes

Because (L/R) strips are often confusing for beginners (there are plenty of people on StackOverflow, Bug Tracker, Python-Ideas, and more who ask about them; A lot of people thought it was a Python bug), hence PEP 616.

The proposal suggests adding two methods to fix this: RemovePrefix and Removesuffix. Prior to this, removing the prefix/suffix function was something we had to implement ourselves.

When type(self) is type(prefix) is type(suffix) is STR, the behavior of these two methods is as follows:

def removeprefix(self: str, prefix: str, /) - >str:
    if self.startswith(prefix):
        return self[len(prefix):]
    else:
        return self[:]
Copy the code
def removesuffix(self: str, suffix: str, /) - >str:
    # suffix='' should not call self[:-0].
    if suffix and self.endswith(suffix):
        return self[:-len(suffix)]
    else:
        return self[:]
Copy the code

(Conceptual code, of course, the bottom is C language implementation)

benefits

  1. Reduce code vulnerability

    Python users do not have to manually calculate the length, offset, and so on of the text.

  2. More efficient

    This code does not call the built-in len function, nor does it call the more expensive str.replace method.

  3. More descriptive

    These methods provide a more advanced API and improve the readability of the code compared to traditional string slicing methods.

The proposal was implemented in Python version 3.9, and you can see the above two methods in the STR, bytes, bytearray, and Collections. UserString classes.

conclusion

STR.(l/r)strip doesn’t remove prefixes/suffixes very well, so don’t use it to poison your code.

Related articles:

  • See the new syntax in Python 3.11: try-except*
  • A new Python 3.10 feature: Python now supports switch-case statements? !

The resources

  1. Svn.python.org/projects/py…
  2. Docs.python.org/3/extending…
  3. Docs.python.org/3/c-api/ref…
  4. Docs.python.org/3/c-api/int…
  5. Docs.python.org/3/glossary….
  6. Docs.python.org/3/glossary….
  7. www.python.org/dev/peps/pe…