Today in writing a small program to check the input data type, found this problem is very interesting, can test you to think about the problem is comprehensive, the data structure in Python understanding is accurate. You can think about it and try to write an implementation of this method.

The background of the problem is that we read a configuration from a YAML format file, and we need to check that the configuration is formatted, and for places that support input lists, such as:

scripts:
  - abcCopy the code

If there is only one item in the list, you can write it as well

scripts: abcCopy the code

And some fields may accept many different formats, such as simply entering a string or an object that contains more information. You need to check if any of the formats can be matched at check time.

The method interface to implement looks like this:

def check_type(value, type):
    ...Copy the code

Where value is a plain Python value, which may include lists, dictionaries, their nesting, and so on. The type to check is interesting. It can specify the type of the value in the list, as generics like Java do. Here’s how we define it:

type ::= python_type | None | (type, type, ...) | () | [] | [type] | {} | {"key": type, "key": type, ... }Copy the code

That is, type can be:

  1. Any valid Python type, such as STR, int, list, etc. Note that object can match all possible values (including None)
  2. None, can only match None. Since None is an instance of a special type in Python, it is equivalent to types.nonetype.
  3. A tuple in which each element is a valid type (note: not necessarily a Python type, but a recursively defined type). Indicates that any of the types can be matched, and returns the result of the first successful match. For example, (STR, None) can match STR type, or None. (STR, int) can match STR or int. (STR, [STR]) can match STR or [STR].
  4. In particular, if an empty tuple () represents any value other than None.
  5. A list containing only one element, which is a valid type. It matches two types of values: a list or a tuple, requiring all elements to match the internal type, and automatically converting the tuple to a list on return. If it is not a list value, the value must match the internal type, which is automatically converted on return to a list containing one element, such as “ABC”, which successfully matches [STR] and returns [” ABC “]. Note that [list] can only match [[1]], not [1] — automatic conversion is not allowed in ambiguous cases. Also note that [[STR]] matches “ABC” — because “ABC” matches STR, which matches [STR], and therefore matches [[STR]]. It’s a little weird but it’s allowed.
  6. If it is an empty list, it is equivalent to [object] — notice that it means that if the position is a value other than a list or tuple, it is converted to a list containing a value; Otherwise, keep the list.
  7. If it is a dictionary, the value must be a dictionary.
  8. If the dictionary contains keys, the type of the corresponding key in the dictionary is further checked. The value corresponding to each key in the Type dictionary must also be a valid type. Key must be a string. To make this feature more useful, we specify:

    1. If key starts with? At the beginning, like “? “ABC” : indicates that “ABC” is an optional key. It may not exist, but if it does exist, it must match the corresponding type.
    2. If key starts with! Start something like “! “ABC” : indicates that “ABC” is a key that must exist. If the key does not exist, the match cannot be successful.
    3. If the key starts with ~, for example, “~ ABC.*def”, the key of any regular expression that matches ~ must conform to the corresponding type. The exception is that if a key already matches the first two cases, the rule of regular expression is not considered. In this way, all other keys that are not defined can be matched with “~”. You can then use “~”: NoMatchType (NoMatchType is a user class that cannot be constructed, so no object is an instance of it) to prevent an out-of-definition key.
    4. Other cases, as in front of! , which is the required key.

    I don’t know if THAT’s clear, but let’s take a few examples:

    > > > check_type (" ABC ", STR) "ABC" > > > check_type ([1, 2, 3], [int]) [1, 2, 3] > > > check_type ((1, 2, 3), (int)) [1, 2, 3] > > > check_type ([1, 2, "ABC"], [int]) (Exception...). >>> check_type("abc", [str]) ["abc"] >>> check_type(None, str) (Exception...) > > > check_type (None, (STR, None)) None > > > check_type ([1, 2, "ABC", [" def ", "ghi"]], [(int, (STR))]) [1, 2, [" ABC "], [" def ", "ghi"]] > > > check_type ({" ABC ": 123," def ":" ghi "}, {" ABC ": int," def ": str}) {"abc":123, "def":"ghi"} >>> check_type({"abc": {"def": "test", "ghi": 5}, "def": 1}, {"abc": {"def": str, "ghi": int}, "def": [int]}) {"abc": {"def": "test", "ghi": 5}, "def": [1]}Copy the code

    Clever reader, have you figured out how to do this? Think about it, practice it, see what weird situations your implementation can handle, and think about which inputs are buggy.

    tip

    Think before you look

    There is a huge catch to this question.

    In Python, types like list and dict can contain themselves by holding references, forming recursive structures such as:

    a = []
    a.append(a)Copy the code

    By our definition, it is a perfectly legal type that can match any layer of arbitrarily structured parenthes-only data, such as [[],[[]],[],[]]].

    More seriously, the input data may also contain such recursion. Imagine what would happen if your program executed the following statement.

    a = []
    a.append(a)
    check_type(a, a)Copy the code

    Rules that require input data to be modified and returned, such as “ABC” returning [” ABC “] when matching [STR], also need to be valid for recursive data, such as

    my_type = []
    my_type.append(([str], my_type))
    
    my_data = ["abc"]
    my_data.append(my_data)
    
    check_type(my_data, my_type)
    # should return [["abc"], [["abc"], ...]]Copy the code

    The recursive structure must also be returned after modification, which can be an implementation difficulty.

    The same object that contains a recursive definition may match different types to generate different values, not necessarily a source object corresponds to a destination object:

    my_type = {} my_type["abc"] = my_type my_type["def"] = [my_type] my_data = {} my_data["abc"] = my_data my_data["def"] = my_data check_type(my_data, my_type) # should return {"abc": {"abc": {... }, "def": [{...}]}, "def": [{"abc": {...}, "def": [{...}]}]}Copy the code

    Does your program handle these situations correctly?

    As an aside, many people may not know that YAML really allows such a recursive structure to be input as long as anchor is used:

    >>> import yaml >>> data=""" ... &my_data ... abc: ... abc: 1 ... def: *my_data ... """ >>> a = yaml.safe_load(data) >>> a {'abc': {'abc': 1, 'def': {... }}} >>> a['abc']['def'] {'abc': {'abc': 1, 'def': {... }}}Copy the code

    Well, it’s not your fault that recursive structures don’t work well, you see CPython himself

    >>> a = []
    >>> a.append(a)
    >>> b = []
    >>> b.append(b)
    >>> a == b
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RecursionError: maximum recursion depth exceeded in comparisonCopy the code