Transfer:
https://www.cnblogs.com/sctb/…


There are fewer and fewer good things on the Internet. Look and cherish them. Please respect the copyright.

What is configfs?

ConfigFS is a memory-based file system that provides the opposite functionality to sysfs. Sysfs is a file system-based view of kernel objects, while configfs is a file system-based kernel object manager (or config_items).

In sysfs, an object is created in the kernel (for example, when the kernel finds a device) and registered in sysfs, and its properties then appear in sysfs, allowing user-space to read through readdir(3)/read(2), and, Users are also allowed to modify some properties via write(2). It is important to note that objects are created and destroyed in the kernel. The kernel controls sysfs, which represents the life cycle of the kernel object, and sysfs cannot interfere.

The config_item of configfs is created and destroyed by the user space explicit operation mkdir(2) and rmdir(2) The properties of the object appear at mkdir(2) and can be read or modified by read(2) and write(2). Like sysfs, readdir(3) can query a list of items and attributes, and symlink(2) can be used to group items. Unlike sysfs, the lifecycle of the configfs representation is completely user-controlled, and the kernel module that supports these items must respond to user control.

Both sysfs and configfs should exist in the same system, and neither should replace the other.

2. Use configfs

ConfigFS can be compiled as a module or into the kernel. You can access it with the following command:

sudo mount -t configfs none /sys/kernel/config/

The configfs tree is empty unless the client module is also loaded. These modules register their item type in configfs as a subsystem, and once the client subsystem is loaded, it will appear as one (or more) subdirectories under /sys/kernel/config/. As with sysfs, the configfs tree always exists whether it is mounted on /sys/kernel/config/ or not.

Create an Item with mkdir(2). At this point, the attributes of item will also appear, readdir(3) can see which attributes are present, read(2) can query their default values, and write(2) can store the new values.

Note: Do not store multiple properties in a properties file.

There are two types of configfs attributes:

  1. The general property, similar to the sysfs property, is a small ASCII text file with a maximum size of one page (PAGE_SIZE, 4096 on i386). It is a good idea to use only one value per file, the same caveat for sysfs.

    ConfigFS expects write(2) to store the entire buffer at once. When writing to a generic configfs attribute, the user-space process should first read the entire file, modify the desired parts, and then write the entire buffer back.

  2. Binary properties, somewhat similar to the sysfs binary properties, but with a slight semantic change. Not limited by PAGE_SIZE, but the entire binary item must fit into the buffer allocated by a single kernel vmalloc.

    The user-space call to write(2) is buffered, and the write_bin_attribute method for the attribute is called when it is closed, so the user-space must check the return code of close(2) to verify that the operation completed successfully.

    In order to prevent malicious users from OOMing (“out of memory”, overflow attack) into the kernel, each binary property has a maximum buffer value.

When an item needs to be destroyed, delete it with rmdir(2). If an Item is linked (through symlink(2)) to another Item, it cannot be destroyed. You can unlink through unlink(2).

3. Configuring FakenBD: An example

Imagine that there is a Network Block Device (NBD) driver that allows you to access remote block devices, we call it FakenBD. Fakenbd is configured using configfs. Obviously, you need to provide a user-mode program that allows system administrators to easily configure Fakenbd. In order for the configuration of Fakenbd to work, this program must tell the driver the configuration information.

When the FakeNBD driver is loaded, it registers itself in configfs and the user can see it using readdir(3).

ls /sys/kernel/config
  fakenbd

Users can also create fakenbd connections using mkdir(2), with any name. However, the name in the example may already be used by another tool (UUID or disk name).

mkdir /sys/kernel/config/fakenbd/disk1
ls /sys/kernel/config/fakenbd/disk1
  target device rw

The target attribute contains the IP address of the server to which FakeNBD is connecting, and the device attribute is the device on the server. Predictably, the rw attribute determines whether the connection is read-only or read-write.

Echo 10.0.0.1 > / sys/kernel/config/fakenbd/disk1 / target echo/dev/sda1 > / sys/kernel/config/fakenbd/disk1 / device echo 1 > /sys/kernel/config/fakenbd/disk1/rw

That’s it. The device is configured through the shell.

4. Program in configfs

Each object in configfs is a config_item, which is an object in the subsystem whose attributes match the value on that object. ConfigFS handles the file system representation of the object and its properties, allowing the subsystem to ignore all interactions except the basic show/store.

Items are created and destroyed in config_group. A Group is a collection of Items that share the same properties and operations. Items are created by mkdir(2) and deleted by rmdir(2), both of which are handled by configfs, and there is a set of operations in the group to perform these tasks.

The subsystem is the top level of the client module. During initialization, the client module registers the subsystem with ConfigFS, and the subsystem appears as a directory at the highest level (root) of the ConfigFS file system. A subsystem is also a config_group and can do everything config_groups can do.

4.1 The config_item structure

struct config_item {
    char                    *ci_name;
    char                    ci_namebuf[UOBJ_NAME_LEN];
    struct kref             ci_kref;
    struct list_head        ci_entry;
    struct config_item      *ci_parent;
    struct config_group     *ci_group;
    struct config_item_type *ci_type;
    struct dentry           *ci_dentry;
};

void config_item_init(struct config_item *);
void config_item_init_type_name(struct config_item *, const char *name, struct config_item_type *type);
struct config_item *config_item_get(struct config_item *);
void config_item_put(struct config_item *);

In general, the config_item structure is embedded in a container structure that actually represents what the subsystem is doing, and the config_item part of it is how objects interact with configfs.

Whether defined statically in the source file or created by the parent config_group, the creation of config_item must call an _init() function, which initializes the reference counter and sets the appropriate fields.

All users of config_item should refer to it through config_item_get() and discard this reference after completion through the config_item_put() function.

By itself, config_item can only appear in configfs. Typically, a subsystem wants the Item to be able to display and store properties, and do something else, and to do that, it needs a Type structure.

In other words, the CONFIG_ITEM_TYPE structure is mainly used to do something other than display and store attributes.

4.2 The CONFIG_ITEM_TYPE structure

struct configfs_item_operations {
    void (*release)(struct config_item *);
    int (*allow_link)(struct config_item *src, struct config_item *target);
    void (*drop_link)(struct config_item *src, struct config_item *target);
};

struct config_item_type {
    struct module                           *ct_owner;
    struct configfs_item_operations         *ct_item_ops;
    struct configfs_group_operations        *ct_group_ops;
    struct configfs_attribute               **ct_attrs;
    struct configfs_bin_attribute                        **ct_bin_attrs;
};

The most basic function of config_item_type is to define what operations can be performed on config_item. All dynamically allocated items need to provide the ct_item_ops->release() method. When the reference count of config_item is zero, this method is called to free it.

4.3 The CONFIGFS_ATTRIBUTE structure

struct configfs_attribute {
    char                    *ca_name;
    struct module           *ca_owner;
    umode_t                 ca_mode;
    ssize_t (*show)(struct config_item *, char *);
    ssize_t (*store)(struct config_item *, const char *, size_t);
}; 

When a config_item wants an attribute to appear as a file in the configfs directory of the project, it must define a configfs_attribute to describe it. It then adds the attributes to the null-terminated array config_item_type->ct_attrs. When an item appears in configfs, the properties file will appear as the configfs_attribute->ca_name file name, configfs_attribute->ca_mode specifies the file permissions.

If a property is readable and a ->show method is provided, the method (->show) will be called whenever user-space requires a read(2) on the property. If a property is writable and a ->store method is provided, the method (->store) will be called whenever user-space requires a write(2) on the property.

4.4 The CONFIGFS_BIN_ATTRIBUTE structure

struct configfs_bin_attribute {
   struct configfs_attribute     cb_attr;
   void                                             *cb_private;
   size_t                                         cb_max_size;
};

Binary attributes are used when a binary blob is needed to display the contents of a file in the configfs directory corresponding to an item.

Binary Large Object A binary large object is a container in which binary files can be stored.

Add binary attributes to the null-terminated array config_item_type->ct_bin_attrs, and the item will appear in configfs. Configfs_bin_attribute ->cb_attr.ca_name is the name of the attribute file, and configfs_bin_attribute->cb_attr.ca_mode specifies the file permissions.

The cb_private member is provided for use by the driver, and the cb_max_size member specifies the maximum free space of the vmalloc buffer.

If the binary attribute is readable, and config_item provides the ct_item_ops->read_bin_attribute() method, then the method is called whenever user-space requires a read(2) on the attribute. Similarly, a write(2) operation in user space calls the ct_item_ops->write_bin_attribute() method. The reads/writes will be buffered, so only one of the reads/writes will be executed. The property itself is not of concern.

4.5 The CONFIG_GROUP structure

Config_items cannot be created out of nothing. The only way to do this is to create one on config_group via mkdir(2), which triggers the creation of a child item.

struct config_group {
    struct config_item        cg_item;
    struct list_head        cg_children;
    struct configfs_subsystem     *cg_subsys;
    struct list_head        default_groups;
    struct list_head        group_entry;
};

void config_group_init(struct config_group *group);
void config_group_init_type_name(struct config_group *group, const char *name, struct config_item_type *type);

The config_group structure contains a config_item, which properly configured means that a group can stand alone as an item.

In addition, groups can do more: create an item or group, which is done through the config_item_type group operation specified in the group.

struct configfs_group_operations {
    struct config_item *(*make_item)(struct config_group *group, const char *name);
    struct config_group *(*make_group)(struct config_group *group, const char *name);
    int (*commit_item)(struct config_item *item);
    void (*disconnect_notify)(struct config_group *group, struct config_item *item);
    void (*drop_item)(struct config_group *group, struct config_item *item);
};

A group creates subitems by providing the ct_group_ops->make_item() method. If this method is provided, it is called when mkdir(2) is used in the group directory. When the ct_group_ops->make_item() method is called, the subsystem assigns a new config_item (or more likely its container structure), initializes and returns it to configfs, and then, ConfigFS populates the filesystem tree to reflect the new Item.

If the subsystem wants the subitem to be a group itself, the subsystem provides ct_group_ops->make_group(), and everything else is the same, initialized using the group _init() function on the group.

Finally, the ct_group_ops->drop_item() method is called when the user space calls rmdir(2) on item or group. Since config_group is also a config_item, there is no need for a separate drop_group() method. The subsystem must call the config_item_put() function to free the reference initialized when the item is allocated. If the subsystem does not need to do anything other than this, it can omit the ct_group_ops->drop_item() method. Configfs calls the config_item_put() method on the item on behalf of the subsystem.

Important: The return value of drop_item() is void, so it cannot fail. When rmdir(2) is called, configfs removes the item from the file system tree (assuming no child items are using it), and the subsystem is responsible for responding to this action. If the subsystem has a reference to this item in other threads, then memory is safe, and it may take some time before the item actually disappears from the subsystem, but it has already disappeared from configfs.

When drop_item() is called, the link to the item has been removed, it no longer has a reference to the parent item, and it has no place in the item hierarchy. If the client needs to do some cleanup before this split occurs, the subsystem can implement the ct_group_ops-> disCONNECT_NOTIFY () method. This method is called after configfs removes the item from the filesystem structure and before the item is removed from the parent group. Like drop_item(), disconnect_notify() returns void and cannot fail. The client system should not delete any references here, as this must be done in the drop_item().

When a config_group has a child item, it cannot be removed, as is implemented in the configfs rmdir(2) code. ->drop_item() will not be called because the item has not been deleted and rmdir(2) will fail because the directory is not empty.

4.6 Configfs_subsystem subsystem

A subsystem must register itself, usually at module_init, which tells configfs to make the subsystem appear in the file tree.

struct configfs_subsystem {
    struct config_group    su_group;
    struct mutex        su_mutex;
};

int configfs_register_subsystem(struct configfs_subsystem *subsys);
void configfs_unregister_subsystem(struct configfs_subsystem *subsys);

A subsystem consists of a top-level config_group, where the subconfig_items are created, and a mutex. For a subsystem, the group is usually statically defined and before calling configfs_register_subsystem() the subsystem has to initialize the group through the group _init() function and it also has to initialize the mutex.

When the call to the registration function returns, the subsystem persists and can be seen in configfs. At this point, the user program can call mkdir(2), and the subsystem must be ready for it.

5. An example

Understand the basic concept is the best example samples/configfs/configfs_sample in c simple_children subsystem/group and simple_child item, They present a simple object to display and store properties, and a simple Group to create and destroy these sub-items.

configfs_sample.c :
https://github.com/torvalds/l…

6. Hierarchical navigation and subsystem mutual exclusion

ConfigFS also provides some additional functionality. Since config_groups and config_items appear in the file system, they are arranged in a hierarchy. A subsystem will never touch the file system part, but the subsystem may be interested in this hierarchy. For this reason, hierarchies are represented by config_group->cg_children and config_item->ci_parent structure members.

The subsystem can browse the cg_children list and the ci_parent pointer to see the tree created by the subsystem. This can conflict with ConfigFS’s management of hierarchies, so ConfigFS uses mutex of the subsystem to protect changes. Whenever a subsystem browses a hierarchy, it must do so under the protection of the subsystem mutex.

The subsystem will not be able to obtain mutex when a newly allocated Item has not been linked to the hierarchy, and the subsystem will not be able to obtain mutex when an Item that is being deleted has not been unlinked. This means that when an item is in configfs, the ci_parent pointer to the item will never be NULL, and that the item will only be in the cg_children list of one parent item at a time. This allows the subsystem to trust ci_parent and cg_children when holding mutex.

7. Summarize items through Symlink (2)

ConfigFS provides a simple group for parent/child relationships with Group-> items, but, in general, aggregations outside of parent/child relationships are required in larger environments, which is achieved through symlink(2).

A config_item can provide ct_item_ops->allow_link() and ct_item_ops->drop_link() methods. If the ->allow_link() method exists, then symlink(2) can be called, using config_item as the source of the link. These links are only allowed between config_items in configfs, and any symlink(2) calls outside the configfs filesystem will be rejected.

When symlink(2) is called, the source config_item’s ->allow_link() method is called both by itself and by a target item, returning 0 if the source item allows linking to the target item. If the source Item only wants to link to an object of a certain type (for example, an object in its own subsystem), it can reject the link.

When unlink(2) is called on a symbolic link, the source item is notified by the ->drop_link() method. Like the ->drop_item() method, this is a void function that cannot fail. The subsystem is responsible for responding to the changes caused by the execution of this function.

A config_item cannot be deleted when it is linked to any other item, nor can it be deleted when an item is linked to it. Soft linking is not allowed in configfs.

8. Automatically create groups

A new config_group might want to have two types of child config_items, and while this can be written as magic names in ->make_item(), a more explicit approach would be for user space to see the difference.

ConfigFS provides a way to automatically create one or more subgroups within a parent group when it is created, rather than placing items that behave differently in the same group. Thus, mkdir(“parent”) results in “parent”, “parent/subgroup1”, up to “parent/subgroupN”. Now, an item of type 1 can be created in the directory “parent/subgroup1”, and an item of type N can be created in the directory “parent/subgroupN”.

These automatically created sub-groups, or default groups, do not affect other sub-groups of the parent group. If ct_group_ops->make_group() exists, other sub-groups can also be created directly on the parent group.

The configfs subsystem specifies default groups by adding them to the parent config_group structure using the configfs_add_default_group() function. Each added Group is populated into the ConfigFS tree at the same time as the parent Group. Likewise, they are deleted at the same time as the parent group, without additional notification. When a ->drop_item() method call notifies the subsystem that the parent group is about to disappear, it means that every default child group associated with that parent group is also about to disappear.

Therefore, the default group cannot be deleted directly through rmdir(2), nor are subgroups considered (default groups) when their parent group’s rmdir(2) checks them.

9. Subsystems

Sometimes, some drivers depend on a specific ConfigFS item, for example, mounting OCFS2 depends on the heartbeat area item, and if you remove this area item with rmdir(2), the OCFS2 mount will fail or switch to readonly mode.

ConfigFS provides two additional API calls: Configfs_depend_item () and configfs_undepend_item(), the client driver can call configfs_depend_item() on an existing item to tell configfs that it is dependent. If another program rmdir(2) the item, configfs will return -ebusy, and the client driver will call configfs_undepend_item() to remove the dependency when the item is no longer dependent.

These APIs cannot be called in any of the configfs callbacks because they conflict; however, they can block and redistribute. The client driver should not invoke them on its own instinct; it should provide an API for external subsystem calls.

How does this work? Imagine the mount process for OCFS2. When it is mounted, it asks for a heartbeat area item, which is done by calling the heartbeat code. In the heartbeat code, the region item is found, and the heartbeat code calls configfs_depend_item(). If this succeeds, the heartbeat code knows that the region is safe and can be handed to ocfs2. If this fails, ocfs2 will be uninstalled. The heartbeat code elegantly passes an error.

10. Items can be submitted

Note: Submission items are not currently used.

Some config_items cannot have a valid initial state; that is, they cannot assign default values to the properties of the item (the default values are specified for the item to be effective). One or more properties must be configured in the user space before the subsystem can start the entities represented by the item.

Consider the FakeNBD device above. Without the destination address and destination device, the subsystem does not know what block device to import. This example assumes that the subsystem simply waits until all properties are configured before starting the connection. It is possible to check each property store operation to see if the property is initialized, but this results in each property store operation having to trigger a connection if the condition is met (the properties are all initialized).

A better approach is to notify the subsystem config_item is ready with an explicit operation. More importantly, explicit operations allow the subsystem to provide feedback on whether properties have been initialized in a reasonable way, and configfs provides this feedback in the form of commitable items.

ConfigFS still uses only normal file system operations, and an item submitted through rename(2) is moved from a modifiable directory to an unmodifiable directory.

Any group that provides the ct_group_ops->commit_item() method will have a commit_item. When this group appears in configfs, mkdir(2) will not work directly in the group. Instead, The group will have two subdirectories, “live” and “pending”. The “live” directory does not support mkdir(2) or rmdir(2), it only allows rename(2), and the “pending” directory allows mkdir(2) and rmdir(2). If an item is created in the “pending” directory, its properties can be changed at will, and the user-space commits by renaming the item to the “live” directory, at which point the subsystem receives the ->commit_item() callback. If all the required properties are filled, the method returns 0 and the item is moved to the “live” directory.

Since rmdir(2) does not work in the “live” directory, an item must be closed, or “uncommitted”. Again, this is done with rename(2), This time go back from the “live” directory to the “uncommitted” directory and notify the subsystem with the ct_group_ops->uncommit_object() method.

Reference:
https://www.kernel.org/doc/Do…