• Tutorial-write a Shell in C
  • By Stephen Brennan
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: nettee
  • Proofread by: Kasheemlew, Jake Ggie

It’s easy to think you’re “not a real programmer.” There are some programs that everyone uses, and their developers are easily put on a pedestal. While developing large software projects is not easy, many times the basic idea of such software is simple. Implementing such software yourself is a fun way to prove that you can be a real programmer. So, this article describes how I wrote my own simple Unix shell in C. I hope that other people will also feel this funny way.

The shell described in this article, called LSH, is available at GitHub for source code.

Attention, school students! Many classes have assignments that require you to write a shell, and some teachers know the tutorials and code. If you are a student in one of these classes, please do not copy (or copy and modify) the code here without permission. I advise against relying heavily on this tutorial.

The basic Shell life cycle

Let’s look at a shell from the top down. A shell does three main things in its life cycle.

  • Initialization: In this step, the shell typically loads and executes its configuration file. These configurations change the behavior of the shell.
  • Explain execution: The shell then reads commands from standard input (which may be interactive input or a file) and executes them.
  • Terminate: When all commands have been executed, the shell executes a close command, frees all memory, and then terminates.

These three steps are too broad to apply to any program, but we can apply them to the foundation of our shell. Our shell will be simple, requiring no configuration files and no close commands. So, we just need to call the loop function and terminate. For architecture, however, we need to remember that the life cycle of an application is not just a cycle.

int main(int argc, char **argv)
{
  // Load the configuration file if it exists.

  // Run the command loop
  lsh_loop();

  // Do some shutdown and cleanup.

  return EXIT_SUCCESS;
}
Copy the code

Here you can see that I just wrote a function: lsh_loop(). This function loops and interprets the execution of each command. We will see how this loop is implemented next.

The basic loop of the Shell

We already know how shell programs start. Now consider the basic logic of your program: what does the shell do in its loop? An easy way to process commands is to use these three steps:

  • Read: Reads a command from standard input.
  • Analysis: Splits the command string into program names and parameters.
  • Execute: Run the analyzed command.

Here, I convert these ideas into code for lsh_loop() :

void lsh_loop(void)
{
  char *line;
  char **args;
  int status;

  do {
    printf(">");
    line = lsh_read_line();
    args = lsh_split_line(line);
    status = lsh_execute(args);

    free(line);
    free(args);
  } while (status);
}
Copy the code

Let’s take a look at this code. The first few lines are just statements. A do-while loop is more convenient for checking state variables because it executes once before checking the value of the variable. Inside the loop, we print a prompt and call a function to read a line of input, split a line into arguments, and execute those arguments. Finally, we free up the memory we previously requested for LINE and ARgs. Notice that we use the state variable returned by lsh_execute() to determine when to exit the loop.

Read a line of input

Reading a line from standard input sounds simple, but it can be difficult to do in C. The bad news is that you have no way of knowing in advance how much text the user is going to type into the shell. So you can’t simply allocate a block of space in the hope that it will hold user input. Instead, you should temporarily allocate a certain amount of space and then reallocate more when it does. This is a common strategy in C, and we’ll use it to implement lsh_read_line().

#define LSH_RL_BUFSIZE 1024
char *lsh_read_line(void)
{
  int bufsize = LSH_RL_BUFSIZE;
  int position = 0;
  char *buffer = malloc(sizeof(char) * bufsize);
  int c;

  if(! buffer) {fprintf(stderr."lsh: allocation error\n");
    exit(EXIT_FAILURE);
  }

  while (1) {
    // Read a character
    c = getchar();

    // If we reach EOF, replace it with '\0' and return.
    if (c == EOF || c == '\n') {
      buffer[position] = '\ 0';
      return buffer;
    } else {
      buffer[position] = c;
    }
    position++;

    // If we exceed the size of buffer, it is reallocated.
    if (position >= bufsize) {
      bufsize += LSH_RL_BUFSIZE;
      buffer = realloc(buffer, bufsize);
      if(! buffer) {fprintf(stderr."lsh: allocation error\n");
        exit(EXIT_FAILURE); }}}}Copy the code

The first part is a lot of statements. In case you haven’t noticed, I tend to use the old C style of putting variable declarations in front of the rest of the code. The focus of this function is on the (apparently infinite) while (1) loop. In this loop, we read a character (and save it as int, not char, which is important! EOF is an integer value, not a character value. If you want to use its value as a criterion, use int. This is a common mistake made by beginners of C. . If the character is a newline or EOF, we end the current string with a null character and return it. Otherwise, we add this character to the current string.

Next, we check whether the next character will exceed the current buffer size. If it does, we reallocate the buffer first (and check if the memory allocation was successful). That’s it.

If you’re familiar with the new C library, you’ll notice that stdio.h has a getline() function that does much the same thing we just implemented. To be honest, I didn’t even know this function existed until after I wrote the above code. This function was a GNU extension of the C standard library until 2008, and most modern Unix systems should already have it. I’ll keep the code I’ve written, and I encourage you to learn this way before using GetLine. Otherwise, you will miss an opportunity to learn! Anyway, with getLine, this function doesn’t matter:

char *lsh_read_line(void)
{
  char *line = NULL;
  ssize_t bufsize = 0; // Use getLine to help us allocate buffers
  getline(&line, &bufsize, stdin);
  return line;
}
Copy the code

Parsing a line of input

Okay, so let’s go back to the original loop. We now implement lsh_read_line() and get one line of input. Now we need to parse this line into a list of parameters. I’m going to make a huge simplification here, assuming that we don’t allow quotes and backslash escapes in our command-line arguments, but simply use whitespace characters as separators between arguments. Instead of calling echo with a single parameter this message, the command echo “this message” has two parameters: “this “and” message”.

With this simplification, all we need to do is mark the string with whitespace as a delimiter. This means we can use the traditional library function Strtok to do the hard work for us.

#define LSH_TOK_BUFSIZE 64
#define LSH_TOK_DELIM " \t\r\n\a"
char **lsh_split_line(char *line)
{
  int bufsize = LSH_TOK_BUFSIZE, position = 0;
  char **tokens = malloc(bufsize * sizeof(char*));
  char *token;

  if(! tokens) {fprintf(stderr."lsh: allocation error\n");
    exit(EXIT_FAILURE);
  }

  token = strtok(line, LSH_TOK_DELIM);
  while(token ! =NULL) {
    tokens[position] = token;
    position++;

    if (position >= bufsize) {
      bufsize += LSH_TOK_BUFSIZE;
      tokens = realloc(tokens, bufsize * sizeof(char*));
      if(! tokens) {fprintf(stderr."lsh: allocation error\n");
        exit(EXIT_FAILURE);
      }
    }

    token = strtok(NULL, LSH_TOK_DELIM);
  }
  tokens[position] = NULL;
  return tokens;
}
Copy the code

This code looks very similar to lsh_read_line(). That’s because they’re just so similar! We use the same strategy — take a buffer and expand it dynamically. But instead of using a null-terminated character array, we are using a null-terminated pointer array.

At the beginning of the function, we start calling Strtok to split the token. This function returns a pointer to the first token. What strtok() actually does is return a pointer to the inside of the string you passed in, and place byte \0 at the end of each token. We place each returned pointer in an array of character Pointers (buffers).

Finally, we redistribute pointer arrays if necessary. This process is repeated until Strtok no longer returns the token. At this point, we set the tail of the token list to a null pointer.

So, our work is done, and we have an array of tokens. Then we can execute the command. So the question is, how do we carry out the orders?

How does the Shell start a process

Now we’re really at the heart of the shell. The main function of the Shell is to start the process. So writing a shell means you have a good idea of what’s going on in the process and how the process is started. So I’m going to digress for a moment and talk about processes in Unix.

In Unix, there are only two ways to start a process. The first way (not really a way) is to be an Init process. When a Unix machine starts, its kernel is loaded. Once the kernel is loaded and initialized, a separate process is started, called the Init process. This process runs for as long as the machine is running, and is responsible for starting any other processes you need so that the machine can work properly.

Since most programs are not Init, there’s really only one way to start a process: use the fork() system call. When this function is called, the operating system makes a copy of the current process and runs both simultaneously. The original process is called the parent process, and the new process is called the child process. Fork () returns 0 in the child and the process ID number (PID) in the parent. In essence, this means that the only way to start a new process is to copy an existing one.

That seems to be a problem. In particular, when you want to run a new process, you don’t want to run the same program all over again — you want to run another program. This is what the exec() system call does. It replaces the currently running program with a brand new one. This means that every time you call exec, the operating system stops your process, loads a new program, and then starts the new program in place. A process never returns from an exec() call (unless there is an error).

With these two system calls, we have the building blocks for most programs to run on Unix. First, an existing process forks itself into two different processes. The child process then uses exec() to replace the program it is executing with a new one. The parent process can continue to do other things, or even follow the child process with the system call wait().

Ah! That’s all we’ve talked about. But with that in context, the following code to start the program makes sense:

int lsh_launch(char **args)
{
  pid_t pid, wpid;
  int status;

  pid = fork();
  if (pid == 0) {
    / / the child process
    if (execvp(args[0], args) == - 1) {
      perror("lsh");
    }
    exit(EXIT_FAILURE);
  } else if (pid < 0) {
    / / Fork go wrong
    perror("lsh");
  } else {
    / / the parent process
    do {
      wpid = waitpid(pid, &status, WUNTRACED);
    } while(! WIFEXITED(status) && ! WIFSIGNALED(status)); }return 1;
}
Copy the code

This function uses the argument list we created earlier. It then forks the current process and saves the return value. When fork() returns, we actually have two processes running concurrently. The child enters the first if branch (PID == 0).

In the child process, we want to run the user-supplied commands. So, we use one of several variations of the Exec system call: execvp. Different variations of Exec do slightly different things. Some accept variable-length string arguments, some accept lists of strings, and some allow you to set the environment in which the process runs. The variant execvp takes an array of program names and string arguments (also called a vector, hence ‘v’). (The first element of the array should be the program name.) ‘p’ means that we do not need to provide the file path of the program, just provide the file name and let the operating system search the path of the program file.

If the exec command returns -1 (or if it does), we know something is wrong. So, we use Perror to print the system error message along with our program name to let the user know what went wrong. Then, we let the shell continue running.

The second if condition (pid < 0) checks for fork() errors. If something goes wrong, we print an error and continue — we don’t do any more error handling than inform the user. We let the user decide if they want to quit.

The third if condition indicates that fork() executed successfully. The parent process will run here. We know that the child process will execute the command process, so the parent process needs to wait for the command to finish. We use waitpid() to wait for a process to change state. Unfortunately, waitpid() has many options (just like exec()). A process can change its state in many ways, and not all states indicate the end of the process. A process may exit (normally, or return an error code), or it may be terminated by a signal. So, we need to use the macro provided by waitpid() to wait for the process to exit or be terminated. The function returns 1, indicating that the upper-layer function needs to continue prompting the user for input.

Shell built-in functions

As you may have noticed, the lsh_loop() function calls lsh_execute(). But the function we wrote above is called lsh_launch(). This is intentional. While most of the commands executed by the shell are programs, some are not. Some commands are built into the shell.

The reason for this is actually quite simple. If you want to change the current directory, you use the function chdir(). The problem is that the current directory is a property of the process. So, if you write a program called CD to change the current directory, it will just change its current directory and terminate. The current directory of its parent process does not change. So the shell process itself should execute chdir() to update its current directory. Then, when it starts the child process, the child process inherits the new directory as well.

Similarly, if you have a program called exit, it has no way to make the calling shell exit. This command must also be built into the shell. Also, most shells are configured by running configuration scripts such as ~/.bashrc. These scripts use commands that change the behavior of the shell. These commands, if implemented by the shell itself, also change the shell’s own behavior.

So it makes sense that we need to add some commands to the shell itself. The commands I added to my shell were CD, exit, and help. Here is their function implementation:

/* Function declarations for built-in shell commands: */
int lsh_cd(char **args);
int lsh_help(char **args);
int lsh_exit(char **args);

/* A list of built-in commands and their corresponding functions. * /
char *builtin_str[] = {
  "cd"."help"."exit"
};

int (*builtin_func[]) (char **) = {
  &lsh_cd,
  &lsh_help,
  &lsh_exit
};

int lsh_num_builtins(a) {
  return sizeof(builtin_str) / sizeof(char *);
}

/* Function implementation of built-in commands. * /
int lsh_cd(char **args)
{
  if (args[1] = =NULL) {
    fprintf(stderr."lsh: expected argument to \"cd\"\n");
  } else {
    if (chdir(args[1]) != 0) {
      perror("lsh"); }}return 1;
}

int lsh_help(char **args)
{
  int i;
  printf("Stephen Brennan's LSH\n");
  printf("Type program names and arguments, and hit enter.\n");
  printf("The following are built in:\n");

  for (i = 0; i < lsh_num_builtins(); i++) {
    printf(" %s\n", builtin_str[i]);
  }

  printf("Use the man command for information on other programs.\n");
  return 1;
}

int lsh_exit(char **args)
{
  return 0;
}
Copy the code

This code has three parts. The first part includes the pre-declaration of my function. Pre-declaration is when you declare (but not yet define) a symbol, you can use it before its definition. I do this because lsh_help() uses an array of built-in commands, which in turn includes lsh_help(). The best way to break this dependency cycle is to use pre-declarations.

The second part is an array of built-in command names, followed by an array of their corresponding functions. The idea is that, in the future, you can simply modify these arrays to add built-in commands, rather than modifying a huge “switch” statement somewhere in the code. If you don’t understand the builtin_func statement, that’s fine! I don’t understand either. This is an array of function Pointers (a function that takes an array of strings as arguments and returns an integer). Any declaration of function Pointers in C is complicated. I still need to look up how function Pointers are declared myself!

Finally, I implemented each function. The lsh_cd() function first checks for the existence of its second argument and prints an error message if it does not. It then calls chdir(), checks for errors, and returns. The help function prints a nice message, along with the names of all the built-in functions. The exit function returns 0, which is the signal for the command loop to exit.

Combine built-in commands with processes

The last missing piece of our program is implementing lsh_execute(). This function starts either a built-in command or a process. If you’ve read all the way to this point, you’ll know that we only have one very simple function left to implement:

int lsh_execute(char **args)
{
  int i;

  if (args[0] = =NULL) {
    // The user entered an empty command
    return 1;
  }

  for (i = 0; i < lsh_num_builtins(); i++) {
    if (strcmp(args[0], builtin_str[i]) == 0) {
      return(*builtin_func[i])(args); }}return lsh_launch(args);
}
Copy the code

All this function does is check that the command is the same as the various built-in commands, and if so, run the built-in commands. If no built-in command is matched, we call lsh_launch() to start the process. Note that it is possible that the user entered an empty string or that the string has only whitespace, in which case args contains only null Pointers. So, we need to check this situation at the beginning.

Put it all together

That’s all the code for this shell. If you’ve read it, you should have a complete understanding of how the shell works. To try it out (on a Linux machine), you need to copy the code snippet into a file (main.c) and compile it. Make sure you include only one implementation of lsh_read_line() in your code. You need to include the following header files at the top of the file. I’ve added comments so you know where each function comes from.

  • #include <sys/wait.h>
    • waitpid()And related macros
  • #include <unistd.h>
    • chdir()
    • fork()
    • exec()
    • pid_t
  • #include <stdlib.h>
    • malloc()
    • realloc()
    • free()
    • exit()
    • execvp()
    • EXIT_SUCCESS.EXIT_FAILURE
  • #include <stdio.h>
    • fprintf()
    • printf()
    • stderr
    • getchar()
    • perror()
  • #include <string.h>
    • strcmp()
    • strtok()

Once you have the code and header files ready, simply run GCC -o main main.c to compile and./main to run.

Alternatively, you can get the code from GitHub. This link jumps directly to the current version of the code as I write this article — I may update the code in the future to add some new features. If the code is updated, I will try to update the code details and implementation ideas in this article.

conclusion

If you’re reading this article and wondering how on earth I know how to use these system calls. The answer is simple: through man pages. Detailed documentation of each system call is available in Man 3P. If you know what you’re looking for and just want to know how to use it, then man pages are your best friend. If you don’t know what interfaces the C library and Unix have for you, I recommend reading the POSIX specification, especially Chapter 13, “Header files.” You can find each header file and what needs to be defined in it.

Obviously, this shell is not feature-rich enough. Some notable omissions are:

  • Only whitespace is used to separate arguments, and quotes and backslash escapes are not considered.
  • No pipes or redirects.
  • Too few built-in commands.
  • There are no wildcards.

Implementing these features is fun, but it’s more than I can handle in a single article. If I start implementing any of these, I’ll be sure to write a follow-up article about it. But I encourage readers to try these features out for themselves. If you’re successful, let me know in the comments section below, I’d love to see your code.

Finally, thanks for reading this tutorial (if anyone did). I had fun writing it, and I hope you have fun reading it. Let me know what you think in the comments section!

Update: In an earlier version of this article, I encountered some nasty bugs in lsh_split_line() that just cancel each other out. Thanks to /u/ Munmap (and other commenters) at Reddit for finding these bugs! Find out what I did wrong here.

Update 2: Thanks to GitHub user Ghswa for contributing some null pointer checks for malloc() that I forgot. He/she also points out that getLine’s man page specifies that the memory occupied by the first argument should be freed, so in my lsh_read_line() implementation using getLine (), line should be initialized to NULL.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.