Learn how to construct a C file and write a C main function to handle command line arguments successfully.

I know that kids write their crazy “apps” in Python and JavaScript these days. But don’t be so quick to write C off — it has a lot to offer, and it’s simple. If you need speed, writing in C might be your answer. If you’re looking for a stable career or want to learn how to catch null Pointers for dereferencing, C might be the answer for you too! In this article, I’ll explain how to construct a C file and write a C main function to handle command line arguments successfully.

Me: A diehard Unix system programmer.

You: someone with an editor, a C compiler, and time to kill.

Let’s get to work.

A boring but correct C program

Parody O’Reilly book cover, “Hating Other People’s Code”

C programs begin with the main() function and are usually stored in a file named main.c.

/* main.c */
int main(int argc, char *argv[]) {

}
Copy the code

This program compiles but does nothing.

$ gcc main.c
$ ./a.out -o foo -vv
$
Copy the code

Correct but boring.

The main function is unique.

The main() function is the first function of the program executed when execution begins, but not the first function executed. The first function is _start(), which is usually provided by the C runtime and is linked in automatically when the program is compiled. This detail is highly dependent on the operating system and compiler toolchain, so I pretended not to mention it.

The main() function takes two arguments, usually called argc and argv, and returns a signed integer. Most Unix environments expect programs to return 0 (zero) on success and -1 (minus one) on failure.

parameter The name of the describe
argc Number of parameters Number of parameter vectors
argv The parameter vector Character pointer array

The argument vector argv is a tokenized representation of the command line that calls your program. In the above example, argv would be a list of the following strings:

argv = [ "/path/to/a.out"."-o"."foo"."-vv" ];
Copy the code

The argument vector ensures at least one string in its first index, argv[0], which is the full path to the execution program.

Analysis of main.c file

When I write main.c from scratch, its structure usually looks like this:

C */ /* 0 copyright/License */ /* 1 includes */ /* 2 definitions */ /* 3 external declarations */ /* 4 type definitions */ /* 5 global variable declarations */ /* 6 function prototypes */ int main(int Arg c, char * argv []) {7 command-line parsing / * * /} / * * / 8 function statementCopy the code

I’m going to talk about each part of these numbers, except for the 0 part. If you must put copyrighted or licensed text in the source code, put it there.

Another thing I don’t want to talk about is comments.

"Comment on lies." - A cynical but smart and good-looking programmer.Copy the code

Instead of using comments, use meaningful function and variable names.

Given the programmer’s inherent inertia, once annotations are added, the maintenance burden doubles. If you change or refactor the code, you need to update or extend the comments. Over time, the code can change beyond recognition and be completely different from what the comments describe.

If you must write comments, don’t write about what the code is doing; instead, write about why the code is being written that way. Write comments that you will read in five years when you have forgotten all about the code. The fate of the world depends on you. No pressure.

1, containing

The first thing I added to the main.c file were include files, which provide programs with a large number of standard C library functions and variables. The C library does a lot of things. Browse through the headers in /usr/include to see what they can do.

The #include string is a C preprocessor (CPP) instruction that includes the referenced file in its entirety in the current file. Header files in C are usually named with the.h extension and should not contain any executable code. It has only macros, definitions, type definitions, external variables, and function prototypes. The string
tells CPP to look for a file named header.h in the system-defined header file path, which is usually in the /usr/include directory.

/* main.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libgen.h>
#include <errno.h>
#include <string.h>
#include <getopt.h>
#include <sys/types.h>
Copy the code

This is the smallest contained set that I will include globally by default, and it will introduce:

# include file Something on offer
stdio provideFILE,stdin,stdout,stderrfprint()Function series
stdlib providemalloc(),calloc()realloc()
unistd provideEXIT_FAILURE,EXIT_SUCCESS
libgen providebasename()function
errno Define externalerrnoVariables and all the values they can accept
string providememcpy(),memset()strlen()Function series
getopt Provide externaloptarg,opterr,optindgetopt()function
sys/types Type definition shortcuts, such asuint32_tuint64_t

2, definitions,

/* main.c */ <... >#define OPTSTR "vi:o:f:h"
#define USAGE_FMT "%s [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]"
#define ERR_FOPEN_INPUT "fopen(input, r)"
#define ERR_FOPEN_OUTPUT "fopen(output, w)"
#define ERR_DO_THE_NEEDFUL "do_the_needful blew up"
#define DEFAULT_PROGNAME "george"
Copy the code

This doesn’t make a lot of sense right now, but the OPTSTR definition, which I’ll show you here, is the command-line switch recommended by the program. Refer to the getopt(3) man page to see how OPTSTR will affect the behavior of getopt().

USAGE_FMT defines a printf() style format string that is used in the Usage () function.

I also like to place string constants in the #define section of the file. Gathering them together makes it easier to correct spelling, reuse messages, and internationalize messages if needed.

Finally, use all uppercase letters when naming #define to distinguish variables from function names. You can put words together or separate them with underscores if you want, just make sure they’re all in uppercase.

3. External declarations

/* main.c */ <... > extern int errno; extern char *optarg; extern int opterr, optind;Copy the code

The extern declaration brings the name into the namespace of the current compilation unit (that is, the “file”) and allows the program to access the variable. Here we introduce the definition of three integer variables and a character pointer. Several variables of the opt prefix are used by the getopt() function, and the C library uses errno as an out-of-band communication channel to communicate possible causes of the function’s failure.

4. Type definition

/* main.c */ <... > typedef struct { int verbose; uint32_t flags; FILE *input; FILE *output; } options_t;Copy the code

After external declarations, I like to declare typedef for structures, unions, and enumerations. Naming a typedef is a traditional convention. I’m a big fan of using the _t suffix to indicate that the name is a type. In this example, I declare options_t as a struct with four members. C is a whitespace neutral programming language, so I use whitespace to arrange field names in the same column. I just like the way it looks. For pointer declarations, I prefix the name with an asterisk to make it clear that it is a pointer.

5. Global variable declarations

/* main.c */ <... > int dumb_global_variable = -11;Copy the code

Global variables are a bad idea and you should never use them. But if you must use global variables, declare them here and be sure to give them a default value. Seriously, don’t use global variables.

6. Function prototype

/* main.c */ <... > void usage(char *progname, int opt); int do_the_needful(options_t *options);Copy the code

When writing functions, add them after the main() function instead of before it, and put the function prototype here. Early C compilers used a one-pass strategy, which meant that every symbol (variable or function name) you used in your program had to be declared before it was used. Modern compilers are almost all multipass compilers that build a complete symbol table before generating code, so the use of function prototypes is not strictly required. However, sometimes you can’t choose which compiler to use for your code, so prototype the function and keep doing so.

Of course, I always include a usage() function, which main() calls when it doesn’t understand what you’re passing in from the command line.

Command line parsing

/* main.c */ <... > int main(int argc, char *argv[]) { int opt; options_t options = { 0, 0x0, stdin, stdout }; opterr = 0;while((opt = getopt(argc, argv, OPTSTR)) ! = EOF) switch(opt) {case 'i':
              if(! (options.input = fopen(optarg,"r")) ){
                 perror(ERR_FOPEN_INPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }
              break;

           case 'o':
              if(! (options.output = fopen(optarg,"w")) ){
                 perror(ERR_FOPEN_OUTPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }    
              break;
              
           case 'f':
              options.flags = (uint32_t )strtoul(optarg, NULL, 16);
              break;

           case 'v':
              options.verbose += 1;
              break;

           case 'h':
           default:
              usage(basename(argv[0]), opt);
              /* NOTREACHED */
              break;
       }

    if(do_the_needful(&options) ! = EXIT_SUCCESS) { perror(ERR_DO_THE_NEEDFUL);exit(EXIT_FAILURE);
       /* NOTREACHED */
    }

    return EXIT_SUCCESS;
}
Copy the code

Okay, a little too much code. The purpose of this main() function is to collect user-supplied arguments, perform the most basic input validation, and then pass the collected arguments to the functions that use them. This example declares an Options variable initialized with default values and parses the command line to update options as needed.

At the heart of the main() function is a while loop that uses getopt() to traverse argv looking for command-line options and their arguments, if any. OPTSTR, defined earlier in the file, is the template that drives getopt() behavior. The opt variable accepts the character value of any command line option found by getopt(), and the program’s response to detecting the command line option occurs in the switch statement.

If you noticed, you might ask, why is opt declared a 32-bit int when it is expected to be an 8-bit char? In fact, getopt() returns an int and takes a negative value when it reaches the end of argv. I’ll use EOF (end-of-file tag) to match. Char is signed, but I like to match variables to their function return values.

Specific behavior occurs when a known command line option is detected. Specify an argument ending in a colon in OPTSTR. These options can have one argument. When an option has an argument, the next string in argv can be supplied to the program via the externally defined variable optarg. I use Optarg to open files for reading and writing, or to convert command-line arguments from strings to integer values.

Here are a few key points about code style:

  • willopterrInitialized to0To bangetoptThe trigger?.
  • inmain()Intermediate use ofexit(EXIT_FAILURE);exit(EXIT_SUCCESS);.
  • /* NOTREACHED */Lint is one of my favorite lint directives.
  • Used at the end of a function that returns an intreturn EXIT_SUCCESS;.
  • Shows cast implicit types.

The command line format for this program, compiled, looks like this:

$ ./a.out -h
a.out [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]
Copy the code

In fact, usage() emits this to stderr after compilation.

Function declaration

/* main.c */ <... > void usage(char *progname, int opt) { fprintf(stderr, USAGE_FMT, progname? progname:DEFAULT_PROGNAME);exit(EXIT_FAILURE);
   /* NOTREACHED */
}

int do_the_needful(options_t *options) {

   if(! options) { errno = EINVAL;return EXIT_FAILURE;
   }

   if(! options->input || ! options->output) { errno = ENOENT;return EXIT_FAILURE;
   }

   /* XXX do needful stuff */

   return EXIT_SUCCESS;
}
Copy the code

The function I ended up writing was not a boilerplate function. In this case, the function do_the_needful() accepts a pointer to the options_t structure. I verify that the options pointer is not NULL, and proceed to verify the input and output structure members. If one of the tests fails, return EXIT_FAILURE, and BY setting the external global variable errno to the normal error code, I can tell the caller the normal reason for the error. Callers can use the convenience function perror() to issue easy-to-read error messages based on the value of errno.

Functions almost always validate their input in some way. If full validation is expensive, try it once and treat the validated data as immutable. The Usage () function validates the progName parameter with conditional assignment from the fprintf() call. The usage() function then exits, so I don’t bother setting errno or using the correct program name.

The biggest mistake I want to avoid here is dereferencing NULL Pointers. This would cause the operating system to send a special signal to my process called SYSSEGV, resulting in inevitable death. The last thing users want is a crash caused by SYSSEGV. It is better to catch the NULL pointer to send a more appropriate error message and gracefully close the program.

Some people complain about having multiple return statements in the body of a function, and they babble about “continuity of control flow.” Honestly, if there is an error in the middle of a function, you should return this error condition. Writing a bunch of nested if statements with a single return is never a “good idea” ™.

Finally, if you write a function that takes more than four arguments, consider binding them to a structure and passing a pointer to that structure. This makes the function signature simpler, easier to remember, and error-free for future calls. It also makes calling functions slightly faster because there are fewer things to copy into the function stack. In practice, this is only considered if the function is called millions or billions of times. If that doesn’t make sense, that’s fine.

Wait, I thought you said there were no comments! ? !!!!!

In the do_the_needful() function, I wrote a special type of comment that was designed as a placeholder, not to explain the code:

/* XXX do needful stuff */
Copy the code

When you get to this point, sometimes you don’t want to stop and write some really complicated code, you’ll do it later, not now. That’s where I left myself to come back again. I insert a comment prefixed with XXX and a short comment describing what needs to be done. Later, when I have more time, I look for XXX in the source code. It doesn’t matter what prefix you use, just make sure it’s unlikely to show up in your code base in another context, such as a function name or variable.

Put them all together

Well, when you compile this program, it still does almost nothing. But now you have a solid skeleton from which to build your own command-line parsing C programs.

/* main.c - the complete listing */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libgen.h>
#include <errno.h>
#include <string.h>
#include <getopt.h>

#define OPTSTR "vi:o:f:h"
#define USAGE_FMT "%s [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]"
#define ERR_FOPEN_INPUT "fopen(input, r)"
#define ERR_FOPEN_OUTPUT "fopen(output, w)"
#define ERR_DO_THE_NEEDFUL "do_the_needful blew up"
#define DEFAULT_PROGNAME "george"

extern int errno;
extern char *optarg;
extern int opterr, optind;

typedef struct {
  int           verbose;
  uint32_t      flags;
  FILE         *input;
  FILE         *output;
} options_t;

int dumb_global_variable = -11;

void usage(char *progname, int opt);
int  do_the_needful(options_t *options);

int main(int argc, char *argv[]) {
    int opt;
    options_t options = { 0, 0x0, stdin, stdout };

    opterr = 0;

    while((opt = getopt(argc, argv, OPTSTR)) ! = EOF) switch(opt) {case 'i':
              if(! (options.input = fopen(optarg,"r")) ){
                 perror(ERR_FOPEN_INPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }
              break;

           case 'o':
              if(! (options.output = fopen(optarg,"w")) ){
                 perror(ERR_FOPEN_OUTPUT);
                 exit(EXIT_FAILURE);
                 /* NOTREACHED */
              }    
              break;
              
           case 'f':
              options.flags = (uint32_t )strtoul(optarg, NULL, 16);
              break;

           case 'v':
              options.verbose += 1;
              break;

           case 'h':
           default:
              usage(basename(argv[0]), opt);
              /* NOTREACHED */
              break;
       }

    if(do_the_needful(&options) ! = EXIT_SUCCESS) { perror(ERR_DO_THE_NEEDFUL);exit(EXIT_FAILURE);
       /* NOTREACHED */
    }

    returnEXIT_SUCCESS; } void usage(char *progname, int opt) { fprintf(stderr, USAGE_FMT, progname? progname:DEFAULT_PROGNAME);exit(EXIT_FAILURE);
   /* NOTREACHED */
}

int do_the_needful(options_t *options) {

   if(! options) { errno = EINVAL;return EXIT_FAILURE;
   }

   if(! options->input || ! options->output) { errno = ENOENT;return EXIT_FAILURE;
   }

   /* XXX do needful stuff */

   return EXIT_SUCCESS;
}
Copy the code

Now you are ready to write C that is easier to maintain. If you have any questions or feedback, please share it in the comments.


Via: opensource.com/article/19/…

By Erik O ‘shaughnessy, lujun9972

This article is originally compiled by LCTT and released in Linux China