preface

Sometimes we need to remove comments to keep the information private or simply to read the code. I considered regular expressions before, but found them rather cumbersome to implement. However, the state machine can classify various situations into one state and then decompose them, greatly simplifying the problem. This paper is based on the state machine.

Remove C/C++ code comments

Case to consider

  • //
  • / * * /
  • // and /* */ nested (note not present /* */ and /* */ nested)
  • Folded comments (with \ spacing)
  • The/and * are present in the character
  • // and /* */ exist in the string
  • Line folding code in strings (with \ spacing)
  • Possible/in the header file
  • Description of state transition

Description of state transition

How to remove all comments from C/C++ code? Talk about the state machine programming ideas, write very good. Based on the above blog post, the following changes or optimizations have been made:

  • The original post did not consider /***/ (where * is an odd number), which has been corrected
  • Switch to Windows platform, support Windows newline \r\n (and note: if there is no carriage return at the end of the original file, it will be automatically inserted)
  • State quantities are optimized to enumeration constants
  • State transition by if… else… Elseif constructs switch… Case structure, clearer, for large code, more efficient
Remove the state transition diagram for code comments

In addition to state NOTE_MULTILINE_STAR, character (string) processing is required in other states to maintain correct output. See the code at the end.

Remove Java code comments

Case to consider

  • //
  • / * * /
  • / * * * /
  • / / and / * * / nested (note that there is no / * * / * * / and/nested, there is no / / and / * * * * * * / nested, there is no / / and / * * * * * / nested)
  • // and /** */ nested
  • The/and * are present in the character
  • //, /**/, and /** */ exist in the string
  • Description of state transition

Description of state transition

As you can see, the comment rules in Java are simpler, where /** */ is fully covered by the /** / state. And I don’t have any line folding comments or string folding, so the state is much simpler, so if you’re interested, you can draw it instead of drawing it. In other words, the same program that removed C/C++ comments above could have been used to remove Java comments.

The program

package code_tools;


import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;

import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.BufferedWriter;

import java.io.IOException;

import java.util.Scanner;

/ * * *@author xiaoxi666
 * @version1.0.0 2017.12.01 * /

public class deleteCAndCplusplusAndJavaNote {

    /** * state */
    enum State {
        CODE, // Normal code
        SLASH, / / slash
        NOTE_MULTILINE, // Multi-line comments
        NOTE_MULTILINE_STAR, // Multi-line comments encounter *
        NOTE_SINGLELINE, // Single-line comment
        BACKSLASH, // Fold line comments
        CODE_CHAR, / / character
        CHAR_ESCAPE_SEQUENCE, // Escape character in character
        CODE_STRING, / / string
        STRING_ESCAPE_SEQUENCE// Escape characters in the string
    };

    / * * *@functionRemove comments from code and return * as String@paramStrToHandle Code to comment out *@returnUncommented code in the form of String */
    public static String delete_C_Cplusplus_Java_Note(String strToHandle) {
        StringBuilder builder = new StringBuilder();

        State state = State.CODE;// Initiate
        for (int i = 0; i < strToHandle.length(); ++i) {
            char c = strToHandle.charAt(i);
            switch (state) {
            case CODE:
                if (c == '/') {
                    state = State.SLASH;
                }else {
                    builder.append(c);
                    if(c=='\' ') {
                        state=State.CODE_CHAR;
                    }else if(c=='\ "') { state=State.CODE_STRING; }}break;
            case SLASH:
                if (c == The '*') {
                    state = State.NOTE_MULTILINE;
                } else if (c == '/') {
                    state = State.NOTE_SINGLELINE;
                } else {
                    builder.append('/');
                    builder.append(c);
                    state = State.CODE;
                }
                break;
            case NOTE_MULTILINE:
                if(c==The '*') {
                    state=State.NOTE_MULTILINE_STAR;
                }else {
                    if(c=='\n') {
                        builder.append("\r\n");// Keep the blank lines, of course, can also be removed
                    }
                    state=State.NOTE_MULTILINE;// Keep the current state
                }
                break;
            case NOTE_MULTILINE_STAR:
                if(c=='/') {
                    state=State.CODE;
                }else if(c==The '*') {
                    state=State.NOTE_MULTILINE_STAR;// Keep the current state
                }
                else {
                    state=State.NOTE_MULTILINE;
                }
                break;
            case NOTE_SINGLELINE:
                if(c=='\ \') {
                    state=State.BACKSLASH;
                }else if(c=='\n'){
                    builder.append("\r\n");
                    state=State.CODE;
                }else {
                    state=State.NOTE_SINGLELINE;// Keep the current state
                }
                break;
            case BACKSLASH:
                if(c=='\ \' || c=='\r'||c=='\n') {// On Windows, the newline character is \r\n
                    if(c=='\n') {
                        builder.append("\r\n");// Keep the blank lines, of course, can also be removed
                    }
                    state=State.BACKSLASH;// Keep the current state
                }else {
                    state=State.NOTE_SINGLELINE;
                }
                break;
            case CODE_CHAR:
                builder.append(c);
                if(c=='\ \') {
                    state=State.CHAR_ESCAPE_SEQUENCE;
                }else if(c=='\' ') {                    
                    state=State.CODE;
                }else {
                    state=State.CODE_CHAR;// Keep the current state
                }
                break;
            case CHAR_ESCAPE_SEQUENCE:
                builder.append(c);
                state=State.CODE_CHAR;
                break;
            case CODE_STRING:
                builder.append(c);
                if(c=='\ \') {
                    state=State.STRING_ESCAPE_SEQUENCE;
                }else if(c=='\ "') {                    
                    state=State.CODE;
                }else {
                    state=State.CODE_STRING;// Keep the current state
                }
                break;
            case STRING_ESCAPE_SEQUENCE:
                builder.append(c);
                state=State.CODE_STRING;
                break;
            default:
                break; }}return builder.toString();
    }

    / * * *@functionReads the code content from the specified file, returning a String *@paramInputFileName Indicates the file whose comment is to be deleted@returnThe code content in the file to be deleted is a String of *@noteThe default input file format is UTF-8 */
    public static String readFile(String inputFileName) {
        StringBuilder builder = new StringBuilder();
        try {
            FileInputStream fis = new FileInputStream(inputFileName);
            InputStreamReader dis = new InputStreamReader(fis);
            BufferedReader reader = new BufferedReader(dis);
            String s;
            // Read one line at a time, terminating when the change is null
            while((s = reader.readLine()) ! =null) {
                builder.append(s);
                builder.append("\r\n");// Windows newline character
            }
            reader.close();
            dis.close();
            fis.close();
        } catch (IOException e) {
            e.printStackTrace();
            System.exit(1);
        }
        return builder.toString();
    }

    / * * *@functionSave the code with the comment removed to the specified new file *@paramOutputFileName Specifies the name of the file to save the "deleted commented code" file *@paramStrHandled removes the comment from the code */
    public static void writeFile(String outputFileName, String strHandled) {
        try {
            FileOutputStream fos = new FileOutputStream(outputFileName);
            OutputStreamWriter dos = new OutputStreamWriter(fos);
            BufferedWriter writer = new BufferedWriter(dos);
            writer.write(strHandled);
            writer.close();
            dos.close();
            fos.close();
            System.out.println("code that without note has been saved successfully in " + outputFileName);
        } catch(IOException e) { e.printStackTrace(); }}/ * * *@functionRead the file to be processed, delete comments, and write the processed code to a new file *@param args
     */
    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);
        // The file to delete the comment
        System.out.println("The fileName that will be delete note:");
        String inputFileName = in.nextLine();
        // Save the "uncommented code" file
        System.out.println("The fileName that will save code without note:"); String outputFileName = in.nextLine(); String strToHandle = readFile(inputFileName); String strHandled = delete_C_Cplusplus_Java_Note(strToHandle); writeFile(outputFileName, strHandled); }}Copy the code

instructions

  • This program preserves comment occupied lines, that is, the code other than the comment remains unchanged (the number of lines does not change) and the comment line becomes blank.
  • Do not detect file suffixes (this means that the code written in. TXT can also be handled), there is a need to add their own.
  • This program is suitable for Windows platform, other platforms such as Linux and MAC please replace the “\r\n” line break. The default file format is UTF.
  • Interested can be packaged into a graphical interface, directly dragged into the file processing, better use.
  • This procedure after a lot of tests did not find a bug, if the reader found a bug, welcome to put forward.

reference

  • How do I remove all comments from C/C++ code? Talk about the programming thought of state machine
  • Can someone write a regular expression that removes comments
  • Regular expressions remove comments from code