RTF is short for Rich TextFormat, which means multi-text format. This is a DOC format (Word document) file with good compatibility that can be opened and edited using the “tablet” in Windows accessories. RTF is a very popular file structure that is supported by many text editors. General formatting Settings, such as font and paragraph Settings, page Settings and other information can be stored in RTF format, which can realize the mutual access between Word and WPS files to a certain extent. RTF Syntax An RTF file consists of unformatted text, control words, control symbols, and groups. For easier conversion, a standard RTF file should contain only 7-bit ASCII characters. RTF files do not limit the maximum line length of the file. (Recomment: maximun line length refers to the number of characters in the line or the number of lines in the document.) An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file. A control word is a special RTF format command used to mark printer control characters, as well as formatting information used by programs to manage document styles. (Reconsideration: Poor wording.) A control word cannot contain more than 32 characters. A control word is similar to the following: /LetterSequence

Note that each control word begins with a backslash. The LetterSequence consists of lowercase characters (A – Z). RTF is case sensitive. The end of an RTF control word is marked by a delimiter, which can be used as a delimiter:

A space. In this case, Spaces are part of the keyword. A number or hyphen (-), meaning it is a numeric argument. The length of this sequence of numbers is delimited by a space or any character other than letters and numbers following it. This parameter can be positive or negative and usually ranges from -32767 to 32767. However, Word can range from -31680 to 31680. Word allows the keyword to have a small number of parameters ranging from -2,147,483,648 to 2,147,483,648(special, /bin, /revdttm, and some image attributes). (回 答 : A small number of keywords.) An RTF parser should be able to convert a arbitrarily written numeric string to a valid value for a keyword. If a numeric parameter follows a control word, the parameter is part of the control word. In this case, control words are separated by a space or non-alphanumeric character in the same way as other control words are separated. Characters other than letters and numbers. In this case, the delimiter terminates a control word that is not part of the control word. In the first case, whitespace does not appear in the document. All characters after the delimiter, including Spaces, are written to the document. For this reason, you should use Spaces only when necessary, not just to separate RTF code. RTF File Contents An RTF File complies with the following syntax:

‘{‘


‘}’ This syntax is standard RTF syntax and any RTF viewer should be able to interpret RTF files written in this syntax correctly. It is important to reiterate that the RTF viewer does not have to contain all control words, but it must be able to harmlessly ignore control words that it does not know about (or use), and it must be able to correctly skip the parts marked by control numbers. However, it is possible that the editor that generates RTF does not fully conform to this syntax specification, and as such, the RTF viewer should be able to handle slightly changed control words. However, if an editor that generates RTF complies with this specification, any correct RTF viewer should be able to interpret it perfectly.

RTF file structure analysis and its application

RTF is a very popular file structure, many text editors support it, VB and other development tools even provide Richtxtbox control. Programmers writing general-purpose tools should consider adding the ability to read and write RTF files to their software, which requires a good understanding of the structure of RTF. Now the most important means of publishing information is the WWW, and providing RTF to HTML conversion in editing software is also something programmers should consider. Although this capability is already available in WORD, it is not a good idea to say to your customers, “Use my program to save to RTF, and then use WORD…….” . RTF file structure analysis and its application will be discussed below. The structure of RTF is not complex, but there are so many contents that it is impossible to explain them all in this article. It can only be discussed in general. (If you want to read a full RTF document, you can find it on the Internet or contact the author.) Each RTF file is a text file formatted by the RTF reader when displayed. The file begins with {/ RTF, which is essential as a marker for RTF files and is used by the RTF reader to determine whether a file is in RTF format. The file header includes font table, file table, color table and other data structures. The font and table style in the text are formatted according to the information in the file header. Each table is enclosed in braces and contains a number of commands starting with the character “/”. For example, a color table is as follows: {/colortbl; /red0/green0/blue0; /red0/green0/blue255; /red0/green255/blue255; /red0/green255/blue0; /red255/green0/blue255; /red255/green0/blue0; /red255/green255/blue0; /red255/green255/blue255; /red0/green0/blue128; /red0/green128/blue128; /red0/green128/blue0; /red128/green0/blue128; /red128/green0/blue0; /red128/green128/blue0; /red128/green128/blue128; /red192/green192/blue192; } start with /colortbl to indicate a color table in braces, followed by /red0/green0/blue0 to register a color that has 0 red, green, and blue components. The other tables follow the same pattern. The header is followed by the body, which consists of layout formatting commands, text, and various special commands. Only special commands are enclosed in braces, while layout commands and text are “open” to separate text from command. There is a “} “at the end of the file, which corresponds to the first” {“. “} “and” {” must correspond one to one throughout the file. This format is the basis of the RTF reader and converter algorithms. The RTF format is also special in that some characters have special meanings in the command, so when they appear as text, they need to be preceded by a “/”. For example, “/” is itself represented as “//”. In fact, this form is common in most programming languages. 2. Algorithm Analysis Although the algorithm introduced in this section is for RTF reading and writing, it is also a common method for general file filters and is suitable for converting files of various formats. In particular, various files are converted to an intermediate format, and then displayed or converted as required. One of the rules is that the program must be able to filter out unrecognized formats. All kinds of files have their special formats, and format loss will inevitably occur in the conversion process, which should be considered in the algorithm. For formatted text files like RTF, the most important thing is to display or convert the file’s size, color, font, and style correctly. Therefore, a data structure should be used to store this information in the program, which is called an intermediate format. The following is the flow chart: 3. Analysis of difficulties We encountered a number of problems during development, two of which were particularly interesting. The first problem is the Chinese representation. In RTF, Chinese is expressed in command form: “/’ inside code”. The internal code is the Chinese character machine internal code. However, please note that RTF is a text file, and the internal code is stored in ASCII, which must be converted to numbers to use. For example, the form of “electronics and computers” in RTF is: /’b5/’e7/’d7/’d3/’d3/’eb/’b5/’e7/’ C4 /’d4. The second problem is the picture, which is also the focus of this paper. Images exist in RTF in two ways: the first is directly embedded, starting with {/ Pict; The second way is to embed it as an OLE object, starting with {/object. OLE data is provided in RTF files when the RTF processor can use OLE directly. Otherwise, the file provides the image data directly, starting with {/result. The most common image format in use is METAFILE with DIB BITMAP, which is not specified in the SDK and is stored in compressed form in RTF, so it is difficult to convert. Four. Finally, I’ll discuss extensions in RTF format. The RTF format should be uniform as a standard, but in some cases it may be necessary to extend it. The most obvious example is Microsoft WORD, which has its own RTF command. If you want to give your software a technical edge, you can do so by creating new RTF commands. For example, if you want to support DHTML in your software, you can embed commands like {/ DHTML or {/ Java in RTF. Since the RTF reader has the ability to filter unrecognized commands, this does not affect the generality of RTF files.

RTF file format learning and application

One, the introduction

The Rich text Format (RTF) specification is an encoding method for easily dumping formatted text and graphics between applications. Users can now transfer word processing documents between applications on different systems, such as MS-DOS, Windows, OS/2, Macintosh, and Power Macintosh, using specific conversion software. The RTF specification provides a format for exchanging text and graphics between different output devices, operating environments, and operating systems. RTF uses ANSI, PC-8, Macintosh, or IBM PC character sets to control the presentation and formatting of documents, including on-screen display and printing. With the RTF specification, documents created by different operating systems and different software programs can be passed between these operating systems and applications. Software that converts a formatted file into an RTF file is called an RTF writer. RTF writers are used to separate program control information from existing text and generate a new file containing the text and the RTF groups associated with it. Software that converts RTF files into formatted files is called an RTF reader.

RTF files consist of unformatted text, control words, control characters, and groups. RTF files do not limit the maximum line length of the file. A control word is a special format of command that RTF uses to mark print control characters and manage document information. A control word contains a maximum of 32 characters. Control words are used in the following format: / sequence of letters < delimiter > Note: Each control word begins with a backslash /. The letter sequence consists of lowercase letters a to Z. Control words (or keywords) should not normally contain any uppercase letters. The delimiter marks the end of the RTF control word and can be one of the following: · A space, in which case a space is part of the control word. · A digit or hyphen (-) to indicate a numeric parameter to follow. The length of the sequence of numbers is delimited by a space following it or by any character other than letters and numbers. This parameter can be positive or negative and usually ranges from -32767 to 32767. · Any other characters other than letters and numbers. In this case, the delimiter terminates a control word that is not part of the control word. The control character consists of a backslash/followed by a single non-alphabetic character. For example, /~ represents a non-newline space. Control characters do not require separators. Groups consist of text, control words, or control characters included in ({}). Left-expansion ({) indicates the beginning of the group, and right-expansion (}) indicates the end of the group. Each group contains text and different attributes of the text. RTF files can also include fonts, formats, screen colors, graphics, footnotes, comments (annotations), headers and endpoints, summary information, combinations of fields and bookmarks, and formatting attributes for documents, sections, paragraphs, and characters. If you include fonts, files, formats, screen colors, redaction marks, as well as summary information groups and document format attributes, they must precede the first plain text character of the file. These groups form the header of the RTF file. If the font group is included, it should precede the format group. If the group is not in use, omit it. For detailed syntax and keyword description of RTF files, please refer to Rich Text Format (RTF) Specification V1.7. There is no further description here. Third, Hello Word

International convention, a Hello Word! Demo examples, as follows: {/rtf1/ansi/ansicpg936/deff0/deflang1033/deflangfe2052 {/fonttbl{/f0/fmodern/fprq6/fcharset134 /’cb/’ce/’cc/’e5; }} {/ * / generator Msftedit 5.41.21.2500; }/viewkind4/uc1/pard/lang2052/f0/fs20 Hello World! {/ rtF1 RTF version/ANSI Character Set/ANSICPG936 Simplified Chinese/DEFF0 Default font 0/ DeflANG1033 American English/DeflangFE2052 Chinese {/fonttbl{/f0 font 0/fmodern/fprq6 font spacing is 6/fcharset134GB2312 gb code /’cb/’ce/’cc/’e5宋体; {/*/generator Msftedit 5.41.21.2500; } 4, document attributes: /viewkind4 normal view /uc1 single section /pard default paragraph attributes /lang2052 Chinese /f0 font 0/fs20 font size 20 LBS / PAR Paragraph mark} end of file Note: in RTF files, double-byte characters such as Chinese are represented by their single-byte ASCII sequence, for example the text “song Typeface ABC” should be represented as: /’cb/’ce/’cc/’e5ABC, which is why RTF is not readable. Once you’ve mastered the basics of text representation, you’ll be sure to explore more advanced text representation methods, such as underlined text, colors, bold text, italics, and so on, which are described in detail in the V1.7 specification. This article only lists some common keywords for your reference. L Instructions for font table and color table: For every font and color we use in the document, we must pre-define it in the font table and color table of the document header. Examples of font table definition is as follows: {/ fonttbl {/ f0 froman fcharset0 / fprq2 {02020603050405020304} / * / panose Times New Roman; } {/f1/fswiss/fcharset0/fprq2{/*/panose 020b0604020202020204}Arial; } {/f10/fnil/fcharset2/fprq2{/*/panose 05000000000000000000}Wingdings; }… … } We can directly specify an index of the font table when using a font, for example, “/f1Happy” denotes the text Happy with the font Arial. If we want to add another font, such as “Huaxin Song”, we just need to add the font description in the font table and reference the font index value if necessary. Method for (ASCII string HuaWenZhong song for “/ ‘bb/aa/’ ce/c4 /’ d6 / do/cb/ce”) : {f222 fnil/fcharset134 fprq2 / ‘bb/aa/’ ce/c4 /’ d6 / do/’ cb/ce; }, and then reference the font via /f222. An example of a color table definition is as follows: {/colortbl; /red0/green0/blue0; /red0/green0/blue255; /red0/green255/blue255; /red0/green255/blue0; /red255/green0/blue255; /red255/green0/blue0; /red255/green255/blue0; /red255/green255/blue255; /red0/green0/blue128; /red0/green128/blue128; /red0/green128/blue0; /red128/green0/blue128; /red128/green0/blue0; /red128/green128/blue0; /red128/green128/blue128; /red192/green192/blue192; … … } In the color table, each color value is written in RGB format, and each color is delimited with a semicolon. Note that the first color value/C0 is empty, indicating the default color of the system (usually black). The order is: /0, /1, /2… … . We can specify a color index when using a color (such as font color). For example, “/cf2Sunday” represents Sundy with RGB(0,0,255) blue font color. /cb6ABC indicates ABC whose background color is RGB(255,0,0). If we need to add other color values, we simply add the color definition to the color table and reference it with the corresponding index value. L characters background syntax is as follows: control word meaning = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = / CHBRDR character frame (each side all have border). /chshdngN character shadow. The value of parameter N is the percentage of the text shadow. /chcfpatN N is the color of the background pattern, specifying an index in the document color table. /chcbpatN N is a fill color that specifies an index in the document color table. /chbghoriz specifies a horizontal text background pattern. /chbgvert Specifies a vertical line text background pattern. / chbgfDiag Specifies a forward diagonal text background pattern (). /chbgbdiag Specifies a reverse diagonal text background pattern (). / chBGcross specifies the crosshair text background pattern. /chbgdcross specifies a diagonal cross text background pattern. /chbgdkhoriz specifies a thick horizontal text background pattern. /chbgdkvert Specifies a thick vertical text background pattern. /chbgdkfdiag Specifies a thick front slash text background pattern (). /chbgdkbdiag Specifies a thick backslash text background pattern (). /chbgdkcross specifies a thick crosshair text background pattern. /chbgdkdcross Specifies a thick diagonal cross text background pattern. Suppose we want to get the text ‘Sunday’ with a horizontal background, a font of Chinese text song (font index 222), and a color of red (color index 6), we simply type: /f222/cf6/chbghoriz/’bb/’ AA /’ CE /’ C4 /’d6/’d0/’cb/’ CE. The l character underscore syntax is as follows:

Control word meaning = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = / ul consecutive underscores. /ul0 Closes all underscores. /ulcN underline color. (Note: uppercase N indicates an index number, same below)/ULD dot underscore. /uldash short underline. /uldashd dot underline.

/uldashdd double underline.

/uldb double underscore.

/ ulhWave Accentuate wave underline.

/ulldash long underline. / ulNone stops all underscores. /ulth bold underline.

/ulthd Bold underline.

/ulthdash dash.

/ulthdashd Bold underline.

/ulthdashdd Bold double underline.

/ulthldash bold underline.

/ ululdbWave Double wave underline.

Underline the word /ulw.

/ulwave wave underline. The underline syntax is the same as the previous shading.

L other text display advanced properties: control word meaning = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = / outl border. / outl 0 closes. /scaps small capital letters. / scaps 0 to close. / shad shadows. / shad 0 closes it. /strike Deletes a line. /strike0 to close. /striked1 Double delete line. /striked0 closes it. /sub follows the subscript text of the font information and shrinks the size of the dot. /super follows the superscript text of the font information and reduces the size of the dot.

L alignment syntax is as follows: control word meaning = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = / qc center alignment.

/qj Align both ends.

/ QL left aligned (default).

/qr right aligned. / QD scatter alignment. /qkN adjusts the percentage of rows (0-low, 10-medium, 20-high) using the Kashida rule.

/qt. Used for scattered alignment of Thai text.

L text indentation syntax is as follows: control word meaning = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = / fiN first line indentation (the default is 0). /cufiN takes the first line indent value as a percentage of character units to override the /fiN setting, although they can be set to the same value. /liN left indent (defaults to 0).

/linN Indents the left end of a left-to-right paragraph; If in a right-to-left paragraph, the right – end indent value (default is 0). /linN defines the number of Spaces before the segment.

/culiN takes the left-indent value of a percentage of character units. Like /linN, it overwrites the setting of /liN and /linN, although they can be set to the same value.

/riN is indented right (defaults to 0).

/rinN Indents the right end of the paragraph from left to right; If it is in a right-to-left paragraph, it represents the left-indent value (default is 0). /rinN defines the number of Spaces before the segment.

/curiN takes the right-indent value of the percentage of character units. Like /rinN, it is used to override the /riN and /rinN Settings, although they can be set to the same value.

/adjustright Automatically adjusts right indentation when the document grid is defined. L The syntax for text spacing is as follows: interval after /sbN (default: 0).

/saN segment interval (default: 0). /sbautoN automatic segment front interval:

The spacing before 0 depends on /sb.

1 Automatic pre-segment spacing (ignore /sb).

The default is 0. /saautoN Automatic section after interval:

The spacing after 0 depends on /sa. 1 The default value is 0.

/lisbN takes a presegment interval value in percentage character units to override the /sbN setting, although they can be set to the same value.

/lisaN takes a percentage of character units after the segment interval value to override the /saN setting, although they can be set to the same value.

/slN Line spacing. If the control word is not used or /sl0 is used, the line spacing is automatically evaluated according to the maximum number of characters between lines. If N is a positive value, the value is used only if it is greater than the maximum interline character value (otherwise, the maximum character value is used); The minute N is a negative value, and the absolute value of N is always used even when it is less than the maximum interline character value. /slmultN Multiple line spacing. Indicates that the current line spacing is a multiple of single line spacing. The control word can only be used after/SL. 0 “minimum” or “exact” line spacing of more than 1, as opposed to “single” line spacing.

/ nosnaplineGrid Unaligns grid lines.

Five, the representation of the picture

The following analysis will hopefully speed up your learning process as you struggle to understand the representation of images in RTF. An RTF image data is usually embedded directly into a file. These images can be in hexadecimal (the default) or binary format. The image belongs to the target reference, starting with the/Pict control word. As described in a later example, the/Pict keyword should come after the /*/shppict reference control keyword. An example of an image is as follows: {/*/shppict{/pict {/*/picprop/shplid1025{/sp{/sn shapeType}{/sv 75}}{/sp{/sn fFlipH}{/sv 0}}{/sp{/sn fFlipV}{/sv 0}}{/sp{/sn pibFlags}{/sv 2}}{/sp{/sn fLine}{/sv 0}}{/sp{/sn fLayoutInCell}{/sv 1}}} /picscalex100/picscaley100/piccropl0/piccropr0/piccropt0/piccropb0/picw4516/pich4516/picwgoal2560/pichgoal2560/jpegblip/ bliptag-728883813 {/*/blipuid d48e1d9b2268ef9f2741709749fb439c} ffd8ffe000104a46494600010101004800480000ffdb0043000604040405040605050609060506090b080606080b0c0a0a0b0a0a0c100c0c0c0c0c0c 100 c0e0f… … }} {/nonshppict {/pict /picscalex100/picscaley100/piccropl0/piccropr0/piccropt0/piccropb0/picw4516/pich4516/picwgoal2560/pichgoal2560/wmetafile 8/bliptag-728883813/blipupi72 {/*/blipuid d48e1d9b2268ef9f2741709749fb439c} 0100090000034660000000002160000000000400000003010800050000000b0200000000050000000c02ac00ac00030000001e000400000007010400 21600000… … }} its analysis is as follows (red) : {/*/shppict picture (quote) {/ Pict picture starts drawing object properties (this group can be omitted) : {/*/picprop means here is a shape attribute applied to an embedded image /shplid1025 identifies a unique value for each figure. {/sp Drawing object property defines {/sn shapeType}{/ SV 75}} The picture type is frame {/sp{/sn fFlipH}{/ SV 0}} Horizontal flip :False {/sp{/sn fFlipV}{/ SV 0}} Vertical flip :False {/sp{/sn pibFlags}{/ SV 2}} Link picture flags {/sp{/sn fLine}{/ SV 0}} with lines :False {/sp{/sn fLayoutInCell}{/ SV 1}} allows graph anchors to be positioned inside cells :True} drawing object property definition End picture property: / PicSCalex100 horizontal scaling/Picscaley100 Vertical scaling/Piccropl0 Left clipping value =0/ Piccropr0 Right clipping value =0/ Piccropt0 Upper clipping value =0/ PiccroPB0 Lower clipping value =0 / PICW4516 picture pixel width/PICH4516 picture pixel height/PicwGoal2560 image expected width/Pichgoal2560 image expected height/jpeGBlip image source is a JPEG file/BLIPtag image ID identif-728883813 {/ * / blipuid d48e1d9b2268ef9f2741709749fb439c} images hexadecimal data: ffd8ffe000104a46494600010101004800480000ffdb0043000604040405040605050609060506090b080606080b0c0a0a0b0a0a0c100c0c0c0c0c0 C100c0e0f} Hexadecimal image data end} Compatibility with wMetaFile file contents: {/nonshppict only for compatibility, Don’t read {/ Pict image start/picscalex100 picscaley100 / piccropl0 piccropr0 / piccropt0 / piccropb0 / picw4516 / pich4516 / picwgoal2560 / pichgoal2560 / wmetafile8 / bliptag – 728883813 / blipupi72 {/ * / blipuid d48e1d9b2268ef9f2741709749fb439c} here is a MetaFile hexadecimal data types: 0100090000034660000000002160000000000400000003010800050000000b0200000000050000000c02ac00ac00030000001e00040000000701040 021600000}MetaFile type hexadecimal image data end} To simplify the analysis, we remove the content that can be omitted, then a picture can be represented like this: {/*/shppict{/pict Piccropl0 / piccropr0 / piccropt0 / piccropb0 picw width/height of pich/picwgoal showing height/width/pichgoal jpegblipJPEG type/bliptag – 728883813 id values (for a negative Here is the actual hexadecimal value of the image: FFD8FFe0001… … Vi. Basic representation methods of tables

The above text and images have been analyzed, I believe you have a certain experience of THE RTF file format, the next description of the RTF file table representation method. The definition of a table is a little more complicated, but there are some rules to follow. There are no RTF table groups; tables are actually described by paragraph attributes. A table represents a sequential arrangement of multiple table rows. A table row is a sequence of paragraphs made up of different cells. In short, a table consists of rows, and rows consist of cells. No matter how complex a table is, it is implemented by describing it line by line, including nesting of the table. The table row starts with the control word /trowd and ends with /row. Each paragraph contained in a table row must specify the/INTBL control word or inherit from the previous paragraph. There may be more than one paragraph in a unit; Cells are terminated by cell flags (/cell control word) and rows by row flags (/row control word). Table rows can also be absolutely located. At this point, each paragraph of the table row must have the same positioning control word. Table attributes can be inherited from the previous row; Therefore, consecutive table rows can be defined by a single < tBLdef >. A simple table example is as follows: 1,1, 1,1, 1,3, 2,1, 2,2,3 /trowd /irow0/irowband0/ts15/trgaph108/trleft-108/trbrdrt /brdrs/brdrw10 /trbrdrl/brdrs/brdrw10 /trbrdrb/brdrs/brdrw10 /trbrdrr/brdrs/brdrw10 /trbrdrh/brdrs/brdrw10 /trbrdrv/brdrs/brdrw10 /trftsWidth1/trftsWidthB3/trautofit1/trpaddl108/trpaddr108/trpaddfl3/trpaddft3/trpaddfb3/trpaddfr3/tblrsid2113686/tbllkh drrows/tbllklastrow/tbllkhdrcols/tbllklastcol /clvertalt/clbrdrt/brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr /brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2840/clshdrawnil /cellx2732/clvertalt/clbrdrt/brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr/brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2841/clshdrawnil /cellx5573/clvertalt/clbrdrt /brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr/brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2841/clshdrawnil /cellx8414/pard/plain /qj /li0/ri0/nowidctlpar/intbl/aspalpha/aspnum/faauto/adjustright/rin0/lin0/yts15 /fs21/lang1033/langfe2052/kerning2/loch/af0/hich/af0/dbch/af13/cgrid/langnp1033/langfenp2052 {/insrsid2113686 Hich/af0 DBCH/af13 / loch/f0 1, 1 / cell/hich af0 / DBCH af13 / loch/f0 1, 2 / cell/hich/af0 / DBCH/af13 loch/f0 1, 3 / cell }/pard/plain /ql /li0/ri0/widctlpar/intbl/aspalpha/aspnum/faauto/adjustright/rin0/lin0 /fs21/lang1033/langfe2052/kerning2/loch/af0/hich/af0/dbch/af13/cgrid/langnp1033/langfenp2052 {/insrsid2113686 /trowd /irow0/irowband0/ts15/trgaph108/trleft-108/trbrdrt /brdrs/brdrw10 /trbrdrl/brdrs/brdrw10 /trbrdrb/brdrs/brdrw10 /trbrdrr/brdrs/brdrw10 /trbrdrh/brdrs/brdrw10 /trbrdrv/brdrs/brdrw10 /trftsWidth1/trftsWidthB3/trautofit1/trpaddl108/trpaddr108/trpaddfl3/trpaddft3/trpaddfb3/trpaddfr3/tblrsid2113686/tbllkh drrows/tbllklastrow/tbllkhdrcols/tbllklastcol /clvertalt/clbrdrt/brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr /brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2840/clshdrawnil /cellx2732/clvertalt/clbrdrt/brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr/brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2841/clshdrawnil /cellx5573/clvertalt/clbrdrt /brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr/brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2841/clshdrawnil /cellx8414/row }/pard/plain /qj /li0/ri0/nowidctlpar/intbl/aspalpha/aspnum/faauto/adjustright/rin0/lin0/yts15 /fs21/lang1033/langfe2052/kerning2/loch/af0/hich/af0/dbch/af13/cgrid/langnp1033/langfenp2052 {/insrsid2113686 / hich/af0 / DBCH/af13 / loch / 2, 1 / cell/f0 hich/af0 / DBCH/af13 / loch / 2, 2 / cell/f0 hich/af0 DBCH af13 / loch/f0 2, 3 / cell }/pard/plain /ql /li0/ri0/widctlpar/intbl/aspalpha/aspnum/faauto/adjustright/rin0/lin0 /fs21/lang1033/langfe2052/kerning2/loch/af0/hich/af0/dbch/af13/cgrid/langnp1033/langfenp2052 {/insrsid2113686 /trowd /irow1/irowband1/lastrow /ts15/trgaph108/trleft-108/trbrdrt /brdrs/brdrw10 /trbrdrl/brdrs/brdrw10 /trbrdrb/brdrs/brdrw10 /trbrdrr/brdrs/brdrw10 /trbrdrh/brdrs/brdrw10 /trbrdrv/brdrs/brdrw10 /trftsWidth1/trftsWidthB3/trautofit1/trpaddl108/trpaddr108/trpaddfl3/trpaddft3/trpaddfb3/trpaddfr3/tblrsid2113686/tbllkh drrows/tbllklastrow/tbllkhdrcols/tbllklastcol /clvertalt/clbrdrt/brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr /brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2840/clshdrawnil /cellx2732/clvertalt/clbrdrt/brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 /clbrdrb/brdrs/brdrw10 /clbrdrr/brdrs/brdrw10 /cltxlrtb/clftsWidth3/clwWidth2841/clshdrawnil /cellx5573/clvertalt/clbrdrt /brdrs/brdrw10 /clbrdrl/brdrs/brdrw10 / CLBRDRB/BRDRS/brdrw10 / CLBRDRR/BRDRS/brdrw10 / CLTXLRTB/clftsWidth3 / clwWidth2841 / clshdrawnil/cellx8414 / row} is a very complicated? Never mind, you’ll see the structure of the tables in the RTF file by dividing it and annotating it appropriately. First, the RTF1.7 specification states that the table row format is as follows: (< tbldef > < cell > + < tbldef > / row) | (< tbldef > < cell > + / row) | (< cell > + < tbldef > / row), the Word2003 adopts the way, is the first one is “defining the content + +”, There is a lot of data redundancy, which is why even a simple document in Word2003 is large, but it is necessary for compatibility reasons. Its definition also consists of row definition + cell definition, where the cell definition can be repeated. The analysis code is as follows (red) :

Row 1 / TROwd Row 1 start table properties/TRGaph108 Cell half-space/TRLEFT-108 row border Settings/TRBRDRT row border/BRDRS single thickness/BRDRW10 line width / TRBRDRL line left border/BRDRS single thickness/BRDRw10 line width/TRBRDRB line bottom border/BRDRS single thickness/BRDRW10 line width/TRBRDRR line right border/BRDRS single thickness/BRDRW10 line width Cell 1 border Settings / CLBRDRT cell top border/BRDRW15 line width/BRDRS single thickness/CLBRDRL cell left border/BRDRW15 line width/BRDRS single thickness/CLBRDRB cell bottom border/BRDRW15 line width/BRDRS single thickness / CLBRDRR cell right border/BRDRW15 line width/BRDRS single thickness/CellX2732 cell right border cell 2 border setting/CLBRDRT cell top border/BRDRW15 line width/BRDRS single thickness CLBRDRL cell left border/BRDRW15 line width/BRDRS single thickness/CLBRDRB cell bottom border/BRDRW15 line width/BRDRS single thickness/CLBRDRR cell right border/BRDRW15 line width/BRDRS single thickness / Cellx5573 cell right border cell 3 border Settings/CLBRDRT cell top border/BRDRW15 line width/BRDRS single thickness/CLBRDRL cell left border/BRDRW15 line width/BRDRS single thickness / CLBRDRB cell bottom border/BRDRW15 line width/BRDRS single thickness/CLBRDRR cell right border/BRDRW15 line width/BRDRS single thickness /cellx8414 cell right border 1 data /pard reset paragraph properties/intBL Paragraph is part of the table /kerning2 Shrink character size /f0 font 0/fs21 size 21 1,1,1,1 /cell cell 1 end 1,2,1 /cell cell 2 end 1,3,1,3 /cell cell 3 end /f1 font 1 / ROW end/F0 font 0 table row 2 /trowd table row 2 start/TRgaph108 table cell half space/TRleft-108 table left position row border set/TRBRDRT/BRDRS /brdrw10 / TRBRDRL/BRDRS /brdrw10 / TRBRDRB/BRDRS /brdrw10 / TRBRDRR/BRDRS /brdrw10 Cell 1 border Settings/CLBRDRT /brdrw15/ BRDRS / CLBRDRL /brdrw15/ BRDRS/CLBRDRB /brdrw15/ BRDRS/CLBRDRR /brdrw15/ BRDRS /cellx2732 Cell 2 border Settings/CLBRDRT /brdrw15/ BRDRS / CLBRDRL/brDRw15 / BRDRS/CLBRDRB/brDRw15 / BRDRS/CLBRDRR /brdrw15/ BRDRS /cellx5573 Cell 3 border Settings/CLBRDRT /brdrw15/ BRDRS / CLBRDRL /brdrw15/ BRDRS/CLBRDRB /brdrw15/ BRDRS/CLBRDRR /brdrw15/ BRDRS /cellx8414 Line 2 Data/INTBL paragraph is part of the table 2,1, 2,1/cell cell 1 end 2,2,2/cell cell 2 end 2,3,3/cell cell 3 end /f1 font 1 /row 2 end Of course, in the same way, the embedded picture can be treated as a paragraph of text, but the implementation of nested table may be more complex, because it involves high-level issues such as the nesting level of paragraph text, which will not be detailed here. Interested readers can refer to the Rich Text Format (RTF) Specification V1.7.

Through the analysis of RTF files, we get an RTF File basic structure as follows: RTF File

Header

RTF version/RTF character set

Default font Locale

Default font number /deff? Font table

file table

? Color chart < colortbl >? The stylesheet < stylesheet >? Catalog tables < listtables >? Listtable {/*/listoverridetable} paragraph group attribute {/*/ PGPTBL} trace revision

? RSID table < rsidtable >? Generator information

? Document

Document information

? The title < title >? The topic < subject >? The author < author >? The manager < manager >? The company < company >? Last modifier

? Document category

? The keyword “keywords >? Annotation < comment >? Document version number /version? Comment

in Word digest message? Build number/Vern? Creation time

? Revision date < Revtim >? Last print time ? Backup time

? Editor-in-chief Time (in minutes)/ Edmins? Pages/nofpages? Word/nofwords? Total number of characters including Spaces /nofchars? Internal ID/ID? Document format attribute

* section text

+ section format attribute

* Header footer setting < HDRFTR >? Paragraph text < para > + < textpar text > | bulleted and numbered < pn >? Paragraph border

? The paragraph format attribute * locates the object with the border

* tabdef>? Shading

? Hidden or not (/v/SPV)? Table

row start /trowd row definition < tBLdef > cell

+ cell definition + cell content + repeated row definition < tBLdef > row end/ROW character text

+ picture < Pict > picture start {/*/shppict {/ Pict } Image properties Image data object

drawing object/SHP footnote /footnote

field