The problem situation

There is a problem when extracting pictures from Word file: part of the pictures have been cropped. If you extract the pictures directly, you will get the original picture. How can you get the cropped picture?

The solution

After querying relevant materials, relevant information about clipping is found in THE XML file. You can get the relevant clipping information to scale the original image, and the specific steps are as follows:

Reading word files

Use Apache POI’s XWPFDocument to read the file and retrieve the image paragraph. Example:

<xml-fragment w14:paraId="70863DC9" w14:textId="14D2A409" w:rsidR="00A84881" w:rsidRDefault="00A84881" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
  <w:r>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
    <w:drawing>
      <wp:inline distT="0" distB="0" distL="0" distR="0" wp14:anchorId="35CB9A82" wp14:editId="733F3916">
        <wp:extent cx="4521200" cy="2438400"/>
        <wp:effectExtent l="0" t="0" r="0" b="0"/>
        <wp:docPr id="1" name="Picture 1" descr="Dog and man lying on the ground description generated automatically."/>
        <wp:cNvGraphicFramePr>
          <a:graphicFrameLocks noChangeAspect="1" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"/>
        </wp:cNvGraphicFramePr>
        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
              <pic:nvPicPr>
                <pic:cNvPr id="1" name="Picture 1" descr="Dog and man lying on the ground description generated automatically."/>
                <pic:cNvPicPr/>
              </pic:nvPicPr>
              <pic:blipFill rotWithShape="1">
                <a:blip r:embed="rId4">
                  <a:extLst>
                    <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
                      <a14:useLocalDpi val="0" xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"/>
                    </a:ext>
                  </a:extLst>
                </a:blip>
                <a:srcRect l="16133" t="1806" r="1853" b="63522"/>
                <a:stretch/>
              </pic:blipFill>
              <pic:spPr bwMode="auto">
                <a:xfrm>
                  <a:off x="0" y="0"/>
                  <a:ext cx="4521200" cy="2438400"/>
                </a:xfrm>
                <a:prstGeom prst="rect">
                  <a:avLst/>
                </a:prstGeom>
                <a:ln>
                  <a:noFill/>
                </a:ln>
                <a:extLst>
                  <a:ext uri="{53640926-AAD7-44D8-BBD7-CCE9431645EC}">
                    <a14:shadowObscured xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"/>
                  </a:ext>
                </a:extLst>
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</xml-fragment>
Copy the code

Extract clipping information

The clipping information is stored in the srcRect tag, which is divided into four attributes:

  1. T: top, proportion of cutting at the top of the picture
  2. B: Bottom, cut scale at the bottom of the picture
  3. L: Left, the cropping scale on the left of the picture
  4. R: Right, cropped to the right of the picture

The four attributes respectively represent the clipping proportion in each direction, and the attribute value divided by 1000 represents the clipping percentage. Note: The percentage represents the clipping proportion relative to the current edge, and a positive value means clipping to the inside of the picture, while a negative value means clipping to the outside of the picture, which expands the picture scope.

Here’s an example:

T: 0, B: 25,000

Cut 25% below the picture (leave 75% above)

Cut out pictures

After extracting the clipping information, it is simple, just need to crop the picture according to the clipping information, the code example is as follows:

    String filePath = "cropped_image.docx";
    XWPFDocument xwpfDocument = new XWPFDocument(new FileInputStream(filePath));
    // Extract clipping information serRect
    CTRelativeRect ctRelativeRect = xwpfDocument.getParagraphs().get(0).getRuns().get(0).getEmbeddedPictures().get(0).getCTPicture().getBlipFill().getSrcRect();
    // Get the image data stream, where part ID is changed to your image ID based on the actual situation
    BufferedImage image = ImageIO.read(xwpfDocument.getPartById("rId4").getInputStream());
    int width = image.getWidth();
    int height = image.getHeight();
    // Clipping the starting coordinates (upper-left coordinates)
    int x = Double.valueOf(width * ctRelativeRect.getL() / 100000.0).intValue();
    int y = Double.valueOf(height * ctRelativeRect.getT() / 100000.0).intValue();
    // Cut the length and width
    int w = Double.valueOf(width * (1 - ctRelativeRect.getR() / 100000.0)).intValue() - x;
    int h = Double.valueOf(height * (1 - ctRelativeRect.getB() / 100000.0)).intValue() - y;
    // Crop the image
    BufferedImage croppedImage = image.getSubimage(x, y, w, h);
Copy the code