## A related story: Anonymous hacker’s “revenge”

On December 10,2010, the hacker group Anonymous posted a message explaining the general motivation behind their latest attack code-named “Operation Prefect” (2010). Angered by the companies that had dropped their support for wikileaks, Anonymous called for retaliation by launching distributed denial of service (DDoS) attacks on some of the organisations involved. It is unsigned and unattributed, and is published as a PDF(Portable Document Format) file.

This is a document from that time. I dug it up to satisfy my curiosity…

Although not signed, the script is used to quickly find the metadata of the document (the anonops_the_press_release.pdf shown here is the actual original file, and the metadata of the file is still preserved…).

A few days later, Greek police arrested Mr. Alex Tapanaris…

Mr. Alex Tapanaris’s revenge mission was cut short

This example tells us that even if the technology is not good, don’t let others know that you made the seed…


At present, sensitive metadata still exists in a large number of domestic resource websites

Take technical book resources downloaded by bloggers from major domestic resource websites as an example:

(Don’t ask me where resources come from, as a programmer, I know a little bit about how to get resources…)

In order to avoid becoming the second Mr. Alex Tapanaris when “a certain degree of library” posts “resources” to earn points, here is the script that blogger just completed to delete PDF metadata in bulk, and how to use it:

Quickly clear PDF metadata

The effect after clearing

#### Get document metadata in bulk (check others):

import PyPDF2
from PyPDF2 import PdfFileReader
import sys
import os
import re

Get all PDF files in the directory
def getFiles(a):

    files = os.listdir()

    If a single PDF file is entered, only a single PDF metadata is output
    try:
        if sys.argv[1]:
            files = [sys.argv[1]]
    except:
        pass

    pdf_files = list()

    for file_name in files:
        try:
            result = re.match(r".*\.pdf$", file_name)
            
            if result:
                pdf_files.append(file_name)
        except Exception as e:
            pass


    return pdf_files


Print the meta information of the file
def printMeta(files):
        for filename in files:
            try:
                pdfFile= PdfFileReader(open(filename, "rb"))
                docInfo = pdfFile.getDocumentInfo()
                print ("=== meta information for file %s is :"%filename)
                for metaItem in docInfo:
                    print(metaItem,":",docInfo[metaItem])
            except Exception as e:
                print("-- file %s metadata cannot be read, skipped!"%filename)

if __name__ == "__main__":
    filenames = getFiles()
    printMeta(filenames)

Copy the code

#### Clear source information (hide yourself):

import sys
import os
import re
from PyPDF2 import PdfFileReader, PdfFileWriter

Get all PDF files in the directory
def getFiles(a):
    files = os.listdir()
    pdf_files = list()

    for file_name in files:
        try:
            result = re.match(r".*\.pdf$", file_name)
            
            if result:
                pdf_files.append(file_name)
        except Exception as e:
            pass

    return pdf_files


def get_page_num(file_name):
    Get a pdfFileReader object
    my_pdf = PdfFileReader(open(file_name,"rb"))

    Get page number
    page_num = my_pdf.getNumPages()
    print("Page number of PDF file %s is %s"%(file_name, page_num))
    return page_num

    

def create_new_pdf(file_names):
    try:
        os.mkdir("./pure")
    except Exception as e:
        pass
    
    for file_name in file_names:

        try:
            Get the original PDF information
            my_pdf = PdfFileReader(open(file_name,"rb"))

            Create a PdfFileWriter object
            new_pdf = PdfFileWriter()
            
            for i in range(0, get_page_num(file_name)):
                
                page_info = my_pdf.getPage(i)

                new_pdf.addPage(page_info)

            new_pdf.write(open("./pure/%s"%file_name, "wb"))
            print("File %s has cleared metadata!"%file_name)
        except Exception as e:
            print("There is a problem with file %s encoding, it has been skipped automatically!"%file_name)



if __name__ == "__main__":
    create_new_pdf(getFiles());
Copy the code

Think of a funny words: know you do, just too lazy to catch you!