Please do this before uploading resources to a certain library

## A related story: Anonymous hacker’s “revenge”

On December 10,2010, the hacker group Anonymous posted a message explaining the general motivation behind their latest attack code-named “Operation Prefect” (2010). Angered by the companies that had dropped their support for wikileaks, Anonymous called for retaliation by launching distributed denial of service (DDoS) attacks on some of the organisations involved. It is unsigned and unattributed, and is published as a PDF(Portable Document Format) file.

### This is the document from that time, I dug it up to satisfy my curiosity…

####, though unsigned, uses the script to quickly find the metadata of the document (the anonops_the_press_release.pdf shown here is the actual original file, and the metadata of the file is still preserved…).

#### A few days later, Greek police arrested Mr. Alex Tapanaris… #### Mr. Alex Tapanaris’ “Operation Revenge” ended prematurely #### This example tells us that even if the technology is not good, do not let others know that it is you who made the seeds…

At present, sensitive metadata still exists in a large number of domestic resource websites

#### take technical book resources downloaded by bloggers from major domestic resource websites as an example:

(Don’t ask me where resources come from, as a programmer, I know a little bit about how to get resources…)

## In order to avoid becoming the second Mr. Alex Tapanaris when you send “resources” to earn points in “certain degree library”, the following is a script that the blogger has just completed to delete PDF metadata in bulk, and how to use it:

### Quickly clear PDF metadata

The effect after clearing

#### Get document metadata in bulk (check others):

import PyPDF2
from PyPDF2 import PdfFileReader
import sys
import os
import re

Get all PDF files in the directory
def getFiles(a):

    files = os.listdir()

    If a single PDF file is entered, only a single PDF metadata is output
    try:
        if sys.argv[1]:
            files = [sys.argv[1]]
    except:
        pass

    pdf_files = list()

    for file_name in files:
        try:
            result = re.match(r".*\.pdf$", file_name)
            
            if result:
                pdf_files.append(file_name)
        except Exception as e:
            pass


    return pdf_files


Print the meta information of the file
def printMeta(files):
        for filename in files:
            try:
                pdfFile= PdfFileReader(open(filename, "rb"))
                docInfo = pdfFile.getDocumentInfo()
                print ("=== meta information for file %s is :"%filename)
                for metaItem in docInfo:
                    print(metaItem,":",docInfo[metaItem])
            except Exception as e:
                print("-- file %s metadata cannot be read, skipped!"%filename)

if __name__ == "__main__":
    filenames = getFiles()
    printMeta(filenames)

Copy the code

#### Clear source information (hide yourself):

import sys
import os
import re
from PyPDF2 import PdfFileReader, PdfFileWriter

Get all PDF files in the directory
def getFiles(a):
    files = os.listdir()
    pdf_files = list()

    for file_name in files:
        try:
            result = re.match(r".*\.pdf$", file_name)
            
            if result:
                pdf_files.append(file_name)
        except Exception as e:
            pass

    return pdf_files


def get_page_num(file_name):
    Get a pdfFileReader object
    my_pdf = PdfFileReader(open(file_name,"rb"))

    Get page number
    page_num = my_pdf.getNumPages()
    print("Page number of PDF file %s is %s"%(file_name, page_num))
    return page_num

    

def create_new_pdf(file_names):
    try:
        os.mkdir("./pure")
    except Exception as e:
        pass
    
    for file_name in file_names:

        try:
            Get the original PDF information
            my_pdf = PdfFileReader(open(file_name,"rb"))

            Create a PdfFileWriter object
            new_pdf = PdfFileWriter()
            
            for i in range(0, get_page_num(file_name)):
                
                page_info = my_pdf.getPage(i)

                new_pdf.addPage(page_info)

            new_pdf.write(open("./pure/%s"%file_name, "wb"))
            print("File %s has cleared metadata!"%file_name)
        except Exception as e:
            print("There is a problem with file %s encoding, it has been skipped automatically!"%file_name)



if __name__ == "__main__":
    create_new_pdf(getFiles());
Copy the code

I know you did it, but I’m too lazy to catch you!

# If you like Python and like stories, please like me or follow me! Your support is the greatest encouragement to the author!

Please do this before uploading resources to a certain library

#### A few days later, Greek police arrested Mr. Alex Tapanaris… #### Mr. Alex Tapanaris’ “Operation Revenge” ended prematurely #### This example tells us that even if the technology is not good, do not let others know that it is you who made the seeds…

At present, sensitive metadata still exists in a large number of domestic resource websites

The effect after clearing

Related Posts

[Docker] 3. Configure Aliyun image acceleration

A measure of fixtures 2. A measure of fixtures 2

VS Code: N plugins to double your productivity