Antecedents feed

Recently, as a result of the sublet house, do very tired, early did not consider the new residence, resulting in the need to sublet the house before, otherwise the landlord does not return the deposit ah; How to do, sublet bai, copy up guy be stem, had used douban group to turn a house before, this also is same (I am not do advertisement, don’t hit me!) , desperately post, desperately their top post ah, plus 20 groups, sent a 20 posts, originally wanted to write a script timed top post, but also need to make verification code, work more, no time to do, automatic top post on artificial replacement, occasionally one or two hours a time;

I just rented it out two days ago. I wanted to go to Douban to delete the post, because there are too much personal information in the post, and it is too troublesome to delete it one by one. Helpless to see a place without deletion, and then know that the original douban deletion must be no comment on the post can be deleted, I looked at my post, the moment petrochemical, my own reply has hundreds of, a deletion to what time ah; Instinctively, I began to study the interface call rules of douban delete comments, and after discovering the rules, I rolled up my sleeves to dry;

Douban interface

Since some related operations in Douban require authentication, we need to log in with the browser in advance to operate, and get Cookie and a CK value. The specific function of ck value is not clear, but most operations need this parameter, and the value of this parameter is basically unchanged, so it is easy to find. It is best to delete the comment before deleting the post itself to obtain the CK value, cid value is the number of the comment value, the subsequent code has introduced related parameters to obtain the way;




Technology selection

I have done node crawler tools before, and I will do the same this time, but this one is relatively simple, only two dependency packages, one is

Superagent is Cheerio. These two are often used by crawlers. I will not introduce them in detail.

The main code

Ideas:

The main code content is very simple, mainly is to obtain the first personal post list, and then simulate access to all the list of posts, to obtain the HTML text of each post, and then to climb the comment ID number of each post, after obtaining the ID number, delete the post operation;

Matters needing attention

Here it is important to note that after when performing deleting comments, you will find that there are part of the review, for their own comments can be directly call interface to delete, but for other people’s comments, you need to call another interface to delete, and need to fill that in, so the first step is to remove all your comments, Then the second step is to delete all other people’s comments; The last step is to delete all posts;

The directory structure

The simplest directory structure is NPM init directly in a project folder

Then create a douSpider. Js file and NPM installs superagent and Cheerio



code

douSpider.js

Var superagent = require('superagent'); Var cheerio = require('cheerio'); Cheerio //Cookie var Cookie='Fill in your Cookie value';
//host
var host ='www.douban.com';
//Origin
var Origin ='https://www.douban.com';
//Referer
var Referer = 'https://www.douban.com/group/'; // parameter ck var ck ='Fill in your CK value'; // Get a list of personal posts getTopic =function() {// The important thing to note here is that you need to click to the list of groups you post var url ='https://www.douban.com/group/people/1732 * * * * * (fill in your own url)/publish';
    superagent
        .get(url)
        .set('Cookie',Cookie)
        .set('Host',host)
        .set('Referer',Referer)
        .end(function (err,res) {
            if(err){
                return err
            }
            var $=cheerio.load(res.text);
         var urlList=[];
         $('.title a').each(function (index,element) {
            var $element = $(element);
            var href =$element.attr('href');
            var href_item = href.split('/');
            urlList.push(href_item[5]);
         });
         for(var i =0; i<urlList.length; i++){ console.log('Processing number'+i+'posts'); spider(urlList[i]); //removeTopic(urlList[I]); }})}; // Get the CID spider = for the specified postfunction (topic) {
    var url = 'https://www.douban.com/group/topic/'+topic+'/'; Superagent.get (url) // The operation after the request ends. End (function (err, res) {
            if (err) {
                console.log(err);
                returnerr; } $=cheerio.load(res.text); $=cheerio.load(res.text); var cid=[]; $('#comments li').each(function (index,element) {
                var $element = $(element);
                cid.push($element.attr('data-cid'));
         });
            for(leti=0; i<cid.length; i++){ console.log('deleting the first'+topic+'The first post'+i+'A comment'); RemoveComment (topic,cid[I]); RemoveOtherComment (topic,cid[I]); }}); }; // Remove post removeTopic =function (topic) {
    var url3='https://www.douban.com/group/topic/'+topic+'/remove? ck='+ck;
   superagent
      .get(url3)
      .set('Cookie',Cookie)
      .set('Host',host)
      .set('Referer',Referer+'topic/'+topic+'/')
      .end(function (err,res) {
         if(err){
            console.log(err);
            return err
         }
         console.log('Deleted post successfully'); })}; RemoveOtherComment =function (topic,cid) {
    var url2 ='https://www.douban.com/group/topic/'+topic+'/remove_comment? cid='+cid;
   superagent
      .post(url2)
      .send({'cid':cid,'ck':ck,'reason':'other_reason'.'submit':'sure'})
      .set('Cookie',Cookie)
      .set('Host',host)
      .set('Origin',Origin)
      .set('Content-Type'.'application/x-www-form-urlencoded')
      .set('Referer',Referer+'topic/'+topic+'/remove_comment? cid='+cid)
      .end(function (err,res) {
         if(err){
            console.log(err);
            return err
         }
         console.log('Deleted successfully'); })}; // Remove your own comments removeComment =function (topic,cid) {
    var url1 = 'https://www.douban.com/j/group/topic/'+topic+'/remove_comment'; Superagent.post (url1).send({'cid':cid,'ck':ck})
        .set('Cookie',Cookie)
        .set('Host',host)
        .set('Origin',Origin)
        .set('Content-Type'.'application/x-www-form-urlencoded')
        .set('Referer',Referer+'topic/'+topic+'/')
      .end(function (err,res) {
         if(err){
             console.log(err);
            return err
         }
         console.log('Deleted successfully'); })}; getTopic();Copy the code

use

This script has been put on my personal Git. If you are interested, you can fork it or give it a star. Thank you very much.

How to use git is also available on git, I believe you should know it; Direct node script name.js

Git portal




The problem

At present, this deletion script is only applicable to delete all comments of posts, so please be careful to use it. Because all posts are subleased information, I can delete them with one key. If you have important information, please note that you can call some functions and fill in your specified post ID value to delete them.