Recently I find a job in the slow progress, few companies are interested, watching friends circle everyday bask in hr, “I’m very busy” regrets the retractor can throw out your resume on the BBS, let interested companies find themselves, and now have no pull hook BBS, the description of the companies of hiring the same, it is hard to see who is what I want.

Since I was a developer of automated testing tools and had a lot of sites, why not work on automated resume delivery? If you’re reading this, you’re having as much trouble finding a job as I am, and hopefully the automated ideas and scripts here will help.

Before we get started, one of the key things about the daily site is the CSS selector. If you’re not familiar with it, you can use firefox’s “Copy CSS selector” feature. As shown in figure:

There are a lot of places where the parameters need to be filled in with a selector, as to why it is so filled, probably either experience, or I also use a tool to select. (I know Chrome has it too, but Firefox’s algorithm is better)

I don’t want to talk. The project is here

1. Pull hook

Hook is a professional Internet recruitment website. Judging from the calls I’ve received from Boss Zhipin, they’re pretty defenseless against their competitors crawling the page, so automated delivery should work.

The first step was definitely to download the Puppeteer and run YARN Add Puppeteer, which failed, even though I had the blue light on.

I had to save the country by manually downloading Chrome Canary and then adding a “PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true” to the.npmrc according to the documentation. If you have a good VPN or SS direct automatic pull should be ok.

Once you’ve downloaded it, try Hello World:

const puppeteer = require('puppeteer'); const main = async () => { const browser = await puppeteer.launch({ headless: false, slowMo: 250, executablePath: "C:\\Users\\Admin\\AppData\\Local\\Google\\Chrome SxS\\Application\\chrome.exe" }); const page = await browser.newPage(); await page.goto('https://www.lagou.com/'); page.on('console', msg => console.log('PAGE LOG:', ... msg.args)); await page.evaluate(() => console.log(`url is ${location.href}`)); await browser.close(); }; main().catch(console.error);Copy the code

First, I have an entry to the asynchronous main function that calls and catches, in case Node complains that the Promise didn’t catch anything wrong.

Puppeteer. Launch, headless is enabled in headless mode, slowMo is action interval for debugging, and executablePath points to where I downloaded Chrome Canary.

After that, the code just opens the new TAB, navigates the pull-up front page, the Console is currently connected, and exits, nothing more to say.

The second plan is to log into your account and browse the job listings. In order to share your code without giving out your username and password, you should definitely use a configuration tool like Dotenv, which is familiar to anyone who has deployed a Node server.

So I add require(‘dotenv’).config() on the first line of the file;

Then, of course, go directly to the login page, enter your username and password, and click Login.

await page.goto('https://passport.lagou.com/login/login.html'); // username await page.type('form.active > div:nth-child(1) > input:nth-child(1)', process.env.lagou_name); // password await page.type('form.active > div:nth-child(2) > input:nth-child(1)', process.env.lagou_pass); Click ('form.active > div:nth-child(5) > input:nth-child(1)'); await page.waitForNavigation(); / / jump straight await page. Goto (' https://www.lagou.com/zhaopin/webqianduan/?labelWords=label '); const title = await page.title(); console.log(title);Copy the code

Here the page.type method is the input text. Process.env.lagou_name is naturally from the.env configuration.

After clicking the login button, the page will jump, so use page.waitforNavigation () to wait for the login jump.

After a successful login jump, you will definitely go to the corresponding page, but there is no need to simulate clicking on node.js or the web front end, because those are just regular A tag links. I just need to browse to the corresponding page again.

Once in the job list, I usually select the city and sort it by when it was updated, and the pull box will refresh the page with a link like: www.lagou.com/jobs/list_w…

Const {escape} = require(‘ queryString ‘); . Then change the link to:

Await page. Goto (` https://www.lagou.com/jobs/list_${escape (' web front-end ')}? P = new&city = ${escape (' tianjin ')} # order `);Copy the code

The third step must be automatic delivery. We can look around and make up our minds as to which ones we want to vote for. But that program is not easy to write, the current simple and crude method is:

Get the 15 positions on the first page of the job list, filter the positions, and select the first of the remaining positions to post. After you post, the pull box will automatically filter out what you have posted, and so on.

So:

  const jobs = await page.?eval('#s_position_list > ul > li', positionList =>
		positionList.map(function mapPosition(position) {
			const dataset = position.dataset;
			const [salary1, salary2] = dataset.salary.split('-');

			return {
				title: dataset.positionname.toLowerCase(),
				company: dataset.company.toLowerCase(),
				salaryLo: parseInt(salary1),
				salaryHi: parseInt(salary2),
				id: parseInt(dataset.positionid)
			};
		})
	);Copy the code

Page.$executes document.querySelector, Page.? Is in the page document querySelectorAll, this 2 API also correspond to a eval is can have a callback to filter the data. In other words, page.? Eval is to perform the document querySelectorAll, and to deal with the results.

In the jQuery era, the data is written in the DOM dataset attribute, so after obtaining the DOM list, the corresponding data is directly extracted:

  • Positionname: indicates the work name. The front-end needs to filter the Java Web.
  • Company: Company name. Can be used to filter out unfriendly and in-conversation targets.
  • Salary: This data needs to be processed a little bit for later filtering.
  • Positionid: This is part of the link to the resume submission page.

Therefore:

function getJobLink(jobs) { const goodJobs = jobs.filter(function(job) { if (job.title.indexOf('java') > -1) { return false; } // Other filter criteria return true; }); if (goodJobs.length > 0) { const job = goodJobs[0]; return `https://www.lagou.com/jobs/${job.id}.html`; } return null; }Copy the code

You can have your own salary, company blacklist, etc. After filtering, we took the remaining first job and put together a link for the delivery page.

	const jobLink = getJobLink(jobs);
	//console.log(jobLink);

	await page.goto(jobLink);
	await page.click('.fr.btn_apply');Copy the code

Get a postable link, jump to it, and click Post…

There will be at least two situations, one is pull ok, you can click “I know”. The other is that pull hook said I write experience fixed number of years is not enough, whether confirm. I’m going to invest in Modern front-end and Node, and even the founders have 8 years of experience, so OF course I’m going to ignore the retarded requirement of 5-10 years of experience.

	await page.click('#delayConfirmDeliver').catch(() => {});
	await page.click('#knowed').catch(() => {});
	await page.waitForNavigation();Copy the code

Click “Confirm delivery” and don’t hang up if it’s not there. Click “I know”, if not, don’t hang up. Finally, wait for the page to refresh.

It’s ready to automatically post, just tweak it so it can automatically post up to the top of the drop box.

2. At

The interface of Zhaopin.com is very messy, especially some modern Internet style pages, which makes people feel very bad, and I don’t use it often.

There seem to be more fraud companies and training companies on zhaopin, so only do automatic renewal resume, do not deliver.

In other words, zhilian is really a magical website, some places to log in to the verification code, and some places do not need…

So here is a link to the first point is must from baidu search page, so there is no verification code, the address is: ts.zhaopin.com/jump/index_…

In terms of code, there is only one magical point, that is, if the popover after login is not closed, the A tag on the page can not be clicked.

	await page.click('.Delivery_success_popdiv_title span.fr').catch(() => {});
	await page.click('.amendBtn');
	await page.waitForNavigation();Copy the code

So click on the X here, no matter whether it is successful or not, continue to click modify resume, waiting for the jump. See the zhilian/index.js file for the code before and after the login and click refresh

3.100 offer

100Offer describes itself as a job site that “enables the best people to meet better opportunities.”

This site is not quite the same as other sites, you click on the city or the next page, its page will have an Ajax request, the result is only one field: HTML, and it will use jQuery to insert the HTML into the DOM tree, really magic.

So the front login and jump page naturally not what to say:

const page = await browser.newPage(); await page.goto('https://cn.100offer.com/signin'); // user name await page.type('#talent_email', process.env.o100_name); // password await page.type('#talent_password', process.env.o100_pass); Click ('#new_talent > div:nth-child(6) > input:nth-child(1)'); await page.waitForNavigation(); / / jump straight await page. Goto (' https://cn.100offer.com/job_positions'); // await page.click('.locations. Filters > div:nth-child(3)'); // no need to await page.click('.degree.filters > div:nth-child(7)');Copy the code

It’s a tricky situation because 100Offer can’t be filtered by job keywords, so there are a lot of irrelevant Java jobs on the page.

So here’s the point:

async function getJobLink() { const jobs = await page.? eval('.position-list > .position-item a.h3-font', links => links.map(function mapLinks(link) { return { name: link.text.toLowerCase(), url: link.href }; })); const goodJobs = jobs.filter(function (job) { if (job.name.indexOf('Node') > 0) { return true; } if (job.name.indexof (' front-end ') > 0) {return true; } return false; }); if (goodJobs.length > 0) { return goodJobs[0].url; } // const nextEl = await page.$(' a.nexext '); if (nextEl == null) { return null; } await nextEl.click(); return await getJobLink(); }Copy the code

This getJobLink is a recursive function that searches for jobs recursively.

First, grab all the job names and links on the page. Filter job names, for example I just want Node and front-end.

If you don’t find it, then you need to turn the page. Const nextEl = await page.$(‘ a.nexext ‘); If you can’t click the next page, the search naturally fails. If so, click on the next page, of course, and recursively search.

After click to post, ignore warnings and hooks similar, I will not post again. The code is in the 100 folder of the project

4. The community

Most of the time, I would also look at some opportunities in the community, but the same action is repeated every day:

Open communities A, B,c and click on the jobs section to see the latest posts.

Why not automate reading posts?

I only know JS and Go, Go language is more suitable for this job, besides, the Mi notebook I used for business trip has 2 cores and 4 threads, it is really silly not to use it.

First, the main function

func main() { results := make(chan *Result) var wg sync.WaitGroup wg.Add(len(sites)) for _, site := range sites { matcher, ok := matchers[site.resType] if ! ok { matcher = matchers["default"] } go func(matcher Matcher, url string) { err := doMatch(matcher, url, results) if err ! = nil { log.Println(err) } wg.Done() }(matcher, site.url) } go func() { wg.Wait() close(results) }() display(results) }Copy the code

If you can’t Go, the Go keyword, the chan keyword and sync.waitgroup will probably help you create a new thread to synchronize the results.

I have a results channel sync result here and WG indicates the end of the thread searching for the post. I then walked through the community links I wanted to visit and parsed the results returned by the community. There is also a thread that synchronizes all the results and outputs them at the command line.

For different sites to have a different resolution scheme. So here’s the Matcher interface, defined as follows:

type Matcher interface {
	match(reader io.Reader) ([]*Result, error)
}Copy the code

Matcher accepts IO.Reader, which is one of the most flexible ways to write javascript.

For a CNode community that provides restful interfaces, it’s natural to parse JSON.

type CNodeTopic struct {
	Title string `json:"title"`
	CreateAt time.Time `json:"create_at"`
	Content string `json:"content"`
}

type CNodeResp struct {
	Success bool `json:"success"`
	Data []CNodeTopic `json:"data"`
}

type CNodeJSON struct {}Copy the code

Cnode has several attributes for each topic, so I’ll just pick the ones I want.

And then parsing:

func (CNodeJSON) match(reader io.Reader) ([]*Result, error) { resp, err := ioutil.ReadAll(reader) if err ! = nil { return nil, err } cnodeResp := CNodeResp{} if err = json.Unmarshal(resp, &cnodeResp); err ! = nil { return nil, err } if ! cnodeResp.Success || cnodeResp.Data == nil { return nil, fmt.Errorf("no response") } ret := make([]*Result, 0) for _, topic := range cnodeResp.Data { if time.Since(topic.CreateAt).Nanoseconds() - time.Hour.Nanoseconds() * 24 * dayLimit > 0 { continue } ret = append(ret, &Result{title: topic.Title, email: emailRe.FindString(topic.Content), content:topic.Content}) } return ret, nil }Copy the code

All Golang daily parse, error return. If all is well, the natural thing is to decide when to post. New posts are added to the results, there is nothing to say.

However, most web sites are not as convenient as CNode and must parse HTML.

So I take my studyGolang.com as an example. The first thing to do is refer to GoQuery, which is a jQuery like or cheerio like in Node. Otherwise, you need to recursively search HTML nodes… go get “github.com/PuerkitoBio/goquery”

And then parsing:

type StudyGolangHTML struct {} func (StudyGolangHTML) match(reader io.Reader) ([]*Result, error) { doc, err := goquery.NewDocumentFromReader(reader) if err ! = nil { return nil, err } ret := make([]*Result, 0) doc.Find(".topic").Each(func(i int, selection *goquery.Selection) { abbr := selection.Find("abbr") timeStr, _ := abbr.Attr("title") t, err := time.Parse("2006-01-02 15:04:05", timeStr) if err ! = nil { return } if time.Since(t).Nanoseconds() - time.Hour.Nanoseconds() * 24 * dayLimit > 0 { return } link := selection.Find(".title a") ret = append(ret, &Result{title: link.Text(), email: "", content:link.AttrOr("href", "")}) }) return ret, nil }Copy the code

The node for each topic in studyGolang.com can be selected using.topic, the time is in the ABBR TAB, and the title and link are under.title a.

If you have a site that you want to search for, such as Rust-China or Kotlin-China, it’s usually json or HTML, so it shouldn’t be too hard to adapt.

conclusion

I can’t believe you finished it. Good luck finding a suitable job soon.