Selenium crawler: Intercepts images on web pages

preface

For the same reason, I still have to brush online courses after work, just like in college.

The body of the

Directly said dry goods, intercept the picture, need to intercept the picture is what picture we all understand (say is you, verification code), other pictures do not need to intercept, directly get the address to download on the line, verification code is not good, the same address to visit once again the content will change.

I don’t know why Selenium can’t just pull out images of certain IMG elements, it’s so anti-human.

According to the information I found, there are two main ideas, one is to simulate the mouse operation, right click on the verification code above, and then choose Save as, save the verification code to the local after reading…… I don’t understand why there is such a neat operation, a big problem with right-click save is that you have no control over where the image is stored, which makes this crawler not universal! So I just pass it.

Another kind is to the entire web page screenshots first, and then according to the verification code size and location of the img element, positioning and cut out small captcha images to, ideally, but after I test many times found different browsers have different offsets and cut out the picture zoom, don’t know where is wrong, can only be hard-coded fine-tuning, throw up. It’s not perfect, but it’s good enough. Share the code…

code

This is done in C# (WinForm), although it is only a code fragment, but screenshots + cropping save part or can refer to.

Behind the verification code identification is incidentally added, with baidu interface, accuracy is worrying.

setStatusMsg1("Extracting captcha");
var verifyCode = currentBrowser.WebDriver.FindElement(By.XPath("/html/body/spk-root/spk-login-page/div/section/div[3]/div[2]/div[2]/form/div[3]/div[2]/img"));
setStatusMsg2("Browser screenshot");
// Set the browser size
currentBrowser.WebDriver.Manage().Window.Size = new Size(1280.800);
// Browser screenshot
var screenshot = ((ITakesScreenshot)currentBrowser.WebDriver).GetScreenshot();
var screenImagePath = Path.Combine(Path.GetTempPath(), $"{System.Guid.NewGuid().ToString("N")}.jpg");
// Save the screenshot
setStatusMsg2("Save the screenshot");
screenshot.SaveAsFile(screenImagePath, ScreenshotImageFormat.Jpeg);

// Clipping the verification code
setStatusMsg2("Clipping verification code");
var codeImagePath = screenImagePath.Replace(".jpg"."_code.jpg");

int x, y, width, height;

// Use js to get information such as the position of the image
switch (currentBrowser.BrowserType) {
    case BrowserEnum.Chrome:
        x = Convert.ToInt32((long)((IJavaScriptExecutor)currentBrowser.WebDriver).ExecuteScript("return document.querySelector('body > spk-root > spk-login-page > div > section > div.login-body.clearfix > div.login-right > div.form-con > form > div.qr-code > div.qrcode-box.clearfix > img').x"));
        y = Convert.ToInt32((long)((IJavaScriptExecutor)currentBrowser.WebDriver).ExecuteScript("return document.querySelector('body > spk-root > spk-login-page > div > section > div.login-body.clearfix > div.login-right > div.form-con > form > div.qr-code > div.qrcode-box.clearfix > img').y"));
        width = verifyCode.Size.Width;
        height = verifyCode.Size.Height;
        // Adjust the captcha position
        //x += 20;
        //y += 8;
        // Adjust the verification code size
        width += 30;
        height += 15;
        break;
    default:
        x = verifyCode.Location.X;
        y = verifyCode.Location.Y;
        width = verifyCode.Size.Width;
        height = verifyCode.Size.Height;
        break;
}

var codeBitmap = new Bitmap(width, height);
var codeGraphics = Graphics.FromImage(codeBitmap);

var destRec = new Rectangle(0.0, width, height);
var srcRec = new Rectangle(x, y, width, height);
setStatusMsg2(srcRec.ToString());
codeGraphics.DrawImage(new Bitmap(screenImagePath), destRec, srcRec, GraphicsUnit.Pixel);
// Save the captcha image
codeBitmap.Save(codeImagePath, ImageFormat.Jpeg);

// Display the image
picVerifyCode.Load(codeImagePath);
picVerifyCode.Tag = codeImagePath;

// Verification code identification
var result = BaiduAiSdk.VerifyCode(codeImagePath);
if (result.Length > 0) {
    txtVerifyCode.Text = result;
    FrmTips.ShowTipsSuccess(this.$" Verification code is identified successfully, identification result:{result}");

    var input = currentBrowser.WebDriver.FindElement(By.XPath("/html/body/spk-root/spk-login-page/div/section/div[3]/div[2]/div[2]/form/div[3]/div[2]/input"));
    input.Clear();
    input.SendKeys(result);
} else
    FrmTips.ShowTipsError(this."Verification code recognition failed, please try again!");
Copy the code

The resources

Python + Selenium solve the problem of image authentication code login or register: www.mscto.com/python/6073…
Selenium + ChromeDriver: kaiwu.lagou.com/java_archit…
Selenium positioning and switching frame/iframe: blog.csdn.net/huilan_same…
Selenium: cuiqingcai.com/2599.html

Welcome to communicate

The program design lab focuses on the exploration of hot new technologies on the Internet and the team’s agile development practice. You can obtain relevant technical articles and materials by replying to the background of the public account “program design lab”, including Linux, flutter, c#, netcore, android, kotlin, Java, python, etc. At the same time, you can leave a message on the background of the public account

Blog Park: www.cnblogs.com/deali/
Play code studio: live.bilibili.com/11883038
Zhihu: www.zhihu.com/people/deal…

Selenium crawler: Intercepts images on web pages

preface

The body of the

code

The resources

Welcome to communicate

Related Posts

MySQL architecture – evolution of MySQL application architecture

[Leetcode] 662. Maximum width of binary tree

Ffmpeg – Basic concepts of audio and video