This article is published by netease Cloud.



“Knowing Things from Learning” is a brand column created by netease Yunyi Shield. The words come from Han Wang Chong’s “Discussion balance · Real Knowledge”. People are superior to each other in ability. They know the truth of things by learning, and then they are wise. They do not know without asking. “Know Things by learning” hopes to bring you harvest through articles of technical dry goods, trend interpretation, character thinking and precipitation, but also hope to open your horizon and achieve a different you. Of course, if you have a good cognition or share, you are welcome to contribute by email ([email protected]).


In “Part 1” of this series, we looked at how machine learning can help hackers gather information and simulate attacks. Here’s part two of this series:


3. Unauthorized access using machine learning


Next is gaining unauthorized access to the user’s account. Imagine a cybercriminal needing unauthorized access to a user’s session. The obvious way to do this is to constantly try to log in with a password. One of the most annoying things about large-scale hacking is captchas. Many computer programs can solve the simple verification code test, but the most complicated part is the object segmentation, there are many research papers describing the verification method of the verification code. On 27 June 2012, Claudia Cruz, Fernando Uceda, and Leobardo Reyes published the first example of machine learning in this field. They used a support vector machine (SVM) approach to crack systems running on reCAPTCHA images with 82% accuracy, and all captcha mechanisms were improved. But then, there were a bunch of papers that used deep learning to crack CAPTCHA. In 2016, an article [1] was published detailing how to use deep learning to break simple captcha with 92% accuracy.


Another study used one of the latest advances in the field of image recognition [2], with a deep residual network of 34 layers, breaking the CAPTCHA of the popular Indian website IRCTC with an accuracy of 95-98%. Most of these papers use character-based CAPTCHA.


One of the most inspiring papers was presented at the BlackHat conference, a study called “I am a Robot.” They broke the latest record for semantic image CAPTCHA, compared various machine learning algorithms, and broke the record for Google reCAPTCHA with 98% accuracy.


Even more awe-inspiring, a new article notes that scientists say 100% CAPTCHA validation is on the way. Another area cybercriminals may find with the help of machine learning is brute force password cracking.


Long before deep learning became a hot topic, the Markov model was the first one used to generate cryptography “predictions” in 2005. If you’re familiar with current neural networks and LSTM, you’ve probably heard of a network that generates text based on trained text, for example, if you give the network a play by Shakespeare, it will create a new text based on it. The same idea can be used to generate passwords. If we can train the network with the most common passwords, and we can generate many similar passwords. The researchers took this approach and applied it to get passwords with good results, creating lists of passwords better than traditional methods, such as changing letters to symbols, such as from “s” to “$”.


Another approach, described in one of the papers “PassGAN: A Deep Learning Method for Password guessing [3]”, is to use gans (generative adversarial networks) to generate passwords. GAN is a special type of neural network composed of two networks. One is often called generative and the other is differentiated. When one network produces an example of hostility, the other is to test whether they can solve a problem, with the core idea being to train the network based on the cryptographic real data collected from a recent data breach. After releasing the largest database of 1.4 billion passwords from all the violations, the realization of this idea looks promising for cybercriminals.


4. Attack with machine learning


The fourth area where cybercriminals want to exploit machine learning is in actual attacks. In general, attacks have three general goals: espionage, sabotage and fraud. These malware, spyware, ransomware or any other type of malware are created by phishing or attackers uploading it to the victim’s computer. Either way, the attacker would need to somehow upload the malware to the victim’s machine.


Using machine learning to protect malware may be the first commercially successful application of machine learning in cybersecurity, and there have been a number of papers describing how different techniques can be used to detect malware using artificial intelligence (AI).

How can cybercriminals use machine learning to create malware? The first well-known example of AI used to create malware was presented in a paper published in 2017 entitled “Examples of Gan-based Black Box Attacks Producing Adversary Malware” [4], where the authors set up a network called MalGAN.


This study proposes an algorithm to generate instances of malware that can bypass the detection model based on black box machine learning. The proposed algorithm is much better than the examples of traditional gradient-based generation algorithms and can reduce the detection rate to almost zero. The system takes the original malware sample as input and outputs hostile samples based on the sample and some noise. The nonlinear structure of neural networks allows them to generate more complex, flexible examples to deceive the target model.


I mentioned earlier that there are three main purposes of attack: espionage, sabotage and fraud, most of which are carried out by malware. However, there is another relatively new type of attack that can be viewed as vandalism, called Crowdturfing[5]. In short, malicious use of crowdsourcing services. For example, attackers paid workers some cash to write bad online reviews for competing businesses. Being written by real people, these reviews often go undetected because automated tools are looking for software attackers.


Other options could be massive, DoS attacks or the generation of false information. With the help of machine learning, cybercriminals can reduce the cost of these attacks and automate them. The “Automated Crowd Attack and Defense in Online Review Systems” research paper, published in September 2017, describes a system that generates fake reviews on Yelp. The benefit is not just great reviews that are undetectable, but reviews that are rated higher than humans.


5. Machine learning for cybercrime automation


Experienced hackers can use machine learning to automate certain necessary tasks in various fields. So it’s hard to say when it’s going to be automated, but since cybercrime groups have hundreds and thousands of members, different types of software are likely to emerge, and it’s likely to support more of us in unexpected ways.


As for specific cybercrimes, there is a new term – Hivenet[6]- they are clever botnets. The idea is that if a botnet is manually managed by cybercriminals, then a hive network could have a brain to pull off a particular event and change its behavior based on that event. Multiple robots are on the device at the same time, and depending on their mission, they will now decide who will use the victim’s resources, like a string of parasites in an organism.


The original address: know things by learning | AI era, the hackers are how to burnish their “sharp”? (2)



The appendix

  • “Deep learning breaks simple captchas with 92% accuracy” [1]
  • 34 layer deep Residual Network Cracking CAPTCHA[2]
  • PassGAN: Deep learning method for password guessing [3]
  • GAN based black box attack [4]
  • Malicious use of crowdsourcing services: Crowdturfing[5]
  • Smart Botnet: Hivenet[6]


Understand netease Cloud:

The official website of netease Cloud is www.163yun.com/

New user package: www.163yun.com/gift

Netease Cloud community: sq.163yun.com/