This article is published by netease Cloud.


“Knowing Things from Learning” is a brand column created by netease Yunyi Shield. The words come from Han Wang Chong’s “Discussion balance · Real Knowledge”. People are superior to each other in ability. They know the truth of things by learning, and then they are wise. They do not know without asking. “Know Things by learning” hopes to bring you harvest through articles of technical dry goods, trend interpretation, character thinking and precipitation, but also hope to open your horizon and achieve a different you. Of course, if you have a good cognition or share, you are welcome to contribute by email ([email protected]).


Machine learning (ML) is rapidly being used to tackle cyber security as well as other technical fields, and there have been numerous examples of its use in defense and attack over the past year. While most of the articles deal with philosophical arguments (I recommend reading “The Truth about Machine Learning in Cyber Security” [1]), it seems that cybercriminals’ machine learning has been portrayed in ways that are not at all consistent with our imagination.


Nevertheless, the U.S. intelligence community is also very concerned about AI [2]. Recent findings suggest that cybercriminals are working on ways to use machine learning to make attacks more violent, faster and cheaper to execute.


The goal of this article is to systematize real-world information about possible implications for the deployment of malicious network machine learning. It is designed to help information security team members prepare for upcoming threats.


1. Tasks of cybercriminals:


All cybercriminals can be assisted by machine learning to complete relevant tasks, such as from initial information collection to system compromise, which can be divided into the following categories:


  • Information gathering – Prepare for attack.
  • Impersonation – an attempt to imitate.
  • Unauthorized access – circumventing restricted access to certain resources or user accounts.
  • Attacks – Perform actual attacks, such as malware or DDoS.
  • Automation – Automated development and post-development.


2. Machine learning for Information Collection:


Gathering information is the first step in a cyber attack, whether it’s an individual or a group of people. The better information you can gather, the better chance you have of a successful attack.


As for phishing or infecting preparations, hackers may use classification algorithms to describe potential victims as belonging to a group that can be attacked. Imagine that after collecting thousands of emails, you only send malware to people who are more likely to click on the link, marking it as accessible and making it less likely that the security team will get involved. There are a number of factors that could help here, but to take a simple example, you can distinguish the users who write about IT topics on social networks from those who focus on food and cats. As an attacker, I would choose the latter, because they have no idea what cyber attacks really are. These distinctions can be accomplished using a variety of clustering and classification methods ranging from K-means and random forests to neural networks.


With regard to the collection of information on targeted attacks, its mission is not to collect as many individual targets as possible, its mission is to obtain as much information as possible about the relevant infrastructure. The idea is to automate all checks, including the collection of information about network infrastructure. While existing tools such as network scanners and sniffers can analyze traditional networks, the new generation of NETWORKS based on SDN is too complex. This is where machine learning can help. A little-known but interesting concept is knowing Your Enemy attacks [3], allowing the configuration of SDN networks to hide intelligence gathering targets; This is a related example of applying machine learning to information gathering tasks. Hackers can gather information ranging from security tools and configuration of network virtualization parameters to general network policies such as QoS. By analyzing the rules from one network device and then extrapolating the conditions and types of rules for other networks, an attacker can infer sensitive information about the network configuration.


In the detection phase, an attacker tries to trigger the installation of traffic rules on a specific switch, and the specific characteristics of the detection traffic depend on the information of interest to the hacker.


In the next phase, the attacker analyzes the correlation between the probe traffic generated during the probe phase and the corresponding traffic rules installed. From this analysis, he or she can infer that network policies are being implemented for specific types of network traffic. For example, an attacker can implement a defense strategy by using a network scanning tool to filter network traffic during the detection phase. If you do this manually, it can take weeks to collect data, and you still need algorithms with pre-configured parameters, for example, it’s hard to determine how many specific packets you need because the amount depends on various factors. With the help of machine learning, hackers can completely automate this process.


These are two examples, but in general, all information gathering tasks that require a lot of time can also be automated. For example, DirBuster, a tool for scanning available directories and files, could be improved by adding a genetic algorithm, LSTM or GAN, to generate directory names that are more similar to existing directories.


3. Machine learning simulation attacks:


Cybercriminals use pseudonyms to attack victims in a variety of ways, much of which depends on the channel of communication. Attackers can convince victims to follow links to exploit or malware after sending an email or using social engineering. Therefore, even telephone calls are considered a means of impersonation.


E-spam is one of the safest areas to use machine learning, and here I expect ML to be one of the first to be applied by cybercriminals. Instead of manually generating spam, they “taught” a neural network to create spam that looked like real E-mail.


But when dealing with electronic spam, it’s hard to mimic the pattern of a person sending an email. But the problem is that if you email an employee to change their password or a company software administrator to download the update, it can’t be written in exactly the same way as an administrator. You won’t be able to copy styles unless you see a bunch of emails. Even so, the problem can be solved by phishing.


The biggest advantage of social media phishing over email phishing is openness and easy access to personal information. You can watch and understand a user’s behavior by reading his or her posts. This idea was demonstrated in a recent study called “Scientific Social Engineering Data” [4] – automated E2E spear phishing on Twitter. The study proposes SNAP_R, which is an automated tool that can significantly increase phishing attack activity. With it, traditional automatic phishing attacks can improve accuracy by about 5-14%, compared to 45% for manual phishing attacks. Their method was just right, accurate 30 percent of the time, and 66 percent of the time in some cases. They used markov models to generate tweets based on users’ previous tweets and compared the results to current neural networks, specifically LSTM. LSTM provides higher accuracy but requires more time to train.


In the new era of artificial intelligence, companies will not only create fake words, but also fake voices or videos. Lyrebird, a startup focused on media and video that mimics voice, showed it’s possible to make robots that sound exactly like you. As more data becomes available and networks evolve, hackers have more and more to gain, and the odds of success are naturally higher. Since we don’t know how Lyrebird works, hackers may not be able to use this service for their needs, but they can find more open platforms, such as Google’s WaveNet[5], that can do the same thing.


Notably, those hackers are now using generative Hostile networks (gans), a more advanced type of neural network.


In the next article, we will discuss how hackers may use machine learning to gain unauthorized access and carry out attacks.


The original address: know things by learning | AI era, the hackers are how to burnish their “sharp”? (a)


Appendix:

  • “The Truth about Machine Learning in Cybersecurity” [1]
  • The U.S. intelligence community is also very focused on AI [2]
  • A little-known but interesting concept is to Know Your Enemy attacks [3]
  • A study entitled “Scientific Data in Social Engineering” [4]
  • Google’s WaveNet [5]


In this article, you can pay attention to the netease yi Shield public account “yidun_163yun”.


Understand netease Cloud:

The official website of netease Cloud is www.163yun.com/

New user package: www.163yun.com/gift

Netease Cloud community: sq.163yun.com/