Substitution and decomposition of character sequences in Java

Use the String class

A String object calls the public String replaceAll(String regex,String replacement) method, which returns a new String object whose character sequence is replaced by a replacement, All sequences of subcharacters matching the parameter regex are replaced with sequences of characters derived from the sequences specified by the parameter replacement. Such as:

String s1="123hello456";
String s2=s1.replaceAll("\\d+"."Hello."); //"\\d+" is a regular expression representing one or more arbitrary numbers between 0 and 9
System.out.println(s1);// Print 123Hello456 not changed
System.out.println(s2);// Prints: Hello. Hello, hello.
Copy the code

Such as:

String regex="-? [0-9] [0-9] * [.]? [0-9] *";
String s1="999 hello. -123.459804 is off tomorrow.";
String s2=s1.replaceAll(regex,"");
System.out.println("Remove"+s1+The sequence of characters after the digits in "is :"+s2);
// After removing the digits in 999 hello, -123.459804 holiday tomorrow, the resulting sequence is: Hello, holiday tomorrow
Copy the code

In fact, the String class provides a useful method:

public String[] split(String regex)
Copy the code

When a String calls this method, it uses the regular expression regex specified by the argument as the delimiter to decompose the words in the String character sequence and store the decomposed words in the String array. Such as:

// Requirement: For a sequence of characters, decompose all numeric characters into words.
String s1="On the night of September 18, 1931, Japan launched its war of aggression against China. Please remember this date!";
String regex="\\D+";
String s2[]=s1.split(regex);
for(String s:s2)
System.out.println(s);S2.length ()=3;
Copy the code

In particular, the split method considers the left and right sides of the split mark to be words. The additional rule is that if the word on the left is a sequence of characters without any characters, that is, empty, then the sequence of characters still counts as a word, but the word on the right must be a sequence of characters. Such as:

String s1="February 18, A.D. 2022.";
String regex="\\D+";
String s2[]=s1.split(regex);
System.out.println(s2.length());// Error :Method call Expected
for(String s:s2)
System.out.println(s);
//s2[0]= s2[1]=2022 s2[2]=02 s2[3]=18 s1[0] is an empty string, nothing is displayed.
// so s2 should be 4 instead of 3, and the extra empty string is "AD" with a default word to the left. The content is empty.
Copy the code

Second, use the StringTokenizer class

1. Unlike the split() method, StringTokenizer objects do not use regular expressions for delimiters. 2. When parsing a character sequence and breaking it down into words that can be used independently, you can use the StringTokenizer class in the java.util package, which calls its object a parser of character sequences and has two constructors. Constructor 1: StringTokenizer(String s) : Construct a StringTokenizer object, such as Fenxi. Fenxi uses the default delimiter (space, newline, carriage return, Tab, feed (\f)) to break out the words in the sequence of characters for parameter S, that is, the words become the data for analysis. Constructor 2: StringTokenizer(String s,String delim) : Construct a StringTokenizer object, such as Fenxi. Fenxi uses an arbitrary permutation of the characters in the character sequence of the parameter delim as a separator mark to decompose the words in the character sequence of the parameter S into fenxi data. Note: Any arrangement of separator tags is still a separator tag. 3. Fenxi can call the String nextToken() method to retrieve each word in Fenxi, and each time nextToken() returns a word, Fenxi will automatically delete that word. Fenxi can call the Boolean hasMoreTokens() method to return a Boolean. As long as there are words in fenxi, the method returns true, otherwise false. 5. Fenxi can call countToken() to return the number of words currently in Fenxi. Example 1:

String s="we are stud,ents";
StringTokenizer fenxi=new StringTokenizer(s,",");// Use any combination of Spaces and commas as delimiters
int number=fenxi.countToken();
while(fenxi.hasMoreTokens()){
String str=fenxi.nextToken();
System.out.println(str);
System.out.println("Left"+fenxi.countToken()+"One word");
}
System.out.println("S common words:"+number+"个");
// Output result:We have left3One word is left2One word stud is left1One word ents is left0Words in common:4aCopy the code

Example 2:

String s="Local call fee: 28.39 yuan, long distance call fee: 49.15 yuan, Internet access fee: 352 yuan";
String delim="[^ 0-9.] +";// Both non-numeric and. Sequences match delim
s=s.replaceAll(delim,"#");
StringTokenizer fenxi=new StringTokenizer(s,"#");
double totalMoney=0;
while(fenxi.hasMoreTokens()){
double money=Double.parseDouble(fenxi.nextToken());
System.out.println(money);
totalMoney+=money;
}
System.out.println("Total cost:"+totalMoney+"Yuan");
// Output result:
28.39
49.15
352.0Total cost:429.53999999999996yuanCopy the code

Use the Scanner class

To create a Scanner object, we need to pass a String object to the constructed Scanner object, for example, for:

String s="telephone cost 876 dollar.Computer cost 2398.89 dollar.";
Copy the code

To understand the numeric words in the character sequence separating S, we can construct a Scanner object as follows:

Scanner scanner=new Scanner(s);
Copy the code

The scanner defaults to using Spaces as delimiters to parse words in the sequence of s characters. You can also have the scanner object call a method:

UseDelimiter (regular expression);Copy the code

The regular expression is used as the delimiter, that is, the Scanner object parses the character sequence of S and uses the sequence of characters that match the regular expression as the delimiter. The Scanner object parses character sequences as follows:

The scanner object calls the next() method to return the words in the sequence of s characters in turn, and the scanner object calls hasNext() to return false if the last word has been returned by the next() method, and true otherwise.
For numeric words in the character sequence of S, for example, 12.34, etc., scanner can call nextInt() or nextDouble() instead of next(). The scanner can call nextInt() or nextDouble() to convert numeric words to int or double data.
If the word is not a numeric word and scanner calls the nextInt() or nextDouble() methods, an InputMismatchException occurs, and the next() method can be called to return the non-numeric word while handling the exception.

Specific examples:

String cost="Local call fee: 28.39 yuan, long distance call fee: 49.15 yuan, Internet access fee: 352 yuan";
Scanner scanner=new Scanner(cost);
scanner.useDelimiter("[^ 0-9.] +");
double sum=0;
while(scanner.hasNext()){
try{
	double price=scanner.nextDouble();
	sum+=price;
	System.out.println(price);
	}catch(InputMismatchException e){
	String s=scanner.next();
	}
}
System.out.println("Total cost:"+sum+"Yuan");
// Output result:
28.39
49.15
352.0Total cost:429.53999999999996yuanCopy the code

Comparison: 1. Both the StringTokenizer and Scanner classes can be used to decompose words in character sequences, but they differ in idea. 2. The StringTokenizer class puts all the decomposed words into the entities of the StringTokenizer object, so that the StringTokenizer object can get the words quickly, i.e. the entities of the StringTokenizer object consume more memory. Speed up, the equivalent of memorizing words in your brain). 3. Unlike StringTokenizer, Scanner only stores separators that tell you how to get words, so Scanner is relatively slow to get words, but Scanner saves memory (slowing down and saving space, like putting words in a dictionary, The brain only remembers the rules for looking up a dictionary).

4. Use Pattern and Matcher classes

The steps for using Pattern and Matcher classes are as follows: 1. Use the regular expression regex as a parameter to get an instance of Pattern called “Pattern”. For example,

String regex="-? [0-9] [0-9] * [.]? [0-9] *";
Pattern pattern=Pattern.compile(regex);
Copy the code

Pattern calls the matcher(CharSequence s) method to return a matcher object, called the match object. The parameter s is the String object that matcher retrieves.

Matcher matcher=pattern.matcher(s);
Copy the code

3. After these two steps, the matching object Matcher can call various methods to retrieve S. Public Boolean find() : find the next sequence of s characters that matches regex. Return true if successful, false otherwise. The first time Matcher calls this method, it looks for the first subsequence in S that matches the Regex. If the find method returns true, the next time Matcher calls find, it looks for the next subsequence that matches the Regex, starting with the last successful subsequence. In addition, when the find method returns true, Matcher can call the start() and end() methods to get the start and end positions of the subcharacter sequence in S. When find returns true, Matcher calls group() to return the sequence of subcharacters matching the regex found by find. (2) Public Boolean matches() : Matcher calls this method to check whether s matches regex exactly. (3) Public Boolean lookingAt() : Matcher calls this method to determine if there is a subsequence matching the regex from the beginning of the sequence of s characters. (4)public Boolean find(int start) : Matcher calls this method to determine if there is a regex-matched subsequence of characters starting at the position specified by start. When start=0, this method has the same function as lookingAt(). (3)public String replacement (String replacement) : Matcher calls this method to return a String whose character sequence is obtained by replacing all sequences of subcharacters in the character sequence of S that match the pattern regex with the character sequence specified by the parameter replacement (note that S itself does not change). (6)public String replacement (String replacement) : Matcher calls this method to return a String whose character sequence is obtained by replacing the first sequence of subcharacters in the character sequence of S that match the pattern regex with the character sequence specified by the parameter replacement (note that S itself does not change). Public String group() : returns a String whose sequence of characters is the regex-matched subsequence found by find in the sequence of s characters. Specific examples:

String regex="-? [0-9] [0-9] * [.]? [0-9] *";// A regular expression that matches numbers, integers, or floating point numbers
Pattern pattern=Pattern.compile(regex);// Initializes the schema object
String s="Local call fee: 28.39 yuan, long distance call fee: 49.15 yuan, Internet access fee: 352 yuan";
Matcher matcher=pattern.matcher(s);// Initializes the match object to retrieve s
double sum=0;
while(matcher.find()){
String str=matcher.group();
sum+=Double.parseDouble(str);
System.out.println("From"+matcher.start()+"To"+matcher.end()+"Matched subsequence:");
System.out.println(str);
}
System.out.println("Total cost:"+sum+"Yuan");
String weatherForecast[]={"Beijing: -9 to 7 degrees"."Guangzhou: 10 to 21 degrees"."Harbin: -29 degrees to -7 degrees"};// Store the temperature of three places
double averTemperture[]=new double[weatherForecast.length];// Store the average temperature of three places
for(int i=0; i<weatherForecast.length; i++){ Matcher matcher1=pattern.matcher(weatherForecast[i]);// Initializes the matching object with the same pattern
double sum1=0;
int count=0;
while(matcher1.find()){
count++;// The number of temperatures in a place is counted several times
sum1+=Double.parseDouble(matcher1.group()); Sum1 is the sum of the highest and lowest temperatures in a place
}
averTemperture[i]=sum1/count;// For loop once to calculate the average temperature of a place
}
System.out.println(Average temperature in the three places:+Arrays.toString(averTemperture));
// The output is:
从4to9Matched subsequence:28.39
从16to21Matched subsequence:49.15
从27to30Matched subsequence:352Total cost:429.53999999999996The average temperature of the three places in The Yuan Dynasty: [-1.0.15.5, -18.0]
Copy the code

Substitution and decomposition of character sequences in Java

Use the String class

Second, use the StringTokenizer class

Use the Scanner class

4. Use Pattern and Matcher classes

Related Posts

Error handling in Golang

Windows 10 64-bit system Talib installation failure troubleshooting

GitHub is a great Linux open source project.