Now there is a requirement: the user table has nearly 2 million pieces of data, and the query needs to be sorted from A to Z by the Chinese pinyin of the user name. There are two solutions: 1. Use the CONVERT() function of the database to CONVERT the query, and sort by the first letter of pinyin; 2. Add a new spelling field (spell_name), and the user’s Name will be inserted into the database when the user registers. I adopted the second after balance, because users will continue to increase, the database used to own function will slow down the query speed, the index will be failure, if use the second add a pinyin fields, pinyin of the original old data field is involved an initialization, data quantity is more, using multithreading, the record here.

1. Use JPinyin and Emoji-Java to convert Chinese characters into pinyin

Introducing related JARS

<! <groupId> <artifactId>jpinyin</artifactId> </artifactId> The < version > 1.0 < / version > < / dependency > <! </groupId> <artifactId> </artifactId> </artifactId> </artifactId> The < version > 4.0.0 < / version > < / dependency >Copy the code

Emoji processing tools

Public Class EmojiDealUtil extends EmojiParser {/** * gets a non-emoticon string * @param input * @return
     */
    public static String getNonEmojiString(String input) {
        int prev = 0;
        StringBuilder sb = new StringBuilder();
        List<UnicodeCandidate> replacements = getUnicodeCandidates(input);
        for (UnicodeCandidate candidate : replacements) {
            sb.append(input.substring(prev, candidate.getEmojiStartIndex()));
            prev = candidate.getFitzpatrickEndIndex();
        }
        returnsb.append(input.substring(prev)).toString(); } /** * get expression string * @param input * @return
     */
    public static String getEmojiUnicodeString(String input){
        EmojiTransformer  transformer = new EmojiTransformer() {
            public String transform(UnicodeCandidate unicodeCandidate) {
                returnunicodeCandidate.getEmoji().getHtmlHexadecimal(); }}; StringBuilder sb = new StringBuilder(); List<UnicodeCandidate> replacements = getUnicodeCandidates(input);for (UnicodeCandidate candidate : replacements) {
            sb.append(transformer.transform(candidate));
        }
        return  parseToUnicode(sb.toString());
    }

    public static String getUnicode(String source){
        String returnUniCode=null;
        String uniCodeTemp=null;
        for(int i=0; i<source.length(); i++){ uniCodeTemp ="\\u"+Integer.toHexString((int)source.charAt(i));
            returnUniCode=returnUniCode==null? uniCodeTemp:returnUniCode+uniCodeTemp;
        }
        return returnUniCode; }}Copy the code

Conversion of Chinese characters into pinyin tools

public class ChineseToPinYinUtil {/** * converts to untoned pinyin string * @param pinYinStr Chinese character * @ to be convertedreturnPublic static String changeToToneP */inYin(String pinYinStr) {
        String tempStr = null;
        try {
            tempStr = PinyinHelper.convertToPinyinString(pinYinStr, "", PinyinFormat.WITHOUT_TONE);
        } catch (Exception e) {
            e.printStackTrace();
        }
        returntempStr; }}Copy the code

Pinyin conversion is not as this text point, there are many libraries and code to solve the Internet search, mainly said the following multi-threaded processing.

2. Use multiple threads to query and update the database

Create an initial capacity of 5 thread pool, each time each thread query 500 records and processing, add a synchronization lock, allocated to each thread it queries the start of the record, query out of the record after calling the above Chinese characters into pinyin method processing, after the end of the update to the database.

2.1 Methods of receiving requests

Private static final Integer LIMIT = 500; private static final Integer LIMIT = 500; Private static final Integer THREAD_NUM = 5; ThreadPoolExecutor pool = new ThreadPoolExecutor(THREAD_NUM, THREAD_NUM * 2, 0, TimeUnit. SECONDS, new LinkedBlockingQueue < > (100)); @GetMapping("/chineseToSpellName")
    public void execute(){int count = 0; logger.info("trans start"); / / query to the total number of records int total = userService. GetTotalCount2 (); logger.info("total num:{}",total);
        int num = total/(LIMIT*THREAD_NUM) + 1;
        logger.info("Number of rounds to go through: {}",num);
        for(int j=0; j<num; J++){// start THREAD_NUM thread parallel query update library, lockfor(int i=0; i<THREAD_NUM; i++){ synchronized(ChineseToPinYinController.class){ int start = count*LIMIT; count++; pool.submit(new TransTask(start,LIMIT)); }}}}Copy the code

2.2 Multi-threaded business method

    class TransTask implements Runnable{
        int start;
        int limit;
        public TransTask(int start, int limit) {
            this.start = start;
            this.limit = limit;
        }

        @Override
        public void runList<User> userList = userService.getList2(start,limit);
            logger.info("Update record start position: {}--{}",start,limit);
            if(! CollectionUtils.isEmpty(userList)){ userList.stream().forEach(u -> { u.setSpellName(ChineseToPinYinUtil.changeToTonePinYin(EmojiDealUtil.getNonEmojiString(u.getName())).trim()); userService.updateUser2(u); }); }}}Copy the code

3. Do not use the traditional limit page to query data

Userservice. getList2(start,num) select * from table limit start,num (select * from table limit start,num) Select * from user limit 0,20). This query is not a problem when the data volume is small, but when the data volume is large, the query will be very slow, because it does not go through the index, but the full table scan, the larger the data volume, the slower the later. Select * from table where id>start limit NUM select * from table where id>start limit NUM Select * from user where id>1000 LIMIT 20) select * from user where id>1000 LIMIT 20 This type of query does not slow down significantly to millions of data volumes because it is a primary key index rather than a full table scan.

4. Optimize postscript

After the code is written in actual use, data initialization to more than 700,000, the number of database connections open too much, the database inside all occupied full, consider again optimization, using a piecewise way, passed two parameters, initialization records and initialization of the number. For example, 0-100,000 records are initialized for the first time, and 100,000 to 200,000 records are initialized for the second time, and so on. The advantage of this is that manual intervention is possible. Even if an error occurs, only this part of the interval data can be run, and the code is as follows:

Private static final Integer LIMIT = 500; private static final Integer LIMIT = 500; Private static final Integer THREAD_NUM = 5; ThreadPoolExecutor pool = new ThreadPoolExecutor(THREAD_NUM,Integer.MAX_VALUE,0,TimeUnit.SECONDS, new ArrayBlockingQueue<>(10)); @GetMapping("/chineseToSpellName")
    public void execute(@RequestParam("startId") Integer startId,@RequestParam("total") Integer total){
        logger.info("trans start");
        int num = total/(LIMIT*THREAD_NUM) + 1;
        logger.info("Number of rounds to go through: {}",num);
        for(int j=0; j<num; J++){// start THREAD_NUM thread parallel query update library, lockfor(int i=0; i<THREAD_NUM; i++){ synchronized(ChineseToPinYinController.class){
                    pool.submit(new TransTask(startId,LIMIT));
                    startId+=LIMIT;
                }
            }
        }
    }

    class TransTask implements Runnable{
        int start;
        int limit;
        public TransTask(int start, int limit) {
            this.start = start;
            this.limit = limit;
        }

        @Override
        public void runList<User> userList = userService.getList2(start,limit);
            logger.info("Update record start position: {}--{}",start,limit);
            if(! CollectionUtils.isEmpty(userList)){ userList.stream().forEach(u -> { u.setSpellName(ChineseToPinYinUtil.changeToTonePinYin(EmojiDealUtil.getNonEmojiString(u.getName())).trim()); userService.updateUser2(u); }); }}}Copy the code