As a programmer, can not write regular expression always feel less what, do not require you to play the regular expression out of flowers, but at least to the regular expression of the hand, I just graduated for the regular is also confused, but learn it is just a tutorial

I’m not going to waste bandwidth repeating the rules of regex, but I’m going to start with the facts and tell you why

Hexadecimal color

Follow the rules

  1. In order to#At the beginning
  2. It is followed by six or three characters at the end, which can bea-fLowercase letters,A-FUppercase letters, numbers

The first sentence can be written as /^#/; [a-fa-f0-9]{6} [A-fa-f0-9]{6} [A-fa-f0-9]{3} [A-fa-f0-9]{6} [A-fa-f0-9]{3} [A-fa-f0-9]{6} To connect with a one or (|) symbol: [[a – fA – F0-9] {6} | [a – fA – F0-9] {3}), the last ending with a $

All together it is / ^ # ([a – fA – F0-9] {6} | [a – fA – F0-9] {3}) $/


The target is to match the protocol, domain name, port number, path, and search

  1. agreement

The following protocols are valid:,, and // The common link between these three protocols is that there must be a // string, which can be preceded by HTTPS: or HTTP: or there can be no string at all by following the https:// rule: ^ HTTPS :\/\/, in which the S character may or may not be available, so use? Modification: ^ HTTPS? :\/\/, again because of HTTPS? : Probably not, so this string is also used? Modification: ^ (HTTPS? :)? \ \ /

  1. The domain name

Domain names can be preceded by // and matched from // to the end, as long as there is no port representing:, or? [^?:/]+ = [^?:/]+

  1. The port numberport

Port: :\d+ port: :\d+ port: :\d+ Modification: (: \ d +)?

  1. path

Always start with a slash, as long as you don’t get a search? , then all belong to path: \/[^? *, because there may be no path, use? Modification: \ / / ^? *)? 5. At the beginning, all subsequent characters belong to search (excluding hash routing) : \? (.*), because there may be no search, use? Modification: (\? . *)?

The result is: /^((HTTPS?)) :)? \/\/([^?:/]+)(:(\d+))? (/ / / ^? *)? (\? (. *))? /


On zhihu seen a mailbox before regular, which is the most accord with a standard of regular expressions, the regular volume as dozens of KB, anyhow is very long, now I can’t find it, here only focus on the commonly used E-mail format, rule: name allows Chinese characters, letters, Numbers, underscores, marking, domain name can be Numbers, letters, underscore, hyphen

The range of Chinese characters is [\ u4e00-u9FA5], the range of letters is [A-za-Z], and the range of numbers is [0-9], which combine to form the name of the mailbox ^[A-za-Z0-9-_ \ u4e00-U9FA5]+

Domain name is [a zA – Z0 – _ – 9] +, domain name suffix can be those domain names (\. [a zA – Z0 – _ – 9] +) +

Combined above is ^ [A Za – z0-9 – _ \ u4e00 – \ u9fa5] + @ [A Za – z0 – _ – 9] + (\. [A Za – z0 – _ – 9] +) + $

Mobile phone no.

The number segment may be incremented, so it is not recommended to restrict it too much in real situations

This re is written according to the following rules:

  1. 11 digits, with digits1At the beginning, that is,^ 1
  2. And then the next number if it’s3, then3It can be followed by an arbitrary number, namely3\d; If it is4, then4You can follow it with one5-9The number between, i.e4 [5-9]; If it is5, then5You can follow it with one0-3or5-9The number between, i.e5 [0-35-9]; If it is6, then6You can follow2,5,6,7One of the numbers, namely6 [2567]; If it is7, then7You can follow it with one0 to 8The number between, i.e7 [0-8]; If it is8, then8It can be followed by an arbitrary number, namely8\d; If it is9, then9You can follow it with one0-3or5-9The number between, i.e9 [0-35-9]
  3. The last8Bits can be any number

The above three steps add up to is / ^ 1 (3 \ | d 4 [5-9] 5 [9] 0 to 35 – | | 6 [2567] 7 [0 to 8] 8 \ | | d | 9 [0-35-9]) \ d {8} $/

Digital/monetary amount

  1. Support negative

The minus sign is a minus minus sign, and it has to be the first one, which is to the minus minus, plus one, right? It’s used to indicate that this minus sign can or can’t be, which is ^-? 2. Support the micrometer separated (no it doesn’t matter) if there is a micrometer, the back of the micrometer level must follow three digits (otherwise the QianFenHao should not be added), micrometer in front of the at least one and up to three digits, then can be written as \ d {1, 3}, \ d {3}, trim, micrometer level in front of the Numbers actually don’t have to limit, \d+(,\d{3})* 3 \d+(,\d{3})* 3 \d{1,3} If there is a decimal, then the decimal point is \., followed by up to two digits \d{1,2}, which may or may not have a decimal, so the whole thing needs to be added. Symbol, i.e. (\.\d{1,2})?

Final rule /^-? \ d + (\ d {3}) * (\. \ d {1, 2})? $/

Id number

Here we only look at the 2nd generation ID card. The last of the 18 digits is the check bit, which may be a number or character X

  1. The first digit in the closed interval [1-9], followed by 5 arbitrary digits, is written as ^[1-9]\d{5}

  2. Then followed by four Numbers represent the year of the (YYYY), because there are people of id card, as is the most late 19th century in the 21st century, so the first two of the four Numbers only see is 18, 19, 20, 18 19 | | (20), after the two can be any number, namely \ d {2}

  3. Again (MM), followed by two digits is month month can only be closed interval [1-12], so it can be written as (01 | 02 | 3 | 4 | $| | 07 | | 08 09 | | | 10 11 12), before the beginning of the nine is 0, the second is [1-9] in the Numbers, So the simplified (0/1-9 10 11 | | | 12)

  4. Followed by two digits is again (DD), the range is [01-31], it can be 31 number list, of course also can streamline, as [00-09], [29] 10 -, 30 – [31] combination, namely (0 [1-9] | [1-2] \ | | 30 d 31)

  5. Immediately following the three digits is the sequential code, i.e. \d{3}

  6. The last bit is the check code, which can be either a number or an X, i.e. [\dX]

Final rules / ^ 1-9] [\ d {5} (18 19 20) | | \ d {2} (0/1-9 10 11 | | | 12) (0 [1-9] | [1-2] \ | | 30 d 31) \ d {3} [\ dX] $/

Password checking

Contains at least six characters, including at least one uppercase letter, one lowercase letter, one digit, and one special character

For at least 1 uppercase letters This rule, the position of capital letters is not fixed, so long as there are, if this is the only rule, regular can be written as ^ [a-z] + \ \ S * S * $\ S match any whitespace, this rule represents the capital letters in front and behind can follow any white space characters (including zero)

In addition, it needs to meet at least one lowercase letter, one number, one special character, and at least six characters. You can write these rules into a single re, and then the target string matches these five re’s one by one. As long as all matches can be correct, it is correct to write js code:

function match(s: string) {
  return /^\S*[A-Z]+\S*$/.test(s)
    && /^\S*[a-z]+\S*$/.test(s)
    && /^\S*[0-9]+\S*$/.test(s)
    && /^\S*[!@#$%^&*?] +\S*$/.test(s)
    && /^\S*\S{6,}\S*$/.test(s)
Copy the code

If you want to implement these checks in a single re, you can, too, by using zero-width positive prediction prior assertion ((? =exp)), which matches the position before exp

/^\S*(? =\S{6,})(? =\S*\d)(? =\S*[A-Z])(? =\S*[a-z])(? =\S*[!@#$%^&*?] )\S*$/

This re has the same meaning of \S* before and after it. In the middle, it extracts the personality part of the 5 rules and then passes. = Put it all together, all the rules? = Can you get rid of it? No, because if I take it out, first of all there’s a conflict of order, like the rule above, if I take all of the? If = is removed, the number must precede the uppercase letter, the uppercase letter must precede the lowercase letter, and the special letter must precede the lowercase letter.

You could say? = Ignoring others while matching? =, I only care if my front can match successfully, there are multiple? Phi is equal to phi, and this is what? = are all concerned with themselves and nothing else, but the result of the whole re is all? = match the result of the union, calculation logic and js is similar to the above

Extract HTML tag data

The tag string to extract looks like

However, regular words cannot be completed and js is needed

First, extract the attributes of the tag

This tag contains the tag start symbol, tag, attribute string, and tag end symbol

The start symbol is <, the tag name follows the start symbol, and as long as no whitespace is encountered, the tag name is <\w+\s*

[^>]*\s*> = /<\w+\s*[^>]*\s*>/ Namely / < \ w + \ s * ([^ >] *) > \ s * /

const str = `<div class="header-box" name="header">`
const mt = str.match(/<\w+\s*([^>]*)\s*>/)
Class ="header-box" name="header"
const properties = mt[1]
Copy the code

Get class = “header – box” name = “header”, again carries on the processing, observe the law, every attribute must exist between the key/value pair of whitespace, but cannot directly by whitespace to break up, because the attribute value can be exist whitespace, such as class = “b”

For example,

and hr /> are both valid. Therefore, the following parameters must be compatible: /<\w+\s*([^>]*)\s*\/? >/

But the attribute name is determinable. It can be left of = without whitespace. Again, reverse selection, matching from left to right, is neither the content of = nor the whitespace, i.e. [^\s=]+

It is not certain whether the attribute value contains whitespace, but it is certain that the attribute value must be enclosed by quotation marks, so simply take all quotation marks to the right of =, =”.*?”

There is a problem, however, that the quotation marks can be both single and double, i.e. =”.*?” And = “. *? If the first quote begins with a double quote, then the second quote must be a double quote, as well as a single antisense quote. The catch rule is used here.*? \1, \1 means that the matched content is the same as the first capture group, which is [“‘]. If the first capture group matches the double quotes, then \1 represents the double quotes, otherwise it represents the single quotes

[^\s=]+=([“‘]).*? \ 1

/[^\s=]+(=([“‘]).*? \ 2)? /, because want to capture the attributes and attribute values, so give attributes and attribute values add parentheses: / (=] [^ \ s +) (= ([‘]) (. *?)” \ 3)? /

I can just keep going with the top representation

const str = `<div class="header-box" name="header">`
const mt = str.match(/<\w+\s*([^>]*)\s*>/)
Class ="header-box" name="header"
const properties = mt[1]
const mt1 = properties.match(/([^\s=]+)(=(["'])(.*?) \ 3)? /g)
const obj = {}
if (mt1) {
  mt1.forEach(p= > {
    const kv = p.trim().split('=')
    obj[kv[0].trim()] = kv[1].trim().slice(1, -1)})}// obj => { class: 'header-box', name: 'header' }
Copy the code


To learn the most by the regular way is to practice, usually can solve problems with regular, just try to regular solution, maybe you can’t write out at the beginning, but can go online and see what other people write, wrote on his own again, write more naturally, no tricks, just nothing more than the regular rules of master

The last

Bytedance – Live Realisation and recruitment!

No less than 10 HC (cheat is a puppy), Beijing, Shanghai, Hangzhou can be, and is urgent recruit, just a programmer of that kind

Internship is accepted without limitation

It doesn’t matter if you have sent bytes to other departments before (failing the interview does not necessarily mean your ability is not good, it may be that your eye edge is not enough), you can continue to meet my department (in case you see the right eye, after all, our department is really short of people), you can send your resume to my email [email protected], all the people who send my resume, Make sure to follow up and give feedback on the progress of the interview, always answer questions (as long as it doesn’t violate company policy), and avoid the bad experience of having your resume tossed out of the window

Save the children! Come and share my needs!

The team introduced live streaming realization and Qianchuan, was responsible for optimizing live streaming ads and short video e-commerce ads for Bytedance’s traffic in China, and was responsible for platform construction, algorithm optimization, and implementation of advertising products and operation strategies for Giant Qianchuan. The primary goal is to further increase the commercial revenue in China by leveraging byte’s powerful algorithmic engineering capabilities and giving full play to the natural advantages of the live streaming genre and e-commerce closed-loop. The life service business relies on Douyin, Douyin Speed Edition and other platforms to promote the connection between users and local services; In the past year, the life service business has created a new video planting and transaction experience, enabling more users to discover offline good places through Douyin, and helping local merchants expand into new business fronts. We look forward to your joining us. We hope you will be the next one to serve millions of merchants, influence hundreds of millions of consumers and lead the marketing revolution!

Team advantage

  1. Technology: as the ultimate guide to business, even as a research and development, will also be able to contact a line of customers, through technical solutions to customer problems, technical solution related to recall in the advertising system, row, row, bid, sorting, and many other links, to an in-depth understanding of advertisements each link the internals of the system.
  2. Growth: Byte e-commerce GMV is still improving at a high speed. When meeting purchase demands, short video and live broadcast have subversive advantages, and there is a great space for business growth.
  3. Opportunities: The buying experience of byte e-commerce is more diversified, including commodities, video, live stream, talent, fan relationship, live interaction, etc. Compared with traditional shelving e-commerce, the scope is larger and the development opportunities for individuals are more.