From top to bottom, the veil of PHP parsing large integers is gradually lifted

Problems encountered

Recently encountered a PHP large integer problem, the problem code looks like this

$shopId = 17978812896666957068;
var_dump($shopId);
Copy the code

The above code output, which converts $shopId to float and uses scientific notation, looks like this:

Float (1.7978812896667 e+19)

However, in the program, a complete number is needed as the parameter to search the data, so a complete number is needed. At that time, I thought it was just because the data was converted into scientific notation, so THE solution is to force it not to use scientific notation:

$shopId= number_format(17978812896666957068);
var_dump($shopId);
Copy the code

Then something strange happens. The output reads:

17978812896666957824

At that time, I did not look carefully. After comparing the first ten places, I did not look further, so I thought the problem was solved. When I looked for the data according to the ID, I found that the data could not be found.

The reason for the failure of number_format here will be explained later, the idea of converting the original data to a string was there, but it still didn’t work using the following method

$shopId= strval(17978812896666957068);
var_dump($shopId);

$shopId = 17978812896666957068 . '';
var_dump($shopId);
Copy the code

The output is going to be zero

Float (1.7978812896667 e+19)

In the end, only the following scenario is feasible:

$shopId = '17978812896666957068'; var_dump($shopId); // Output // string(20) "17978812896666957068"Copy the code

PHP is known to be an interpreted language, so it was a wild guess that PHP would convert numeric literal constants to float at compile time and use scientific notation to represent them. But just guessing is not enough to satisfy my curiosity, I want to see the actual implementation code to believe. Step by step, analyze and explore until you find the implementation behind it.

At first, I searched “PHP large integer parsing process” on the Internet, but I did not find the answer, so I had to track it down by myself. When you’re unfamiliar with PHP execution, the starting point is to debug step by step, and then

Sample code:

// test.php
$var = 17978812896666957068;
var_dump($var);
Copy the code

The tracking process

Opcode = opcode; ASSIGN opcode = opcode

Next we want to see where the ASSIGN is executed.

2, GDB debug 2-1, use list to see where breakpoints can be made

2-2, no clue yet, try at 1186 breakpoint

Sapi /cli/php_cli.c: sapi/cli/php_cli.c: php_cli.c

ASSIGN (do_cli) break do_cli (do_cli) Type n, press Enter, sapi/cli/php_cli.c at line 993 to get the program output:

Php_execute_script break php_execute_script break php_execute_script break php_execute_script break php_execute_script break php_execute_script break php_execute_script

Steps 2-6, continuing a breakpoint: break zend_execute_scripts repeat the previous steps, found in the zend/zend. C file to program the output to walk 1476 steps:

When I see the op_array in line 1475, I guess the value of op_array is already there, so I print the value of op_array:

We don’t see any useful information after printing, but it contains a lot of information, such as opcode’s handler: ZEND_ASSIGN_SPEC_CV_RETVAL_CV_CONST_RETVAL_UNUSED_HANDLER, but didn’t notice it at the time because I just wanted to see how op_array was assigned and what the steps were doing and ignored this important information. And then we’re going to come back to this handler.

2-7, start from the breakpoint 2-5, let the program run step by step, see op_array assignment as follows:

Zend_compile_file is assigned to op_array, so break zend_compile_file, zend_compile_file is not defined, Zend_compile_file is traced to compile_file by the source utility, so break zend_compile

Zend/zend_language_scanner. L file breakpoint, step by step, see this line pass_two(op_array), guess may have a value here, so print look at:

The result is the same as before, except that there is an opcodes value. Print it

Opcode = 38, 38 means assignment

(op_array->opcodes) (op_array->opcodes) (op_array->opcodes)

CG(zend_lineno) = last_lineno; Opcode = 38

CG(zend_lineno) = last_lineno; Is a macro, so no clue, close to giving up state…

So I got to know the data structure of OpCode first, and found the chapter of OpCode processing function lookup in the In-depth Understanding of THE PHP kernel book, which gave me some ideas to continue.

Zend_vm_get_opcode_handler () : zend_vm_get_opcode_handler() : zend_vm_get_opcode_handler

In fact, opCode processing functions have the following naming rules

ZEND_[opCode]_SPEC_(Variable type 1)_(Variable type 2)_HANDLERCopy the code

From the previous debug print, we can see a handler value at 2-6:

Is ZEND_ASSIGN_SPEC_CV_CONST_RETVAL_UNUSED_HANDLER,

Find the definition of the function as follows:

As you can see, when opcode operates, the value is taken from EX_CONSTANT, so expand this macro by definition, and that is

opline->op2->execute_data->literals
Copy the code

Here are two messages: Op2 ->execute_data->literals; op2->execute_data->literals; You can print it out and verify it

The print result is as follows:

Conjecture verification is correct, but did not see the real conversion place, or do not give up, continue to find PHP Zend bottom to do the compilation of the logic code.

Reference open sourceMaking project, the PHP compilation stage is shown as follows:

The best guess is to convert at the ZendParse and zend_compile_top_stMT stages, because all these stages do is convert PHP code into an Opcode array.

I did a search for PHP parses, and one of them talked about parsing integers, so I found out where PHP actually converts large integers:

<ST_IN_SCRIPTING>{LNUM} { char *end; if (yyleng < MAX_LENGTH_OF_LONG - 1) { /* Won't overflow */ errno = 0; ZVAL_LONG(zendlval, ZEND_STRTOL(yytext, &end, 0)); /* This isn't an assert, we need to ensure 019 isn't valid octal * Because the lexing itself doesn't do that for us */ if (end ! = yytext + yyleng) { zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0); ZVAL_UNDEF(zendlval); RETURN_TOKEN(T_LNUMBER); } } else { errno = 0; ZVAL_LONG(zendlval, ZEND_STRTOL(yytext, &end, 0)); if (errno == ERANGE) { /* Overflow */ errno = 0; if (yytext[0] == '0') { /* octal overflow */ ZVAL_DOUBLE(zendlval, zend_oct_strtod(yytext, (const char **)&end)); } else { ZVAL_DOUBLE(zendlval, zend_strtod(yytext, (const char **)&end)); } /* Also not an assert for the same reason */ if (end ! = yytext + yyleng) { zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0); ZVAL_UNDEF(zendlval); RETURN_TOKEN(T_DNUMBER); } RETURN_TOKEN(T_DNUMBER); } /* Also not an assert for the same reason */ if (end ! = yytext + yyleng) { zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0); ZVAL_UNDEF(zendlval); RETURN_TOKEN(T_DNUMBER); } } ZEND_ASSERT(! errno); RETURN_TOKEN(T_LNUMBER); }Copy the code

As you can see, the Zend engine performs a lexical analysis of the PHP code to determine whether the number is likely to overflow. If it does, it tries to save the number as LONG. If it overflows, it uses zend_strtod to convert it to double. Then save it in a zval structure of type double.

Number_format Failure cause

Through GDB debugging, the number_format function is traced to the php_conv_fp function, which is eventually called at the bottom of PHP to convert numbers:

The function prototype is as follows:

PHPAPI char * php_conv_fp(register char format, register double num, boolean_e add_dp, int precision, char dec_point, bool_int * is_negative, char *buf, size_t *len);
Copy the code

Num is received as a double, so if a string number is passed in, the number_format function also converts it to a double in the php_conf_fp function. The final output of num of type double is 17978812896666957824, because the accuracy of scientific counting is lost, and the original value cannot be restored when it is converted to double. Verify in C language:

Double local_dval e+19 = 1.7978812896666958; printf("%f\n", local_dval);Copy the code

The output is going to be

17978812896666957824.000000

So, it’s not a PHP bug, it is what it is.

Solutions to such problems

Bigint /varchar = bigint/varchar = bigint/varchar = bigint/varchar For assignment, in PHP, if you have a large integer to assign to, do not try to use the integer type. For example, do not use the following type:

$var = 17978812896666957068;
Copy the code

With this:

$var = '17978812896666957068';
Copy the code

For number_format, the number of numbers it can parse without losing precision on 64-bit operating systems, the recommended maximum is 9007199254740991. Here’s what you should know about PHP floating point numbers

conclusion

The cause of the problem doesn’t look too important, although it for business development in fact also nothing important to learn to use, won’t make your development ability “duang” up to a few level, but to understand the PHP for the processing of large integer, is himself a little accumulation of knowledge framework, know why, just be more careful in the daily development, For example, from the perspective of storage and using assignments. It’s good to know that detail.

Looking back on the whole process of solving the problem, I feel a little long. It took about 4 hours to locate the problem. Because I only had a rudimentary understanding of the PHP kernel and didn’t have a systematic way to sort out the whole process, I didn’t know where to start at the beginning and began to debug according to my own guess. In retrospect, you should have learned the process of compiling and executing PHP first, and then guessed the exact steps.

Original article, writing is limited, talent and learning shallow, if the article is not straight, hope to inform.

More exciting content, please pay attention to the individual public number.

From top to bottom, the veil of PHP parsing large integers is gradually lifted

Problems encountered

The tracking process

Number_format Failure cause

Solutions to such problems

conclusion

Related Posts

Go language to achieve multi-person chat room

When we read or write a Socket, what exactly are we reading or writing?

APIGateway profile