background

At the beginning of the story, I encountered a need at work to solve the problem of loading data very slowly. The key program had been running in the production environment for quite a long time, and there was no problem, at least in terms of the stability of running online.

Personal view

Only as development summary record here, who wrote it and why do you write, is not in the scope of the record summary, only to make a analysis on the code, if you see the feel or have better idea, please tell me, we can discuss together, if there is wrong, also please feel free to more,

Why write it down?

Do anything must have the output, development as well as daily, in addition to the delivery of the task, also should have its own summary, facilitate subsequent checking, output is not much, a few lines of text, a table, or draw a picture, output is not 100% can understand each question in the development, summed up to look not to understand, is output, subsequent problems or all have not been resolved Summary record here, previously solved will be summarized again.


Positioning reason

After some debugging to locate the code, the problem was found to be mainly divided into two types:

1. Repeat the evaluation

This is easy to understand, which means that the same result has been achieved above, or the same result could be stored in a variable, but it is not done here.

Instead, the expression that requires the required value is executed every time to get the result, which greatly reduces the execution efficiency and improves the resource occupation

2. Loop nesting

Since the company code is involved, it is not convenient to post it. In order to more vividly describe the specific problem, the code Demo and description abstracted by me according to the problem are as follows:

1). Loop 500 times, and there are 50W pieces of data in the corresponding list set

2). Find data where ProductId is 213000 according to Lambda and store it in result set

ForEach is a multithreaded ForEach. ForEach is a multithreaded ForEach. ForEach is a multithreaded ForEach. After investigation, it was found that the core cause of the problem was internal Lambda expression evaluation. Why was the evaluation slow?

See resources, because Lambda is implemented in a loop, with iterators and state machines and other delay mechanisms, but here it is FirstOrDefault(), which evaluates immediately and doesn’t lazy-load.

If analyzed abstractly from the point of view of data structure, the time complexity here is O(500* 50W). As the number of times increases, the execution time increases more and more.

3. Verify the execution time

The validation execution time is: 1195 ms. What if you add another layer of loop?


Are all set operations Lambda?

We have found the problem, the next step is to start to solve the method, the first problem is easy to solve, change the evaluation notation do not repeat evaluation.

It was mainly the solution of the second problem. Before solving the problem, I came to an important conclusion, that is, “Lambda does not apply everywhere, but sometimes improper use can make your program worse “.

The solution

1. How to optimize lambda(loops)?

We need to think about the nature of the problem is how list lookups here are faster than sequential lookups. Sequential table lookups in data structures fall into three categories:

1. Search in sequence

Binary search

3. Index lookup

The key here is to change the time complexity from O(n) or O(10x500x50W) to O(1). There is no doubt that the time complexity of index lookup is O(1), because the time complexity of accessing elements by index subscript is O(1), then index lookup must be better than sequential lookup. Then how to change to index lookup?

We can’t help but ask :” in this case, the index index happens to match ProductId, so what if it is messy “? Then you have to work around it, which is to turn it into a collection of dictionaries and find values by keys.

For a collection with no duplicate data let’s go to Dictionary first and look at the execution time, not as good as an index but compared to a loop?

Now we also have to think about this, because ProductId is non-repetitive you can add it to Dictionary, but what if there’s a duplicate? Don’t panic, there is a solution, that is:

For sets with duplicate data we can use ILookup because it groups the same keys into one Key.


Simple summary

  1. Syntactic sugar and development libraries are great, but they can be a bit of a roadblock for maintenance.
  2. In daily development, although there are many libraries for us to call, in the process of use we should combine the code to think about whether it can be used in a specific place, rather than blindly pursuing the simplicity and novelty of syntax sugar.