What is an index

To understand the index, you need to have a mental picture, and it’s recommended to imagine a cookbook, not just any cookbook, but 5,000 pages of thick cookbooks for various occasions, dishes and seasons. Although this recipe is complete, one drawback is that it is out of order, the first page may be fish-flavored eggplant, the third page is braised eggplant.

That’s not so important. The problem is that there’s no index!

Here’s the first question you should ask yourself: How do you find sweet and sour ribs in a recipe without an index? The only option is to turn page by page. If it’s on page 3892, how many pages do you have to turn? Worst case, if it’s on the last page, you have to turn the whole book.

The solution is to build an index.

We can think of a number of ways to find this, with the recipe name as a starting point. If you create a list by recipe name, followed by the page number, you index the book by recipe name. The entry might look something like this:

Braised spare ribs: 45 pork dumplings: 320 turnip sauce: 199Copy the code

Now that you know the name of a recipe, you can quickly find any recipe in a book through the index. If that’s all you want to do, you’re done.

But that’s not realistic. You want to look up recipes based on what’s in your fridge, for example, or look up dishes. In this case, you need more indexes.

This leads to the second question, how can you find all the sparerib-related recipes with only one index based on recipe names? Without a proper index, you’ll still have to wade through the entire cookbook — 5,000 pages. This is true when searching based on ingredients or dishes.

To do this, we need to build another index, this time of ingredients, in which ingredients are listed alphabetically, with each item pointing to the page number of all the recipes that contain it. The basic index of ingredients looks like this

Beef: 301, 342, 785, 2310, 2456, 4310... Chinese yam: 8, 20, 45, 78, 287, 1295, 4587... Pork: 12, 124, 320, 890, 3719...Copy the code

Is this the index you wanted? Is it useful?

If just need to know the specified recipe listing of ingredients, the index will do the trick, but if you still hope in finding contain any arbitrary and other information related to diet, or the need for “scan” – once you know the beef page, every page you want to turn to find the name of the recipe and determine type of dishes, although this is better than we go through the whole book, But it is far from enough.

For example, a week ago, you stumbled across a great chicken recipe in this cookbook, but you forgot its name, and you wanted to find it and cook it for your cute little sister. So far, there are two indexes, one for recipe names and one for ingredients. Can you combine the two and find the forgotten chicken recipe?

In practice, this is impossible. If you start with a recipe name index but can’t remember the name, retrieving it is only marginally better than flipping through a book. Starting with ingredients, you check a series of pages that cannot be inserted into an index based on recipe names. So only one index can be used in this case, and in this case the index for the ingredients is a little more useful.

It is generally thought that there are two fields to look for in a query, and you can separate indexes for them. There is an existing algorithm that looks for the page number of a match in each index and scans the intersection of page numbers for lists that match both indexes. This reduces the total number of scans. Some databases implement this algorithm, but not MongoDB. Even if it did, using a compound index to find two fields would always be more efficient than the algorithm I just described. Keep in mind that the database only uses one index per query, and if you are querying for multiple fields, make sure you have matching indexes for those fields.

So what to do? Fortunately, we have composite indexes.

The composite index

So far you’ve built single-key indexes: they index recipes by one key. Now you want to build a new index for the entire recipe, this time using two keys. Similar indexes that use multiple keys are known as compound indexes.

The composite index, which uses ingredients and recipe names in turn, can be marked as follows: Ingredient-recipe, part of which looks like this:

Pork: Pork cabbage stewed vermicelli: 320 Pork egg roll: 3719 Pork breast: 890 Chicken legs: Braised chicken legs: 82 Cola chicken legs: 3710 Braised chicken legs with potatoes: 2578 tomato scrambled eggs: 4827 tomato egg soup: 2478 Tomato brisket: 489Copy the code

The value of this index is obvious to the human eye, and it is now possible to search by ingredient, roughly locate the recipe, or even just remember the beginning of the name. It’s also valuable for machines to not scan all the recipe names for that ingredient.

It should be noted that the order of the composite index is very careful. If we flipped the index to cookie-ingredients, would it replace our previous index?

Obviously not! With the new index, a search is guaranteed to lead to a recipe, a page in a book, as long as you know the name. If you’re looking for a pork recipe that contains banana ingredients, you can be sure it doesn’t exist. If we flip it, we have to know the name of the recipe, we have to find the ingredients, but the reality is that we know the ingredients but we don’t know the name of the recipe.

The whole recipe now has three indexes: recipes, ingredients and ingredients-recipes, which means we can safely remove the ingredients index. Why is that? Because an index of an ingredient can use the ingredient-recipe index, if you know an ingredient, you can facilitate the composite index to get a list of the recipes that contain it.

conclusion

This article is just a metaphor for a better understanding of indexes, from which you can learn some simple rules of thumb, as follows:

  1. Indexes can significantly reduce the amount of work required to retrieve documents. Without a suitable index, the only way to implement the query is to linearly scan the entire document until the query criteria are met. This is usually just scanning the entire collection.
  2. Only one single-key index is used when parsing queries (or is the exception), and for queries that contain multiple keys (such as ingredients and recipes), a composite index containing these keys will parse the query better.

If you select name=”zhangsan” and age = 20, only one of the indexes will be used

  1. If you have an ingredient-recipe index, you can remove the ingredient index, and you should. More abstractly, if you have a composite index of A-B, then an index of ONLY A is redundant. But if b is itself a composite index (b=c-d), then it makes sense to have both a-b and A.
  2. It is also important to match the order of the keys in the index.