Follow the public account MageByte, set the star punctuation “looking” is our motivation to create good writing. “Add group” into the technical exchange group for more technical growth.

Arrays are one of the most important data structures in every programming language, and of course arrays are implemented and handled differently in different languages. Arrays provided in the Java language are used to store fixed-size elements of the same type.

You’d say arrays are so simple. Hey, hey, hey, it contains secrets that not everyone knows.


Here’s today’s question… .

Arrays are almost always numbered from 0. Ever wonder “Why arrays are numbered from 0 instead of 1?” Isn’t it more human to use one?

Introduction of the array

An array is a linear table data structure that uses a contiguous set of memory space to store a set of data of the same type.

There are several key keywords, linear tables, contiguous memory space, and the same type of data, which are explained here.

The linear table

It is the structure of data rows like imaging lines, just like our high-speed railway G1024. Each carriage is connected end to end, and the data can only be “forward” and “back” at most. Except for arrays, lists, queues, stacks are all linear structures.


Nonlinear table

Such as binary trees, heaps, graphs, etc. It is called nonlinear because, in a nonlinear table, there is no simple contextual relationship between data.

Contiguous memory space

Because it has contiguous memory space and the same data type of data. One cool feature is “random access.” Many of you will be asked in an interview what is the difference between an array and a linked list? Most people will answer “Linked lists are suitable for insertion and deletion, time complexity O(1); The array is suitable for lookup, and the lookup time is O(1) “.

This answer is not rigorous. It’s good to find, but it’s not O(1), even if it’s sorted, it’s O(logn) if you use dichotomy. Instead, “Arrays support random access with time complexity O(1) according to the table below.”

Random access

We all know that arrays access data according to the following table, how does it implement random access?

Using an array int[] a = new int[4] as an example, the computer allocates a contiguous memory space of 1000~1015 for array A. Int takes up 4 bytes, so it takes up 4*4 bytes. The first address of the memory block, base_address, is 1000. When the program randomly accesses the ith element of the array, the computer calculates the memory address using the following addressing formula.

targetAddress = base_address + i * data_type_size

Copy the code
  • TargetAddress: Memory address of the access target.
  • Base_address: the first address of an array memory block.
  • Data_type_size: specifies the size of the data type in bytes. For example, the int type is 4 bytes.

The head address is like the bullet train G1024 number, each car is the subscript position of the array, and each car seat is like a byte length.


That’s what array addressing formulas are all about, folks. This formula is also the final explanation of why subscripts start at 0.

Why do subscripts start at 0?

The best definition of “subscript” would be “offset”. A [I] is offset by I data_type_size (data_type_size), and a[I] is offset by I data_type_size (data_type_size, data_type_size, data_type_size, data_type_size).

targetAddress[i] = base_address + i * data_type_size

Copy the code

If the array index starts at 1, the formula for calculating the memory address of a[I] should be changed to:

TargetAddress [I] = base_address + (i-1) * data_type_size

Copy the code

The point is, comparing the two formulas, starting at 1 each random access to the array element is one more subtraction operation, which is equivalent to one more subtraction instruction.

Array is a very basic data structure, and random access to array elements by subscript is a very basic programming operation, so efficiency optimization should be as extreme as possible. So to eliminate one subtraction, the array is numbered from 0 instead of 1.

Of course, this can not be said to be absolute, may also be historical reasons, C language design from 0, the following high-level language are modeled, but also convenient program ape quickly adapt to reduce learning costs.

Inefficient inserts and deletes

This limitation also makes it inefficient to delete and insert arrays, which require data movement to ensure memory continuity.

Is there any way to improve it?

The insert

Insert an element into the KTH position of the array of length N. In order for continuity to work, we need to make sure that the k slot is empty, that the new data slot is occupied, and then we move everything from k to N one bit back. What is the time complexity of this insertion? Let’s analyze it, and learn about time and space complexity.

When inserting elements at the end of the array, there is no need to move the data, so the “best time complexity” is O(1). When the insertion position is at the beginning of the array, all the data needs to be moved one bit further back, so the worst-case complexity is O(n). And we have the same probability of inserting elements at each location, so the average time is zero


Optimize ideas – Hatoyama occupy the magpie nest

If the order of the array is ordered, we need to move data after k, if the data is stored in the array, just as a stored data collection, to insert an element to an array of k position, we can put the original position in k element in array in the end, put the newly inserted element in k to this place, Time goes down to order one.

Delete operation

Similarly, if we want to delete the KTH position, if k = n-1, then the best time complexity is O(1). If k = 0, the worst time complexity is O(n). The average time complexity is also O(n).

Optimization idea – tagging – batch execution

In fact, in some cases, it is not necessary to pursue data continuity. You can perform multiple deletion operations in batches.

For example, the array number[6] contains six elements of type int: 1, 2, 3, 4, 5, 6. Delete 1, 2, and 3 in sequence. Three elements. To prevent the data from being moved every time it is deleted, we only need to mark that the data has been deleted. When the deletion threshold, such as 3, is reached, the operation of moving the data will be performed, and the operation of moving the data will be performed at this time, which greatly reduces the data movement.

Isn’t that the core idea behind the JVM tag scavenging garbage collection algorithm? Yes, that’s the beauty of data structures and algorithms. “A lot of the time it’s not about memorizing a data structure or algorithm, it’s about learning the ideas and processing skills behind it, and that’s what’s most valuable.” If you pay close attention, there are algorithms and data structures that can be found in software development and architectural design.

Knowledge development & summary

Array uses a contiguous memory space to store a group of data of the same type. The biggest feature of array is that it supports random access, but the insertion and deletion operations become relatively inefficient, and the average time complexity is O(n). In normal business development, we can use the container classes provided by the programming language directly, but for very low-level development, it may be more appropriate to use arrays directly.

The question

Based on the array delete operation, we propose an optimization idea: mark-batch cleanup idea. In Java JVM, what is the tag cleanup algorithm for garbage collection? Feel free to share your thoughts in the group or reply to “tag clear” to get answers.

Welcome to add group to discuss and share with us, we feedback the first time.

Recommended reading

1. The importance of data structure algorithms

2. Time complexity and space complexity

3. Best, worst, average and amortized time complexity

MageByte