preface

I introduced the knowledge related to mobile terminal iOS and Android development to my friends before, so today I reorganize and summarize the knowledge points of back-end development: Hibernate, perhaps now many enterprises slowly change to Spring Boot, Spring MVC, but all changes are not part of it, learn to learn a framework or its ideas are also the same, and there are still many enterprises legacy of the old project or stay in SSM, SSH and other projects; All right, no more bullshit. Let’s get to the point.

Introduction of cache

The cache is between the application program and the physical data source, its function is to reduce the application program to the physical data source access frequency, thus improving the application performance. The data in the cache is a copy of the data in the physical data source. The application reads and writes data from the cache at run time and synchronizes the data from the cache and the physical data source at a specific time or event.

The cache medium is usually memory, so the read and write speed is very fast. However, if a large amount of data is stored in the cache, hard disks are also used as the cache medium. The implementation of the cache takes into account not only the storage medium, but also the concurrent access of the managed cache and the life cycle of the cached data.

Session, SessionFactory

Hibernate caches include Session caches and SessionFactory caches. SessionFactory caches can be divided into two types: built-in caches and external caches. Session caching is built in and cannot be unloaded, also known as Hibernate’s level 1 cache. The SessionFactory built-in cache is implemented in a similar way to the Session cache. The former is the data contained in some collection attributes of the SessionFactory object, and the latter is the data contained in some collection attributes of the Session. The built-in cache of SessionFactory stores mapping metadata, which is a copy of the data in the mapping file, and predefined SQL statements, which are derived from mapping metadata during Hibernate initialization. The built-in cache of SessionFactory is read-only. Applications cannot modify the mapping metadata and predefined SQL statements in the cache, so the SessionFactory does not need to synchronize the built-in cache with the mapping file. The SessionFactory external cache is a configurable plug-in. SessionFactory does not enable this plug-in by default. External cache data is a copy of database data. External cache media can be memory or hard disk. SessionFactory’s external cache is also known as Hibernate’s second level cache.

Both Hibernate caches are in the persistence layer and store copies of database data, so what’s the difference between them? To understand the difference, you need to understand two characteristics of caching at the persistence layer: the scope of the cache and the concurrent access strategy of the cache.

The scope of the persistence layer’s cache

The scope of the cache determines the lifetime of the cache and who can access it. The scope of caches falls into three categories.

Transaction scope

The cache can only be accessed by the current transaction. The lifetime of the cache depends on the lifetime of the transaction, and when the transaction ends, the cache ends its lifetime. In this scope, the cache medium is memory. A transaction can be a database transaction or an application transaction. Each transaction has its own cache, and the data in the cache is usually in the form of interrelated objects.

Scope of the process

The cache is shared by all transactions within the process. These transactions may be concurrent access to the cache, so the necessary transaction isolation mechanisms must be applied to the cache. The lifetime of the cache depends on the lifetime of the process, and the lifetime of the cache ends when the process terminates. Process-wide caches can hold large amounts of data, so they can be stored in memory or on hard disk. The data in the cache can be in the form of either interrelated objects or loose data of objects. The form of the loose object data is somewhat similar to the serialized data of the object, but the decomposition of the object into the loose algorithm is faster than the algorithm of object serialization.

The cluster

In a clustered environment, the cache is shared by processes on one or more machines. The data in the cache is copied to each process node in the cluster environment. Remote communication between processes ensures the consistency of the data in the cache. The data in the cache is usually in the form of loose data of objects.

For most applications, the need to use cluster-wide caching should be carefully considered, since access is not necessarily much faster than direct access to database data.

The persistence layer can provide a wide range of caching. If the data is not found in the transaction-wide cache, you can also query it in the process-wide or cluster-wide cache. If the data is not found, you can only query it in the database. Transaction-scoped caching is the first level of caching at the persistence layer and is usually required; Process-wide or cluster-wide caching is the second level of the persistence layer and is usually optional.

The concurrent access strategy for the persistence layer’s cache

When multiple concurrent transactions simultaneously access the same data cached at the persistence layer, concurrency problems arise and the necessary transaction isolation measures must be adopted.

A process-wide or cluster-wide cache, known as the second level cache, has concurrency issues. The following four types of concurrent access policies can therefore be set, each corresponding to a transaction isolation level.

Transactional: Only applicable in a managed environment. It provides the Repeatable Read transaction isolation level. For data that is frequently read but rarely modified, this type of isolation can be used because it prevents concurrency problems such as dirty and unrepeatable reads.

Read-write: Provides the Read Committed transaction isolation level. Only applicable in a non-clustered environment. For data that is frequently read but rarely modified, this type of isolation can be used because it prevents concurrency problems such as dirty reads.

Non-strict read-write: Does not ensure the consistency between the cache and the data in the database. If it is possible for two transactions to access the same data in the cache at the same time, a very short data expiration time must be configured for that data to minimize dirty reads. This concurrent access strategy can be used for data that is rarely modified and allows occasional dirty reads.

Read-only: Use this concurrent access strategy for data that is never modified, such as reference data.

Transactional concurrent access policies have the highest transaction isolation level and read-only policies have the lowest isolation level. The higher the transaction isolation level, the lower the concurrency performance.

What kind of data is appropriate to store in the second-level cache?

Data that is rarely modified 2. Data that is not very important and allows occasional concurrent access 3. Data that cannot be accessed concurrently 4

Data that doesn’t fit in the second level cache?

1. Frequently modified data. 2. Financial data, concurrency is never allowed.

Hibernate’s second-level cache

As mentioned earlier, Hibernate provides two levels of caching, the first level being the caching of sessions. Because the lifetime of a Session object usually corresponds to one database transaction or one application transaction, its cache is transaction-wide cache. Level 1 caching is required, not allowed and in fact impossible to dismount. In the first level cache, each instance of a persistent class has a unique OID.

The second level cache is a pluggable cache plug-in that is managed by the SessionFactory. Since the life cycle of the SessionFactory object corresponds to the entire process of the application, the second level cache is either process-wide or cluster-wide. This cache holds loose data for objects. Second-level objects have the potential for concurrency issues and therefore require an appropriate concurrent access policy that provides transaction isolation levels for cached data. The cache adapter is used to integrate specific cache implementation software with Hibernate. The second-level cache is optional and can be configured on a per-class or per-collection granularity.

Hibernate’s level 2 caching strategy follows the following general process:

  1. Select * from table_name (select * from table_name, select * from table_name, select * from table_name);

  2. All obtained data objects are placed in the second level cache by ID.

  3. When Hibernate accesses a data object based on its ID, it first looks it up from the Session level 1 cache. If the level-2 cache is configured, check the level-2 cache. If not, query the database and put the result into the cache according to the ID.

  4. Update the cache while deleting, updating, and adding data.

Hibernate’s second-level caching strategy is for ID queries and has no effect on conditional queries. To do this, Hibernate provides Query caching for conditional queries.

Hibernate’s Query caching strategy goes as follows:

  1. Hibernate first constructs a Query Key from this information. The Query Key contains general information about the request for the conditional Query: SQL, parameters required by the SQL, range of records (starting position rowStart, maximum number of records maxRows), etc.

  2. Hibernate uses this Query Key to find the corresponding list of results in the Query cache. If so, return the list of results; If not, Query the database, get the list of results, and put the entire list of results into the Query cache based on the Query Key.

  3. The SQL in Query keys refers to table names that are cleared from the cache if any data in these tables is modified, deleted, or added.

Many people do not know much about the second level cache, or there is a wrong understanding, I always want to write an article about Hibernate’s second level cache, today finally couldn’t help it. My experience mainly comes from Hibernate2.1, and the basic principle is the same as 3.0 and 3.1. Please forgive my obstinacy.

Hibernate sessions provide level 1 caching. Each session loads the same ID twice and does not send two SQL statements to the database. However, when the session is closed, level 1 caching is disabled.

The second level cache is the global cache at SessionFactory level. It can use different cache libraries such as Ehcache and OSCache. Hibernate. In 2.1 is to hibernate. Cache. Provider_class = net. Sf. Hibernate. Cache. EhCacheProvider if you use the query cache, Combined with hibernate. Cache. Use_query_cache = true

The cache can simply be viewed as a Map, looking for values in the cache by keys.

A Class of cache

For a record, that is, a PO, the cache key is the ID and the value is the POJO. List, load, or iterate populates the cache whenever an object is read. Iterate does not cache the list, however, as well as iterate, which uses the select ID from the iterate database. Then iterate loads the database id by id. If it is in the cache, it loads from the iterate database. If the cache is read and write, you need to set:

<cache usage="read-write"/> 
Copy the code

Ehcache.xml needs to be configured if you are using a secondary cache implementation ehcache

<cache name="com.xxx.pojo.Foo" maxElementsInMemory="500" eternal="false" timeToLiveSeconds="7200" timeToIdleSeconds="3600" overflowToDisk="true" /> 
Copy the code

Where eternal indicates whether the cache will never time out, timeToLiveSeconds is the timeout time for each element (i.e., a POJO) in the cache. If eternal=”false” exceeds the specified time, the element will be removed. TimeToIdleSeconds are moments of meditation, which are optional. When more than 500 elements are put into the cache, if overflowToDisk=”true”, part of the cache will be saved to a temporary file on disk. This is configured for each class that needs to be cached. Hibernate will warn you at startup if you don’t have one, and then use the defaultCache configuration so that multiple classes share the same configuration.

Hibernate knows when an ID is changed through Hibernate and removes the cache.

You might think that you can use the cache if you iterate for the first time and then iterate for the second time. Actually, this is very difficult because you can’t tell when it is the first time, and the criteria for each query are usually different. If you have 100 records with ids from 1 to 100, the first list contains the first 50 ids, and the second time you get ids from 30 to 70, So 30-50 is fetched from the cache, 51 to 70 is fetched from the database, and 1+20 SQL is sent. So I always thought iterate was useless. There was always a 1+N problem. (Digression: There is a saying that a large query with list will load the whole result set slowly. As for iterate, it is better to use only the SELECT ID as well. But as for large queries, you always have to do paging because nobody really loads the whole result set. The list is slower than the iterate first select ID statement, but it has only one statement. Instead of loading the entire result set, Hibernate optimizes for the database dialect, such as mysql limit, and the list should still be faster. If you want to cache the results of a list or iterate query, use the query cache

The query cache

Use_query_cache =true if ehcache is used, configure ehcache. XML. Note that hibernate3.0 is not a net.sf package name

<cache name="net.sf.hibernate.cache.StandardQueryCache" 
maxElementsInMemory="50" eternal="false" timeToIdleSeconds="3600" 
timeToLiveSeconds="7200" overflowToDisk="true"/> 
<cache name="net.sf.hibernate.cache.UpdateTimestampsCache" 
maxElementsInMemory="5000" eternal="true" overflowToDisk="true"/> 
Copy the code

Then the query. SetCacheable (true); // Activate the query cache query.setCacheregion (“myCacheRegion”); Optionally, the second line specifies that the cacheRegion to use is myCacheRegion. That is, you can do a separate configuration for each query cache using setCacheRegion. You need to configure it in ehcache.xml:

<cache name="myCacheRegion" maxElementsInMemory="10" eternal="false" timeToIdleSeconds="3600" timeToLiveSeconds="7200" overflowToDisk="true" /> 
Copy the code

If you omit the second line, didn’t set cacheRegion will then use the above mentioned standard query cache configuration, which is net. Sf. Hibernate. Cache. StandardQueryCache

In the case of the query cache, the cache key is the SQL generated from the HQL, plus the parameters, pagination, etc. (this can be seen through the log output, but its output is not very readable, so it is best to change its code). Such as HQL:

from Cat c where c.name like ? 
Copy the code

Generate roughly the following SQL:

select * from cat c where c.name like ? 
Copy the code

If the parameter is “tiger%”, the query cache key will look something like this:

select * from cat c where c.name like ? , parameter:tiger% 
Copy the code

In this way, the same keys are guaranteed for the same query and parameters.

Now, the cached value, if it’s a list, the value here is not the entire result set, it’s this list of ids that are being queried. That is, both the list and iterate methods perform the same query as they do the first time, with the list executing one SQL and the iterate executing 1+N. The additional action is that they fill the cache. The second time I query for the same condition, I iterate for the same behavior as the iterate. The value is a string of ids, which I load one by one from the iterate cache. This is to save memory. As you can see, the query cache requires the class cache of the relevant class to be turned on. The list and iterate methods, when first executed, populate both the query cache and the class cache.

Another important issue that can be easily overlooked is that even the list methods can encounter 1+N problems when the query cache is turned on! The first time a list with the same condition is not found in the query cache, a SQL statement is always sent to the database to fetch all the data, regardless of whether the class cache has data, and then populates the query cache and the class cache. If the class cache has timed out but the query cache is still in use, then the list method will load the list of ids one by one. Therefore, the class cache timeout must not be shorter than the timeout set by the query cache! If a daze time is set, ensure that the class cache daze time is also greater than the query cache lifetime. There are other cases, such as class caching being forced evict by the program, that are on your own.

In addition, if an HQL query contains a SELECT clause, the value in the query cache is the entire result set.

When Hibernate updates a database, how does it know which query caches to update? Hibernate in one place for each table maintenance last update time, actually is also on net. Sf. Hibernate. Cache. UpdateTimestampsCache specified cache configuration. When an update is made through Hibernate, Hibernate knows which tables are affected by the update. It then updates those tables with the last update time. Each cache, there is a generation time and the cache the query table, when the hibernate query a cache exists, if the cache exists, it will remove the cache generation time and the cache of the query table, and then to find the tables of the last update time, if there is a table in the generation time was updated after, Then the cache is invalid. As you can see, as soon as a table is updated, the query cache involving that table is invalidated, so the hit ratio of the query cache may be low.

The Collection cache

It needs to be set in HBM collection

<cache usage="read-write"/> 
Copy the code

If class is Cat and collection is children, ehCache is configured

<cache name="com.xxx.pojo.Cat.children" 
maxElementsInMemory="20" eternal="false" timeToIdleSeconds="3600" timeToLiveSeconds="7200" 
overflowToDisk="true" /> 
Copy the code

The cache for a Collection is the same as the list cache, but it does not expire because the table has been updated. A Collection cache only expires when elements in the Collection have been added or deleted. One problem with this is that if your collection is sorted by a field, the order in the collection cache is not updated when one of the elements updates the field, causing the order to change.

Caching strategies

Read -only: read/write cache: nonstrict read/write cache that the program may need to update data: Transactional caching supports transactional transactions and can be rolled back when an exception occurs. It only supports JTA environments

The difference between read-write cache and non-strict read-write cache is that when the read-write cache updates the cache, it replaces the data in the cache with a lock. If other transactions attempt to fetch the corresponding cache data and find that it is locked, they directly fetch the database query. In the Ehcache implementation of Hibernate2.1, if an exception occurs in a transaction that locks part of the cache, the cache will remain locked until it times out after 60 seconds. Data in the cache is not locked.

Prerequisites for using a level 2 cache

Your Hibernate application has exclusive write access to the database, and hibernate has no way of knowing that other processes have updated the database. You must operate the database directly through Hibernate, and Hibernate does not know if you call stored procedures or update the database using JDBC yourself. Mass updates and deletions in Hibernate3.0 do not update level 2 caches, but 3.1 has reportedly addressed this issue. This limitation can be tricky, and sometimes Hibernate does batch updates and deletes slowly, but you can’t write JDBC to optimize it yourself, which can be frustrating.

SessionFactory also provides methods to remove caches. If you must write your own JDBC, you can call these methods to remove caches. These methods are:

void evict(Class persistentClass) 
Evict all entries from the second-level cache. 
void evict(Class persistentClass, Serializable id) 
Evict an entry from the second-level cache. 
void evictCollection(String roleName) 
Evict all entries from the second-level cache. 
void evictCollection(String roleName, Serializable id) 
Evict an entry from the second-level cache. 
void evictQueries() 
Evict any query result sets cached in the default query cache region. 
void evictQueries(String cacheRegion) 
Evict any query result sets cached in the named query cache region. 
Copy the code

I don’t recommend doing this though, because it’s hard to maintain. EvictQueries (String cacheRegion) removes all query caches. Evict (Class persistentClass) removes all query caches. It looks like it’s complete. However, one day you add a related query cache, you may forget to update the removal code here. If your JDBC code is all over the place, do you know where else to make changes when you add a query cache?

Hibernate Level 1 cache

Level 1 caches are short, consistent with the lifetime of a session. Level 1 caches are also called session-level caches or transaction-level caches

Those methods support level 1 caching: Get () load() iterate (query entity objects)

How to manage level 1 cache: session.clear(),session.evict()

How can I avoid memory overflow caused by a large amount of entity data being imported into the database at a time

If the amount of data is very large, use JDBC. If JDBC cannot meet the requirements, use a specific data import tool

Hibernate Level 2 Cache

Level 2 caches are also called process-level caches or SessionFactory caches. Level 2 caches are shared by all sessions. Level 2 caches have the same lifecycle as SessionFactory caches, which manage level 2 caches

The configuration and use of level 2 cache:

Copy the echcache. XML file to SRC to enable level 2 caching. <property name="hibernate.cache.use_second_level_cache">true</property> specify the cache product provider, Modify hibernate.cfg. XML file to <property before <mapping> name="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</property>Copy the code

In hibernate.cfg. XML files, use labels as follows:

<class-cache class="com.bcm.model.Article" usage="read-write"/>
Copy the code

Level 2 caches are for caching entity objects

Hibernate query caching

The query cache is a cache for common attribute result sets and only caches ids for entity object result sets

The lifetime of the query cache. If the current associated table is modified, the lifetime of the query cache ends

Query the configuration and use of cache:

  • Enable query caching in hibernate.cfg. XML files as follows:

true

  • Query caching must be enabled manually in the program, for example:
query.setCacheable(true); List */ public void testCache1() {session session = null; try { session = HibernateUtils.getSession(); session.beginTransaction(); Query query = session.createQuery("select s.name from Student s"); // Enable query.setcacheable (true); List names = query.list(); for (Iterator iter=names.iterator(); iter.hasNext(); ) { String name = (String)iter.next(); System.out.println(name); } System.out.println("-------------------------------------"); query = session.createQuery("select s.name from Student s"); // Enable query.setcacheable (true); // No query SQL was issued because query caching is enabled names = query.list(); for (Iterator iter=names.iterator(); iter.hasNext(); ) { String name = (String)iter.next(); System.out.println(name); } session.getTransaction().commit(); }catch(Exception e) { e.printStackTrace(); session.getTransaction().rollback(); }finally { HibernateUtils.closeSession(session); }}Copy the code

Entity object query [Important]

As a default, use query.iterate, as well as iterate for the query, for example, as well as for the query id list. Iterate if there is no match for the list of ids in the cache, then the SQL statement is issued as follows: LIST and ITERATE. The list issues an SQL statement every time. The list adds data to the cache without using it as iterate: By default, iterate uses the cache, but N+1 can occur if there is no data in the cache

List, load, or iterate populates the cache whenever an object is read. Iterate does not cache the list, however, as well as iterate, which uses the select ID from the iterate database. Then iterate loads the database id by id. If it is in the cache, it loads from the iterate database

conclusion

Don’t assume that caching will improve performance, only if you can harness it and the conditions are right. Hibernate’s level 2 cache limitations are still quite high, and the inconvenience of using JDBC can significantly reduce update performance. If you don’t know how it works, you might have a 1+N problem. Improper use can also result in dirty data being read out.

If you can’t handle Hibernate’s limitations, do your own caching at the application level. The higher the level of caching, the better. Just like the database has to implement its own cache even though the disk has cache, our application has to cache even though the database has cache. Because the low-level cache does not know what the high-level needs to do with the data, it can only do more general, and the high-level can be targeted to achieve caching, so at a higher level of caching, the effect is better.