Modify the data

Elasticsearch provides near real-time data manipulation and search capabilities. By default, it takes about 1s (the time configured for the refresh interval) for data to be written/updated until it is retrieved. This is an important difference from other storage engines, such as databases, where the data is immediately visible after the transaction is completed.

Index/replace documents

We have already seen how to create a document before, in order to execute once:

curl -XPUT ‘localhost:9200/customer/external/1? pretty’ -d ‘ { “name”: “John Doe” }’

The specified document is indexed to externalType of the Customer index with id 1. Elasticsearch will overwrite the existing document with id 1 (or reindex) with a new document if we repeat the above command with a different or the same document.

curl -XPUT ‘localhost:9200/customer/external/1? pretty’ -d ‘ { “name”: “Jane Doe” }’

The command above changes the name of the document with ID 1 from “John Doe” to “Jane Doe”. On the other hand, if we use a different ID, a new document will be created in the index and the original document will not be changed.

curl -XPUT ‘localhost:9200/customer/external/2? pretty’ -d ‘ { “name”: “Jane Doe” }’

The command above indexes a new document with ID 2. The ID parameter is optional for indexing. If not specified, Elasticsearch will generate a random ID and use it as the document ID. The ID generated by ES (or the ID we specified) will be returned as the result of the API call. The following example shows how to create a document without specifying an ID.

curl -XPOST ‘localhost:9200/customer/external? pretty’ -d ‘ { “name”: “Jane Doe” }’

Note that in the above example, we used a POST request instead of a PUT because we did not specify an ID. In practice, we can insert updates using either POST or PUT, and we don’t see any difference. In fact, we can see that the handler that creates the document can handle both POST and PUT requests.


Update the document

In addition to creating or overwriting a document, we can also modify it. Note that the bottom layer of Elasticsearch does not directly update the original data. Whenever we perform an update operation, ES deletes the old document and then creates the new document, both of which are done in an update request. The following example shows how to modify the previous document with ID 1 and change the name to “Jane Doe”.

curl -XPOST ‘localhost:9200/customer/external/1/_update? pretty’ -d ‘ { “doc”: { “name”: “Jane Doe” } }’

The following example shows how to change the name of a document to “Jane Doe” and add an AGE field:

curl -XPOST ‘localhost:9200/customer/external/1/_update? pretty’ -d ‘ { “doc”: { “name”: “Jane Doe”, “age”: 20 } }’

Update operations can also use simple scripts. Note that dynamic scripting like the one below is prohibited by default in version 1.4.3, see Scripting Docs for details. This script adds 5 to the age:

curl -XPOST ‘localhost:9200/customer/external/1/_update? pretty’ -d ‘ { “script” : “ctx._source.age += 5” }’

(Translator’s note: Scripts are powerful, but not recommended, especially for queries. Stored procedures like relational databases can handle complex business logic, but putting overly complex logic in the ES engine is a performance drain, and it makes sense for ES to disable this feature by default.

In the example above, ctx._source refers to the document you want to update (that is, the document with ID 1). Note that only one document can be modified at a time, and future versions may provide apis for batch conditional modification, similar to the UPDATE WHERE condition in SQL.


Delete the document

Deleting a document is fairly simple. The following example shows deleting a document with ID 2:

DELETE /customer/doc/2? pretty

Delete also provides a bulk delete API (plug-in), but it is worth noting that if data needs to be emptied, deleting the entire index is more efficient than deleting all documents in bulk.

(translator note: about bulk changes and batch delete, have corresponding plug-in, but in fact is also use the scroll one query and then deleted, execution speed is very slow, if the process execution is longer, the service layer waiting queue request timeout may occur, in the test environment, therefore this feature Suggestions or slack period use command line mode flow, is not recommended to provide online interface. It is recommended to use the batch update or delete interface, or multiple threads of a single delete.


The batch operation

In addition to adding, deleting, and modifying individual documents, ES also provides bulk execution: _BULK API, which is very efficient and can maximize data processing speed with minimal network overhead. The following example uses a batch interface to index two documents:

POST /customer/doc/_bulk? pretty {“index”:{“_id”:”1″}} {“name”: “John Doe” } {“index”:{“_id”:”2″}} {“name”: “Jane Doe” }

Update the first document and delete the second document:

POST /customer/doc/_bulk? pretty {“update”:{“_id”:”1″}} {“doc”: { “name”: “John Doe becomes Jane Doe” } } {“delete”:{“_id”:”2″}}

Note the second delete operation, which does not contain the source document information, because you only need to provide the ID for the delete.

Batch operations If one operation fails, it does not affect the entire operation. If one operation fails, other operations continue to be performed. When the batch execution is complete, you get a return response with the status code information for each operation, which you can use to see if an operation failed.

Elasticsearch 1.3 Exploring clusters

Next section: Elasticsearch official translation — 1.5 Explore data