“This is the 25th day of my participation in the Gwen Challenge in November. See details: The Last Gwen Challenge in 2021”

Technology stack:

SpringBoot 2.5.6

ElasticSearch 7.8.0

Vue

1. Project construction

1. Create a New Spring Boot project

2. Import dependencies

2.1. Change the ElasticSearch version in Springboot to the local version

<properties>
    <java.version>1.8</java.version>
    <elasticsearch.version>7.8. 0</elasticsearch.version>
</properties>
Copy the code

2.2. Import dependencies

<dependencies> <! Jsoup </artifactId> jsoup</artifactId> <version>1.10.2</version> </dependency> <! Alibaba </groupId> <artifactId> Fastjson </artifactId> <version>1.2.70</version> </dependency> <! -- ElasticSearch --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <! -- thymeleaf --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <! -- web --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <! <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-devtools</artifactId> <scope>runtime</scope> <optional>true</optional> </dependency> <! -- --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-configuration-processor</artifactId> <optional>true</optional> </dependency> <! <dependency> <groupId>org.projectlombok</groupId> <artifactId> Lombok </artifactId> <optional>true</optional> </dependency> <! -- test --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies>Copy the code

3. Write a configuration file

# Change ports to prevent conflicts
server.port=9090
# Turn off thymeleaf cache
spring.thymeleaf.cache=false
Copy the code

4. Import static resources

Link: pan.baidu.com/s/10EF40UUK… Extraction code: OT8J

5. Test access to static pages

@GetMapping({"/","/index"})
public String test(a){
    return "index";
}
Copy the code

Access request: http://localhost:9090/

Project construction completed!

2. Crawl data

1. By requesting search.jd.com/Search?keyw… Query to page

Check the page: you can see that the element list id is J_goodsList,

Each LI label stores the specific data of each product:

2. Parse the page to get the data

Write parse page utility classes to get product information: easy version

package com.cheng.utils;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;

/ * * *@Author wpcheng
 * @CreateThe 2021-11-10 -.so * /
public class HTMLParseUtil {
    public static void main(String[] args) throws IOException {
        //1. Obtain the request
        String url = "https://search.jd.com/Search?keyword=java";
        // Jsoup returns the browser's Document object, which can be used in javascript
        Document document = Jsoup.parse(new URL(url), 30000);
        // Get the "J_goodsList" list
        Element element = document.getElementById("J_goodsList");
        // Get the set of Li tags in the "J_goodsList" list
        Elements elements = element.getElementsByTag("li");
        // Select * from the list of li tags; // Select * from the list of LI tags
        for (Element el : elements) {

            // All images are loaded lazily, and the address of the image is placed in "data-lazy-img"
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");// Get the address of the commodity image
            String price = el.getElementsByClass("p-price").eq(0).text();// Get the price of the item
            String title = el.getElementsByClass("p-name").eq(0).text();// Get the title of the item

            System.out.println("= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ="); System.out.println(img); System.out.println(price); System.out.println(title); }}}Copy the code

Running: Data is successfully obtained.

Entity class for writing commodity information:

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Content implements Serializable {
    private static final long serialVersionUID = -8049497962627482693L;
    private String name;
    private String img;
    private String price;

}
Copy the code

Package HTMLParseUtil, the parsing page utility class, as a method: the full version

package com.cheng.utils;

import com.cheng.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

public class HTMLParseUtil {
    public static void main(String[] args) throws Exception {
        / / test
        new HTMLParseUtil().parseJD("Liu with").forEach(System.out::println);

    }

    public  List<Content> parseJD(String keyword) throws Exception {

        String url = "https://search.jd.com/Search?keyword="+keyword;
        // Jsoup returns the browser's Document object, which can be used in javascript
        Document document = Jsoup.parse(new URL(url), 30000);
        // Get the "J_goodsList" list
        Element element = document.getElementById("J_goodsList");
        // Get the set of Li tags in the "J_goodsList" list
        Elements elements = element.getElementsByTag("li");
        // Select * from the list of li tags; // Select * from the list of LI tags
        // list stores all contents of li
        List<Content> contents = new ArrayList<>();
        for (Element el : elements) {
            // All images are loaded lazily, and the address of the image is placed in "data-lazy-img"
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");// Get the address of the commodity image
            String price = el.getElementsByClass("p-price").eq(0).text();// Get the price of the item
            String title = el.getElementsByClass("p-name").eq(0).text();// Get the title of the item

            Content content = new Content(img, price, title);
            contents.add(content);
        }

        returncontents; }}Copy the code

Run the test utility class:

Data crawl test successful!

3. Write business

1. Write a configuration file

@Configuration
public class ESHighClientConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(a){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1".9200."http")));
        returnclient; }}Copy the code

2. Write the Service layer

package com.cheng.service;

import com.alibaba.fastjson.JSON;
import com.cheng.pojo.Content;
import com.cheng.utils.HTMLParseUtil;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;


@Service
public class ContentService {

    @Autowired
    RestHighLevelClient restHighLevelClient;

    // Crawl data into index
    public boolean parseContent(String keyword) throws Exception {
        // Use custom parsing tool classes to parse web pages and get data
        List<Content> contents = HTMLParseUtil.parseJD(keyword);

        // Add the parsed data to ES in batches
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");
        for (int i = 0; i < contents.size(); i++) {
            bulkRequest.add(
                    new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)), XContentType.JSON));

        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);

        return! bulkResponse.hasFailures();// Returns true. Execution succeeded
    }


    // Search for document information
    public  List<Map<String,Object>> search(String keyword,int pageIndex,int pageSize) throws IOException {

        if (pageIndex < 0){
            pageIndex = 0;
        }

        // Build query requests against indexes
        SearchRequest request = new SearchRequest("jd_goods");

        // Build the search criteria
        SearchSourceBuilder sourceBuilder = SearchSourceBuilder.searchSource();
        // Query the exact keyword
        TermQueryBuilder termQuery = QueryBuilders.termQuery("name", keyword);
        // Put the exact query into the search criteria
        sourceBuilder.query(termQuery).timeout(new TimeValue(60, TimeUnit.SECONDS));

        / / paging
        sourceBuilder.from(pageIndex);
        sourceBuilder.size(pageSize);

        // Put the search criteria into the request
        request.source(sourceBuilder);

        // Execute the query request
        SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);

        List<Map<String,Object>> list = new ArrayList<>();

        for (SearchHit documentFields : response.getHits().getHits()) {
            // Retrieve each item information as a map
            Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();
            // Add to the list
            list.add(sourceAsMap);
        }
        returnlist; }}Copy the code

3. Write the Controller layer

package com.cheng.controller;

import com.cheng.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;
import java.util.Map;

@RestController
public class TextController {

   @Autowired
   private ContentService contentService;


    @GetMapping({"/","/index"})
    public String test(a){
        return "index";
    }


    @GetMapping("/parse/{keyword}")
    public Boolean parse(@PathVariable("keyword") String keyword) throws Exception {
        return contentService.parseContent(keyword);
    }


    @GetMapping("/search/{keyword}/{pageIndex}/{pageSize}")
    public List<Map<String, Object>> parse(@PathVariable("keyword") String keyword,
                                           @PathVariable("pageIndex") Integer pageIndex,
                                           @PathVariable("pageSize") Integer pageSize) throws IOException {
        returncontentService.search(keyword,pageIndex,pageSize); }}Copy the code

4. Take a test

Data access request: http://localhost:9090/parse/java add documents: the documents have been added into the ES

Access request: http://localhost:9090/search/java/1/20 query with keyword “Java” commodity information, and paging: query success!

4. Front and back end interaction

Import vue and AXIOS dependencies

2. Introduce JS to HTML files

<script th:src="@{/js/vue.min.js}"></script>
<script th:src="@{/js/axios.min.js}"></script>
Copy the code

3. Render index.html

<! DOCTYPEhtml>
<html xmlns:th="http://www.thymeleaf.org">
<head>
    <meta charset="utf-8"/>
    <title>Crazy god said java-ES mimics jingdong combat</title>
    <link rel="stylesheet" th:href="@{/css/style.css}"/>
    <script th:src="@{/js/jquery.min.js}"></script>
</head>
<body class="pg">
<div class="page">
    <div id="app" class=" mallist tmall- page-not-market ">
        <! -- Head search -->
        <div id="header" class=" header-list-app">
            <div class="headerLayout">
                <div class="headerCon ">
                    <! -- Logo-->
                    <h1 id="mallLogo">
                        <img th:src="@{/images/jdlogo.png}" alt="">
                    </h1>
                    <div class="header-extra">
                        <! - search - >
                        <div id="mallSearch" class="mall-search">
                            <form name="searchTop" class="mallSearch-form clearfix">
                                <fieldset>
                                    <legend>Tmall search</legend>
                                    <div class="mallSearch-input clearfix">
                                        <div class="s-combobox" id="s-combobox-685">
                                            <div class="s-combobox-input-wrap">
                                                <input v-model="keyword"  type="text" autocomplete="off" id="mq"
                                                       class="s-combobox-input"  aria-haspopup="true">
                                            </div>
                                        </div>
                                        <button type="submit" @click.prevent="searchKey" id="searchbtn">search</button>
                                    </div>
                                </fieldset>
                            </form>
                            <ul class="relKeyTop">
                                <li><a>God said Java</a></li>
                                <li><a>Crazy god said front end</a></li>
                                <li><a>God said Linux</a></li>
                                <li><a>Crazy god says big data</a></li>
                                <li><a>Crazy talk about money</a></li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>
        </div>
        <! -- Product details page -->
        <div id="content">
            <div class="main">
                <! -- Brand Category -->
                <form class="navAttrsForm">
                    <div class="attrs j_NavAttrs" style="display:block">
                        <div class="brandAttr j_nav_brand">
                            <div class="j_Brand attr">
                                <div class="attrKey">brand</div>
                                <div class="attrValues">
                                    <ul class="av-collapse row-2">
                                        <li><a href="#">Crazy god said</a></li>
                                        <li><a href="#"> Java </a></li>
                                    </ul>
                                </div>
                            </div>
                        </div>
                    </div>
                </form>
                <! -- Sorting rules -->
                <div class="filter clearfix">
                    <a class="fSort fSort-cur">comprehensive<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">sentiment<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">New product<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">sales<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">The price<i class="f-ico-triangle-mt"></i><i class="f-ico-triangle-mb"></i></a>
                </div>
                <! -- Details of goods -->
                <div class="view grid-nosku" >
                    <div class="product" v-for="result in results">
                        <div class="product-iWrap">
                            <! -- Product cover -->
                            <div class="productImg-wrap">
                                <a class="productImg">
                                    <img :src="result.img">
                                </a>
                            </div>
                            <! - price - >
                            <p class="productPrice">
                                <em v-text="result.price"></em>
                            </p>
                            <! - the title -- -- >
                            <p class="productTitle">
                                <a v-html="result.name"></a>
                            </p>
                            <! -- Shop name -->
                            <div class="productShop">
                                <span>Shop: Crazy god said Java</span>
                            </div>
                            <! -- Transaction Information -->
                            <p class="productStatus">
                                <span>Month to clinch a deal<em>999</em></span>
                                <span>evaluation<a>3</a></span>
                            </p>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>
<script src="https://cdn.jsdelivr.net/npm/vue@2/dist/vue.js"></script>
<script th:src="@{/js/axios.min.js}"></script>
<script>
    new Vue({
        el:"#app".data: {keyword: ' '.// Search for the keyword
            results: []// The result returned by the back end
        },
        methods: {searchKey(){
                var keyword = this.keyword;
                console.log(keyword);
                axios.get('h_search/'+keyword+0/20 '/').then(response= >{
                    console.log(response.data);
                    this.results=response.data; }}}}));</script>
</body>
</html>
Copy the code

5. Keyword highlighting

To add keywords to a ContentService:

How it works: Overwrite old field values with new highlighted field values

public List<Map<String, Object>> highlightSearch(String keyword, Integer pageIndex, Integer pageSize) throws IOException {
    // For build query requests
    SearchRequest searchRequest = new SearchRequest("jd_goods");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    // Exact query
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);
    searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    // Add a query
    searchSourceBuilder.query(termQueryBuilder);
    / / paging
    searchSourceBuilder.from(pageIndex);
    searchSourceBuilder.size(pageSize);
    // keyword highlight
    HighlightBuilder highlightBuilder = new HighlightBuilder();
    highlightBuilder.field("name");
    highlightBuilder.preTags("<span style='color:red'>");
    highlightBuilder.postTags("</span>");
    searchSourceBuilder.highlighter(highlightBuilder);
    // Add the query criteria to the query request
    searchRequest.source(searchSourceBuilder);
    // Execute the query request
    SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    // Parse the result
    SearchHits hits = searchResponse.getHits();
    List<Map<String, Object>> results = new ArrayList<>();
    for (SearchHit documentFields : hits.getHits()) {
        // Use the new highlighted field value to override the old field value
        Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();
        // Highlight the field
        Map<String, HighlightField> highlightFields = documentFields.getHighlightFields();
        HighlightField name = highlightFields.get("name");
        // Start the substitution
        if(name ! =null){
            Text[] fragments = name.fragments();
            // Using StringBuilder is more efficient
            StringBuilder new_name = new StringBuilder();
            for (Text text : fragments) {
                new_name.append(text);
            }
            sourceAsMap.put("name",new_name.toString());
        }
        results.add(sourceAsMap);
    }
    return results;
}
Copy the code

Configure highlighting in index.html:

<! - the title -- -- >
<p class="productTitle">
    <a v-html="result.name"></a>
</p>
Copy the code

6. Final effect