字数：6964

1 导读

本篇主要内容：

elasticsearch的简单介绍及其用途；
elastcsearch的插件logstash的运用解决mysql数据同步到elasticsearch。

2 介绍

elasticsearch乃当红炸子鸡，就在本月已上市，市值50亿美金左右。源起于一个github上的开源项目，其解决了大数据、分布式环境下的痛点问题，已经纳入大厂的解决方案，随着使用的人越来越多，未来无限美好。

初浅谈来其有几个特点：
（1）使用太方便了，解压，启动就是一个命令行的事，体验也直接用命令行搞定；
（2）大道至简，至少外表是这样，相比于solr门槛太低，吸粉无数，开源插件也推波助澜；
（3）单纯的核心技术没太多新意，但也绝对不简单，还是创意太好，RESTful风格时势造成？
（4）商业化的路线是框架免费开源，插件收费，技术服务收费；
（5）基于分布式文件系统，想象空间很大，优化的索引策略，全文检索的底蕴，聚合统计也是强项，简直集万千宠爱于一身。

两本电子书可以去看看：《elasticsearch权威指南》和《深入理解ElasticSearch》。

logstash是一个日志收集框架，其相关的插件logstash-input-jdbc能搞定数据从mysql到elasticsearch的同步。不支持删除同步，因此删除需要做成逻辑更新。

3 应用示例

主要参考：https://www.cnblogs.com/zuolun2017/p/8082996.html

3.1 安装elasticsearch

官方下载页面：https://www.elastic.co/downloads/elasticsearch
下载地址：https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.4.2.tar.gz

需要指定JAVA_HOME，直接运行bin/elasticsearch启动

查看：$curl http://localhost:9200/

{
  "name" : "pYmhVpz",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "W64pCc-yRGyVC4QKTCoRRA",
  "version" : {
    "number" : "6.4.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "04711c2",
    "build_date" : "2018-09-26T13:34:09.098244Z",
    "build_snapshot" : false,
    "lucene_version" : "7.4.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

最新版的elasticsearch的安装和配置，见文章：elasticserach安装

3.2 安装logstash

官方下载页面：https://www.elastic.co/downloads/logstash
下载地址：https://artifacts.elastic.co/downloads/logstash/logstash-6.4.2.tar.gz

创建配置文件，参考：https://www.elastic.co/guide/en/logstash/current/configuration.html

input { stdin { } }
output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

启动： bin/logstash -f logstash-simple.conf

安装插件： bin/logstash-plugin install logstash-input-jdbc

3.3 演示

准备数据库环境，并创建表结构：

CREATE TABLE `hotel` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `hotel_name` varchar(255) DEFAULT NULL,
  `photo_url` varchar(255) DEFAULT NULL,
  `last_modify_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;

CREATE TABLE `hotel_account` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `hotel_id` int(11) DEFAULT NULL,
  `finance_person` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;

与mysql相关的配置，比如jdbc.conf：

input {
    stdin {
    }
    jdbc {
      # mysql jdbc connection string to our backup databse
      jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
      # the user we wish to excute our statement as
      jdbc_user => "root"
      jdbc_password => "root"
      # the path to our downloaded jdbc driver
      jdbc_driver_library => "/home/tao/program/mysql-connector-java-5.1.14-bin.jar"
      # the name of the driver class for mysql
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "50000"
      statement_filepath => "jdbc.sql" #关联的要执行的SQL语句
      # 更多配置可看这里 https://www.cnblogs.com/zuolun2017/p/8082996.html
      schedule => "* * * * *"  #目前是1分钟执行一次，执行周期在这里？
      type => "jdbc" #这个会在doc的属性里体现？
    }
}

filter {
    json {
        source => "message"
        remove_field => ["message"]
    }
}

output {
    elasticsearch {
        hosts => ["http://localhost:9200"]、
        #port/host/protocol都不认
        index => "mysql01"
        document_id => "%{id}"
        ## cluster => "logstash-elasticsearch" 这个不认
    }
    stdout {
        codec => json_lines
    }
}

要执行的sql语句为：

select
    h.id as id,
    h.hotel_name as name,
    h.photo_url as img,
    ha.id as haId,
    ha.finance_person
from
    hotel h LEFT JOIN hotel_account ha on h.id = ha.hotel_id
where
    h.last_modify_time >= :sql_last_value  /*这个变量是logstash记录下来的，上次查询的时间戳*/

这样看来，在数据库端需要设置可更新的时间戳字段，以此作为增量更新分界线。

到计划时间后，logstash调用mysql连接（有池的概念吗？）执行查询，将结果解析并调用elasticsearch http api，将数据更新到后者。下面看一下数据：

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "mysql01",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "id" : 5,
          "name" : "马二帅酒店",
          "img" : "images/madashuai.img",
          "@timestamp" : "2018-10-18T08:41:01.622Z",
          "haid" : null,
          "type" : "jdbc",
          "@version" : "1",
          "finance_person" : null
        }
      },
      {
        "_index" : "mysql01",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "id" : 4,
          "name" : "马二帅酒店",
          "img" : "images/madashuai.img",
          "@timestamp" : "2018-10-18T08:41:01.621Z",
          "haid" : null,
          "type" : "jdbc",
          "@version" : "1",
          "finance_person" : null #关联表没有这个值
        }
      },
      {
        "_index" : "mysql01",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 2,
          "name" : "马二帅酒店",
          "img" : "images/madashuai.img",
          "@timestamp" : "2018-10-18T08:41:01.594Z",
          "haid" : 2,
          "type" : "jdbc",
          "@version" : "1",
          "finance_person" : "马二帅"
        }
      },
      {
        "_index" : "mysql01",
        "_type" : "doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "id" : 6,
          "name" : "taoych's hotel", #有更新
          "img" : "images/madashuai.img",
          "@timestamp" : "2018-10-18T08:59:00.041Z", #数据库中的时间戳
          "haid" : 3,
          "type" : "jdbc",
          "@version" : "1", #更新也没升版本？
          "finance_person" : "马二帅"
        }
      },
      {
        "_index" : "mysql01",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "id" : 3,
          "name" : "马二帅酒店",
          "img" : "images/madashuai.img",
          "@timestamp" : "2018-10-18T08:41:01.621Z",
          "haid" : null,
          "type" : "jdbc",
          "@version" : "1",
          "finance_person" : null
        }
      }
    ]
  }
}

4 应用方案

4.1 电商系统

这类系统，数据量大，分析和日志类的就用较多，可以用elasticsearch作为查询、全文检索的脚手架，需要事务和更新的内容放到mysql等传统关系型数据库，或mongodb等NOSQL数据库，再使用同步策略至elasticsearch。

4.2 博客系统

将博文提交给elasticsearch，搜索、统计将非常便利，这里要说的博文到底指什么，正文？像markdown生成的html站点如何集成？可以将这部分html静态部分，作为动态网站的一部分，elasticsearch展示的动态查询结果，可以高亮，再关联到html源页。这是比较快的集成，且变更不大，就是将md的内容直接塞给elasticsearch，再建立关联。甚至动态查询条件的组织直接在前端页面准备，nginx屏蔽掉DELETE/PUT/POST等请求。

4.3 其它应用

数据库在其它地方有一份，即时同步到elasticsearch，一个用于键值查询，比如返回ID和ID集，一个用于各种其它查询。