项目背景

随着业务的不断优化调整,开发的环境由传统环境转向Docker容器方向,各种开发过程和应用的日志变得更加种类繁多。 因此,集中式的日志管理与展示分析变得尤为重要。

项目实施架构

Docker环境搭建

Centos 7.5 docker ( 一 ) 安装

EFK简介

## Elasticsearch : 官网 https://www.elastic.co 分布式搜索引擎。具有高可伸缩、高可靠、易管理等特点。可以用于全文检索、结构化检索和分析,并能将这三者结合起来。 Elasticsearch 基于 Lucene 开发,现在使用最广的开源搜索引擎之一,Wikipedia 、StackOverflow、Github 等都基于它来构建自己的搜索引擎。 ## Fluentd (td-agent): https://www.fluentd.org 是开源社区中流行的日志采集器,提供了丰富的插件来适配不同的数据源、输出目的地等。 fluentd基于C和Ruby实现,并对性能表现关键的一些组件用C语言重新实现,整体性能不错 由于docker的log driver默认支持Fluentd,所以发送端默认选定Fluentd. td-agent是fluentd的易安装版本,由Treasure Data公司维护。一般会默认包含一些常用插件 fluentd适合折腾,td-agent适合安装在大规模的生产环境。 ## Kibana : 官网 https://www.elastic.co 可视化化平台。它能够搜索、展示存储在 Elasticsearch 中索引数据。使用它可以很方便的用图表、表格、地图展示和分析数据。 

术语约定

Elasticsearch 后续简称为 ES

三大日志采集器横向对比

网上转发前辈的对比

日志客户端(Logstash,Fluentd, Logtail)横评

阿里云ECS配置参数

ES+Kibana 所在主机 ecs.mn4.small 共享通用型 1核 4GB # cat /etc/centos-release CentOS Linux release 7.6.1810 (Core) # uname -r 4.4.162-1.el7.elrepo.x86_64 td-agent 所在主机 需要采集数据的每一台主机。注意,并不是在docker容器内安装。(当然,这视乎你是如何设计日志收集方式而定)

部署EFK

下载docker镜像

点开Download会有比较详细的指引

具体使用的指令如下:

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.5.4 docker tag docker.elastic.co/elasticsearch/elasticsearch:6.5.4 elasticsearch:6.5.4 docker rmi docker.elastic.co/elasticsearch/elasticsearch:6.5.4 docker pull docker.elastic.co/kibana/kibana:6.5.4 docker tag docker.elastic.co/kibana/kibana:6.5.4 kibana:6.5.4 docker rmi docker.elastic.co/kibana/kibana:6.5.4 以上指令看不懂的话,请自行补docker知识

ES + Kibana 所在宿主机内核参数优化

vim /etc/sysctl.conf vm.max_map_count=262144 立即生效 sysctl -w vm.max_map_count=262144 如果不设置,容器将会报如下错误 [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

docker-compose 配置生成容器

用到的一些辅助配置(这些需要你去了解ES和Kibana是如何工作的)

mkdir -p /data/docker/EFK cd /data/docker/EFK

elasticsearch 相关

主配置文件 vim elasticsearch.yml 内容如下 cluster.name: EFK # 这是注释:集群名称 node.name: host-elk01 # 集群中节点名称 path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch ES的JVM参数配置文件 vim jvm.options 内容如下,这些内容,可以从官方安装包内获取,这里只是把内容贴出来而已。 # JVM heap size,注意,此2个值要保持一致,否则会出现启动不了 -Xms1500m -Xmx1500m ## 以下内容我没有修改,建议不熟悉参数的话,使用官方提供的参数 ## GC configuration -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly # pre-touch memory pages used by the JVM during initialization -XX:+AlwaysPreTouch ## basic # explicitly set the stack size -Xss1m # set to headless, just in case -Djava.awt.headless=true # ensure UTF-8 encoding by default (e.g. filenames) -Dfile.encoding=UTF-8 # use our provided JNA always versus the system one -Djna.nosys=true # turn off a JDK optimization that throws away stack traces for common # exceptions because stack traces are important for debugging -XX:-OmitStackTraceInFastThrow # flags to configure Netty -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 # log4j 2 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=${ES_TMPDIR} ## heap dumps # generate a heap dump when an allocation from the Java heap fails # heap dumps are created in the working directory of the JVM -XX:+HeapDumpOnOutOfMemoryError # specify an alternative path for heap dumps; ensure the directory exists and # has sufficient space -XX:HeapDumpPath=data # specify an alternative path for JVM fatal error logs -XX:ErrorFile=logs/hs_err_pid%p.log ## JDK 8 GC logging 8:-XX:+PrintGCDetails 8:-XX:+PrintGCDateStamps 8:-XX:+PrintTenuringDistribution 8:-XX:+PrintGCApplicationStoppedTime 8:-Xloggc:logs/gc.log 8:-XX:+UseGCLogFileRotation 8:-XX:NumberOfGCLogFiles=32 8:-XX:GCLogFileSize=64m # JDK 9+ GC logging 9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m # due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise # time/date parsing will break in an incompatible way for some date patterns and locals 9-:-Djava.locale.providers=COMPAT # temporary workaround for C2 bug with JDK 10 on hardware with AVX-512 10-:-XX:UseAVX=2

kibana相关

vim kibana.yml 内容如下 xpack.monitoring.ui.container.elasticsearch.enabled: true # 这是注释,xpack是一个权限控制插件,30天试用。 server.port: 5601 # kibana 监听端口 server.host: "0" # kibana 监听地址,0代表所有地址(0.0.0.0/0) #server.basePath: "" #server.rewriteBasePath: false #server.maxPayloadBytes: 1048576 server.name: kibana # 服务器名称 elasticsearch.url: http://elasticsearch:9200 # ES服务器访问地址 #elasticsearch.preserveHost: true #kibana.index: ".kibana" #kibana.defaultAppId: "home" #elasticsearch.username: "user" #elasticsearch.password: "pass" #server.ssl.enabled: false #server.ssl.certificate: /path/to/your/server.crt #server.ssl.key: /path/to/your/server.key #elasticsearch.ssl.certificate: /path/to/your/client.crt #elasticsearch.ssl.key: /path/to/your/client.key #elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ] #elasticsearch.ssl.verificationMode: full #elasticsearch.pingTimeout: 1500 #elasticsearch.requestTimeout: 30000 #elasticsearch.requestHeadersWhitelist: [ authorization ] #elasticsearch.customHeaders: {} #elasticsearch.shardTimeout: 30000 #elasticsearch.startupTimeout: 5000 #elasticsearch.logQueries: false #pid.file: /var/run/kibana.pid #logging.dest: stdout #logging.silent: false #logging.quiet: false #logging.verbose: false #ops.interval: 5000 #i18n.locale: "en"

nginx 反向代理相关

因为要用到nginx反向代理,因此,会有一个容器专门运行nginx服务器 SSL证书可以采用Let’s Encrypt颁发的,免费使用90天,到期续约便可一直免费。 提供此nginx容器的主配置文件nginx.conf供参考 user nginx; worker_processes auto; worker_rlimit_nofile 60000; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { use epoll; worker_connections 10240; } http { server_tokens off; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; #### logs access_log /var/log/nginx/access.log main; sendfile on; tcp_nopush on; tcp_nodelay on; types_hash_max_size 2048; #### include include /usr/share/nginx/modules/*.conf; include /etc/nginx/mime.types; include /etc/nginx/conf.d/*.conf; default_type application/octet-stream; # server set include /data/nginx_conf/vhosts/*.conf; # upstream set # include /data/nginx_conf/upstream/*.conf; ##### Timeout keepalive_timeout 60; client_header_timeout 12; client_body_timeout 120; send_timeout 12; ##### post client_max_body_size 100M; # 这个自己设置了,允许上传的大小。 ##### Buffer client_body_buffer_size 128k; client_header_buffer_size 4k; client_body_in_single_buffer on; large_client_header_buffers 4 8k; open_file_cache max=60000 inactive=20s; open_file_cache_valid 30s; open_file_cache_min_uses 2; #### Compression gzip on; gzip_comp_level 6; gzip_min_length 1k; gzip_buffers 16 8k; gzip_types text/plain text/css text/xml application/xml text/Javascript application/Javascript application/x-Javascript application/x-httpd-php; gzip_vary off; gzip_disable "MSIE [1-6]\."; # proxy set proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_connect_timeout 300; proxy_read_timeout 300; proxy_send_timeout 300; proxy_intercept_errors off; proxy_ignore_client_abort on; 此容器的虚拟主机配置供参考 # server set server { listen 9001 default_server ssl; listen [::]:9001 default_server ssl; server_name demo.com; index index.html index.htm # SSL set ssl on; ssl_certificate "/你的证书链路径/fullchain.cer"; ssl_certificate_key "/你的证书私钥路径/demo.com.key"; ssl_session_cache shared:SSL:10m; ssl_session_timeout 30m; ssl_ciphers HIGH:!aNULL:!MD5; ssl_prefer_server_ciphers on; add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload"; ## kibana location / { auth_basic "Authorization"; # 用来在kibana访问前进行账号验证 auth_basic_user_file /etc/.htpasswd; # 参考下面的工具用法 proxy_pass http://172.18.1.222:5601; # 反向代理到内部的kibana主机 } } 

nginx 基本认证htpasswd工具用法参考

注意,因为kibana默认是没有权限管理的,暴露在公网上,必须在前面加一层nginx基本模块认证,且配置为https协议。 有钱的同学可以使用xpack。 nginx 的基本认证模块auth_basic 中用到的htpasswd加密文件,根据不同的容器需要安装以下组件 centos: httpd-tools alpine: apache2-utils 创建使用文本文件作为数据库 htpasswd [ -c ] [ -m ] [ -D ] passwdfile username htpasswd -b [ -c ] [ -m | -d | -p | -s ] [ -D ] passwdfile username password -c:自动创建文件,仅在文件不存在时使用 -m:md5格式加密,默认方式 -s: sha格式加密 -D:删除指定用户 -b: 批处理时使用,可以通过命令行直接读取密码而不是交互。 -n: 不更新文件,仅仅屏幕输出命令执行结果。 交互方式: #htpasswd -c /etc/httpd/conf.d/.htpasswd hunk1 New password: Re-type new password: Adding password for user hunk1 非交互方式: #htpasswd -bs /etc/httpd/conf.d/.htpasswd hunk2 1234567 Adding password for user hunk2 生成的密码是经过加密的 #cat .htpasswd hunk1:xLhgTub5K6Css hunk2:{SHA}IOq+XWSw4hZ5boNPUtYf0LcDMvw= 仅仅显示命令执行效果 #htpasswd -nbs hunk3 1234567 hunk3:{SHA}IOq+XWSw4hZ5boNPUtYf0LcDMvw= 删除指定用户 #htpasswd -D /etc/httpd/conf.d/.htpasswd hunk2 Deleting password for user hunk2

docker-compose.yml编排

docker创建外部自定义网络和数据卷,注意,这是使用docker-compose编排启动的必要条件 docker network create efk docker volume create elasticsearch #################### docker-compose.yml 内容如下 version: "2.4" ## 声明网络 networks: efk: external: true ## 声明数据卷 volumes: elasticsearch: external: true ### 服务 services: elasticsearch: image: elasticsearch:6.5.4 container_name: elasticsearch environment: - cluster.name=EFK - node.name=host-elk01 - bootstrap.memory_lock=true - "discovery.zen.ping.unicast.hosts=elasticsearch" ulimits: memlock: soft: -1 hard: -1 networks: - efk ports: - "9200:9200" volumes: - /etc/localtime:/etc/localtime - "elasticsearch:/usr/share/elasticsearch/data" - "/data/docker/EFK/jvm.options:/usr/share/elasticsearch/config/jvm.options" restart: "always" logging: driver: "json-file" options: max-size: "200k" max-file: "1" kibana: image: kibana:6.5.4 container_name: kibana networks: - efk ports: - "5601:5601" volumes: - /etc/localtime:/etc/localtime - /data/docker/EFK/kibana.yml:/usr/share/kibana/config/kibana.yml restart: "always" logging: driver: "json-file" options: max-size: "200k" max-file: "1" ### 注意,以下服务的镜像属于自定义的,请参考上面的给的参考文件。基于nginx-alpine制作。 efk-proxy: image: efk-proxy:latest container_name: efk-proxy networks: - efk ports: - "9001:9001" volumes: - /etc/localtime:/etc/localtime - "/data/docker/EFK/ssl/demo.com:/data/ssl/demo.com:ro" command: ["nginx", "-g", "daemon off;"] restart: "always" depends_on: - kibana logging: driver: "json-file" options: max-size: "200k" max-file: "1"

启动ES + Kibana + nginx

docker-compose up -d 宿主机会出现端口5601,9001和9200 简单验证下 # curl 172.18.1.222:9200 { "name" : "host-elk01", "cluster_name" : "EFK", "cluster_uuid" : "CMJ4F-E5TcypIhrReze7mQ", "version" : { "number" : "6.5.4", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "d2ef93d", "build_date" : "2018-12-17T21:17:40.758843Z", "build_snapshot" : false, "lucene_version" : "7.5.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" } 

打开浏览器,输入EFK对应的域名

部署 fluentd (td-agent)

在需要采集数据的机器安装client 查看是否安装: rpm -qa|grep td-agent 按系统版本选择操作 https://www.fluentd.org Installation Guide 提供了安装指引 

查看本地安装了哪些组件

# td-agent-gem list --local *** LOCAL GEMS *** addressable (2.5.2) elasticsearch (6.1.0) elasticsearch-api (6.1.0) elasticsearch-transport (6.1.0) excon (0.62.0) faraday (0.15.3) fluent-config-regexp-type (1.0.0) fluent-logger (0.7.2) fluent-plugin-elasticsearch (2.11.11) fluent-plugin-kafka (0.7.9) fluent-plugin-record-modifier (1.1.0) fluent-plugin-rewrite-tag-filter (2.1.0) fluent-plugin-s3 (1.1.6) fluent-plugin-td (1.0.0) fluent-plugin-td-monitoring (0.2.4) fluentd (1.2.6) 以上由于篇幅有限,仅仅列出一部分,其中包括了后续要使用到的fluent-plugin-elasticsearch

td-agent 配置文件

不同的操作系统位置不一样,centos系统如下 /etc/td-agent/td-agent.conf 默认会带有一些示例配置,此处暂时不作变更,后续会单独篇章讲解。 默认配置文件路径:/etc/td-agent/td-agent.conf 默认日志文件路径:/var/log/td-agent/td-agent.log 可以从这个日志文件中查看td-agent服务运行日志/报错信息 

td-agent相关操作指令

设置开自动启动: systemctl enable td-agent.service 启动: systemctl start td-agent 重新启动: systemctl restart td-agent.service 热加载配置文件: systemctl reload td-agent.service 停止服务: systemctl stop td-agent.service 检查是否设置了开机启动: systemctl is-enabled td-agent.service enabled:已开启 disabled:已关闭 启动之后,默认会监听tcp和udp的24224端口