Logstash笔记(二)----input插件
在"hello World" 示例中,我们已经见到并介绍了Logstash 的运行流程和配置的基础语法。 请记住一个原则:Logstash 配置一定要有一个 input 和一个 output。在演示过程中,如果没有写明 input,默认就会使用 "hello world" 里我们已经演示过的 input/stdin ,同理,没有写明的 output 就是 output/stdout

目前创新互联公司已为数千家的企业提供了网站建设、域名、虚拟空间、成都网站托管、企业网站设计、上林网站维护等服务,公司将坚持客户导向、应用为本的策略,正道将秉承"和谐、参与、激情"的文化,与客户和合作伙伴齐心协力一起成长,共同发展。
如果有什么问题的话,请查看该文档:http://udn.yyuap.com/doc/logstash-best-practice-cn/input/index.html。以下是input插件的具体解释:
(1),标准输入。type和tags是logstash事件中特殊的字段。 type 用来标记事件类型—— 我们肯定是提前能知道这个事件属于什么类型的。而 tags 则是在数据处理过程中,由具体的插件来添加或者删除的。
[root@localhost test]# vim stdin.conf
input {
stdin {
add_field => {"key" => "value"}
codec => "plain"
tags => ["add"]
type => "std-lqb"
}
}
output {
stdout {
codec => rubydebug
}
}
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/stdin.conf
Settings: Default pipeline workers: 1
Logstash startup completed
hello world
{
"message" => "hello world",
"@version" => "1",
"@timestamp" => "2017-05-24T08:11:45.852Z",
"type" => "std-lqb",
"key" => "value",
"tags" => [
[0] "add"
],
"host" => "localhost.localdomain"
}
abclqb
{
"message" => "abclqb",
"@version" => "1",
"@timestamp" => "2017-05-24T08:13:21.192Z",
"type" => "std-lqb",
"key" => "value",
"tags" => [
[0] "add"
],
"host" => "localhost.localdomain"
}
#####对stdin进行修改,添加tags列
[root@localhost test]# vim stdin.conf
input {
stdin {
add_field => {"key" => "value2222222222222222222222222222222222222222222
2"}
codec => "plain"
tags => ["add","xxyy","abc"]
type => "std-lqb"
}
}
output {
stdout {
codec => rubydebug
}
}
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/stdin.conf
Settings: Default pipeline workers: 1
Logstash startup completed
hello world
{
"message" => "hello world",
"@version" => "1",
"@timestamp" => "2017-05-24T09:07:43.228Z",
"type" => "std-lqb",
"key" => "value22222222222222222222222222222222222222222222",
"tags" => [
[0] "add",
[1] "xxyy",
[2] "abc"
],
"host" => "localhost.localdomain"
}
#########根据tags来进行判断:
[root@localhost test]# vim stdin_2.conf
input {
stdin {
add_field =>{"key11"=>"value22"}
codec=>"plain"
tags=>["add","xxyy"]
type=>"std"
}
}
output {
if "tttt" in [tags]{
stdout {
codec=>rubydebug{}
}
}
else if "add" in [tags]{
stdout {
codec=>json
}
}
}
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/stdin_2.con
f
Settings: Default pipeline workers: 1
Logstash startup completed
yyxxx
{"message":"yyxxx","@version":"1","@timestamp":"2017-05-24T09:32:25.840Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"}
{"message":"","@version":"1","@timestamp":"2017-05-24T09:32:32.480Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"}xxyy
{"message":"xxyy","@version":"1","@timestamp":"2017-05-24T09:32:42.249Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"}(2).读取文件。Logstash 使用一个名叫 FileWatch 的 Ruby Gem 库来监听文件变化。这个库支持 glob 展开文件路径,而且会记录一个叫 .sincedb 的数据库文件来跟踪被监听的日志文件的当前读取位置。所以,不要担心 logstash 会漏过你的数据.
[root@localhost test]# cat log.conf
input {
file {
path =>"/usr/local/nginx/logs/access.log"
type=>"system"
start_position =>"beginning"
}
}
output {
stdout {
codec => rubydebug
}
}
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/log.conf
Settings: Default pipeline workers: 1
Logstash startup completed
{
"message" => "192.168.181.231 - - [24/May/2017:15:04:29 +0800] \"GET / HTTP/1.1\" 502 537 \"-\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\" \"-\"",
"@version" => "1",
"@timestamp" => "2017-05-24T09:39:16.600Z",
"path" => "/usr/local/nginx/logs/access.log",
"host" => "localhost.localdomain",
"type" => "system"
}
{
"message" => "192.168.181.231 - - [24/May/2017:15:04:32 +0800] \"GET / HTTP/1.1\" 502 537 \"-\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\" \"-\"",
"@version" => "1",
"@timestamp" => "2017-05-24T09:39:16.614Z",
"path" => "/usr/local/nginx/logs/access.log",
"host" => "localhost.localdomain",
"type" => "system"
}解释:
有一些比较有用的配置项,可以用来指定 FileWatch 库的行为:
discover_interval
logstash 每隔多久去检查一次被监听的 path 下是否有新文件。默认值是 15 秒。
exclude
不想被监听的文件可以排除出去,这里跟 path 一样支持 glob 展开。
sincedb_path
如果你不想用默认的 $HOME/.sincedb(Windows 平台上在 C:\Windows\System32\config\systemprofile\.sincedb),可以通过这个配置定义 sincedb 文件到其他位置。
sincedb_write_interval
logstash 每隔多久写一次 sincedb 文件,默认是 15 秒。
stat_interval
logstash 每隔多久检查一次被监听文件状态(是否有更新),默认是 1 秒。
start_position
logstash 从什么位置开始读取文件数据,默认是结束位置,也就是说 logstash 进程会以类似 tail -F 的形式运行。如果你是要导入原有数据,把这个设定改成 "beginning",logstash 进程就从头开始读取,有点类似 cat,但是读到最后一行不会终止,而是继续变成 tail -F。
注意
通常你要导入原有数据进 Elasticsearch 的话,你还需要 filter/date 插件来修改默认的"@timestamp" 字段值。稍后会学习这方面的知识。
FileWatch 只支持文件的绝对路径,而且会不自动递归目录。所以有需要的话,请用数组方式都写明具体哪些文件。
LogStash::Inputs::File 只是在进程运行的注册阶段初始化一个 FileWatch 对象。所以它不能支持类似 fluentd 那样的
path => "/path/to/%{+yyyy/MM/dd/hh}.log"写法。达到相同目的,你只能写成path => "/path/to/*/*/*/*.log"。start_position仅在该文件从未被监听过的时候起作用。如果 sincedb 文件中已经有这个文件的 inode 记录了,那么 logstash 依然会从记录过的 pos 开始读取数据。所以重复测试的时候每回需要删除 sincedb 文件。因为 windows 平台上没有 inode 的概念,Logstash 某些版本在 windows 平台上监听文件不是很靠谱。windows 平台上,推荐考虑使用 nxlog 作为收集端
(3).TCP输入。未来你可能会用 redis 服务器或者其他的消息队列系统来作为 logstash broker 的角色。不过 Logstash 其实也有自己的 TCP/UDP 插件,在临时任务的时候,也算能用,尤其是测试环境。
[root@localhost test]# cat tcp.conf
input {
tcp {
port =>8888
mode=>"server"
ssl_enable =>false
}
}
output {
stdout {
codec => rubydebug
}
}
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/tcp.conf
Settings: Default pipeline workers: 1
Logstash startup completed
{
"message" => "GET /jenkins/ HTTP/1.1\r",
"@version" => "1",
"@timestamp" => "2017-05-24T10:09:53.980Z",
"host" => "192.168.181.231",
"port" => 59426
}
{
"message" => "Host: 192.168.180.9:8888\r",
"@version" => "1",
"@timestamp" => "2017-05-24T10:09:54.175Z",
"host" => "192.168.181.231",
"port" => 59426
}
{
"message" => "Connection: keep-alive\r",
"@version" => "1",
"@timestamp" => "2017-05-24T10:09:54.180Z",
"host" => "192.168.181.231",
"port" => 59426
}备注:先关闭8888端口的应用,再开启,会输出如下日志。
(4)编码插件Codec:
Codec 是 logstash 从 1.3.0 版开始新引入的概念(Codec 来自 Coder/decoder 两个单词的首字母缩写)。在此之前,logstash 只支持纯文本形式输入,然后以过滤器处理它。但现在,我们可以在输入 期处理不同类型的数据,这全是因为有了 codec 设置。我们在第一个“Hello world”列子中已经用过Codec编码了,rubydebug就是一种Codec虽然它一般只会在stdout插件中,作为配置测试或者调试的工具。
(4.1)采用JSON编码,直接输入预定义好的 JSON 数据,这样就可以省略掉 filter/grok 配置!
配置实例以nginx为例,具体步骤如下:
a,编辑配置nginx配置文件nginx.conf。把原先的配置文件注释掉,换成json的格式,然后重启下你的nginx
[root@localhost test]# vim /usr/local/nginx/conf/nginx.conf
user ftp;
worker_processes 2;
worker_rlimit_nofile 65535;
events {
use epoll;
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
include proxy.conf;
#log_format main '$remote_addr - $remote_user [$time_local] "$request" '
# '$status $body_bytes_sent "$http_referer" '
# '"$http_user_agent" "$http_x_forwarded_for"';
#
log_format json '{"@timestamp":"$time_iso8601",'
'"@version":"1",'
'"host":"$server_addr",'
'"client":"$remote_addr",'
'"size":$body_bytes_sent,'
'"responsetime":$request_time,'
'"domain":"$host",'
'"url":"$uri",'
'"status":"$status"}';
access_log logs/nginx_access.log json;
# access_log logs/access.log main;
####################注意:在$request_time和$body_bytes_sent 变量两头没有双引号"" ,这两个数据在JSON
里应该是数值类型。b,编辑下你的logstash配置文件json.conf
[root@localhost test]# vim json.conf
input {
file {
path => "/usr/local/nginx/logs/nginx_access.log"
type => "nginx"
start_position => "beginning"
add_field => { "key"=>"value"}
codec => "json"
}
}
output {
stdout{
codec => rubydebug{ }
}
}c,logstash加载启动测试:
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/json.conf
Settings: Default pipeline workers: 1
Logstash startup completed
{
"@timestamp" => "2017-05-25T03:26:19.000Z",
"@version" => "1",
"host" => "192.168.180.9",
"client" => "192.168.181.231",
"size" => 8250,
"responsetime" => 0.157,
"domain" => "192.168.180.9",
"url" => "/",
"status" => "200",
"path" => "/usr/local/nginx/logs/nginx_access.log",
"type" => "nginx",
"key" => "value"
}
{
"@timestamp" => "2017-05-25T03:26:19.000Z",
"@version" => "1",
"host" => "192.168.180.9",
"client" => "192.168.181.231",
"size" => 450,
"responsetime" => 0.017,
"domain" => "192.168.180.9",
"url" => "/sc.do",
"status" => "200",
"path" => "/usr/local/nginx/logs/nginx_access.log",
"type" => "nginx",
"key" => "value"
}
{
"@timestamp" => "2017-05-25T03:26:19.000Z",
"@version" => "1",
"host" => "192.168.180.9",
"client" => "192.168.181.231",
"size" => 16,
"responsetime" => 0.083,
"domain" => "192.168.180.9",
"url" => "/logger/catch.do",
"status" => "200",
"path" => "/usr/local/nginx/logs/nginx_access.log",
"type" => "nginx",
"key" => "value"
}
{
"@timestamp" => "2017-05-25T03:26:19.000Z",
"@version" => "1",
"host" => "192.168.180.9",
"client" => "192.168.181.231",
"size" => 41153,
"responsetime" => 0.362,
"domain" => "192.168.180.9",
"url" => "/getPageData.do",
"status" => "200",
"path" => "/usr/local/nginx/logs/nginx_access.log",
"type" => "nginx",
"key" => "value"
}
{
"@timestamp" => "2017-05-25T03:26:20.000Z",
"@version" => "1",
"host" => "192.168.180.9",
"client" => "192.168.181.231",
"size" => 51042,
"responsetime" => 0.565,
"domain" => "192.168.180.9",
"url" => "/getPageData.do",
"status" => "200",
"path" => "/usr/local/nginx/logs/nginx_access.log",
"type" => "nginx",
"key" => "value"(4.2)合并多行数据(Multiline):有些时候,应用程序调试日志会包含非常丰富的内容,为一个事件打印出很多行内容。这种日志通常都很难通过命令行解析的方式做分析。 logstash 正为此准备好了 codec/multiline 插件。multiline 插件也可以用于其他类似的堆栈式信息,比如 linux 的内核日志。
当启动logstash及配置文件时会让你输入一连串的字符,知道输入[ 时才终止当前输入,如下:
[
root@localhost test]# vim multiline.conf
input {
stdin {
codec => multiline {
pattern => "^\["
negate => true
what => "previous"
}
}
}
output {
stdout {
codec => rubydebug{ }
}
}
[root@localhost logstash]# /usr/local/logstash/bin/logstash -f test/multiline.c
onf
Settings: Default pipeline workers: 1
Logstash startup completed
hello
hello world
how are you
abc2345
[
{
"@timestamp" => "2017-05-25T03:44:35.604Z",
"message" => "[\nhello\nhello world\nhow are you \nabc2345",
"@version" => "1",
"tags" => [
[0] "multiline"
],
"host" => "localhost.localdomain"
}总之,这个插件的原理很简单,就是把当前行的数据添加到前面一行后面,直到新进的当前行匹配 "[" 正则为止。这个正则还可以用 grok 表达式。
当前文章:Logstash笔记(二)----input插件
网页URL:http://www.jxruijie.cn/article/jcjdde.html
