flume怎么采集远程服务器上的日志
发布网友
发布时间:2022-04-21 01:01
我来回答
共1个回答
热心网友
时间:2022-06-16 23:47
log4j.rootLogger=INFO,A1,R
# ConsoleAppender out
log4j.appender.A1= org. apache.log4j.ConsoleAppender
log4j.appender.A1.layout= org. apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d{ yyyy/MM/ ddHH:mm:ss}%-5p%-10C {1} %m%n
# File out
//日志Appender修改为flume提供的Log4jAppender
log4j.appender.R= org. apache. flume.clients.log4jappender.Log4jAppender
log4j.appender.R.File=${ catalina.home}/logs/ ultraIDCPServer.log
//日志需要发送到的端口号,该端口要有ARVO类型的source在监听
log4j.appender.R.Port =44444
//日志需要发送到的主机ip,该主机运行着ARVO类型的source
log4j.appender.R.Hostname = localhost
log4j.appender.R.MaxFileSize=102400KB
# log4j.appender.R.MaxBackupIndex=5
log4j.appender.R.layout= org. apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%d{ yyyy/MM/ ddHH\: mm\: ss}%-5p%-10C {1} %m%n
log4j.appender.R.encoding=UTF-8
log4j.logger.com.ultrapower.ultracollector.webservice.MessageIntercommunionInterfaceImpl=INFO, webservice
log4j.appender.webservice= org. apache.log4j.FileAppender
log4j.appender.webservice.File=${ catalina.home}/logs/logsMsgIntercommunionInterface.log
log4j.appender.webservice.layout= org. apache.log4j.PatternLayout
log4j.appender.webservice.layout.ConversionPattern=%d{ yyyy/MM/ ddHH\: mm\: ss}%-5p[%t]%l%X-%m%n
log4j.appender.webservice.encoding=UTF-8
注:Log4jAppender继承自AppenderSkeleton,没有日志文件达到特定大小,转换到新的文件的功能
1.1.3. flume agent配置
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1.sources.source1.type = avro
agent1.sources.source1.bind = 192.168.0.141
agent1.sources.source1.port = 44444
# Describe sink1
agent1.sinks.sink1.type = FILE_ROLL
agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapactiy = 100
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
注:生成的文件的规则为每隔固定时间间隔生成一个新的文件,文件里面保存该时间段agent接收到的信息
1.2. 分析
1. 使用简便,工作量小。
2. 用户应用程序使用log4j作为日志记录jar包,而且项目中使用的jar包要在log4j-1.2.15版本以上,
3. 应用系统必须将flume所需jar包引入到项目中。如下所示为所有必须jar包:可能会存在jar冲突,影响应用运行
4. 能够提供可靠的数据传输,使用flume log4jAppender采集日志可以不在客户机上启动进程,而只通过修改logapppender直接把日志信息发送到采集机(参见图一),此种情况可以保证采集机接受到数据之后的数据可靠性,但是客户机与采集机连接失败时候数据会丢失。改进方案是在客户机上启动一个agent,这样可以保证客户机和采集机不能连通时,当能连通是日志也被采集上来,不会发送数据的丢失(参见图二),为了可靠性,需在客户机上启动进程
1.3. 日志代码
Log.info(“this message has DEBUG in it”);
1.4. 采集到的数据样例
this message has DEBUG in it
this message has DEBUG in it
2. Exec source(放弃)
The problem with ExecSource and other asynchronous sources is that thesource can not guarantee that if there is a failure to put the event into theChannel the client knows about it. In such cases, the data will be lost. As afor instance, one of the most commonly requested features is thetail -F [file]-like use casewhere an application writes to a log file on disk and Flume tails the file,sending each line as an event. While this is possible, there’s an obviousproblem; what happens if the channel fills up and Flume can’t send an event?Flume has no way of indicating to the application writing the log file that itneeds to retain the log or that the event hasn’t been sent, for some reason. Ifthis doesn’t make sense, you need only know this: Your application can neverguarantee data has been received when using a unidirectional asynchronousinterface such as ExecSource! As an extension of this warning - and to becompletely clear - there is absolutely zero guarantee of event delivery whenusing this source. You have been warned.
注:即使是agent内部的可靠性都不能保证
2.1. 使用说明
2.1.1. flume agent配置
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'
# example.conf: A single-node Flume configuration
# Name the components on this agent
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# Describe/configure source1
#agent1.sources.source1.type = avro
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /home/yubojie/logs/ultraIDCPServer.log
#agent1.sources.source1.bind = 192.168.0.146
#agent1.sources.source1.port = 44444
agent1.sources.source1.interceptors = a
agent1.sources.source1.interceptors.a.type = org.apache.flume.interceptor.HostInterceptor$Builder
agent1.sources.source1.interceptors.a.preserveExisting = false
agent1.sources.source1.interceptors.a.hostHeader = hostname
# Describe sink1
#agent1.sinks.sink1.type = FILE_ROLL
#agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = hdfs://localhost:9000/user/
agent1.sinks.sink1.hdfs.fileType = DataStream
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapactiy = 100
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
2.2. 分析
1. tail方式采集日志需要宿主主机能够执行tail命令,应该是只有linux系统可以执行,不支持window系统日志采集
2. EXEC采用异步方式采集,会发生日志丢失,即使在节点内的数据也不能保证数据的完整
3. tail方式采集需要宿主操作系统支持tail命令,即原始的windows操作系统不支持tail命令采集
2.3. 采集到的数据样例
2012/10/26 02:36:34 INFO LogTest this message has DEBUG 中文 in it
2012/10/26 02:40:12 INFO LogTest this message has DEBUG 中文 in it
2.4. 日志代码
Log.info(“this message has DEBUG 中文 in it”);
3. Syslog
Passing messages using syslogprotocol doesn't work well for longer messages. The syslog appender forLog4j is hardcoded to linewrap around 1024 characters in order to comply withthe RFC. I got a sample program logging to syslog, picking it up with asyslogUdp source, with a JSON layout (to avoid new-lines in stack traces) onlyto find that anything but the smallest stack trace line-wrapped anyway. Ican't see a way to reliably reconstruct the stack trace once it is wrapped andsent through the flume chain.(注:内容不确定是否1.2版本)
Syslog TCP需要指定eventsize,默认为2500
Syslog UDP为不可靠传输,数据传输过程中可能出现丢失数据的情况。