flume怎么采集远程服务器上的日志

发布网友发布时间：2022-04-21 01:01

共1个回答

热心网友时间：2022-06-16 23:47

log4j.rootLogger=INFO,A1,R

# ConsoleAppender out

log4j.appender.A1= org. apache.log4j.ConsoleAppender

log4j.appender.A1.layout= org. apache.log4j.PatternLayout

log4j.appender.A1.layout.ConversionPattern=%d{ yyyy/MM/ ddHH:mm:ss}%-5p%-10C {1} %m%n

# File out

//日志Appender修改为flume提供的Log4jAppender

log4j.appender.R= org. apache. flume.clients.log4jappender.Log4jAppender

log4j.appender.R.File=${ catalina.home}/logs/ ultraIDCPServer.log

//日志需要发送到的端口号，该端口要有ARVO类型的source在监听

log4j.appender.R.Port =44444

//日志需要发送到的主机ip，该主机运行着ARVO类型的source

log4j.appender.R.Hostname = localhost

log4j.appender.R.MaxFileSize=102400KB

# log4j.appender.R.MaxBackupIndex=5

log4j.appender.R.layout= org. apache.log4j.PatternLayout

log4j.appender.R.layout.ConversionPattern=%d{ yyyy/MM/ ddHH\: mm\: ss}%-5p%-10C {1} %m%n

log4j.appender.R.encoding=UTF-8

log4j.logger.com.ultrapower.ultracollector.webservice.MessageIntercommunionInterfaceImpl=INFO, webservice

log4j.appender.webservice= org. apache.log4j.FileAppender

log4j.appender.webservice.File=${ catalina.home}/logs/logsMsgIntercommunionInterface.log

log4j.appender.webservice.layout= org. apache.log4j.PatternLayout

log4j.appender.webservice.layout.ConversionPattern=%d{ yyyy/MM/ ddHH\: mm\: ss}%-5p[%t]%l%X-%m%n

log4j.appender.webservice.encoding=UTF-8

注：Log4jAppender继承自AppenderSkeleton，没有日志文件达到特定大小，转换到新的文件的功能

1.1.3. flume agent配置

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

# Describe/configure source1

agent1.sources.source1.type = avro

agent1.sources.source1.bind = 192.168.0.141

agent1.sources.source1.port = 44444

# Describe sink1

agent1.sinks.sink1.type = FILE_ROLL

agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.capacity = 1000

agent1.channels.channel1.transactionCapactiy = 100

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

注：生成的文件的规则为每隔固定时间间隔生成一个新的文件，文件里面保存该时间段agent接收到的信息

1.2. 分析

1. 使用简便，工作量小。

2. 用户应用程序使用log4j作为日志记录jar包，而且项目中使用的jar包要在log4j-1.2.15版本以上，

3. 应用系统必须将flume所需jar包引入到项目中。如下所示为所有必须jar包：可能会存在jar冲突，影响应用运行

4. 能够提供可靠的数据传输，使用flume log4jAppender采集日志可以不在客户机上启动进程，而只通过修改logapppender直接把日志信息发送到采集机（参见图一），此种情况可以保证采集机接受到数据之后的数据可靠性，但是客户机与采集机连接失败时候数据会丢失。改进方案是在客户机上启动一个agent，这样可以保证客户机和采集机不能连通时，当能连通是日志也被采集上来，不会发送数据的丢失（参见图二），为了可靠性，需在客户机上启动进程

1.3. 日志代码

Log.info(“this message has DEBUG in it”);

1.4. 采集到的数据样例

this message has DEBUG in it

this message has DEBUG in it

2. Exec source（放弃）

The problem with ExecSource and other asynchronous sources is that thesource can not guarantee that if there is a failure to put the event into theChannel the client knows about it. In such cases, the data will be lost. As afor instance, one of the most commonly requested features is thetail -F [file]-like use casewhere an application writes to a log file on disk and Flume tails the file,sending each line as an event. While this is possible, there’s an obviousproblem; what happens if the channel fills up and Flume can’t send an event?Flume has no way of indicating to the application writing the log file that itneeds to retain the log or that the event hasn’t been sent, for some reason. Ifthis doesn’t make sense, you need only know this: Your application can neverguarantee data has been received when using a unidirectional asynchronousinterface such as ExecSource! As an extension of this warning - and to becompletely clear - there is absolutely zero guarantee of event delivery whenusing this source. You have been warned.

注：即使是agent内部的可靠性都不能保证

2.1. 使用说明

2.1.1. flume agent配置

# The configuration file needs to define the sources,

# the channels and the sinks.

# Sources, channels and sinks are defined per agent,

# in this case called 'agent'

# example.conf: A single-node Flume configuration

# Name the components on this agent

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

# Describe/configure source1

#agent1.sources.source1.type = avro

agent1.sources.source1.type = exec

agent1.sources.source1.command = tail -f /home/yubojie/logs/ultraIDCPServer.log

#agent1.sources.source1.bind = 192.168.0.146

#agent1.sources.source1.port = 44444

agent1.sources.source1.interceptors = a

agent1.sources.source1.interceptors.a.type = org.apache.flume.interceptor.HostInterceptor$Builder

agent1.sources.source1.interceptors.a.preserveExisting = false

agent1.sources.source1.interceptors.a.hostHeader = hostname

# Describe sink1

#agent1.sinks.sink1.type = FILE_ROLL

#agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out

agent1.sinks.sink1.type = hdfs

agent1.sinks.sink1.hdfs.path = hdfs://localhost:9000/user/

agent1.sinks.sink1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.capacity = 1000

agent1.channels.channel1.transactionCapactiy = 100

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

2.2. 分析

1. tail方式采集日志需要宿主主机能够执行tail命令，应该是只有linux系统可以执行，不支持window系统日志采集

2. EXEC采用异步方式采集，会发生日志丢失，即使在节点内的数据也不能保证数据的完整

3. tail方式采集需要宿主操作系统支持tail命令，即原始的windows操作系统不支持tail命令采集

2.3. 采集到的数据样例

2012/10/26 02:36:34 INFO LogTest this message has DEBUG 中文 in it

2012/10/26 02:40:12 INFO LogTest this message has DEBUG 中文 in it

2.4. 日志代码

Log.info(“this message has DEBUG 中文 in it”);

3. Syslog

Passing messages using syslogprotocol doesn't work well for longer messages. The syslog appender forLog4j is hardcoded to linewrap around 1024 characters in order to comply withthe RFC. I got a sample program logging to syslog, picking it up with asyslogUdp source, with a JSON layout (to avoid new-lines in stack traces) onlyto find that anything but the smallest stack trace line-wrapped anyway. Ican't see a way to reliably reconstruct the stack trace once it is wrapped andsent through the flume chain.（注：内容不确定是否1.2版本）

Syslog TCP需要指定eventsize，默认为2500

Syslog UDP为不可靠传输，数据传输过程中可能出现丢失数据的情况。

日志平台的一点思考

日志要搜索，就要集中存储，就要采集日志，以前日志采集分2种，一种是agent的方式，一种是agentless的方式，前者是在要采集的服务器上部署一个agent，agent将日志不断的发送给日志server端，agentless的方式是通过类似ssh远程登录服务器去抓日志。agentless的方式不需要部署agent，一般是定时的方式去拉日志过来...

如何对光传输设备进行调试和维护？

对光传输设备进行调试和维护，需要掌握以下关键点：1. **熟悉设备性能**：深入了解OTN/DWDM等光传输产品的性能特点，确保能够熟练使用设备。2. **按规范操作**：依据产品说明书和维护手册进行调试和维护，确保操作步骤正确无误。3. **故障排查**：掌握常见故障现象及处理方法，能够迅速定位问题并进行修复。4. **性能监测**：定期监测设备性能指标，确保设备运行状态良好，及时发现并解决潜在问题。5. **专业培训**：参与相关培训，不断提升自身技能水平，以更好地应对各种挑战。通过以上措施，可以确保光传输设备的稳定运行，提升网络可靠性和智能…对光传输设备进行调试和维护需要掌握一定的专业知识和技术，包括光功率计、光谱分析仪等测试仪器的使用，以及对光传输系统的工作原理和性能指标的深入理解。深圳市光派通信技术有限公司提供专业的技术支持和服务，可为客户提供全面的光传输设备...

大数据处理流程的第一步是

大数据处理流程的第一步是收集数据。大数据处理流程包括：数据采集、数据预处理、数据入库、数据分析、数据展现。1、数据采集概念：目前行业会有两种解释：一是数据从无到有的过程（web服务器打印的日志、自定义采集的日志等）叫做数据采集；另一方面也有把通过使用Flume等工具把数据采集到指定位置的这个过程叫...

常见的大数据采集工具有哪些?

2、实时搜集工具：Flume/Kafka 实时搜集首要用在考虑流处理的事务场景，比方，用于记录数据源的履行的各种操作活动，比方网络监控的流量办理、金融运用的股票记账和 web 服务器记录的用户访问行为。在流处理场景，数据搜集会成为Kafka的顾客，就像一个水坝一般将上游源源不断的数据拦截住，然后依据事务场景做...

数据仓库与ODS的区别,数据仓库和ODS并存方案

做法也很简单,由Flume在前端日志服务器上收集网站日志和广告日志,实时的发送给Spark Streaming,由Spark Streaming完成统计,将数据存储至Redis,业务通过访问Redis实时获取。任务调度与监控在数据仓库/数据平台中,有各种各样非常多的程序和任务,比如:数据采集任务、数据同步任务、数据分析任务等; 这些任务除了定时调度,还存...

如何利用GPS获取海洋数据?

01数据采集层数据采集层主要利用开源组件Flume对日志文件进行采集。Flume是一个分布式、高可靠、高可用的海量日志采集软件，支持定制各类的数据发送方，在收集数据的同时能够对数据进行简单的处理，然后写到各种数据接收方。目前我们是对Flume采集的日志文件做两个操作，一是直接发送给kafka进行缓存，二是将数据...

Flume之各种 Channel 的介绍及参数解析

这样的配置既能降低服务器的资源使用率又能减少Event在网络之间的传输，有效提高日志采集系统的性能。Kafka Channel相关操作在org.apache.flume.channel.kafka包的KafkaChannel类定义，kafka相关参数的默认值在org.apache.kafka.clients.CommonClientConfigs包中的KafkaChannel-Configuration中。Kafka的通用配置参数在...

技术面试会问很多技术问题吗

6、熟悉rsyslog, flume等日志收集和处理系统;7、具有强烈的安全意识及较强的沟通协调和学习能力,良好的团队合作精神,工作积极主动。过去之后,前台美眉把我带到他们公司的地下室,我扫视了一下周围的环境,貌似旁边就是机房,因为我听到服务器的声音。等了几分钟,面试官下来了,面试官目测比较瘦,看着跟我身材差不多(...

如何使用Spooling Directory Source

嗯，想法不错，逻辑上看上去也应该没啥问题，于是就开始吭哧吭哧写脚本了。FTP上每个信令数据的每个文件的大小差不多都有300M左右。SCP到远端服务器也没出现问题，可就是agent老是会挂掉，报这个异常：2014-11-26 12:30:16,942 ERROR org.apache.flume.source.SpoolDirectorySource: FATAL: Spool ...

大数据工程师的日常工作做什么?

5 ，运用搬家(有部分公司需求把运用从传统的数据库 Oracle、MySQL 等数据库的存储进程程序或许SQL脚本搬家到大数据途径上，这个进程也是非常繁琐的工作，高度重复且杂乱)6 ，数据收集(收集日志数据、文件数据、接口数据，这个触及到各种格式的转化，一般用得比较多的是 Flume 和 Logstash)7，数据处理 7....

大数据核心技术有哪些

1、数据采集与预处理：FlumeNG实时日志收集系统，支持在日志系统中定制各类数据发送方，用于收集数据；Zookeeper是一个分布式的，开放源码的分布式应用程序协调服务，提供数据同步服务。2、数据存储：Hadoop作为一个开源的框架，专为离线和大规模数据分析而设计，HDFS作为其核心的存储引擎，已被广泛用于数据存储...

flume采集日志文件的格式 flume采集log4j日志 flume采集日志 flume采集日志到本地 flume收集日志到hdfs flume如何实时收集日志 flume采集传感器 flume采集rabbitmq kafka统一日志采集