关于apache SeaTunnel
Apache SeaTunnel 原名 Waterdrop,在 2021 年 10 月更名为 SeaTunnel 并申请加入 Apache孵化器。目前 Apache SeaTunnel 已发布 40+个版本,并在大量企业生产实践中使用,包括 J.P.Morgan、字节跳动、Stey、中国移动、富士康、腾讯云、国双、中科大数据研究院、360、Shoppe、Bilibili、新浪、搜狗、唯品会等企业,广泛应用于海量异构数据集成、CDC 数据同步,SaaS 数据集成以及多源数据处理等场景中。
2021 年 12 月 9 日, Apache SeaTunnel 以全票通过的优秀表现正式成为 Apache 孵化器项目。
2023 年 5 月 17 日,Apache 董事会通过 Apache SeaTunnel 毕业决议,结束了为期 18 个月的孵化,正式确定 Apache SeaTunnel 成为 Apache 顶级项目
Apache SeaTunnel 是新一代高性能、分布式、海量数据集成工具,支持上百种数据源 ( Database/Cloud/SaaS ) 支持海量数据的实时 CDC 和批量同步,可以稳定高效地同步万亿级数据。
作为一款简单一易用、超高性能、支持实时流式和离线批量处理的数据集成平台,Apache SeaTunnel 整体的特征和优势包括:
- 丰富且可扩展的连接器:SeaTunnel提供了一个不依赖于特定执行引擎的连接器API。基于此API开发的连接器(Source, Transform, Sink)可以在许多不同的引擎上运行,例如当前支持的SeaTunnel Engine, Flink和Spark。
- 连接器插件:插件设计允许用户轻松开发自己的连接器并将其集成到SeaTunnel项目中。目前,SeaTunnel支持100多个连接器,而且这个数字还在飙升。下面是当前支持的连接器列表
- 批处理流集成:基于SeaTunnel Connector API开发的连接器完美兼容离线同步、实时同步、全同步、增量同步等场景。它们大大降低了管理数据集成任务的难度。
- 支持分布式快照算法,保证数据一致性。
- 多引擎支持:SeaTunnel默认使用SeaTunnel引擎进行数据同步。SeaTunnel还支持使用Flink或Spark作为连接器的执行引擎,以适应企业现有的技术组件。SeaTunnel支持多个版本的Spark和Flink。
- JDBC多路复用,数据库日志多表解析:SeaTunnel支持多表或整个数据库同步,解决了JDBC过度连接的问题;支持多表或全数据库的日志读取和解析,解决了CDC多表同步场景需要处理日志重复读取和解析的问题。
- 高吞吐量和低延迟:SeaTunnel支持并行读写,提供稳定可靠的高吞吐量和低延迟的数据同步能力。
- 完善的实时监控:SeaTunnel支持对数据同步过程中每一步的详细监控信息,让用户轻松了解同步任务读写的数据数量、数据大小、QPS等信息。
- 支持两种作业开发方法:编码和画布设计。SeaTunnel web项目https://github.***/apache/seatunnel-web提供了作业、调度、运行和监控功能的可视化管理
SeaTunnel系统架构设计
SeaTunnel 安装部署
SeaTunnel 引擎是 SeaTunnel 的默认引擎。SeaTunnel的安装包中已经包含了SeaTunnel Engine的全部内容
SeaTunnel 安装包获取
- 可以通过直接下载编译的包进行安装
https://dlcdn.apache.org/seatunnel/2.3.3/apache-seatunnel-2.3.3-bin.tar.gz
下载完毕之后上传到服务器上面并解压
# 解压到了/opt/module目录下
tar -zxvf apache-seatunnel-2.3.3-bin.tar.gz -C /opt/module
- 通过源代码方式进行编译获取安装包
- 从https://seatunnel.apache.org/download或https://github.***/apache/seatunnel.git获取源码包
- 使用maven命令构建安装包
./mvnw -U -T 1C clean install -DskipTests -D"maven.test.skip"=true -D"maven.javadoc.skip"=true -D"checkstyle.skip"=true -D"license.skipAddThirdParty"
- 然后就可以在 中获取安装包
${Your_code_dir}/seatunnel-dist/target
,例如:apache-seatunnel-2.3.3-SNAPSHOT-bin.tar.gz
配置 SEATUNNEL_HOME
安装Connectors插件
从2.2.0-beta开始,二进制包默认不提供connectors的依赖,因此在第一次使用它时,需要执行以下命令来安装连接器:(当然,您也可以从Apache Maven Repository[https://repo.maven.apache.org/maven2/org/apache/seatunnel/]手动下载连接器,然后手动移动到connectors/seatunnel目录)
sh bin/install-plugin.sh 2.3.3
如果需要指定connector的版本,以2.3.3版本为例,需要执行
sh bin/install-plugin.sh 2.3.3
一般情况下我们不需要所有的连接器插件,所以你可以通过配置config/plugin_config来指定你需要的插件,例如,你只需要connector-console插件,然后你可以修改plugin.properties,比如
--seatunnel-connectors--
connector-console
--end--
如果希望示例应用程序正常工作,则需要添加以下插件
--seatunnel-connectors--
connector-fake
connector-console
--end--
你可以在${SEATUNNEL_HOME}/connectors/plugins-mapping.properties下找到所有支持的连接器和相应的plugin_config配置名称
如果想手动安装V2 connector插件,只需要下载自己需要的V2 connector插件,放到${SEATUNNEL_HOME}/connectors/seatunnel目录下即可
启动示例作业
定义一个作业配置文件
我们可以直接使用官方的模版config/v2.batch.config.template, 该模版分别定义了env、source、sink
env:表示运行的参数设置,比如并发数,作业运行模式:BATCH/STRAM、checkpoint等等
source:表示数据源的定义
sink:表示目标端的数据源定义
执行命令:
./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local
我是用的2.3.3,在运行后会报错,提示缺少:java.lang.NoClassDefFoundError: ***/sun/jersey/client/impl/CopyOnWriteHashMap
报错信息:
2023-11-08 17:47:32,243 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Fatal Error,
2023-11-08 17:47:32,243 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Please submit bug report in https://github.***/apache/seatunnel/issues
2023-11-08 17:47:32,243 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Reason:SeaTunnel job executed failed
2023-11-08 17:47:32,244 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Exception StackTrace:org.apache.seatunnel.core.starter.exception.***mandExecuteException: SeaTunnel job executed failed
at org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand.execute(ClientExecute***mand.java:191)
at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: java.util.concurrent.***pletionException: java.lang.NoClassDefFoundError: ***/sun/jersey/client/impl/CopyOnWriteHashMap
at ***.hazelcast.spi.impl.AbstractInvocationFuture.wrapIn***pletionException(AbstractInvocationFuture.java:1347)
at ***.hazelcast.spi.impl.AbstractInvocationFuture.cascadeException(AbstractInvocationFuture.java:1340)
at ***.hazelcast.spi.impl.AbstractInvocationFuture.a***ess$200(AbstractInvocationFuture.java:65)
at ***.hazelcast.spi.impl.AbstractInvocationFuture$ApplyNode.execute(AbstractInvocationFuture.java:1478)
at ***.hazelcast.spi.impl.AbstractInvocationFuture.unblockOtherNode(AbstractInvocationFuture.java:797)
at ***.hazelcast.spi.impl.AbstractInvocationFuture.unblockAll(AbstractInvocationFuture.java:759)
at ***.hazelcast.spi.impl.AbstractInvocationFuture.***plete0(AbstractInvocationFuture.java:1235)
at ***.hazelcast.spi.impl.AbstractInvocationFuture.***pleteExceptionallyInternal(AbstractInvocationFuture.java:1223)
at ***.hazelcast.spi.impl.AbstractInvocationFuture.***pleteExceptionally(AbstractInvocationFuture.java:709)
at ***.hazelcast.client.impl.spi.impl.ClientInvocation.***pleteExceptionally(ClientInvocation.java:294)
at ***.hazelcast.client.impl.spi.impl.ClientInvocation.notifyExceptionWithOwnedPermission(ClientInvocation.java:321)
at ***.hazelcast.client.impl.spi.impl.ClientInvocation.notifyException(ClientInvocation.java:304)
at ***.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.handleResponse(ClientResponseHandlerSupplier.java:164)
at ***.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.process(ClientResponseHandlerSupplier.java:141)
at ***.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.a***ess$300(ClientResponseHandlerSupplier.java:60)
at ***.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier$DynamicResponseHandler.a***ept(ClientResponseHandlerSupplier.java:251)
at ***.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier$DynamicResponseHandler.a***ept(ClientResponseHandlerSupplier.java:243)
at ***.hazelcast.client.impl.connection.tcp.TcpClientConnection.handleClientMessage(TcpClientConnection.java:245)
at ***.hazelcast.client.impl.protocol.util.ClientMessageDecoder.handleMessage(ClientMessageDecoder.java:135)
at ***.hazelcast.client.impl.protocol.util.ClientMessageDecoder.onRead(ClientMessageDecoder.java:89)
at ***.hazelcast.internal.***working.nio.NioInboundPipeline.process(NioInboundPipeline.java:136)
at ***.hazelcast.internal.***working.nio.NioThread.processSelectionKey(NioThread.java:383)
at ***.hazelcast.internal.***working.nio.NioThread.processSelectionKeys(NioThread.java:368)
at ***.hazelcast.internal.***working.nio.NioThread.selectLoop(NioThread.java:294)
at ***.hazelcast.internal.***working.nio.NioThread.executeRun(NioThread.java:249)
at ***.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
Caused by: java.lang.NoClassDefFoundError: ***/sun/jersey/client/impl/CopyOnWriteHashMap
at org.apache.seatunnel.engine.server.checkpoint.CheckpointPlan$Builder.<init>(CheckpointPlan.java:66)
下载安装包
下载地址[https://repo.maven.apache.org/maven2/org/apache/seatunnel/seatunnel-hadoop3-3.1.4-uber/2.3.2/seatunnel-hadoop3-3.1.4-uber-2.3.2-optional.jar]
将安装包放入 $SEATUNNEL_HOME/lib
下面
再次运行:
./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local
运行结果
2023-11-08 17:55:32,575 INFO org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - wait checkpoint ***pleted: 9223372036854775807
2023-11-08 17:55:32,621 INFO org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - pending checkpoint(9223372036854775807/1@774572734674894849) notify finished!
2023-11-08 17:55:32,621 INFO org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - start notify checkpoint ***pleted, checkpoint:org.apache.seatunnel.engine.server.checkpoint.***pletedCheckpoint@7e4b2ce6
2023-11-08 17:55:32,627 INFO org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - start clean pending checkpoint cause CheckpointCoordinator ***pleted.
2023-11-08 17:55:32,628 INFO org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - Turn checkpoint_state_774572734674894849_1 state from null to FINISHED
2023-11-08 17:55:32,672 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] taskDone, taskId = 20000, taskGroup = TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=1}
2023-11-08 17:55:32,672 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] Task TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=1} ***plete with state FINISHED
2023-11-08 17:55:32,672 INFO org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-610439] [5.1] Received task end from execution TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=1}, state FINISHED
2023-11-08 17:55:32,674 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource-fake]-SplitEnumerator (1/1)] turn to end state FINISHED.
2023-11-08 17:55:32,674 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource-fake]-SplitEnumerator (1/1)] end with state FINISHED
2023-11-08 17:55:32,684 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] taskDone, taskId = 50000, taskGroup = TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30000}
2023-11-08 17:55:32,684 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] taskDone, taskId = 50001, taskGroup = TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30001}
2023-11-08 17:55:33,480 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] taskDone, taskId = 40001, taskGroup = TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30001}
2023-11-08 17:55:33,480 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] taskDone, taskId = 40000, taskGroup = TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30000}
2023-11-08 17:55:33,480 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] Task TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30001} ***plete with state FINISHED
2023-11-08 17:55:33,480 INFO org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-610439] [5.1] Task TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30000} ***plete with state FINISHED
2023-11-08 17:55:33,480 INFO org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-610439] [5.1] Received task end from execution TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30001}, state FINISHED
2023-11-08 17:55:33,480 INFO org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-610439] [5.1] Received task end from execution TaskGroupLocation{jobId=774572734674894849, pipelineId=1, taskGroupId=30000}, state FINISHED
2023-11-08 17:55:33,482 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource-fake]-SourceTask (1/2)] turn to end state FINISHED.
2023-11-08 17:55:33,482 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource-fake]-SourceTask (2/2)] turn to end state FINISHED.
2023-11-08 17:55:33,482 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource-fake]-SourceTask (1/2)] end with state FINISHED
2023-11-08 17:55:33,482 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource-fake]-SourceTask (2/2)] end with state FINISHED
2023-11-08 17:55:33,482 INFO org.apache.seatunnel.engine.server.dag.physical.SubPlan - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)] end with state FINISHED
2023-11-08 17:55:33,506 INFO org.apache.seatunnel.engine.server.master.JobMaster - release the pipeline Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)] resource
2023-11-08 17:55:33,507 INFO org.apache.seatunnel.engine.server.service.slot.DefaultSlotService - received slot release request, jobID: 774572734674894849, slot: SlotProfile{worker=[localhost]:5801, slotID=1, ownerJobID=774572734674894849, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='82bd6da9-17b1-41ba-b745-9f6d4fea5378'}
2023-11-08 17:55:33,507 INFO org.apache.seatunnel.engine.server.service.slot.DefaultSlotService - received slot release request, jobID: 774572734674894849, slot: SlotProfile{worker=[localhost]:5801, slotID=2, ownerJobID=774572734674894849, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='82bd6da9-17b1-41ba-b745-9f6d4fea5378'}
2023-11-08 17:55:33,507 INFO org.apache.seatunnel.engine.server.service.slot.DefaultSlotService - received slot release request, jobID: 774572734674894849, slot: SlotProfile{worker=[localhost]:5801, slotID=3, ownerJobID=774572734674894849, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='82bd6da9-17b1-41ba-b745-9f6d4fea5378'}
2023-11-08 17:55:33,510 INFO org.apache.seatunnel.engine.server.dag.physical.SubPlan - Job SeaTunnel_Job (774572734674894849), Pipeline: [(1/1)] turn to end state FINISHED.
2023-11-08 17:55:33,511 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalPlan - Job SeaTunnel_Job (774572734674894849) end with state FINISHED
2023-11-08 17:55:33,523 INFO org.apache.seatunnel.engine.client.job.ClientJobProxy - Job (774572734674894849) end with state FINISHED
2023-11-08 17:55:33,546 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand -
***********************************************
Job Statistic Information
***********************************************
Start Time : 2023-11-08 17:55:30
End Time : 2023-11-08 17:55:33
Total Time(s) : 2
Total Read Count : 32
Total Write Count : 32
Total Failed Count : 0
***********************************************
2023-11-08 17:55:33,546 INFO ***.hazelcast.core.LifecycleService - hz.client_1 [seatunnel-610439] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN
2023-11-08 17:55:33,548 INFO ***.hazelcast.internal.server.tcp.TcpServerConnection - [localhost]:5801 [seatunnel-610439] [5.1] Connection[id=1, /127.0.0.1:5801->/127.0.0.1:33602, qualifier=null, endpoint=[127.0.0.1]:33602, remoteUuid=38fbe3e1-876c-48e9-b145-1989888393ab, alive=false, connectionType=JVM, planeIndex=-1] closed. Reason: Connection closed by the other side
2023-11-08 17:55:33,548 INFO ***.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel-610439] [5.1] Removed connection to endpoint: [localhost]:5801:6f7f4921-7d71-42af-b26a-6f11a7960118, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:33602->localhost/127.0.0.1:5801}, remoteAddress=[localhost]:5801, lastReadTime=2023-11-08 17:55:33.543, lastWriteTime=2023-11-08 17:55:33.523, closedTime=2023-11-08 17:55:33.547, connected server version=5.1}
2023-11-08 17:55:33,548 INFO ***.hazelcast.core.LifecycleService - hz.client_1 [seatunnel-610439] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED
2023-11-08 17:55:33,549 INFO ***.hazelcast.client.impl.ClientEndpointManager - [localhost]:5801 [seatunnel-610439] [5.1] Destroying ClientEndpoint{connection=Connection[id=1, /127.0.0.1:5801->/127.0.0.1:33602, qualifier=null, endpoint=[127.0.0.1]:33602, remoteUuid=38fbe3e1-876c-48e9-b145-1989888393ab, alive=false, connectionType=JVM, planeIndex=-1], clientUuid=38fbe3e1-876c-48e9-b145-1989888393ab, clientName=hz.client_1, authenticated=true, clientVersion=5.1, creationTime=1699437330660, latest clientAttributes=lastStatisticsCollectionTime=1699437330713,enterprise=false,clientType=JVM,clientVersion=5.1,clusterConnectionTimestamp=1699437330653,clientAddress=127.0.0.1,clientName=hz.client_1,credentials.principal=null,os.***mittedVirtualMemorySize=7048228864,os.freePhysicalMemorySize=15730012160,os.freeSwapSpaceSize=0,os.maxFileDescriptorCount=65535,os.openFileDescriptorCount=51,os.processCpuTime=5290000000,os.systemLoadAverage=0.86,os.totalPhysicalMemorySize=33566306304,os.totalSwapSpaceSize=0,runtime.availableProcessors=8,runtime.freeMemory=994659352,runtime.maxMemory=1029177344,runtime.totalMemory=1029177344,runtime.uptime=2424,runtime.usedMemory=34517992, labels=[]}
2023-11-08 17:55:33,550 INFO ***.hazelcast.core.LifecycleService - hz.client_1 [seatunnel-610439] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN
2023-11-08 17:55:33,550 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - Closed SeaTunnel client......
2023-11-08 17:55:33,551 INFO ***.hazelcast.core.LifecycleService - [localhost]:5801 [seatunnel-610439] [5.1] [localhost]:5801 is SHUTTING_DOWN
2023-11-08 17:55:33,553 INFO ***.hazelcast.internal.partition.impl.MigrationManager - [localhost]:5801 [seatunnel-610439] [5.1] Shutdown request of Member [localhost]:5801 - 6f7f4921-7d71-42af-b26a-6f11a7960118 this is handled
2023-11-08 17:55:33,557 INFO ***.hazelcast.instance.impl.Node - [localhost]:5801 [seatunnel-610439] [5.1] Shutting down connection manager...
2023-11-08 17:55:33,558 INFO ***.hazelcast.instance.impl.Node - [localhost]:5801 [seatunnel-610439] [5.1] Shutting down node engine...
2023-11-08 17:55:35,980 INFO ***.hazelcast.instance.impl.NodeExtension - [localhost]:5801 [seatunnel-610439] [5.1] Destroying node NodeExtension.
2023-11-08 17:55:35,980 INFO ***.hazelcast.instance.impl.Node - [localhost]:5801 [seatunnel-610439] [5.1] Hazelcast Shutdown is ***pleted in 2427 ms.
2023-11-08 17:55:35,980 INFO ***.hazelcast.core.LifecycleService - [localhost]:5801 [seatunnel-610439] [5.1] [localhost]:5801 is SHUTDOWN
2023-11-08 17:55:35,980 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - Closed HazelcastInstance ......
2023-11-08 17:55:35,981 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - Closed metrics executor service ......
2023-11-08 17:55:35,981 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - run shutdown hook because get close signal
日志输入如上内容,说明配置成功
Seatunnel示例:mysql-to-mysql
接下来,我们在测试一下使用seatunnel实现从mysql同步到mysql的配置
首先,需要下载mysql jdbc的的依赖,这里我们可以选择在plugin-mapping.properties文件中配置connector-jdbc ,也可以直接将connector-jdbc的jar包放入到 $SEATUNNEL_HOME//connectors/seatunnel 下面
创建mysql库和表
create database test_01;
create table test_01.user(userid int(4) primary key not null auto_increment,username varchar(16) not null);
create database test_02;
create table test_02.user(userid int(4) primary key not null auto_increment,username varchar(16) not null);
插入数据
insert into test_01.user (username) values ("zhangsan");
insert into test_01.user (username) values ("lisi");
创建同步配置文件
env {
job.mode = "BATCH"
}
# 配置数据源
source {
jdbc {
url = "jdbc:mysql://172.1.1.54:3306/test_01"
driver = "***.mysql.cj.jdbc.Driver"
user = "admin"
password = "xxxxxxx"
generate_sink_sql = true
database = "test_01"
table = "user"
query = "select * from test_01.user"
}
}
transform {
}
# 配置目标库
sink {
jdbc {
url = "jdbc:mysql://172.1.1.54:3306/test_02"
driver = "***.mysql.cj.jdbc.Driver"
user = "admin"
password = "xxxxxx"
generate_sink_sql = true
database = "test_02"
table = "user"
}
}
运行命令:
./bin/seatunnel.sh -e LOCAL -c ./config/mysql-to-mysql.conf
输出如下信息表示同步成功:
023-11-08 20:55:00,468 INFO org.apache.seatunnel.engine.server.service.slot.DefaultSlotService - received slot release request, jobID: 774617897816293377, slot: SlotProfile{worker=[localhost]:5801, slotID=1, ownerJobID=774617897816293377, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='36c05043-4938-47e0-927c-94e7cac4f749'}
2023-11-08 20:55:00,468 INFO org.apache.seatunnel.engine.server.service.slot.DefaultSlotService - received slot release request, jobID: 774617897816293377, slot: SlotProfile{worker=[localhost]:5801, slotID=2, ownerJobID=774617897816293377, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='36c05043-4938-47e0-927c-94e7cac4f749'}
2023-11-08 20:55:00,470 INFO org.apache.seatunnel.engine.server.dag.physical.SubPlan - Job SeaTunnel_Job (774617897816293377), Pipeline: [(1/1)] turn to end state FINISHED.
2023-11-08 20:55:00,471 INFO org.apache.seatunnel.engine.server.dag.physical.PhysicalPlan - Job SeaTunnel_Job (774617897816293377) end with state FINISHED
2023-11-08 20:55:00,483 INFO org.apache.seatunnel.engine.client.job.ClientJobProxy - Job (774617897816293377) end with state FINISHED
2023-11-08 20:55:00,511 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand -
***********************************************
Job Statistic Information
***********************************************
Start Time : 2023-11-08 20:54:58
End Time : 2023-11-08 20:55:00
Total Time(s) : 1
Total Read Count : 2
Total Write Count : 2
Total Failed Count : 0
***********************************************
2023-11-08 20:55:00,511 INFO ***.hazelcast.core.LifecycleService - hz.client_1 [seatunnel-250845] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN
2023-11-08 20:55:00,514 INFO ***.hazelcast.internal.server.tcp.TcpServerConnection - [localhost]:5801 [seatunnel-250845] [5.1] Connection[id=1, /127.0.0.1:5801->/127.0.0.1:33254, qualifier=null, endpoint=[127.0.0.1]:33254, remoteUuid=332c6983-64e1-4d2a-8d4b-7b40d9677a39, alive=false, connectionType=JVM, planeIndex=-1] closed. Reason: Connection closed by the other side
2023-11-08 20:55:00,514 INFO ***.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel-250845] [5.1] Removed connection to endpoint: [localhost]:5801:1373b501-f19f-47a2-9ef2-66d14e5a31c0, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:33254->localhost/127.0.0.1:5801}, remoteAddress=[localhost]:5801, lastReadTime=2023-11-08 20:55:00.508, lastWriteTime=2023-11-08 20:55:00.483, closedTime=2023-11-08 20:55:00.512, connected server version=5.1}
2023-11-08 20:55:00,515 INFO ***.hazelcast.core.LifecycleService - hz.client_1 [seatunnel-250845] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED
2023-11-08 20:55:00,516 INFO ***.hazelcast.client.impl.ClientEndpointManager - [localhost]:5801 [seatunnel-250845] [5.1] Destroying ClientEndpoint{connection=Connection[id=1, /127.0.0.1:5801->/127.0.0.1:33254, qualifier=null, endpoint=[127.0.0.1]:33254, remoteUuid=332c6983-64e1-4d2a-8d4b-7b40d9677a39, alive=false, connectionType=JVM, planeIndex=-1], clientUuid=332c6983-64e1-4d2a-8d4b-7b40d9677a39, clientName=hz.client_1, authenticated=true, clientVersion=5.1, creationTime=1699448098390, latest clientAttributes=lastStatisticsCollectionTime=1699448098443,enterprise=false,clientType=JVM,clientVersion=5.1,clusterConnectionTimestamp=1699448098383,clientAddress=127.0.0.1,clientName=hz.client_1,credentials.principal=null,os.***mittedVirtualMemorySize=7048228864,os.freePhysicalMemorySize=12655243264,os.freeSwapSpaceSize=0,os.maxFileDescriptorCount=65535,os.openFileDescriptorCount=51,os.processCpuTime=5340000000,os.systemLoadAverage=0.24,os.totalPhysicalMemorySize=33566306304,os.totalSwapSpaceSize=0,runtime.availableProcessors=8,runtime.freeMemory=994543624,runtime.maxMemory=1029177344,runtime.totalMemory=1029177344,runtime.uptime=2526,runtime.usedMemory=34633720, labels=[]}
2023-11-08 20:55:00,516 INFO ***.hazelcast.core.LifecycleService - hz.client_1 [seatunnel-250845] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN
2023-11-08 20:55:00,516 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - Closed SeaTunnel client......
2023-11-08 20:55:00,517 INFO ***.hazelcast.core.LifecycleService - [localhost]:5801 [seatunnel-250845] [5.1] [localhost]:5801 is SHUTTING_DOWN
2023-11-08 20:55:00,520 INFO ***.hazelcast.internal.partition.impl.MigrationManager - [localhost]:5801 [seatunnel-250845] [5.1] Shutdown request of Member [localhost]:5801 - 1373b501-f19f-47a2-9ef2-66d14e5a31c0 this is handled
2023-11-08 20:55:00,523 INFO ***.hazelcast.instance.impl.Node - [localhost]:5801 [seatunnel-250845] [5.1] Shutting down connection manager...
2023-11-08 20:55:00,525 INFO ***.hazelcast.instance.impl.Node - [localhost]:5801 [seatunnel-250845] [5.1] Shutting down node engine...
2023-11-08 20:55:03,377 INFO ***.hazelcast.instance.impl.NodeExtension - [localhost]:5801 [seatunnel-250845] [5.1] Destroying node NodeExtension.
2023-11-08 20:55:03,377 INFO ***.hazelcast.instance.impl.Node - [localhost]:5801 [seatunnel-250845] [5.1] Hazelcast Shutdown is ***pleted in 2858 ms.
2023-11-08 20:55:03,378 INFO ***.hazelcast.core.LifecycleService - [localhost]:5801 [seatunnel-250845] [5.1] [localhost]:5801 is SHUTDOWN
2023-11-08 20:55:03,378 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - Closed HazelcastInstance ......
2023-11-08 20:55:03,378 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - Closed metrics executor service ......
2023-11-08 20:55:03,379 INFO org.apache.seatunnel.core.starter.seatunnel.***mand.ClientExecute***mand - run shutdown hook because get close signal
查看同步结果
MySQL [(none)]>
MySQL [(none)]> select * from test_02.user;
+--------+----------+
| userid | username |
+--------+----------+
| 1 | zhangsan |
| 2 | lisi |
+--------+----------+
2 rows in set (0.00 sec)
Seatunnel 集成Flink&Spark引擎
编辑seatunnel-env.sh文件
修改FLINK_HOME为flink部署目录
修改Spark_HOME为spark部署目录
FLINK_HOME = /data/flink-1.14.5/
SPRK_HOME = /data/spark-2.4.6/
启动Flink集群
./bin/start_cluster.sh
执行命令,还是拿mysql-to-msyq.conf 为例
注意:如果是同步mysql的话,需要将jdbc的jar包放在flink/lib目录下,这次其实和使用flink做一些数据同步一样,相关的依赖包都给到flink。
./bin/start-seatunnel-flink-15-connector-v2.sh --config ./config/mysql-to-mysql.conf
执行结果如下:
Execute SeaTunnel Flink Job: ${FLINK_HOME}/bin/flink run -c org.apache.seatunnel.core.starter.flink.SeaTunnelFlink /mnt/apache-seatunnel-2.3.3/starter/seatunnel-flink-13-starter.jar --config ./config/mysql-to-mysql.conf --name SeaTunnel
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/kmr/flink1/1/flink-1.14.5/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/kmr/hadoop/1/hadoop-3.1.1/share/hadoop/***mon/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Job has been submitted with JobID a643a1990705f817479b1d2880c9f038
Program execution finished
Job with JobID a643a1990705f817479b1d2880c9f038 has finished.
Job Runtime: 2356 ms
清空test_02.user表,使用spark导入
MySQL [(none)]> truncate table test_02.user;
Query OK, 0 rows affected (0.01 sec)
执行导入命令
./bin/start-seatunnel-spark-2-connector-v2.sh \
--master local[2] \
--deploy-mode client \
--config ./config/mysql-to-mysql.conf
执行结果如下:
23/11/08 21:19:57 INFO executor.FieldNamedPreparedStatement: PrepareStatement sql is:
INSERT INTO `test_02`.`user` (`userid`, `username`) VALUES (?, ?)
23/11/08 21:19:57 INFO v2.DataWritingSparkTask: ***mit authorized for partition 0 (task 0, attempt 0, stage 0.0)
23/11/08 21:19:57 INFO v2.DataWritingSparkTask: ***mitted partition 0 (task 0, attempt 0, stage 0.0)
23/11/08 21:19:57 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1148 bytes result sent to driver
23/11/08 21:19:57 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1486 ms on localhost (executor driver) (1/1)
23/11/08 21:19:57 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all ***pleted, from pool
23/11/08 21:19:57 INFO scheduler.DAGScheduler: ResultStage 0 (save at SinkExecuteProcessor.java:117) finished in 1.568 s
23/11/08 21:19:57 INFO scheduler.DAGScheduler: Job 0 finished: save at SinkExecuteProcessor.java:117, took 1.610054 s
23/11/08 21:19:57 INFO v2.WriteToDataSourceV2Exec: Data source writer org.apache.seatunnel.translation.spark.sink.writer.SparkDataSourceWriter@25ea068e is ***mitting.
23/11/08 21:19:57 INFO v2.WriteToDataSourceV2Exec: Data source writer org.apache.seatunnel.translation.spark.sink.writer.SparkDataSourceWriter@25ea068e ***mitted.
23/11/08 21:19:57 INFO execution.SparkExecution: Spark Execution started
23/11/08 21:19:57 INFO spark.SparkContext: Invoking stop() from shutdown hook
23/11/08 21:19:57 INFO server.AbstractConnector: Stopped Spark@6e8a9c30{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
23/11/08 21:19:57 INFO ui.SparkUI: Stopped Spark web UI at http://kmr-b55b8d33-gn-0a6e9139-az1-master-1-2.ksc.***:4040
23/11/08 21:19:57 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/11/08 21:19:57 INFO memory.MemoryStore: MemoryStore cleared
23/11/08 21:19:57 INFO storage.BlockManager: BlockManager stopped
23/11/08 21:19:57 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
23/11/08 21:19:57 INFO scheduler.Output***mitCoordinator$Output***mitCoordinatorEndpoint: Output***mitCoordinator stopped!
23/11/08 21:19:57 INFO spark.SparkContext: Su***essfully stopped SparkContext
23/11/08 21:19:57 INFO util.ShutdownHookManager: Shutdown hook called
23/11/08 21:19:57 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6dfc2825-0da5-4e4c-9623-bdd***cab0849
查询数据表,已导入成功
MySQL [(none)]> truncate table test_02.user;
Query OK, 0 rows affected (0.01 sec)
MySQL [(none)]> select * from test_02.user;
+--------+----------+
| userid | username |
+--------+----------+
| 1 | zhangsan |
| 2 | lisi |
+--------+----------+