Data Pipeline

Edit | Report

Overview

This page explains the jobs to be run to bring up the Data Pipeline services. In order to do so, log into Jenkins and execute the instructions as per the sequence provided on this page.

Build

Switch to the Build folder and run all jobs. For the value of the github_release_tag, refer to Current Release Tags and Jenkins Jobs Reference

Operation Name Function Description CURRENT RELEASE TAG
AnalyticsCore Builds the analytics framework Creates analytics framework artifact release-3.5.0
Core Data Product Builds analytics core data products Creates analytics core data products artifact release-3.5.0
EdDataProducts release-3.5.0
FlinkPipelineJobs release-3.5.0
ETLJobs Builds adhoc scripts which includes content snapshot druid indexer, dialcode redis indexer Creates jar out of adhoc scripts release-3.5.0

DevOps Administration

Operation Name Function
Bootstrap Creates Deployer User

Provision

Switch to Provision/<env>/DataPipeline and run the jobs in the following sequence:

Operation Name Function Description CURRENT RELEASE TAG
AnalyticsSpark Provision the server to run the data products jobs Creates the analyticsuser required folders and installs the dependencies to run spark release-3.5.0
Postgres Provision the Postgres database Installs the dependencies like java and install the Postgres database release-3.5.0
postgres-managed release-3.5.0
Yarn Provision the Yarn cluster Install the dependencies required to create the yarn cluster and creates the yarn cluster to run samza jobs release-3.5.0
Influxdb Provision the influxdb database Install the dependencies like java and install the influxdb database release-3.5.0
Druid deploy and start druid services    

Deploy

Switch to Deploy/<env>/DataPipeline and run the jobs in the following sequence:

Operation Name Function Description CURRENT RELEASE TAG
AnalyticsCore Deploys Analytics framework jars Copies analytics framework artifact to spark server release-3.5.0
CoreDataProducts Deploys core data products Copies core data products artifact to spark server release-3.5.0
EdDataProducts release-3.5.0
KafkaIndexer Deploys logstash configurations and starts process Used for pushing pipeline metrics to influxDB release-3.5.0
KafkaSetup Setup kafka for pipeline Creates kafka topic and updates topic configurations release-3.5.0
FlinkPipelineJobs release-3.5.0
DruidIngestion release-3.5.0
ETLJobs release-3.5.0
ETLUserCacheIndexer release-3.5.0

Edit | Report