Data Pipeline

The getting started guide shows how to create a simple pipeline, this guide is more in depth.

The schematic below illustrates a possible pipeline configuration. Steps A-F represent independent stages at which a task occurs. The tasks can include filtering data, calling modules or external services. Each step calls out to a module, an independet service which does the data processing.

pipelines

Syntax

YAML

Name: String
Inputs:
  - KeyValuePair
ExecutionTimeout: Integer
Steps:
  - Step
Output:
  CallbackEndpoint: Url
  FailureEndpoint: Url
  DataCollection: String
  DataFormat: DataFormat
  Template: Liquid

JSON

{
  "Name": String,
  "Inputs": [ KeyValuePair, ... ],
  "ExecutionTimeout": Integer,
  "Steps": [ Step, ... ],
  "Output": {
    "CallbackEndpoint": Url,
    "FailureEndpoint": Url,
    "DataCollection": String,
    "DataFormat": DataFormat,
    "Template": Liquid
  } 
}

Properties

Name

The name of the pipeline has to be unique (accross one user) and is used to reference the pipeline in triggers.
Required: Yes
Type: String

Inputs

The differnets inputs of the pipeline, each a key-value pair like an-input: Json. The key is the name of the input, AireInsights will provide that input data as a variable in all the templates. The value of the key-value pair specifies that input’s data format. This way AireInsights knowns how to interpret the input string so it can map it in the processing steps.
Required: Yes
Type: List of KeyValuePairs key: value
key: any string of alphanumeric characters and underscores, should always start with a letter, and not have any kind of leading sigil Options for value: Raw (0), Json (1), Xml (2), Csv (3), Text (4)

ExecutionTimeout

Data processing can be slow at times, this parameter makes sure it doe snot take too long. It specifies the pipeline timeout in milliseconds.
Required: No
Type: Integer
Default: 5000 ms

Steps

The data processing in AireInsights happens in modules, small services that are connected to the core AireInsights serivces. To build a custom module one of the SDKs can be used. In the pipeline definition a module is referenced and configured in a step.
Required: No
Type: List of Step
Default: Empty list

Output/CallbackEndpoint

Endpoint where the pipeline output should be sent to.
Required: Conditional
Type: Liquid (must evaluate to a url)

Output/FailureEndpoint

Endpoint where a failure notification should be sent
Required: Conditional
Type: Liquid (must evaluate to a url)

Output/DataCollection

Name of the data collection where the pipeline output should be saved to.
Required: Conditional
Type: Liquid

Output/DataFormat

The data format of the pipeline’s output data, this is used for the Content-Type header when calling the CallbackEndpoint or used as meta data when stored to the DataCollection.
Required: Yes
Type: DataFormat
Options: Raw (0), Json (1), Xml (2), Csv (3), Text (4)

Output/Template

Specifies the output of the pipeline using a Liquid template. All step outputs are available as a variable with the name of the step.
Required: Yes
Type: Liquid