Setting up the first data pipeline

Defining a pipeline

Every pipeline needs a unique name. This is how it is referenced in triggers. The pipeline can have multiple inputs which each get a name and a data format.

Name: my-first-pipeline
Inputs:
  input: Xml

Our pipeline needs at least one step, in this example we make use of the fictional module called demo-module. Each step also needs a name, this is how it it referenced in input templates for other steps. The InputTemplate uses the Liquid templating language to map the input data for this step, which will then be sent to the module. Because the data is stored internally as a string we need to tell AireInsights what the input and output formats of this module are so it can correctly interpret and map it between steps.

...
Steps:
  - Name: demo-step
    InputTemplate: '{{ input.foo }}' # the pipeline input data is accessed through variables named after the inputs specified
    ModuleName: demo-module
    ModuleInputFormat: Raw
    ModuleOutputFormat: Json

The final part of our minimal pipeline definition is the output config. Here we specify, using a Liquid template, how the output of the different steps should be combined to the output of the pipeline. The output data of our demo-step step is referenced using a variable with the same name. DataFormat specifies the dataformat of the output so it can use the correct Content-Type headers. And finally we specify the callback url to tell AireInsights where to send the data to. Because data processing can be quite slow sometimes AireInsights pipelines are designed to run as an asynchronous process. It is also possible to store the processed data back in a data collection instead.

...
Output:
  - Template: '{"foo": "{{ demo-step }}" }'
    DataFormat: Json
    CallbackEndpoint: 'https://some-url.com/callback'
 

Defining a trigger

Triggers is how data pipelines are invoked. A NewDataTrigger trigger invokes a pipeline every time data is added to the data collection in the data lake specified in DataCollection. The DataSources wire the data from the data collections to the different inputs of a pipeline. In this most simple example, the trigger send the pipeline the newly added data item.

Name: demo-trigger
DataCollection: demo-data
Pipeline: my-first-pipeline
DataSources:
  - Type: LatestItem
    DataCollection: demo-data
    Input: input # This has to match one of the inputs defined in the pipeline definition

Submitting data to the data lake

Data in the data lake is organised in data collections, where all the data in one collection should be of the same format and structure, so it can be treated the same when sent to a pipeline. We can submit new data to a data collection posting the data using this endpoint:

Route: POST Api/Data/demo-data
Header: Content-Type: application/xml
Payload: <xml><foo>bar</foo></xml>

By doing this, the trigger will have invoked the pipeline with this newly submitted data item. And after processing it will have sent the output to the specified callback.

Getting Started

Concepts

Toolkits

Management

API