Defining a pipeline
Every pipeline needs a unique name. This is how it is referenced in triggers. The pipeline can have multiple inputs which each get a name and a data format.
Name: my-first-pipeline
Inputs:
input: Xml
Our pipeline needs at least one step, in this example we make use of the fictional module called demo-module
. Each step also needs a name, this is how it it referenced in input templates for other steps. The InputTemplate
uses the Liquid templating language to map the input data for this step, which will then be sent to the module. Because the data is stored internally as a string we need to tell AireInsights what the input and output formats of this module are so it can correctly interpret and map it between steps.
...
Steps:
- Name: demo-step
InputTemplate: '{{ input.foo }}' # the pipeline input data is accessed through variables named after the inputs specified
ModuleName: demo-module
ModuleInputFormat: Raw
ModuleOutputFormat: Json
The final part of our minimal pipeline definition is the output config. Here we specify, using a Liquid template, how the output of the different steps should be combined to the output of the pipeline. The output data of our demo-step
step is referenced using a variable with the same name. DataFormat
specifies the dataformat of the output so it can use the correct Content-Type
headers. And finally we specify the callback url to tell AireInsights where to send the data to. Because data processing can be quite slow sometimes AireInsights pipelines are designed to run as an asynchronous process. It is also possible to store the processed data back in a data collection instead.
...
Output:
- Template: '{"foo": "{{ demo-step }}" }'
DataFormat: Json
CallbackEndpoint: 'https://some-url.com/callback'
Defining a trigger
Triggers is how data pipelines are invoked. A NewDataTrigger
trigger invokes a pipeline every time data is added to the data collection in the data lake specified in DataCollection
. The DataSources
wire the data from the data collections to the different inputs of a pipeline. In this most simple example, the trigger send the pipeline the newly added data item.
Name: demo-trigger
DataCollection: demo-data
Pipeline: my-first-pipeline
DataSources:
- Type: LatestItem
DataCollection: demo-data
Input: input # This has to match one of the inputs defined in the pipeline definition
Submitting data to the data lake
Data in the data lake is organised in data collections, where all the data in one collection should be of the same format and structure, so it can be treated the same when sent to a pipeline. We can submit new data to a data collection posting the data using this endpoint:
Route: POST Api/Data/demo-data
Header: Content-Type: application/xml
Payload: <xml><foo>bar</foo></xml>
By doing this, the trigger will have invoked the pipeline with this newly submitted data item. And after processing it will have sent the output to the specified callback.