Create a service to detect anomalies using a model built from scratch (model from scratch template)¶
This tutorial shows how to create a service to detect anomalies using a model built from scratch using the model from scratch template.
Introduction¶
This tutorial shows how to create a service to detect anomalies using a model built from scratch.
This service will take a time series dataset as input and return a list of anomalies using an autoencoder model.
Prerequisites¶
To follow this tutorial, we highly recommend you to follow the Getting started guide first.
It contains all the required tools to follow this tutorial.
Bootstrap the service based on the model from scratch template¶
In this section, you will bootstrap a new service based on the Create a new service (model from scratch) template.
You have three ways to bootstrap a new service based on the template:
If you are part of the Swiss AI Center GitHub organization, this is the recommended way to bootstrap a new service.
Access the Create a new service (model from scratch) template repository.
Use the "Use the template" button to create a new repository based on the template in the Swiss AI Center GitHub organization or in your own GitHub account.
For the Repository name, use my-anomalies-detection-service
.
Clone the newly created repository locally.
This will be the root directory of your new service for the rest of this tutorial.
If you are not part of the Swiss AI Center GitHub organization, this is the recommended way to bootstrap a new service.
Fork the Create a new service (model from scratch) template to fork a new repository based on the chosen template.
For the Repository name, use my-anomalies-detection-service
.
Clone the newly created repository locally.
This will be the root directory of your new service for the rest of this tutorial.
If you do not want to host your codebase on GitHub or if you do not want to be linked to the Swiss AI Center organization, download the Create a new service (model from scratch) template as an archive file ("Download ZIP") from the GitHub repository and start over locally or in a new Git repository.
Extract the archive and name the directory my-anomalies-detection-service
.
This will be the root directory of your new service for the rest of this tutorial.
Explaining the template¶
In this section, you will learn about the different files and folders that are part of the template.
README.md
¶
This file contains a checklist of the steps to follow to bootstrap a new service based on the template. This can help you to follow step-by-step what you need to do to bootstrap a new service based on the template.
model-creation
¶
This folder contains the code to create the model. This is where you will implement the code to create the model that will be saved as a binary file.
The binary file will then be copied/moved in the model-serving
folder.
model-serving
¶
This folder contains the code to serve the model. This is where you will implement the code to load the model from the binary file and serve it over a FastAPI REST API.
Implement the anomalies detection service¶
In this section, you will implement the anomalies detection service.
The service is composed of two parts:
- The model creation
- The model serving
Implement the model creation¶
In this section, you will implement the code to create the model to detect anomalies. This model will then be used in the next section to serve the model.
Warning
Make sure you are in the model-creation
folder.
Create a new Python virtual environment¶
Instead of installing the dependencies globally, it is recommended to create a virtual environment.
To create a virtual environment, run the following command inside the project folder:
Execute this in the 'model-creation' folder | |
---|---|
Then, activate the virtual environment:
Install the dependencies¶
Warning
Make sure you are in the virtual environment.
Create a requirements.txt
file with the following content:
requirements.txt | |
---|---|
These are the dependencies required to create the model to detect anomalies.
Then, install the dependencies:
Execute this in the 'model-creation' folder | |
---|---|
Create a freeze file to pin all dependencies to their current versions:
Execute this in the 'model-creation' folder | |
---|---|
Create the source files¶
Create a src/train_model.py
file with the following content:
This file contains the code to create the model to detect anomalies.
Create a src/evaluate_model.py
file with the following content:
Import the dataset¶
Create the following dataset in the data
folder:
data/train.csv | |
---|---|
The training dataset contains 20 measurements between -1 and 1.
data/test.csv | |
---|---|
In the training dataset, there are 20 measurements and 2 anomalies (3.8765
and -2.876
).
Train the model¶
Run the following command to train the model:
Execute this in the virtual environment | |
---|---|
Note
If you encounter a libdevice not found at ./libdevice.10.bc
error message while utilizing an Nvidia GPU with CUDA, you should export the CUDA library path by executing the command:
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/opt/cuda
Adjust the path accordingly. This step is necessary to enable successful GPU-based training for the model.
The model will be saved in the model-creation/model
folder under the name anomalies_detection_model.h5
.
A plot of the training loss will be saved in the model-creation/evaluation
folder under the name training_loss.png
.
Evaluate the model¶
Run the following command to evaluate the model:
Execute this in the virtual environment | |
---|---|
A plot of the anomalies detected will be saved in the model-creation/evaluation
folder under the name result.png
.
Where to find the model binary file¶
The model binary file is saved in the model-creation/model
folder under the name anomalies_detection_model.h5
. You will need this file in the next section.
Exit the virtual environment¶
Run the following command to exit the virtual environment:
Implement the model serving¶
In this section, you will implement the code to serve the model to detect anomalies you created in the previous section.
Warning
Make sure you are in the model-serving
folder.
Create a new Python virtual environment¶
Create a new Python virtual environment as explained in the previous section:
Execute this in the 'model-serving' folder | |
---|---|
Then, activate the virtual environment:
This will ensure that the dependencies of the model creation and the model serving are isolated.
Install the dependencies¶
Warning
Make sure you are in the virtual environment.
Update the requirements.txt
file with the following content:
requirements.txt | |
---|---|
The common-code
package is required to serve the model over a FastAPI REST API and boilerplate code to handle the configuration.
You must add to this file all the dependencies your model needs to be loaded and executed. In the case of this model, it is matplotlib
, numpy
, pandas
, scikit-learn
and tensorflow
.
Install the dependencies as explained in the previous section:
Execute this in the virtual environment | |
---|---|
Create a freeze file to pin all dependencies to their current versions:
Execute this in the virtual environment | |
---|---|
The specific common-code @ git+https://github.com/swiss-ai-center/common-code.git@<commit>
line will conflict with the more general line in requirements.txt
due to the explicit commit reference.
From there, you have two options:
- Easier update: Remove the specific
common-code @ git+https://github.com/swiss-ai-center/common-code.git@<commit>
line fromrequirements-all.txt
. This allows for easier updates of the common-code dependency without adjusting service dependencies. - Consistent dependencies: Remove the generic line in
requirements.txt
to keep dependencies consistent across machines and time. This will ensure that the same versions of the dependencies are installed on every machine if you ever share your code with someone else.
Copy the model binary file¶
Copy the model binary file from the model-creation/model
folder to the model-serving/model
folder:
Execute this in the 'model-serving' folder | |
---|---|
Update the template files to load and serve the model¶
Update the pyproject.toml
file¶
Update the pyproject.toml
file to rename the package name (usually the name of the repository):
pyproject.toml | |
---|---|
- The name is usually the name of the repository.
Update the src/main.py
file¶
Update the src/main.py
file to load the model binary file and serve it over FastAPI:
src/main.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
|
- Import the dependencies required by the model.
- Edit the description of the service.
- Edit the name of the service.
- Edit the slug of the service.
- Edit the input fields of the service.
- Edit the output fields of the service.
- Edit the tags of the service.
- Edit the
has_ai
field of the service. - Optional: Edit the documentation URL of the service.
- Load the model binary file.
- Get the raw data from the
dataset
field. - Get the type of the data from the
dataset
field. - Use the model to reconstruct the original time series data.
- Return the result of the service.
- Change the API description. The description is a markdown string that will be displayed in the API documentation.
- Edit the summary of the service.
- Edit the title of the service.
- Edit the version of the service.
- Edit the contact information of the service.
- Edit the license information of the service.
Note
The input and output data of the process function are bytes. Depending on the wanted type of the data, you might need to convert the data to the expected type.
Update the Dockerfile
file¶
Update the Dockerfile to install all required packages that might be required by the model and the model itself:
- Some OS packages might need to be installed in order to run the model. If needed, you can add them here.
- Change the name of the model file to match the name of your model file.
Update the .gitignore
file¶
To avoid pushing the model binary file to the Git repository, update the .gitignore
file to ignore the model binary file:
.gitignore | |
---|---|
- Add the model binary file to the
.gitignore
file.
Start the service¶
Tip
Start the Core engine as mentioned in the Getting started guide for this step.
Start the service with the following command:
Start the service with the following command:
Execute this in the virtual environment | |
---|---|
Warning
The above Uvicorn command tests service registration but lacks access to the core-engine database in Docker.
For full testing, start with Docker or run the database outside Docker.
The service should try to announce itself to the Core engine.
It will then be available at http://localhost:9090.
Access the Core engine either at http://localhost:3000 (Frontend UI) or http://localhost:9090 (Backend UI).
The service should be listed in the Services section.
Test the service¶
Tip
Start the Core engine as mentioned in the Getting started guide for this step.
There are two ways to test the service:
Access the Core engine at http://localhost:3000.
The service should be listed in the Services section.
Try to start a new task with the service. You can use the model-creation/data/test.csv
file as input.
The service should execute the task and return a response with the anomalies.
You can download the anomalies plot by clicking on the Download button.
Access the Core engine at http://localhost:9090.
The service should be listed in the Registered Services section.
Start a new task
Try to start a new task with the service. You can use the model-creation/data/test.csv
file as input.
A primary JSON response should be returned with the task ID similar to this:
- The input data is stored in the
data_in
field. - The output is not available yet and will be stored in the
data_out
field. - The task ID is stored in the
id
field.
Get the task status and result
You can then use the task ID to get the task status and the task result from the Tasks section.
Using the task ID, you can get the details of the task similar to this:
- If the task is finished, the output data is stored in the
data_out
field. - The status of the task. The service can have a queue of tasks to execute and your task might not be executed yet.
Download the result
Using the file key(s) of the data_out
field, you can download the result of the task under the Storage section.
You should then be able to download the anomalies plot.
You have validated that the service works as expected.
Commit and push the changes (optional)¶
Commit and push the changes to the Git repository so it is available for the other developers.
Build, publish and deploy the service¶
Now that you have implemented the service, you can build, publish and deploy it.
Follow the How to build, publish and deploy a service guide to build, publish and deploy the service to Kubernetes.
Access and test the service¶
Access the service using its URL (either by the URL defined in the DEV_SERVICE_URL
/ PROD_SERVICE_URL
variable or by the URL defined in the Kubernetes Ingress file).
You should be able to access the FastAPI Swagger UI.
The service should be available in the Services section of the Core engine it has announced itself to.
You should be able to send a request to the service and get a response.
Conclusion¶
Congratulations! You have successfully created a service to detect anomalies using a model built from scratch.
The service has then been published to a container registry and deployed on Kubernetes.
The service is now accessible through a REST API on the Internet and you have completed this tutorial! Well done!
Go further¶
Move model data to S3 with the help of DVC¶
In this tutorial, you have learned how to create a service to detect anomalies using a model built from scratch.
This model does not contain a lot of data to store and therefore, it is not a problem to store it in the Git repository.
For some models, you might have a lot of data to store and it might not be a good idea to store it in the Git repository (performance issues, Git repository size, etc.).
You might want to store the model data in a cloud storage service like AWS S3.
DVC is the perfect tool to do that.
Learn how to move the model data to S3 with the help of DVC in the How to add DVC to a service guide.