Performance optimization: run OCR in the background
As you probably realized, our solution has many limitations. For example, photos can't be larger than 10MB. Also Textract processing can take a few seconds, and API Gateway can't run longer than 30 seconds. For serverless applications, performance directly affects the cost of the infrastructure. Lambda funtion that runs for 30 seconds is 300x more expensive than a function that runs for 100ms!
What's the best way to optimize the performance and cost of our solution?
Textract allows us to send a file via SDK method or Amazon S3. As we need to upload a photo to S3 anyways, we can use that to trigger Textract analysis. However that does not solve size limit, and analysis still takes dozens of seconds.
Serverless applications are event driven, we can use events to make our app faster and cheaper. First, we can decouple upload from processing. Instead of uploading a file via API, we can use the API to get a presigned URL that will allow customers uploading a file directly to Amazon S3. Presigned URL is a temporary URL that allows our customers to interact with Amazon S3 directly. Presigned URLs support fine grained permissions, so you can allow customer to upload a photo to a specific path in a specific button, and make that URL valid for 10 minutes.
Presigned URL will make our app faster and cheaper. Once the photo is uploaded, Amazon S3 can trigger a function that starts Textract analysys. Instead of waiting for analysis to be finished, we can use Amazon Simple Notification Service to trigger another Lambda function that will get the analysis data and store it to DynamoDB table. Once we connect the system, it should look similar to the following diagram.
Sounds more complex than it is, I promise!
Why is this faster and cheaper?
Getting the signed URL takes less than 200-300 ms, which makes first step faster and cheaper. Then customer uploads a file directly to Amazon S3, and the process is done for our customer. Everything after that is done in the backround, which makes our UX faster (for our customers, that's important than the real speed of the processing). We have two more Lambda functions in the process, but both of them have a few lines of code only, and they run for 100-300ms, which brings our Lambda execution from up to 30s to less than a second (and makes the app 30x cheaper). This adds Amazon S3 and SNS cost, however that's just a fraction of the cost of API Gateway.
It's time to try to make this!
Task
Your task is to move the processing to the background and optimize our application performance and cost by doing the following:
- Create an endpoint that will return a presigned URL that will allow customers to upload a photo to a specified path for 30 seconds.
- Create an SNS topic that Textract will use to send a notification when analysis is finished.
- Create another Lambda function that will be triggered when a new file is uploaded to Amazon S3, and will start Textract document analysis.
- Create a third Lambda function that will be triggered by the SNS message, and will read Textract result and save the data to the DynamoDB table.
Once you complete this exercise, take a minute and discuss this solution with your team. Does this solution seems faster for our customers? Is it more complex for maintenance than our first solution?
Hints
Here are a few hints to help you with this task:
- There are many examples for creating presigned URL, other option is to use an open source application from Serverless Application Repository (SAR). Here's one of the apps on SAR that you can use as a part of your app, or at least as an inspiration: Serverless S3 Uploader app on SAR.
- Instead of using the
textract.analyzeDocument
method for photo analysis, you can use thetextract.startDocumentAnalysis
method that starts the analysis and uses SNS topic that trigger a function in the background when analysis is finished. For more info, see AWS SDK documentation. - Don't forget to give your functions right permissions. For most of the functions you can use AWS SAM's policy templates.