How to Handle Exceptions When Processing SQS Messages in .NET Lambda Function
Learn how to handle exceptions, configure Dead Letter Queue, re-drive messages from Amazon SQS and how to enable batch error processing from a .NET Lambda Function.
Table of Contents
This article is sponsored by AWS and is part of my AWS Series.
In a previous post, we learned how to process messages coming to Amazon SQS using Lambda Functions written in .NET.
We have messages coming into the queue, the Lambda function gets triggered, processes the message, and successfully removes the message from the queue. Great!
But what happens on an exception processing the message in the Lambda Function?
Maybe it’s an unhandled code path, a null reference exception, or an external third-party integration service that is down.
In this post, you’ll learn,
- What happens on Exception in Processing Messages?
- Handle Retries and Configure Dead-letter Queue (DLQ)
- ReDrive Messages from Dead-letter Queue
- Enabling Batch Error Processing
What happens on Exception in Processing Messages?
We first need to understand what happens when there is an exception in processing a message from SQS.
First, let's simulate an exception in our Lambda Function handler.
Simulating Exception in .NET Lambda Function
Let’s update the Lambda Function code to simulate an exception to see the exception handling in action.
private async Task ProcessMessageAsync(SQSEvent.SQSMessage message, ILambdaContext context)
{
context.Logger.LogInformation($"Processed message {message.Body}");
if (message.Body.Contains("Exception"))
throw new Exception("Message cannot be processed");
// TODO: Do interesting work based on the new message
await Task.CompletedTask;
}
The above code throws an exception anytime the message body contains the word ‘Exception’.
This makes it easy to simulate the exceptions we need when sending messages and see what happens. To see the exception handling in action, send a message to the queue with the word ‘Exception’.
The Lambda Function will throw an error as soon as it picks up for processing.
When an exception happens in processing Lambda Function does not acknowledge the successful processing of the message.
The message is not deleted from the Queue by the Lambda Function. After the default VisbilityTimeout period of the Queue (30secs), the message is made available in the Queue for processing again.
In this case, where we have simulated an exception based on the message body, this message will always throw an exception (with the current Function code). The message gets retried every 30 seconds.
In real-world scenarios, if the exceptions in the processing are because of transient errors (that go away after some time), the message will eventually be processed. E.g., If the Function talks to a third-party service that is currently down, it might be back up when we try again after 30 seconds. In this case, the message will be successfully processed and removed from the queue.
However, if the exception is due to a null reference exception or unhandled code paths, it will continue to throw an error and be retried.
Handle Retries and Configure Dead-letter Queue (DLQ)
If there are a lot of messages in the queue, erroring out eventually, it will create a backlog of messages. It can cause delays in processing new messages coming into the Queue.
So ideally, you want the messages to retry a few times and removed for manual intervention.
Dead-letter Queue (DLQ) lets you isolate and handle problematic messages manually.
Under the Queue configuration, navigate to the Deal-letter queue configuration to set it up.
You must first create a queue to designate as DLQ. I have one created from the SQS Console - user-data-dlq
. Click the Edit button to update the configuration.
Specify the Queue ARN and the Maximum received count.
Maximum Receives Count
The Maximum receives value determines the number of times a message will be retried. If the ReceiveCount for a message exceeds the maximum receive count, SQS moves the message to the associated DLQ.
The above configuration will remove the message when the ReceiveCount is more than 2. So the message will get delivered for processing twice - the first time it comes into the Queue and then once more after 30 secs (default VisiblilityTimeout after an exception in processing).
Any existing messages in the Queue with a ReceiveCount more than the maximum is automatically moved to the DLQ.
You can inspect this message from the DLQ and decide what action to take next. Based on the message content, the application, and the business processes associated with the messages, you can choose to ignore them (delete from DLQ), reprocess the message later, or fix the Lambda Function and then retry the message again.
ReDrive Messages from Dead-letter Queue
Re-processing messages from Queue is referred to as Redrive.
SQS allows easily moving unprocessed/error messages back to the source or other custom destination queues. This helps you to stay on top of the messages that are not getting processed and retry them.
Select the Message destination to redrive the messages. You can redrive the messages to the source queue or choose a custom destination.
You can also specify the Velocity control settings to redrive messages. It controls the rate at with messages are moved from the Dead-letter queue to the destination queue.
Optionally, you can Inspect messages in the queue and pick and choose the messages to redrive. This is useful if you need to filter and process messages.
Selecting the 'DLQ redrive' button sends the messages back into the destination queue and is available for processing. The message will follow the same execution flow as it would have come in the first time.
On successful processing, it gets removed from the queue. If it continues to throw an exception, it will eventually be moved back to the DLQ (based on the maximum receive count setting.)
Enabling Batch Error Processing
Any time the Batch Size property is set to more than 1, the SNSEvent
can have multiple records. This means when looping and processing the messages, they can fail independently.
For example, if you receive ten messages, seven might get processed successfully, but three might fail.
Now whether you fail the full batch on the first exception or loop through all of the messages and continue processing even if one fails is something you can control in your Function code.
To fail the entire batch on the first exception, you only need to bubble up the exception as it occurs, and the code will stop processing more messages. The current Function code stops processing on the first exception.
After the VisibilityTimeout, all the messages in the batch are made available again in the queue for reprocessing.
Batch Error Processing
If you decide to continue processing the messages even after one message throws an error, you can handle the exception at the individual processing level.
However, in these cases, we must tell SQS the messages that were processed successfully and those that failed to process.
The SQSBatchResponse
signals SQS the failed messages in a batch.
public async Task<SQSBatchResponse> FunctionHandler(SQSEvent evnt, ILambdaContext context)
{
var response = new SQSBatchResponse()
{
BatchItemFailures = new List<SQSBatchResponse.BatchItemFailure>()
};
context.Logger.LogInformation($"Received Message of count {evnt.Records.Count}");
foreach (var message in evnt.Records)
{
try
{
await ProcessMessageAsync(message, context);
}
catch (Exception e)
{
context.Logger.LogError(e.Message);
response.BatchItemFailures.Add(new SQSBatchResponse.BatchItemFailure()
{
ItemIdentifier = message.MessageId
});
}
}
return response;
}
The above code handles exceptions from the ProcessMessageAsync
function responsible for processing each record message. On exception in processing, it records the MessageId
of the message that failed processing as a BatchItemFailure
.
Once all messages in the batch are processed, the Lambda Function returns the batch response, which contains only the message id that failed to process.
Reporting batch item failures must be enabled on the Lambda Trigger Configuration for this code to work. You can enable this from the Lambda Trigger Configuration, as shown below.
Once enabled and the new function code deployed, we can successfully process messages in batches. Only messages that fail to process from the batch will be moved to the DLQ.
If you want to simulate and test the batch item failures, you can use the below Linqpad script to send bulk messages.
async Task Main()
{
var sqsClient = new AmazonSQSClient();
var batchRequest = Enumerable.Range(1, 100)
.Select(e => new SendMessageBatchRequestEntry(Guid.NewGuid().ToString(), $"Test {(e % 5 == 0 ? "Exception" : "")} {e}")).ToList();
await Task.WhenAll(
batchRequest
.Batch(10)
.Select(async batch => await sqsClient
.SendMessageBatchAsync("QUEUE-URL", batch.ToList())));
}
The above code sends 100 messages in batches of 10. Every 5th message in the list will trigger an exception.
In the Cloudwatch logs, you will notice the messages with Exception being retried, and it will eventually get moved to the Dead-letter Queue.
I hope this helps you understand how to handle exceptions when consuming messages from SQS from a .NET application.
Rahul Nath Newsletter
Join the newsletter to receive the latest updates in your inbox.