AWS Rekognition: Unable to get image metadata from S3. Check object key, region and/or access permissions.

What does this Amazon Rekognition error mean: Unable to get image metadata from S3.  Check object key, region and/or access permissions.

For me, this meant that I was calling

DetectFaces(Image => { S3Object => { Bucket => “bucket_name”, Name => “path_to_image” } }

with bucket_name belonging to an S3 bucket in a different region than the Rekognition endpoint that I was using.  It turns out, Rekognition can only access buckets in its region.  Copying the data to us-west-2 resolved the problem.

Sending SMS messages with Amazon SNS and Paws

Paws is a great way to access AWS services in Perl.  Here’s how you can send an SMS:

use Paws;
my $paws = Paws->service('SNS');
$paws->Publish(
   Message => "test",
   PhoneNumber => "+1XXXXXXXXXX",
);

If you get an error message saying

Invalid parameter: PhoneNumber Reason: +1XXXXXXXXXX is not valid to publish to

This may just mean that you aren’t connecting to the right endpoint, as SMS sending through SNS is only available in 6 regions at the time of this writing.  If your default region isn’t one of those, you can specify your region for this Paws instance like so:

use Paws;
my $paws = Paws->service('SNS', region => 'us-west-2');
$paws->Publish(
   Message => "test",
   PhoneNumber => "+1XXXXXXXXXX",
);

Deduplicating SQS messages using DynamoDB in Perl

Amazon SQS is a massively scalable, highly available message queuing system.  It is capable of handling virtually unlimited amounts of traffic, but it has one particularly awkward issue: duplicate messages.  They are relatively infrequent, but they can have serious consequences, ie. debiting a bank account a second time.

SQS does provide a solution to this in the form of FIFO queues, but they have some drawbacks.  First, they are limited to about 300 messages/second (which may not be an issue for everyone), but more importantly they simply aren’t available in many regions.  At the time of this writing they are only operational in US West and US East.

If you have higher performance requirements or you aren’t lucky enough to be in a region that supports it, you still have options.  One solution is to architect your code such that the operations are idempotent, meaning they can execute multiple times without causing problems.  If the task is doing something like refreshing a cached webpage, you might not have to even consider idempotency as an infrequent duplicate refresh of a web page may be inconsequential.

On the other hand, sometimes operations absolutely must never happen twice, or they can be so computationally expensive that avoiding them is critical.  One of the easiest ways to achieve this is to simply assign a GUID to each message and deduplicate them.  The solution I have used is to combine DynamoDB and SQS.  DynamoDB is also highly scalable, with default capacity limits per table of 40,000 reads and writes per second.

To do this in Perl, I’m using Paws.  For clarity, I’m skipping error handling and retry logic.

First, we have to create a table in DynamoDB.  My structure is simple, a key string column named job_id and an integer column called taken.

store a record in DynamoDB containing the GUID, and an attribute indicating that the job hasn’t been taken.

use Data::UUID;
use Paws;
use JSON::XS;
# store the job id in DynamoDB
my $job_id = Data::UUID->new->create_str;
my $dynamodb = Paws->service('DynamoDB', region => 'us-west-2');
$dynamodb->PutItem(
   TableName => "sqs_job",
   Item => {
      job_id => {
         S => $job_id,
      },
      taken => {
         N => 0,
      },
   job_result => { S => " " },
});

# send the message
my $sqs = Paws->service('SQS', region => 'us-west-2');
my $queue_name = "my-queue-name";
my $queue_url = $sqs->GetQueueUrl(QueueName => $queue_name)->QueueUrl;
$sqs->SendMessage(MessageBody => encode_json({ job_id => $job_id }), QueueUrl => $queue_url);

In the receiver we use a conditional update expression to set taken=1 if and only if taken=0.  Otherwise, we exit and avoid duplicate processing.

my $recv = $sqs->ReceiveMessage(QueueUrl => $queue_url, MaxNumberOfMessages => 1, WaitTimeSeconds => $timeout);
foreach my $raw_message (@{$recv->Messages}) {
   my $message = decode_json($raw_message->{Body});
# ensure that SQS doesn't resend it by deleting it
   $sqs->DeleteMessage(QueueUrl => $queue_url, ReceiptHandle =>  $raw_message->{ReceiptHandle});
   # get the job_id from the payload (this could be done with a message attribute instead)
   my $job_id = $message->{job_id};
   eval {
      my $res = Screencap::DynamoDB->Get->UpdateItem(
         TableName => "sqs_job",
         Key => {
            job_id => {
               S => $job_id,
            },
         },
         ConditionExpression => "taken = :zero", # only update if taken=0
         UpdateExpression => "SET taken = :one", # set taken=1
         ExpressionAttributeValues => {
            ":zero" => {
               N => 0,
            },
            ":one" => {
               N => 1,
            },
         },
      );
   }
   if ($@ =~ /^The conditional request failed/) { # this is the error message when a conditional expression fails, if it does we skip processing because we've already handled it
      next;
   }
   # handle the message
}

And that’s it.  You’ll want to add a lot of different error handling for this to be very reliable, though.

CPAN PR Challenge – Text::BibTex

This month I ended up with Text::BibTex.  Conveniently, this distro came with an actual bug that needed fixing.

This, however, was not a Perl problem at all but rather a C problem.  The module was causing an actual crash with this stack trace:

/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f9f39cc5f37] /lib/x86_64-linux-gnu/libc.so.6(+0xebdf0)[0x7f9f39cc4df0] /lib/x86_64-linux-gnu/libc.so.6(+0xead37)[0x7f9f39cc3d37] /usr/lib/libbtparse.so.1(zzFAIL+0xe4)[0x7f9f38eabdc4] /usr/lib/libbtparse.so.1(body+0xdf)[0x7f9f38eab56f] /usr/lib/libbtparse.so.1(entry+0x1ea)[0x7f9f38eab98a] /usr/lib/libbtparse.so.1(bt_parse_entry+0x100)[0x7f9f38ea9d40] /usr/lib/perl5/auto/Text/BibTeX/BibTeX.so(XS_Text__BibTeX__Entry__parse+0x 135)[0x7f9f390c1a45] /usr/lib/libperl.so.5.14(Perl_pp_entersub+0x58c)[0x7f9f3a6ba3cc] /usr/lib/libperl.so.5.14(Perl_runops_standard+0x16)[0x7f9f3a6b19a6] /usr/lib/libperl.so.5.14(perl_run+0x3a5)[0x7f9f3a6535b5] /usr/bin/perl(main+0x149)[0x400f89] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f9f39bf7ead] /usr/bin/perl[0x400fc1]

I don’t actually know anything about BibTex, so the first thing I did was strip down the problem data to the smallest thing that would still break and make a test case.  As it turned out, the smallest thing was about 2000 bytes of any text inside a comment that wasn’t closed.  This was a bit of a clue:

./btparse/pccts/antlr.h:74:#define ZZLEXBUFSIZE 2000

The culprit in this case was an improper use of strncat() in this block:

#ifdef LL_K
 static char text[LL_K*ZZLEXBUFSIZE+1];
 SetWordType *f[LL_K];
#else
 static char text[ZZLEXBUFSIZE+1];
 SetWordType *f[1];
#endif
 SetWordType **miss_set;
 char **miss_text;
 int *bad_tok;
 char **bad_text;
 int *err_k;
 int i;
 va_list ap;
#ifndef __USE_PROTOS
 int k;
#endif
#ifdef __USE_PROTOS
 va_start(ap, k);
#else
 va_start(ap);
 k = va_arg(ap, int); /* how many lookahead sets? */
#endif
 text[0] = '\0';
 for (i=1; i<=k; i++) /* collect all lookahead sets */
 {
 f[i-1] = va_arg(ap, SetWordType *);
 }
 for (i=1; i<=k; i++) /* look for offending token */
 {
#ifdef LL_K
 int freeSpace = (LL_K*ZZLEXBUFSIZE+1) - strlen(text);
#else
 int freeSpace = (ZZLEXBUFSIZE+1) - strlen(text);
#endif
 if ( i>1 ) strcat(text, " ");
 strncat(text, LATEXT(i), freeSpace);
 if ( !zzset_el((unsigned)LA(i), f[i-1]) ) break;
 }

strncat(a,b,n) looks like it safely avoids overflowing the destination buffer but in fact according to its specification:

 If src contains n or more bytes, strncat() writes n+1 bytes to dest (n from src plus the terminating null byte). Therefore, the size of dest must be at least strlen(dest)+n+1.

So to fix, we just allocate more space:

#ifdef LL_K
 static char text[LL_K*ZZLEXBUFSIZE+1+1];
 SetWordType *f[LL_K];
#else
 static char text[ZZLEXBUFSIZE+1+1];
 SetWordType *f[1];
#endif