Replicating the SQL exists statement behavior for DynamoDB
In SQL, checking for the existence of a record is a trivial operation:
SELECT 1 FROM posts WHERE id = '123' LIMIT 1;
DynamoDB has no direct equivalent. A standard GetItem on a 50 KB item reads the entire 50 KB, even if you only need to know whether it exists.
In this article, we explore efficient strategies for implementing an exists check using the AWS SDK for Java v2.
The naive approach vs. projections
One option is to use a ProjectionExpression to retrieve only the partition key instead of the full item. This reduces network transfer and deserialization overhead.
public boolean existsByProjection(String subreddit, String id) {
GetItemRequest request = GetItemRequest.builder()
.tableName("posts")
.key(buildKey(subreddit, id))
.projectionExpression("subreddit") // Request only the key, not the full item
.build();
GetItemResponse response = dynamoDbClient.getItem(request);
return response.hasItem();
}
However, this approach does not reduce cost because DynamoDB calculates consumed RCUs based on the total size of the stored item, regardless of which attributes are projected.
Halving the bill: eventually consistent reads
A second optimization is to use eventually consistent reads (which is the default read mode).
Eventually consistent reads may exhibit a replication delay of up to one second under normal conditions. This read mode consumes 1 RCU for items up to 4 KB.
GetItemRequest.builder()
.key(key)
.projectionExpression("id")
.consistentRead(false)
.build();
Strongly consistent reads (consistentRead(true)) should be used when your use case strictly requires the most up-to-date data. This mode consumes 2 RCUs for items up to 4 KB.
The Batch challenge (BatchGetItem)
To check multiple items at once, you can use BatchGetItem instead of making individual sequential calls.
Two behaviors must be handled:
- Missing items are simply omitted from the response; they are not returned as
null. - Unprocessed keys: Under heavy load, the response may be partial.
Implementation with a retry mechanism:
public Map<String, Boolean> batchExists(String subreddit, List<String> ids) {
// 1. Prepare keys
List<Map<String, AttributeValue>> keys = ids.stream()
.map(id -> buildKey(subreddit, id))
.toList();
Map<String, KeysAndAttributes> requestItems = new HashMap<>();
requestItems.put("posts", KeysAndAttributes.builder()
.keys(keys)
.projectionExpression("id") // Only the key
.build());
BatchGetItemRequest request = BatchGetItemRequest.builder()
.requestItems(requestItems)
.returnConsumedCapacity(ReturnConsumedCapacity.INDEXES)
.build();
Map<String, Boolean> result = new HashMap<>();
ids.forEach(id -> result.put(id, false)); // Default to false
int attempts = 0;
// 2. Retry unprocessed keys
do {
attempts++;
BatchGetItemResponse response = dynamoDbClient.batchGetItem(request);
// Found items
var foundItems = response.responses().getOrDefault("posts", List.of());
foundItems.forEach(item -> result.put(item.get("id").s(), true));
// Unprocessed keys
if (response.hasUnprocessedKeys() && !response.unprocessedKeys().isEmpty()) {
request = request.toBuilder()
.requestItems(response.unprocessedKeys())
.build();
// Exponential backoff strategy
backoff(attempts);
} else {
break; // Done processing
}
} while (attempts < 5);
return result;
}
The backoff helper applies an exponential backoff strategy to avoid overloading DynamoDB:
private static void backoff(int attempt) {
try {
TimeUnit.MILLISECONDS.sleep((long) Math.pow(2, attempt) * 100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
Checking complex attributes
Beyond simple item existence, you may need to verify whether a specific attribute contains data (e.g., a non-empty list).
Using GetItem
Retrieving only the target attribute via GetItem is more direct and efficient than using a Query with a FilterExpression, because DynamoDB locates the item by its primary key (an O(1) operation) and then returns only the requested attributes when a ProjectionExpression is used. RCU cost is still calculated based on the full item size, but you avoid unnecessary network transfer and deserialization of unused attributes.
public boolean hasKeywordsByGetItem(String subreddit, String id) {
GetItemRequest request = GetItemRequest.builder()
.tableName("posts")
.key(buildKey(subreddit, id))
.projectionExpression("keywords")
.consistentRead(false)
.build();
GetItemResponse response = dynamoDbClient.getItem(request);
if (!response.hasItem()) {
return false;
}
AttributeValue keywords = response.item().get("keywords");
// Check if the attribute exists and is non-empty
return keywords != null && keywords.hasL() && !keywords.l().isEmpty();
}
Using Query with FilterExpression
The Query approach remains useful when you need to retrieve a set of items sharing the same partition key and then refine the results server-side. DynamoDB first reads all items matching the partition/sort key condition, consumes RCUs for each of them, and only then applies the filter in memory. Items discarded by the filter are still billed.
public boolean hasKeywords(String subreddit, String id) {
Map<String, AttributeValue> values = new HashMap<>();
values.put(":subVal", AttributeValue.fromS(subreddit));
values.put(":idVal", AttributeValue.fromS(id));
values.put(":zero", AttributeValue.fromN("0"));
QueryRequest request = QueryRequest.builder()
.tableName("posts")
.keyConditionExpression("subreddit = :subVal AND id = :idVal")
.filterExpression("size(keywords) > :zero")
.expressionAttributeValues(values)
.projectionExpression("id")
.limit(1)
.build();
QueryResponse response = dynamoDbClient.query(request);
return response.count() > 0;
}
GetItem should be preferred for single-item checks: it is simpler, avoids query parsing overhead, and makes the intent explicit.
Performance summary
| Technique | Reduces Bandwidth? | Reduces Cost (RCU)? | Ideal Use Case |
|---|---|---|---|
| Full GetItem | No | No | Retrieving actual data |
| Projection | Yes | No | Large items, latency reduction, existence checks (hasItem()) |
| Eventually Consistent | No | Yes (-50%) | Standard existence checks |
| GSI (Keys Only) | Yes | Yes (-90%) | Very large items (>40 KB) |
For items larger than 40 KB, a
KEYS_ONLYGSI costs 0.5 RCU per read, regardless of the item size in the base table.
Conclusion
For a standard existence check:
projectionExpression: Reduces network transfer.consistentRead(false): Halves RCU cost.- Handle
UnprocessedKeysin batch operations. - For items > 40 KB, consider a GSI in KEYS_ONLY to reduce the cost by 90%.
References
- DynamoDB GetItem API Reference
- DynamoDB BatchGetItem API Reference
- DynamoDB Read Consistency
- DynamoDB Projection Expressions
Demo
A demonstration of the concepts covered in this article is available here: micronaut-java-dynamodb-exists