Amazon CloudSearch Cheat Sheet

1. What is Amazon CloudSearch?

  • A fully managed cloud service for creating, managing, and scaling search solutions.
  • Simplifies search implementation, eliminating the need for search expertise and hardware management.
  • Automatically scales to accommodate changes in data volume and traffic.

2. How Search Works:

  • Data Collection: Data to be searched includes unstructured full-text documents, semi-structured documents (e.g., XML), or structured data.
  • Document Representation: Each searchable item is a document with a unique ID and fields containing data.
  • Index Creation: Data is uploaded as JSON or XML batches, and Amazon CloudSearch generates a search index based on domain configuration.
  • Querying: Queries are submitted to the index to find matching documents.
  • Continuous Updates: Updates are applied to add, modify, or delete documents as data changes.

3. Scaling in Amazon CloudSearch:

  • Search Instances: Domains consist of search instances with limited RAM and CPU resources for indexing and request processing.
  • Scaling Factors: The number of required search instances depends on data size and search request volume and complexity.
  • Automatic Scaling: CloudSearch dynamically selects instance sizes and counts for optimal performance.
  • Initial Instance Type: An initial instance type is chosen upon data upload and index configuration.
  • Scaling Up for Data: When data exceeds initial instance capacity, CloudSearch scales up instance types or partitions the index.
  • Scaling Down for Data: Reduced data volume leads to scale-down for cost reduction.
  • Scaling for Traffic: Increased request volume requires additional processing power, deploying duplicate instances (domain depth).
  • Scaling Down for Traffic: Reducing instances to minimize costs when traffic decreases.
  • Sudden Traffic Surges: Additional instances are deployed, but some time is needed to set them up, possibly leading to temporary 5xx errors.
  • Search Request Impact: The type and complexity of search requests can affect performance and required search instances.
  • Handling 5xx Errors: During traffic surges, there may be temporary 5xx errors until new instances are operational.

4. Understanding Amazon CloudSearch Limits:

  • Batch Size: Maximum batch size is 5 MB.
  • Data Loading Volume: One document batch every 10 seconds, up to 10,000 batches in 24 hours (each batch up to 5 MB).
  • Document Size: Maximum document size is 1 MB.
  • Document Fields: Up to 200 fields per document.
  • Expressions: Up to 50 expressions can be configured for a domain, with a max size of 10240 bytes each.
  • Highlighting: Up to 5 occurrences of search terms can be highlighted; only the first 10 KB of data in a text field are highlighted.
  • Index Fields: Up to 200 index fields allowed per domain. Special considerations for dynamic fields and performance.
  • Naming Conventions: Domain names, field names, expression names, and document IDs have specific naming rules.
  • Policy Document Size: Maximum size of an Amazon CloudSearch policy document is 100 KB.
  • Region Restriction: The ap-northeast-2 region supports only m4 instance types.
  • _score: A document’s text relevance score is a positive floating point value.
  • Search Domains: Each AWS account can create up to 100 search domains.
  • Search Partitions: A search index can be split across a maximum of 10 partitions.
  • Search Replicas: Each search partition can have up to 5 replicas (doubles with Multi-AZ).
  • Search Requests: Various limits apply to search query clauses, request size, facet values, size parameter, sort parameter, and more.
  • Suggesters: Up to 10 suggesters per domain; suggestions generated from the first 512 bytes of a text field.
  • Synonym Dictionary Size: The maximum size of an Amazon CloudSearch synonym dictionary is 100 KB.

Note: Ensure compliance with these limits to optimize Amazon CloudSearch performance and avoid operational issues. For limit adjustments, contact Amazon CloudSearch support.

Always refer to the latest Amazon CloudSearch documentation for the most up-to-date information and best practices.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top