Elasticsearch on AWS

Elasticsearch on AWS

What is Elasticsearch?

  • Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene.
  • Since its release in 2010, Elasticsearch has quickly become the most popular search engine, and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.

How does Elasticsearch work?

  • You can send data in the form of JSON documents to Elasticsearch using the API or ingestion tools such as Logstash and Amazon Kinesis Firehose.
  • Elasticsearch automatically stores the original document and adds a searchable reference to the document in the cluster’s index. You can then search and retrieve the document using the Elasticsearch API.
  • You can also use Kibana, an open-source visualization tool, with Elasticsearch to visualize your data and build interactive dashboards.

Is Elasticsearch free?

  • Yes, Elasticsearch is a free, open source software.
  • You can run Elasticsearch on-premises, on Amazon EC2, or on Amazon Elasticsearch Service.
  • With on-premises or Amazon EC2 deployments, you are responsible for installing Elasticsearch and other necessary software, provisioning infrastructure, and managing the cluster.
  • Amazon Elasticsearch Service, on the other hand, is a fully managed service, so you don’t have to worry about time-consuming cluster management tasks such as hardware provisioning, software patching, failure recovery, backups, and monitoring.

Elasticsearch benefits

  • HIGH PERFORMANCE
    • The distributed nature of Elasticsearch enables it to process large volumes of data in parallel, quickly finding the best matches for your queries.
  • EASY APPLICATION DEVELOPMENT
    • Elasticsearch provides support for various languages including Java, Python, PHP, JavaScript, Node.js, Ruby, and many more.
  • NEAR REAL-TIME OPERATIONS
    • Elasticsearch operations such as reading or writing data usually take less than a second to complete.
    • This lets you use Elasticsearch for near real-time use cases such as application monitoring and anomaly detection.
  • COMPLIMENTARY TOOLING AND PLUGINS
    • Elasticsearch comes integrated with Kibana, a popular visualization and reporting tool.
    • It also offers integration with Beats and Logstash, while enable you to easily transform source data and load it into your Elasticsearch cluster.
    • You can also use a number of open-source Elasticsearch plugins such as language analyzers and suggesters to add rich functionality to your applications.

Getting started with Elasticsearch on AWS

  • Step 1: Create an Amazon Elasticsearch Service (Amazon ES) domain

    1. Go to https://aws.amazon.com and choose Sign In to the Console.
    2. Under Analytics, choose Elasticsearch Service.
    3. Choose Create a new domain.
    4. For the deployment type, choose Development and testing.
    5. For the Elasticsearch version, choose latest version and then choose Next.
    6. Provide a name for the domain.
    7. Under Data nodes, choose instance type as t3.small.elasticsearch with the default value of Number of nodes.
    8. Ignore the rest of the settings and choose Next.
    9. For the Network configuration, choose Public access.
    10. For the Fine-grained access control, uncheck Enable fine-grained access control.
    11. For the Domain access policy, choose Custom access policy. Enter the IP address or the IAM ARN to allow access to the domain.
    12. Ignore the rest of the settings and choose Next.
    13. Ignore the tags option and choose Next.
    14. Confirm your domain configuration and choose Confirm. New domains typically take 15–30 minutes to initialize, but can take longer depending on the configuration. After your domain initializes, make note of its endpoint.
  • Step 2: Upload data to Amazon ES

    • Option 1: Upload a single document
    • Use command line
    • curl -XPUT 'domain-endpoint/users/_doc/1' -d '{"id": 1, "userName": "Nguyen Van A", "email": "test@gmail.com"}' -H 'Content-Type: application/json'
    • Use postman
    • Option 2: Upload multiple documents by file json, create a file users.json with the following content:
    • { "index" : { "_index": "users", "_id" : "2" } }
      
      {"id": 2, "userName": "Nguyen Van B", "email": "test+1@gmail.com"}
      
      { "index" : { "_index": "users", "_id" : "3" } }
      
      {"id": 3, "userName": "Nguyen Van C", "email": "test+2@gmail.com"}
    • Use command line
    • curl -XPOST 'domain-endpoint/_bulk' --data-binary @users.json -H 'Content-Type: application/json'
    • Use postman
  • Step 3: Search documents in Amazon ES

    • Search data in the users domain for the word “gmail.com”
    • Use command line
    • curl -XGET 'domain-endpoint/users/_search?q=gmail.com&pretty=true'
    • Use postman
  • Step 4: Delete an Amazon ES domain

    1. Sign in to the Amazon Elasticsearch Service console.
    2. Under My domains, select the domain want to delete.
    3. Choose Actions and then choose Delete domain.

Reference Document:

https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg.html