SlideShare a Scribd company logo
1 of 68
Download to read offline
99 Problems, But
The Search Ain’t One
Andrei Zmievski • PHP UK •!Feb 25, 2011
who am I?
 curl http://localhost:9200/speaker/info/andrei


{“name”:       “Andrei Zmievski”,
 “projects”:   [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”],
 “likes”:      [“coding”, “beer”, “brewing”, “photography”],
 “twitter”:    “@a”,
 “email”:      “andrei@zmievski.org”}
what is elasticsearch?

a search engine for the NoSQL generation

  domain-driven

  distributed

  RESTful

  Hitchhiker’s Guide to the Galaxy (no, really)
document model


document-oriented

JSON-based

schema-free
engine


based on Lucene

multi-tenancy

distributed, out of the box
nomenclature

index

type

document

  _id

node
3 easy steps
1. index
           !"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<=
           >
request




           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N=


           >
response




           %%%%?1:?/-#"9
           %%%%?OB7<9P?/?!178?
           %%%%?O-I.9?/?3.92:9#?
           %%%%?OB<?/?;?
           N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE                                total number of hits
           %%?,B-3?%/%>
           !!!!"#$#%&"!'!()
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
                                                       the index of the doc
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           !!!!!!"*+,-./"!'!"0$,1")
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>                              the type of the doc
           %%%%%%?OB7<9P?%/%?!178?E
           !!!!!!"*#23."!'!"43.%5.6")
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")                             the id of the doc
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")                             the id of the doc
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")
           %%%%%%?O3!1#9?%/%6UV46LM64E                            the hit score
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
                                                                  the original source
           8
           !!!!",%9."'!":,-6.+!;9+.<45+")
           !!!!"#%&5"'!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.")
           !!!!"&+5.4"'!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G)
           !!!!"#H+##.6"'!"%")
           !!!!"A.+FA#"'!(IJ
           K%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%"#$$5"!'!L)
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E                             the execution time
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
3. profit


that’s up to you
demo
distributed model


provides:

  performance

  resiliency (high-availability)
shards
a portion of the document space

each one is a separate Lucene index

  thus, many per-index settings are available

document is sharded by its _id value

  but can be assigned (routed) to a shard
  deterministically
zero-conf discovery


zen (multicast and unicast)

cloud (EC2 via API)
auto-routing

master node:

  maintains cluster state

  reassigns shards if nodes leave/join cluster

any node can serve as the request router

the query is handled via scatter-gather mechanism
replicas

each shard can have 1 or more replicas

# of replicas can be updated dynamically after
index creation

replicas can be used for querying in parallel
shard allocation
               node 1




       start with a single node
shard allocation
                node 1
                 person1
                 person2




      PUT /person {
         “index”: {
            “number_of_shards”: 2,
            “number_of_replicas”: 1
      }}
shard allocation
       node 1          node 2
       person1         person1
       person2         person2




        start the second node
shard allocation
node 1    node 2         node 3   node 4
person1   person1
person2   person2




            start 2 more nodes
shard allocation
node 1    node 2         node 3    node 4
person1                  person1
          person2                  person2




            start 2 more nodes
document sharding
node 1    node 2         node 3     node 4
person1                   person1
          person2                   person2




            PUT /person/info/1
            {…}
document sharding
     node 1         node 2         node 3     node 4
     person1                        person1
                    person2                   person2




                      PUT /person/info/1
hashed to shard 1     {…}
document sharding
node 1    node 2         node 3      node 4
person1                   person1
          person2                    person2




                        replicated

            PUT /person/info/1
            {…}
document sharding
node 1    node 2         node 3     node 4
person1                   person1
          person2                   person2




            PUT /person/info/2
            {…}
document sharding
node 1         node 2            node 3     node 4
person1                           person1
               person2                      person2




hashed to shard 2
                    PUT /person/info/2
                    {…}
document sharding
node 1    node 2         node 3      node 4
person1                   person1
          person2                    person2




                                    replicated

            PUT /person/info/2
            {…}
scatter-gather
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
transactional model

per-document consistency

no need to commit/flush

uses write-behind transaction log

write consistency (W) can be controlled

  one, quorum, or all
(near) real-time search


1 second refresh rate by default

_refresh API also
index storage

node data considered transient

can be stored in local file system, JVM heap,
native OS memory, or FS & memory combination

persistent storage requires a gateway
gateways
persistent store for cluster state and indices

asynchronous, translog-based write strategy

allows full recovery if a cluster restart is needed

supported gateways:
  local
  shared FS
  Hadoop via HDFS
  S3
mapping
describes document structure to the search
engine

automatically created with sensible defaults

explicit mapping can be provided (generally, a
good idea)

can run into merge conflicts
mapping

important meta fields:

  _source

  _all

  _boost
mapping types

simple:

  string, integer/long, float/double, boolean, and
  null)

complex:

  array, object
sample mapping
document



           >?"39#?/%%%%%%?<9#B!:?E
           %?-B-$9?/%%%%%?W17X-%(27B!?E
           %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE
           %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E
           %?.#B1#B-I?/%%5N



           >?.13-?/%>
           %%?.#1.9#-B93?%/%>
mapping




           %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE
           %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE
           %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE
           %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE
           %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N
           NNN
analyzers
break down (tokenize) and normalize fields during
indexing and query strings at search time

analyzer = tokenizer + token filters (0 or more)
*-27<2#<%A72$IZ9#%S
%%%*-27<2#<%+1:97BZ9#%]
%%%%%%%*-27<2#<%+1:97%^B$-9#%]
%%%%%%%_1K9#!239%+1:97%^B$-9#%]
%%%%%%%*-1.%+1:97%^B$-9#
analyzers
                            analyzers, tokenizers, and filters can be
                            customized
mapping elasticsearch.yml




                            B7<9P/
                            %%272$I3B3/
                            %%%%272$IZ9#/
                            %%%%%%.@&%,F/
                            %%%%%%%%-I.9/%!"3-1@
                            %%%%%%%%-1:97BZ9#/%3-27<2#<
                            %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E
                            %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J


                            `
                            ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE
                            `
API
API conventions


append ?pretty=true to get readable JSON

boolean values: false/0/off = false, rest is true

JSONP support via callback parameter
API structure

http://host:port/[index]/[type]/[_action/id]

 GET http://es:9200/_status

 GET http://es:9200/twitter/_status

 POST http://es:9200/twitter/tweet/1

 GET http://es:9200/twitter/tweet/1
API structure
http://host:port/[index]/[type]/[_action/id]

 GET http://es:9200/twitter/tweet/_search

 GET http://es:9200/twitter/user/_search

 GET http://es:9200/twitter/tweet,user/_search

 GET http://es:9200/twitter,facebook/_search

 GET http://es:9200/_search
_cluster API structure

GET /_cluster/health

GET /_cluster/health/index1,index2

GET /_cluster/nodes/stats

GET /_cluster/nodes/nodeId1,nodeId2/stats
API {core}
index             search

bulk               query

delete             from/size paging

delete by query    sort

get                highlighting

count              selective fields
API {indices}
create           optimize

delete           snapshot

open/close       update settings

get/put/delete   analyze
mapping
                 status
refresh
                 flush
API {cluster}

health

state

nodes info

nodes stats

nodes shutdown
Query DSL
term / terms   query_string

range            default_operator

prefix            analyzer

bool             phrase_slop

fuzzy            etc

wildcard
filters


share some similar features with queries (term,
range, etc)

why use a filter?
filters
faster than queries

cached (depends on the filter)

  the cache is used for different queries against
  the same filter

no scoring

more useful ones: term, terms, range, prefix, and,
or, not, exists, missing, query
facets

provide aggregated data based on the search
request

terms, histogram, date histogram, range,
statistical, and more
geo search

implemented as filters (and a facet)

  geo_distance

  geo_bounding_box

  geo_polygon
interfaces
REST

  including memcached

Java /!Groovy

Language clients (REST/Thrift):

  pyes, PHP (standalone and symfony), Ruby, Perl

Flume sink implementation
elastica

similar to the other PHP ElasticSearch client

API naming is consistent with Zend Framework

can be extended for new filters, facets, etc

still under development
elastica
          $es = new Elastica_Client('vm', 9200);
          $index = new Elastica_Index($es, 'test');
          $index->create(array(), true);
          $type = new Elastica_Type($index, 'person');
          $doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski',
example




                                                 'email' => 'andrei@test.com',
                                                 'username' => 'andrei',
                                                 'bills' => array(2, 3, 5)));
          $type->addDocument($doc);

          $qs = new Elastica_Query_QueryString('andrei');
          $query = new Elastica_Query($qs);
          $resultSet = $type->search($query);
          print $resultSet->count();
data import

ES is not the primary data store (usually)

to import/synchronize data:

  write an agent (Gearman, message queues, etc)

  use rivers (CouchDB, RabbitMQ, Twitter)
10 more features
versioning          load balancing nodes

index aliases       plugins

parent/child docs   more_like_this

scripting           multi_field mapping

dynamic mapping     percolation
templates
References

http://github.com/elasticsearch/elasticsearch

http://www.elasticsearch.org/community/forum

IRC: #elasticsearch on irc.freenode.net

twitter: @elasticsearch


             HTTP://ZMIEVSKI.ORG/TALKS

More Related Content

Viewers also liked

The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXAndrea Iacono
 
How to build_a_search_engine
How to build_a_search_engineHow to build_a_search_engine
How to build_a_search_engineAndrea Iacono
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data OutOpenThink Labs
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneMirko Calvaresi
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

Viewers also liked (10)

The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
How to build_a_search_engine
How to build_a_search_engineHow to build_a_search_engine
How to build_a_search_engine
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
 
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and Lucene
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

99 Problems, But The Search Ain't One

  • 1. 99 Problems, But The Search Ain’t One Andrei Zmievski • PHP UK •!Feb 25, 2011
  • 2. who am I? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “andrei@zmievski.org”}
  • 3. what is elasticsearch? a search engine for the NoSQL generation domain-driven distributed RESTful Hitchhiker’s Guide to the Galaxy (no, really)
  • 8. 1. index !"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<= > request %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N= > response %%%%?1:?/-#"9 %%%%?OB7<9P?/?!178? %%%%?O-I.9?/?3.92:9#? %%%%?OB<?/?;? N
  • 9. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 10. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE total number of hits %%?,B-3?%/%> !!!!"#$#%&"!'!() response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 11. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E the index of the doc response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> !!!!!!"*+,-./"!'!"0$,1") %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 12. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> the type of the doc %%%%%%?OB7<9P?%/%?!178?E !!!!!!"*#23."!'!"43.%5.6") %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 13. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 14. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 15. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") %%%%%%?O3!1#9?%/%6UV46LM64E the hit score %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 16. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% the original source 8 !!!!",%9."'!":,-6.+!;9+.<45+") !!!!"#%&5"'!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.") !!!!"&+5.4"'!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G) !!!!"#H+##.6"'!"%") !!!!"A.+FA#"'!(IJ K%N%J%N%N
  • 17. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%"#$$5"!'!L) %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E the execution time %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 19. demo
  • 20. distributed model provides: performance resiliency (high-availability)
  • 21. shards a portion of the document space each one is a separate Lucene index thus, many per-index settings are available document is sharded by its _id value but can be assigned (routed) to a shard deterministically
  • 22. zero-conf discovery zen (multicast and unicast) cloud (EC2 via API)
  • 23. auto-routing master node: maintains cluster state reassigns shards if nodes leave/join cluster any node can serve as the request router the query is handled via scatter-gather mechanism
  • 24. replicas each shard can have 1 or more replicas # of replicas can be updated dynamically after index creation replicas can be used for querying in parallel
  • 25. shard allocation node 1 start with a single node
  • 26. shard allocation node 1 person1 person2 PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1 }}
  • 27. shard allocation node 1 node 2 person1 person1 person2 person2 start the second node
  • 28. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 start 2 more nodes
  • 29. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 start 2 more nodes
  • 30. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1 {…}
  • 31. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1 hashed to shard 1 {…}
  • 32. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 replicated PUT /person/info/1 {…}
  • 33. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/2 {…}
  • 34. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 hashed to shard 2 PUT /person/info/2 {…}
  • 35. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 replicated PUT /person/info/2 {…}
  • 36. scatter-gather node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 37. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 38. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 39. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 40. transactional model per-document consistency no need to commit/flush uses write-behind transaction log write consistency (W) can be controlled one, quorum, or all
  • 41. (near) real-time search 1 second refresh rate by default _refresh API also
  • 42. index storage node data considered transient can be stored in local file system, JVM heap, native OS memory, or FS & memory combination persistent storage requires a gateway
  • 43. gateways persistent store for cluster state and indices asynchronous, translog-based write strategy allows full recovery if a cluster restart is needed supported gateways: local shared FS Hadoop via HDFS S3
  • 44. mapping describes document structure to the search engine automatically created with sensible defaults explicit mapping can be provided (generally, a good idea) can run into merge conflicts
  • 45. mapping important meta fields: _source _all _boost
  • 46. mapping types simple: string, integer/long, float/double, boolean, and null) complex: array, object
  • 47. sample mapping document >?"39#?/%%%%%%?<9#B!:?E %?-B-$9?/%%%%%?W17X-%(27B!?E %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E %?.#B1#B-I?/%%5N >?.13-?/%> %%?.#1.9#-B93?%/%> mapping %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N NNN
  • 48. analyzers break down (tokenize) and normalize fields during indexing and query strings at search time analyzer = tokenizer + token filters (0 or more) *-27<2#<%A72$IZ9#%S %%%*-27<2#<%+1:97BZ9#%] %%%%%%%*-27<2#<%+1:97%^B$-9#%] %%%%%%%_1K9#!239%+1:97%^B$-9#%] %%%%%%%*-1.%+1:97%^B$-9#
  • 49. analyzers analyzers, tokenizers, and filters can be customized mapping elasticsearch.yml B7<9P/ %%272$I3B3/ %%%%272$IZ9#/ %%%%%%.@&%,F/ %%%%%%%%-I.9/%!"3-1@ %%%%%%%%-1:97BZ9#/%3-27<2#< %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J ` ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE `
  • 50. API
  • 51. API conventions append ?pretty=true to get readable JSON boolean values: false/0/off = false, rest is true JSONP support via callback parameter
  • 52. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/_status GET http://es:9200/twitter/_status POST http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/1
  • 53. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/user/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search
  • 54. _cluster API structure GET /_cluster/health GET /_cluster/health/index1,index2 GET /_cluster/nodes/stats GET /_cluster/nodes/nodeId1,nodeId2/stats
  • 55. API {core} index search bulk query delete from/size paging delete by query sort get highlighting count selective fields
  • 56. API {indices} create optimize delete snapshot open/close update settings get/put/delete analyze mapping status refresh flush
  • 58. Query DSL term / terms query_string range default_operator prefix analyzer bool phrase_slop fuzzy etc wildcard
  • 59. filters share some similar features with queries (term, range, etc) why use a filter?
  • 60. filters faster than queries cached (depends on the filter) the cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query
  • 61. facets provide aggregated data based on the search request terms, histogram, date histogram, range, statistical, and more
  • 62. geo search implemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon
  • 63. interfaces REST including memcached Java /!Groovy Language clients (REST/Thrift): pyes, PHP (standalone and symfony), Ruby, Perl Flume sink implementation
  • 64. elastica similar to the other PHP ElasticSearch client API naming is consistent with Zend Framework can be extended for new filters, facets, etc still under development
  • 65. elastica $es = new Elastica_Client('vm', 9200); $index = new Elastica_Index($es, 'test'); $index->create(array(), true); $type = new Elastica_Type($index, 'person'); $doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski', example 'email' => 'andrei@test.com', 'username' => 'andrei', 'bills' => array(2, 3, 5))); $type->addDocument($doc); $qs = new Elastica_Query_QueryString('andrei'); $query = new Elastica_Query($qs); $resultSet = $type->search($query); print $resultSet->count();
  • 66. data import ES is not the primary data store (usually) to import/synchronize data: write an agent (Gearman, message queues, etc) use rivers (CouchDB, RabbitMQ, Twitter)
  • 67. 10 more features versioning load balancing nodes index aliases plugins parent/child docs more_like_this scripting multi_field mapping dynamic mapping percolation templates