Skip navigation

Monthly Archives: October 2013

Performance Review:

  1. Indexes are critical to performance
  2. Explain (how are indexes used)
  3. Hint (force the usage of an index)
  4. Profiling

NOTE: mongotop (similar to unix top), mongostat (similar to iostat)

//writes every 3 seconds
>mongotop 3
//ind % miss percentage and others stats to see the behaviour
>mongostat

Index Query Planner

If you have 3 indexes and run a query, mongo will do a query plan and run the 3 queries. ( Yes mongo choose what index to use based on those results) When one return results, it will keep that as the fastest solution for the problem. That’s cool, Mongo run experiments on our data :-) Every 100 or more queries it will forget what he knows and do more experiments. untitled

To sort on multiple fields, create a compound index. With compound indexes, the results can be in the sorted order of either the full index or an index prefix. An index prefix is a subset of a compound index; the subset consists of one or more fields at the start of the index, in order.

For example, given an index { a:1, b: 1, c: 1, d: 1 }, the following subsets are index prefixes:

{ a: 1 }
{ a: 1, b: 1 }
{ a: 1, b: 1, c: 1 }

NOTE: A really important thing about ASC/DESC compund indexes: You can only use a compound index if it ASC/DESC attribute matches the query(it applies to multiple indexes, in the single ones it just reverse the result)

 

You can do a hinting to use a desired index: http://docs.mongodb.org/manual/reference/method/cursor.hint/

 

Index Cardinality

  • Regular indexes -> 1:1
  • Spare indexes (A sparse index does not index documents that do not have the indexed field) -> <= #documents (Mongo does not keep indexes for non existing values)
  • Multi Key indexes (Index to a value that is an array) : >documents

NOTE: If you update a document and needs to be moved, all the multiple index has to be updated! http://docs.mongodb.org/manual/core/indexes/

Logging Slow Queries + Profiling

Mongo automatically log the slow queries +100ms

About profiling, there is 3 levels of profiling:

Captura

 

You turn it on by:

//slow 3 means log anything slower than 3 ms
mongod --profile 1 --slowms 3
//Anything that takes more than 3 ms will be written.
//You can see those data using 
db.system.profile.find().pretty()

//Database school2, collection students, see the slowest
db.profile.find({ns:{$regex:"^school2.students"}}).sort({millis:1}).limit(10).pretty()
//To check the profiling level
db.getProfilingLevel()
db.getProfilingStatus()

//that changes to level 2 and 5 ms
db.setProfilingLevel(2,5)

//find queries that took more than 1 second
db.system.profile.find({millis:{$gt:1000}}).sort({ts:-1})
NOTE: Level 3 is only for debug...

Index Geospatial

You can have 2D and 3D spatial indexes coordinates on MongoDB.

2D Cartesian model type

//Add index to the 2d location
db.places.ensureIndex({location:"2d",type:1})
//find locations near to those coordinates, it will return from closest to fastest
db.places.find({location:{$near:[74,140]}}).limit(3) 

//for geospatial
db.runCommand({$geoNear:"places",near:[50,50],spherical:true,maxDistance:1})

NOTE: maxdistance is in radians and we tell what collection are we using

 

Index Selectivity

Keep in mind that you want to be as fast as posible. If the index is really common the search is close to do a full scan of the data, So try to use the most “exclusive”

commands:
//In this case it's increasing, but does not matter, it's not a multiple one
db.bbb.ensureIndex({tags:1});
//create index, that will be unique
 //So it will fail at creation if it's not unique
 db.bbb.ensureIndex({tags:1},{unique:true});
//Creates a multiple index with ascending order in tags and descending in name
 db.bbb.ensureIndex({tags:1,name:-1});
//will remove all the duplicates :)
 //CAUTION CAUTION it drop duplicates
 db.bbb.ensureIndex({tags:1},{unique:true,dropDups:true});
//all the indexes in the system
 db.system.indexes.find();
//indexes in a bbb
 db.bbb.getIndexes();
//drop an index
 db.bbb.dropIndexes({student_id:1});
//the explain command show how the query was executed
 //That's the most important tool
//Without index
> db.scores.find({score:{$lte:122}}).explain();
 {
 "cursor" : "BasicCursor",  //means no cursor

....

"millis" : 634,
 "indexBounds" : {
},
 "server" : "SPMADLT0324:27017"
 }
//With index you see the cursor used is a BtreeCursor
 and the millis (milliseconds)
> db.scores.find({score:{$lte:122}}).explain();
{
 "cursor" : "BtreeCursor score_1", //type of cursor
 "isMultiKey" : false, //the cursor is not an array,
 "n" : 185427, //how many returned
 "nscannedObjects" : 185427, //number of scanned elements
 "nscanned" : 185427,
 "nscannedObjectsAllPlans" : 185427,
 "nscannedAllPlans" : 185427,
 "scanAndOrder" : false,
 "indexOnly" : false, //indicates that the query can be satisfied ONLY with the index WITHOUT going to the collection
 "nYields" : 0,
 "nChunkSkips" : 0,
 "millis" : 187,
 "indexBounds" : { //it's the index query that allows to get the results, in this case, it has a max and min
 "score" : [
 [
 -1.7976931348623157e+308,
 122
 ]
 ]
 },
 "server" : "SPMADLT0324:27017"
 }

We want to keep the indexes in memory, if we have to go to disc to read them, it's a waste.
¿How much space are my indexes using?
db.collection.stats()
db.collection.totalIndexSize()

EXAMPLE Expanded:
{
  "cursor" : "<Cursor Type and Index>",
  "isMultiKey" : <boolean>,
  "n" : <num>,
  "nscannedObjects" : <num>,
  "nscanned" : <num>,
  "nscannedObjectsAllPlans" : <num>,
  "nscannedAllPlans" : <num>,
  "scanAndOrder" : <boolean>,
  "indexOnly" : <boolean>,
  "nYields" : <num>,
  "nChunkSkips" : <num>,
  "millis" : <num>,
  "indexBounds" : { <index bounds> },
  "allPlans" : [
                 { "cursor" : "<Cursor Type and Index>",
                   "n" : <num>,
                   "nscannedObjects" : <num>,
                   "nscanned" : <num>,
                   "indexBounds" : { <index bounds> }
                 },
                  ...
               ],
  "oldPlan" : {
                "cursor" : "<Cursor Type and Index>",
                "indexBounds" : { <index bounds> }
              }
  "server" : "<host:port>",
}

Core Explain Output Fields

This section explains output for queries on collections that are not sharded. For queries on sharded collections, see Explain on Queries on Sharded Collections.

explain.cursor

cursor is a string that reports the type of cursor used by the query operation:

  • BasicCursor indicates a full collection scan.
  • BtreeCursor indicates that the query used an index. The cursor includes name of the index. When a query uses an index, the output of explain() includes indexBounds details.
  • GeoSearchCursor indicates that the query used a geospatial index.

For BtreeCursor cursors, MongoDB will append the name of the index to the cursor string. Additionally, depending on how the query uses an index, MongoDB may append one or both of the following strings to the cursor string:

  • reverse indicates that query transverses the index from the highest values to the lowest values (e.g. “right to left”.)
  • multi indicates that the query performed multiple look-ups. Otherwise, the query uses the index to determine a range of possible matches.

explain.isMultiKey

isMultiKey is a boolean. When true, the query uses a multikey index, where one of the fields in the index holds an array.

explain.n

n is a number that reflects the number of documents that match the query selection criteria.

explain.nscannedObjects

Specifies the total number of documents scanned during the query. The nscannedObjects may be lower than nscanned, such as if the index covers a query. See indexOnly. Additionally, the nscannedObjects may be lower than nscanned in the case of multikey index on an array field with duplicate documents.

explain.nscanned

Specifies the total number of documents or index entries scanned during the database operation. You want n and nscanned to be close in value as possible. The nscanned value may be higher than the nscannedObjects value, such as if the indexcovers a query. See indexOnly.
explain.nscannedObjectsAllPlans

New in version 2.2.

nscannedObjectsAllPlans is a number that reflects the total number of documents scanned for all query plans during the database operation.

explain.nscannedAllPlans

New in version 2.2.

nscannedAllPlans is a number that reflects the total number of documents or index entries scanned for all query plans during the database operation.

explain.scanAndOrder

scanAndOrder is a boolean that is true when the query cannot use the order of documents in the index for returning sorted results: MongoDB must sort the documents after it receives the documents from a cursor.If scanAndOrder is false, MongoDB can use the order of the documents in an index to return sorted results.

explain.indexOnly

indexOnly is a boolean value that returns true when the query is covered by the index indicated in the cursor field. When an index covers a query, MongoDB can both match the query conditions and return the results using only the index because:

  • all the fields in the query are part of that index, and
  • all the fields returned in the results set are in the same index.

explain.nYields

nYields is a number that reflects the number of times this query yielded the read lock to allow waiting writes execute.

explain.nChunkSkips

nChunkSkips is a number that reflects the number of documents skipped because of active chunk migrations in a sharded system. Typically this will be zero. A number greater than zero is ok, but indicates a little bit of inefficiency.

explain.millis

millis is a number that reflects the time in milliseconds to complete the query.

explain.indexBounds

indexBounds is a document that contains the lower and upper index key bounds. This field resembles one of the following:

"indexBounds" : {
                    "start" : { <index key1> : <value>, ...  },
                    "end" : { <index key1> : <value>, ... }
                },
"indexBounds" : { "<field>" : [ [ <lower bound>, <upper bound> ] ],
                  ...
                }

explain.allPlans

allPlans is an array that holds the list of plans the query optimizer runs in order to select the index for the query. Displays only when the <verbose> parameter to explain() is true or 1.

explain.oldPlan

New in version 2.2.

oldPlan is a document value that contains the previous plan selected by the query optimizer for the query. Displays only when the <verbose> parameter to explain() is true or 1.

explain.server

New in version 2.2.

server is a string that reports the MongoDB server.

http://docs.mongodb.org/manual/reference/method/cursor.explain/#explain-output-fields-core

Application driven Schema answers:

  1. What pieces of data are we using together?
  2. What pieces of data are used read-only?
  3. What pieces of data are written continuously?
  4. And all the app related questions!

 

We have to remember MongoDB special characteristics:

  • Rich Documents (array of item, another document)
  • Pre join / Embed data,
  • There is no “join”
  • No constrains
  • Atomic operation (non supported transaction)
  • No declared schema, but usually a doc in a collection use to have a similar schema

Always keep in mind: Matching the data access patterns of your application.

 

Thumb up rule: If you find yourself doing the schema design the same way than in a relational SQL database, probably you are not using the best approach

What does Living Without Constraints refer to? Keeping your data consistent even though MongoDB lacks foreign key constraints (at this moment)

 

 

Living Without Transactions

MongoDB is ACID (AtomicityConsistencyIsolationDurability) a life without transactions (I always hated the “redo”)!

It has ATOMIC operations, so nobody will read while you are editing a document. With the rich documents and Embeded docs, you can have the same, Why? Well becouse in an atomic way you are modifying the document in an atomic way and you DO NOT need a transaction at all

And in any case we can be tolerant, will somebody care if you post something and some of your FB friends see it with a 1 second delay?

You can choose:

  • Reestructure, so you can use the ATOMIC operations in your operations.
  • Implement something in software… (buffff).
  • Be tolerant with a little of inconsistency.

 

1:1 relationships example:

Data will be employee – resume ( assuming 1:1)
You can choose having 2 collections, employee and resume collection with the id or embed the resume in the employee (if the resume do not pass the 16 MB) it will depends on:

  • freq of access
  • size of items
  • atomicity of data

 Usually you will embed

1:N (to many) relationships example:

Data can be city -person. If we think in NY City has sense have 2 collections and include the id in the people collection.

 

1:N(to few) relationships example:

Data can be blog -comment in this case usually the best approach will be embed the comments in the blog

 Usually you will embed from the many to the one

 

M:N relationships example:

Data can be books-authors or students-teachers in this case is few to few in both cases usually you embed the id’s in one or another collection (be careful with atomicity) where to place it? It will depend on the way that your app access to the information. Of course you can put id’s in both collections for better performance in some cases.

Usually you will link

Tip: Remember that to embed it has to exists the container, so you cannot insert in some cases…

 

 

Performance:

Benefits of embedding:

  • Improved read Performance (spinning disks takes a lot to be at place. if the info is near it can read more info in the same spin)
  • One round trip to the DB (self explanatory)

MongoDB has multiKey indexes, that’s one of the reasons why is so fast doing searches.

 

from: http://api.mongodb.org/

from: https://education.mongodb.com/courses/10gen/M101J/2013_October/courseware/Week_3_-_Schema_Design/MongoDB_Schema_Design/