Skip navigation

Monthly Archives: November 2013

untitledCodeCademy approach: Separate the site in 3 layers so you can optimize and care in different areas in any case

Client Layer – HTTP sizes, timing, assets sizes, JS, CSS
App Layer – Controllers, responses, group things like: give me an “user”.  Store session in redis (
Data Layer – Encapsulate access patters, will serve a “user” that can contain a lot of collections.

In small docs, the names can be the 40% of the total size. ..

They used a cool way to guarantee uniqueness of the pair user,exercise they used a compound index {user_id, exercise_id, UNIQUE} so they do not care about.

In their 3rd version:


DB layer with mongoDB does not store the documents(exercise responses), only the Amazon S3 key(metadata), so it does a quick return to the client and the client do the load from Amazon s3 (with Amazon bandwidth) so you can have a more lightweight server and scale with divide the bandwidth.

In sharded configs in amazon with EC2 instances i1, i2, i3.

where i1 is primary and the others secondary. You turn off i2 and switch to the new instance type. when it’s sync you do the same with i1 and later on with i3.

So you are always alive!


NOTE: They store the session in redis as a volatile shared memory, so the server is stateless. The client cookie has the UNIQUE session_id that relates the user with the session.


The way that FourSquares uses mongoDB (so far)



Scala + mongo driver

Basic flow:


M.Tel sends (token + location) -> server validates the token -> mongo

mongo -> server (user_id + info)

They do not cache mongo, so… WOw that was a lot of less complexity on app-code.

The servers are stateless so all the state is on devices. You can have then balanced servers and do not be aware of the session and do not need to be attached to one


  • The shard_key is user_id, and it’s a integer due to backward compatibility. They where using PostgreSQL ( auto inc)
  • They use shards + replica sets (some of them write concern, some not)
  • They use solid state disks
  • They use 2D GeoSpatial and is working a bit inaccurate into HIGH places.
  • Every logic entity has it’s own collection (users, places)
  • They try to keep the keyNames as small as possible, and a layer do the transformation so nobody has to work with “v” key name
  • They use a Scala pluggin that queries mongo and gives a compiling time error if NO index was used (then they use hint)
  • Some queries used bad indexes and they forced “hintt”