MongoDB: bypass the 2GB limit on 32 Bits


If you have a 32-Bit server, you might have noticed that MongoDB by default doesn't allow you to store more than 2GB of data on a specific mongod process (because of the virtual addressable space available on such deployments) so, how can we bypass this limit? the answer is with sharding:

NOTE: I'll be using a Fedora Linux as main O.S. for my examples, these commands may also apply for CentOS/RHEL and with very little modifications, any *nix O.S. out there.

What is sharding?

Sharding is the process by which we split (replication included) a MongoDB database across multiple servers (or mongod processes running on a single machine like in this case). Let's see how to implement it:

First: Configure a Shard

We'll need to setup a shard on the machine so it can use two mongod instances (and sync between them) instead of one. You can initiate as many "nodes" as you need to increase your storage (in groups of three, as they're replica sets): Every usable mongod you add to the shard is going to increase database's total storage by 2GB; In this particular case, we're going to launch 2 usable mongod instances and one arbiter in one shard (usually for every shard we launch, we put in it only three nodes in a "replica set fashion" and if we need more nodes, we create another shard of three); But in this case as I said before I'll create one shard with two nodes in it (so in a 32-bit machine this setup will increase my storage up to 4GB):

1. su -
2. mkdir -p /data/shard0
3. chown user /data/shard0
4. cd /data/shard0
5. mkdir -p rs1 && mkdir -p rs2 && mkdir -p rs3
6. chown user rs1 && chown user rs2 && chown user rs3
7. service mongod stop && chkconfig mongod off
8. mongod --replSet myShrd --logpath "s0-r1.log" --dbpath /data/shard0/rs1 --port 37017 --fork --shardsvr
9. mongod --replSet myShrd --logpath "s0-r2.log" --dbpath /data/shard0/rs2 --port 37018 --fork --shardsvr
10. mongod --replSet myShrd --logpath "s0-r3.log" --dbpath /data/shard0/rs3 --port 37019 --fork --shardsvr

Here, you have to replace user with your standard username (not root) and myShrd with the name you want for your shard, then, we'll continue in a standard user terminal (not root):

1. mongo --port 37017
2. config = {_id: "myShrd", members: [{_id: 0, host: "Hostname.local:37017"}, {_id: 1, host: "Hostname.local:37018"}, {_id: 2, host: "Hostname.local:37019", arbiterOnly: "true"}]}
3. rs.initiate(config)

Second: Create ConfigServers

And that will fire up our Replica set ready for sharding... (Note that you need to change myShrd with the name of your shard in the above commands also); Then we exit the mongo shell with CTRL+D and we need to create a config-servers setup, normally for production we'll use three of these (no matter how many shards/nodes we have):

1. su -
2. mkdir -p /data/config
3. chown user /data/config
4. cd /data/config
5. mkdir -p config-a && mkdir -p config-b && mkdir -p config-c
6. chown user config-a && chown user config-b && chown user config-c
7. mongod --logpath "cfg-a.log" --dbpath /data/config/config-a --port 47017 --fork --configsvr
8. mongod --logpath "cfg-b.log" --dbpath /data/config/config-b --port 47018 --fork --configsvr
9. mongod --logpath "cfg-c.log" --dbpath /data/config/config-c --port 47019 --fork --configsvr

Same rules apply here, change user with your standard (non-root) username in the commands

Third: Tie-up Everything

Now we need to "connect the dots" with a mongos process:

1. su -
2. mongos --logpath "mongos-1.log" --configdb Hostname.local:47017,Hostname.local:47018,Hostname.local:47019 --fork

And finally (as a standard user, non-root) we "glue" everything with:

1. mongo
2. db.adminCommand({addshard: "myShrd/Hostname.local:37017"})
3. db.adminCommand({enableSharding: "myDB"})
4. db.adminCommand({shardCollection: "myDB.collection", key: {theKey: 1}})

Where:

  • myShrd is the name of your shard (selected in the first step)
  • myDB is the name of the DB where you want sharding enabled
  • myDB.collection is the the collection (in the DB) you wanna shard
  • theKey is the name of the indexed field you're going to use as a shard key

All of this might seem a little cumbersome at first but is easy to get used to it, just try it and you'll see...

Extra: Running all on boot

The best way I've found to make this setup work at boot, is with an autostart script runned via the nohup command (so we can keep a log). The script doesn't seem to work with other methods such as rc.local, just autostart... To implement this, just run the following commands to download the script and make it executable:

1. wget https://spideroak.com/share/PBSW433EMVZXS43UMVWXG/78656e6f6465/srv/CDN/xenodecdn/replSet.sh -O ~/replSet.sh
2. chmod +x ~/replSet.sh

Once downloaded, edit it according to your own setup (change the Hostname variable, the name of the shard, the paths, etc). If your server has a graphical desktop then adding this script won't be hard: for example in GNOME we just have to open the gnome-session-properties dialog (via Alt+F2) and add the following command there:

nohup path/to/the/script


And that's pretty much it, you're now bypassing MongoDB's limits by using sharding. Verify by rebooting your computer (so the script can run) and then enter the mongo shell in a terminal, you'll see something like: