Since there is too much information available online, one has to carefully decide whether to go for NoSQL or not. And if you decide to use NoSQL, then which database or datastore should be the right choice for you.
Now to answer the first question of NoSQL or not, you should analyze your requirement in detail. Amongst the numerous points that have to be considered, I will mention a few important ones here:
- Size of your data: Evaluate how your data is! Is storing your data in relational database like MySQL or Oracle an option? How many tables do you need to create? How many columns per table will you have on an average and most importantly how many rows each table will have?
- Flexibility: In a relational database, you need to create the schema first. Do you need some flexibility over there? Like in one of my projects I worked on a logging module and the structure of the logs differed. Therefore I wanted to have flexible schema for it.
- Data Retrieval or Faster Write Access: While some applications require fast data retrieval, some require fast write access and a few require both. Think about Google Search where fast data retrieval is very important whereas for an application like Twitter, lot of tweets require lots of write operation.
- Concurrent Read & Write Access: It’s not just the speed of the application that matters but also the concurrent read and write access that should be taken into consideration. Think about the number of Facebook users writing simultaneously on various sections of the website.
- Are you creating an application for Analytics?
- Social network Integration: Does your application have social-network features? Facebook is a big inspiration to choose NoSQL if your application has similar features. Facebook Engineering Notes and Facebook Engineering Papers are good sources of information about the latest technologies used at Facebook.
- Is indexing, caching etc. not a solution to your problem?
Before I proceed towards which database is appropriate for your application, a brief intro about few of the popular NoSQL databases is required. Though there are over 100 NoSQL databases available (Click here
for the entire list), a few popular ones are described below:
- MongoDB: Languages like Java, C, C++, PHP, etc. can be used. It is written in C++. Data interchange format is BSON (Binary JSON).
- Cassandra: Written in Java and Thrift is used for external client-facing API. It supports a wide range of languages (Thrift languages) and brings together the best blend of features of Google’s BigTable and Amazon’s Dynamo. Cassandra was developed and later open-sourced by FaceBook.
- HBase: It is built on top of Hadoop and written in Java. It is used when realtime read/write access to big data is needed.
- Neo4j: NOSQL graph database.
- Redis: advanced key-value store.
MongoDB and CouchDB are based on document store while Cassandra and HBase are column store based (aka ColumnFamilies).
Once you are convinced that your application requires NoSQL, here are a few points which will help you decide the suitable NoSQL database/datastore.
My approach is to choose the one that suits your application data model. If your application has something like a social-graph (some of the social networking features) then use a graph database like Neo4j. If your application requires storing very large amount of data and processing it then use column oriented database like HBase (Google’s BigTable belongs to ColumnFamilies).
If you want fast lookups then you should choose something like Redis which supports key/value pairs. When data-structure can vary and you need a document type storage go for something like mongoDB. If your application requires high concurrency and low latency data access then go for Membase.
Now even if you have chosen the appropriate NoSQL database there are still certain things which you should make a note of :
1. Is the DB that you have chosen easy to manage and administer?
2. Developer’s Angle – Do you have the right set of people who can get started quickly on this. Does it have enough forums and community support. Affirmative answers for these questions are a must since NoSQL DBs have not matured yet and are still in the stage of emergence.
3. Are open source communities actively building tools/frameworks/utilities around it in order to make the developer’s life easy?
There is a nice diagram available on the web- Visual Guide to NoSQL Systems
which talks about which NoSQL DB fits where, based on CAP theorem. Typically, choosing one DB may not solve all your problems. Select one keeping certain features of your application in mind and feel free to go for another one for other modules. Also, this does not mean that you remove all relational databases completely. Personally, I always prefer to keep relational DBs in my applications and use it at certain places where it really makes sense.
Since I am a regular reader of High Scalability site, I would recommend going though this URL: www.highscalability.com/blog/category/nosql . It has around 38 informative articles on NoSQL.
Apart from this, InfoQ also has good content for NoSQL: www.infoq.com/nosql
Another hot place these days to get smart answers is Quora. Do read various NoSQL related queries and their answers written by Developers/Engineers/Architects from top organizations like Facebook, Twitter, LinkedIn, Amazon and various other hot startups at: www.quora.com/NoSQL.
The list is indeed big but I am not going to publish too many URLs to divert your attention 🙂