Get started with MongoDB in windows with .NET

MongoDB is currently one of the most popular NoSQL databases. The idea behind this post is to provide a concise document that will help you get started with MongoDB on Windows with .NET.

Follow the step by step guide below to get started:

  • Download the database server from http://www.mongodb.org/downloads. It’s available for 32 and 64 bit machines. Note that the 32-bit builds can store maximum of around 2 GB data. Refer to  http://blog.mongodb.org/post/137788967/32-bit-limitations for details.
  • The 32-bit installation is of around 18 MB size. It is a zip file which contains a bunch of executables.
  • To start the server:
    •  Create a batch file with ‘mongod.exe –dbpath “C:\\data\\d”. Here ‘dbpath’ specifies the location of the database as “C:\data\db” and any newly created database will be placed at this location.
    • Executing this command will start the server and this is how it  is shown in the console.

Now, let’s take a look at the available drivers in C# .Net for MongoDB. The official driver can be downloaded from  https://github.com/mongodb/mongo-csharp-driver/downloads and its details are at  http://www.mongodb.org/display/DOCS/CSharp+Language+Center. You can also get drivers with fluent interface on top of this official driver from https://github.com/craiggwilson/fluent-mongo. I found another driver-NoRM at https://github.com/atheken/NoRM which uses C# classes. MongoVUE is the IDE available for windows.Sample Application:

  • Create a console application. This will keep the application free from any other code and everyone can relate to it easily.
  •  I will be using the NoRM driver in the sample application.  Add the NoRM.dll in your project reference. Add this line along with other statements for using a library: using Norm;
  • To obtain a handle to the database, a static method is available which needs a connection string to the database. Use the following code: IMongo db=Mongo.Create(“mongodb://localhost/MongoTest”); Here the ‘MongoTest’ is the name of the database and I am using the server on my local machine.
  • Assuming that the server is up and running, this call will return a valid handle to the ‘MongoTest’ database. This is valid even if the database is not created yet. A subsequent call to save any value in it can create a new database. Only calling this method will not create a database.
  • So, let’s assume that we have employee data to be saved. This is how you save an object of employee class: db.GetCollection().Save(emp); ’emp’ is the object of class ‘Employee’ and executing this line will actually create a database (if not created already). It will also create a new collection name ‘Employee’ with one entry for ’emp’ object.
  •  To get a list of employees: var employees = db.GetCollection().AsQueryable().AsEnumerable();
  • To delete an employee: Employee emp = db.GetCollection e.Name == “emp1”).SingleOrDefault(); db.GetCollection().Delete(emp);
Yes, that’s all you got to do to get started with MongoDB from .NET. The full application is available at https://github.com/vipul15184/MongoTestConsoleApp and this is the full page of program.cs file to consolidate:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Norm;namespace MongoTestConsoleApp
{
public class Employee
{
[MongoIdentifier]
public ObjectId Id { get; set; }public string Name { get; set; }
}class Program
{
static void Main(string[] args)
{
Employee emp = new Employee();
emp.Name = “emp1”;IMongo db = Mongo.Create(“mongodb://localhost/MongoTest”);

db.GetCollection().Save(emp);
}
}
}

NoSQL…Hang on a sec

Since there is too much information available online, one has to carefully decide whether to go for NoSQL or not. And if you decide to use NoSQL, then which database or datastore should be the right choice for you.

Now to answer the first question of NoSQL or not, you should analyze your requirement in detail. Amongst the numerous points that have to be considered, I will mention a few important ones here:

  1. Size of your data: Evaluate how your data is! Is storing your data in relational database like MySQL or Oracle an option? How many tables do you need to create? How many columns per table will you have on an average and most importantly how many rows each table will have?
  2. Flexibility: In a relational database, you need to create the schema first. Do you need some flexibility over there? Like in one of my projects I worked on a logging module and the structure of the logs differed. Therefore I wanted to have flexible schema for it. 
  3. Data Retrieval or Faster Write Access: While some applications require fast data retrieval, some require fast write access and a few require both. Think about Google Search where fast data retrieval is very important whereas for an application like Twitter,  lot of tweets require  lots of write operation.
  4. Concurrent Read & Write Access: It’s not just the speed of the application that matters but also the concurrent read and write access that should be taken into consideration. Think about the number of Facebook users writing simultaneously on various sections of the website. 
  5. Are you creating an application for Analytics? 
  6. Social network Integration: Does your application have social-network features? Facebook is a big inspiration to choose NoSQL if your application has similar features. Facebook Engineering Notes and Facebook Engineering Papers are good sources of information about the latest technologies used at Facebook.
  7. Is indexing, caching etc. not a solution to your problem?
Before I proceed towards which database is appropriate for your application, a brief intro about few of the popular NoSQL databases is required. Though there are over 100 NoSQL databases available (Click here for the entire list), a few popular ones are described below:

  • MongoDB: Languages like Java, C, C++, PHP, etc. can be used. It is written in C++. Data interchange format is BSON (Binary JSON).
  • CouchDB: Written in Erlang with javascript as the main query language. Uses JSON and REST as protocol. Useful in cases where data changes occur often and when you want to run predefined queries.
  • Cassandra: Written in Java and Thrift is used for external client-facing API. It supports a wide range of languages (Thrift languages) and brings together the best blend of features of Google’s BigTable and Amazon’s Dynamo. Cassandra was developed and later open-sourced by FaceBook.
  • HBase: It is built on top of Hadoop and written in Java. It is used when realtime read/write access to big data is needed.
  • Neo4j:  NOSQL graph database.
  • Redis: advanced key-value store.
MongoDB and CouchDB are based on document store while Cassandra and HBase are column store based (aka ColumnFamilies).
Once you are convinced that your application requires NoSQL, here are a few points which will help you decide the suitable NoSQL database/datastore.

My approach is to choose the one that suits your application data model. If your application has something like a social-graph (some of the social networking features) then use a graph database like Neo4j. If your application requires storing very large amount of data and processing it then use column oriented database like HBase (Google’s BigTable belongs to ColumnFamilies).
If you want fast lookups then you should choose something like Redis which supports key/value pairs. When data-structure can vary and you need a document type storage go for something like mongoDB. If your application requires high concurrency and low latency data access then go for Membase.

Now even if you have chosen the appropriate NoSQL database there are still certain things which you should make a note of :

1. Is the DB that you have chosen easy to manage and administer?
2. Developer’s Angle – Do you have the right set of people who can get started quickly on this. Does it have enough forums and community support. Affirmative answers for these questions are a must since NoSQL DBs have not matured yet and are still in the stage of emergence.
3. Are open source communities actively building tools/frameworks/utilities around it in order to make the developer’s life easy?

There is a nice diagram available on the web- Visual Guide to NoSQL Systems which talks about which NoSQL DB fits where, based on CAP theorem. Typically, choosing one DB may not solve all your problems. Select one keeping certain features of your application in mind and feel free to go for another one for other modules. Also, this does not mean that you remove all relational databases completely. Personally, I always prefer to keep relational DBs in my applications and use it at certain places where it really makes sense.

Recommended sites:

Since I am a regular reader of High Scalability site, I would recommend going though this URL: www.highscalability.com/blog/category/nosql . It has around 38 informative articles on NoSQL.

Apart from this, InfoQ also has good content for NoSQL: www.infoq.com/nosql

Another hot place these days to get smart answers is Quora. Do read various NoSQL related queries and their answers written by Developers/Engineers/Architects from top organizations like Facebook, Twitter, LinkedIn, Amazon and various other hot startups at: www.quora.com/NoSQL.

The list is indeed big but I am not going to publish too many URLs to divert your attention 🙂

CAP Theorem

Wikipedia (http://en.wikipedia.org/wiki/CAP_theorem) – In theoretical computer science the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency, Availability and Partition tolerance.

The diagram below talks about which NoSQL DB fits where, based on CAP theorem. This theorem was put forward by Eric Brewer. CAP theorem in brief states that a system can have only two out of three properties- Consistency, Availability and Partition-tolerance.

An application is considered to be consistent if an operation is performed fully or not at all. If the operation gets performed half-way, inconsistency gets created. To give you an example, there are many deals sites at present. Some of the deals are limited to a specific count say 100. Suppose you are the 100th consumer of the deal, you are about to pay for it and in the meanwhile someone else also selects the same deals and makes payment. Both of you cannot become the 100th consumer of the deal. So while you’ve confirmed to your selection of a deal, the second one should not be proceeded to make payment at all.

Availability states that the service remains available. Taking the same example as above, if you want to buy something in a deal but the site itself refuses to open, may be because of heavy traffic i.e.non-availability of service, is of no use.

Partitioning will not be needed if all your data resides and runs in a single box. But where the amount of data is huge it is very likely that there will be partition and data will be distributed. For a system to be partition-tolerant, it should respond correctly in every case except the case of total network failure.

In most of the NoSQL DBs, consistency is being compromised. Typically choosing one DB may not solve all your problems. Select one, keeping certain features of your application in mind and feel free to go for another one for other modules. Also this doesn’t mean removing relational database completely. Personally I always prefer to keep relational DB in my applications and use it at certain places where it really makes sense to use it!

In fact if your application is large enough, you can use a combination of NoSQL databases- utilizing the required features of each one of them, plus use a traditional database.

Build Automation with Gradle

Gradle is the latest build automation tool in vogue, particularly for Java projects. It aims at keeping the flexibility provided by Ant builds that Maven lacks. Gradle combines it with the build-by-convention functionality that Maven provides, but makes it much more flexible and configurable. It provides dependency management through Ivy and provides very good multiproject support, through its incremental build methodology as well as support for partial builds.

WHY USE GRADLE?
Gradle builds are written in its own Groovy based DSL, known as Gradle Build Language. It is a build programming language with features for organizing build logic in a more readable and maintainable form. For instance set of instructions that are being used repeatedly can be extracted in Gradle as a method. This method can be named with appropriate parameters in each of the tasks, thus avoiding rewriting code. Gradle developers cite its closeness to Java in terms of syntax, type system and package structure as the reason behind using Groovy for DSL. The argument is that although Gradle is a general purpose build tool at its core, its main focus is Java projects. Groovy provides the greatest transparency and ease of understanding for Java developers amongst dynamic scripting languages like Python, Groovy and Ruby.

GRADLE BUILD FILE
Gradle uses Gradle Build File as the build script for a project. It is Gradle’s substitute to Maven’s pom.xml and Ant’s build.xml. The file is actually a build configuration script rather than a build script. The Gradle command looks for this file to start the build.

There are two basic building blocks in Gradle- projects and tasks. Project represents any component of the software that can be built. For instance a library JAR, a web application WAR or a distribution ZIP assembled from JARs produced by various projects. It could also be something to be done, like deploying the application to various environments. A task represents an atomic piece or work that a build performs, like deleting the older classes, compiling some classes, creating a JAR, generating a Javadoc or publishing archives to a repository. Thus, tasks are the building blocks of builds in Gradle.

To illustrate a few things that stood out for us while evaluating Gradle, Gradle build scripts are made up of code and can be used to leverage the power and flexibility of Groovy. Thus, a build can have easy to read, reusable code blocks like :

task upper << {
String someString = ‘mY_nAmE’
println “Original: ” + someString
println “Upper case: ” + someString.toUpperCase()
}

output:
> Gradle -q upper
Original: mY_nAmE
Upper case: MY_NAME
Iterations can be done with ease when needed, example:

task count << {
4.times { print “$it ” }
}

output:
> Gradle -q count
0 1 2 3
Methods can be extracted out of the logic and reused, example:

task checksum << {
fileList(‘../tempFileDirectory’).each {File file ->
Ant.checksum(file: file, property: “cs_$file.name”)
println “$file.name Checksum: ${Ant.properties[“cs_$file.name”]}”
}
}

task loadfile << {
fileList(‘../tempFileDirectory’).each {File file ->
Ant.loadfile(srcFile: file, property: file.name)
println “I’m fond of $file.name”
}
}
File[] fileList(String dir) {
file(dir).listFiles({file -> file.isFile() } as FileFilter).sort()
}

Another point to note here is the use of Ant tasks (Ant.checksum and Ant.loadfile). This demonstrates how Ant tasks are treated in Gradle as first class citizens. The Java plugin that comes with Gradle’s distribution is also a neat addition and reinforces the claim of useful build-by-convention support. It defines a bunch of conventional build tasks like clean, compile, test and assemble for Java projects.

In conclusion, Gradle has the potential to replace the existing build tools and processes in a big way. However, the move from existing systems to Gradle has understandably been limited across projects. This can be due to the small yet measurable learning curve that comes with moving to Gradle, the relatively low importance attributed to build systems in a project or developers preferring to use systems that they’re already comfortable with. Gradle is definitely an option worth considering, if you are going to start a new project, or your current build tools aren’t cutting it for you anymore.

Better Unit testing with Mockito

Unit tests are like guidelines that help you test right. These guide your design to be loosely coupled, well etched out and provide fast automated regression for refactors and small changes to the code.

Best unit test case scenario is with an independent isolated class. Unit Testing is harder with dependencies on remote method calls, file system operations, DB operations etc.Unit testing means automation and minimum setup. All these dependencies require initial setup and take a very long time to execute. Also, it makes it almost impossible to test the class for the negative cases, eg: Network failure, File system not accessible, DB errors. You need to change the response of these dependencies to execute each unit test case for the class.Mock comes to rescue unit tests. Mock doesn’t mean to make mockery of any object.

Mockito Framework

Mockito is an open source testing framework for Java. The framework allows the creation of Test Double objects called, “Mock Objects” in automated unit tests for the purpose of Test-driven Development or Behavior Driven Development. Mockito Framework enables mocks creation, stubbing and verification..

What is Object Mocking?
Mock objects simulate (fake) real objects and they use the same Interface(s) or class as the real object. A mock object allows you to set positive or negative expectations. The mock object lets you verify that the expectations were met or not, i.e. it records all the interactions which can be verified later.

When to Mock?
Object mocking should be conducted when the real object:

  • Has a behavior that is hard to cause or is non-deterministic.
  • Is slow and difficult to set up.
  • Has (or is ) a UI.
  • Does not exist For example,  Team A is working on X and requires Y from Team B, at the same time team B is working on Y, to start the task of X team A can mock Y.
  • Can simulate both behavior and ill-behavior.

To know more about Mockito,
Visit: http://docs.mockito.googlecode.com/hg/org/mockito/Mockito.html

Agile Database Migration Tools for Java

Agile methodologies of software development require a major shift in approach towards database management. The reason is requirements are never really frozen during agile development. Though changes are controlled, the attitude of the process is to enable change as much as possible. This change can be in response to the inherent instability of requirements in many projects or to better support dynamic business environments.

Thus agile development needs tools to enable and support evolutionary database design, along with solving problems like porting database schema changes on an instance with critical data.

Ruby on Rails, which is designed for agile development, provides built-in capabilities to cope with this problem, in the form of Active Record Migration. However, Java currently doesn’t have a solution of that pedigree. There are a few viable alternatives though, the most popular among the current breed is Liquibase. I will present an evaluation here which might help you choose the best tool.

Liquibase: Here are my observations, from trying to integrate it with a Spring, Hibernate3 annotation based configuration project. Liquibase is described as a substitute for Hibernate’s hbm2ddl. It can be integrated with an existing project in a few simple steps. The advantages are Spring integration so dev environments have updated databases. Secondly version management – changesets are ID-ed and stored in a table after being applied to a database and changelogs can be written in SQL.

Liquibase gives 2 ways of writing migrations, XML based and SQL based. The biggest disadvantage of Liquibase is the inability to write java-based migrations. This lead us to a search for a solution that provides most of the features present in Liquibase, as well as support for Java-based migrations.

c5-db-migration: c5-db-migration supports migrations in Groovy, as well as migrations from within the application i.e. it provides APIs for migrations. However it doesn’t support migrations written in Java. There is no multiple schema support and it isn’t present in maven central.

Migrate4j: This tool supports Java migrations and it also provides an API for migrations. Disadvantage is that it doesn’t support even plain SQL migrations and is terribly short on features (no auto creation of metadata table, no multiple schema support and no maven support).

Flyway: Flyway offers cleaner versioning than Liquibase, migrations can be written in Java as well as SQL, and it supports auto-discovery of migrations in project packages.  However the one missing feature is that it doesn’t support rollbacks, though it is more of a design choice taken by Flyway developers.

After careful evaluation of each of these tools we decided to go ahead with Flyway for the project, and it has been great so far.

Building Web Services with JAX-WS : An Introduction

Developing SOAP based web services seem difficult because the message formats are complex.  JAX-WS API  makes it easy by hiding the complexity from the application developer. JAX-WS is the abbreviation for Java API for XML Web Services. JAX-WS is a technology used for building web services and clients that communicate using XML.

JAX-WS allows developers to write message-oriented as well as RPC-oriented web services.

A web service operation invocation in JAX-WS is represented by an XML-based protocol like SOAP. The envelope structure, encoding rules and conventions for representing web service invocations and responses are defined by the SOAP specification. These calls and responses are transmitted as SOAP messages over HTTP.

Though SOAP messages appear complex, the JAX-WS API hides this complexity from the application developer.  On the server-side, you can specify the web service operations by defining methods in an interface written in the Java programming Language. You need to code one or more classes that implement these methods.

It is equally easy to write client code. A client creates a proxy, a local object representing the service and then invokes methods on the proxy. With JAX-WS you do not generate or parse SOAP messages. Thanks to JAX-WS runtime system that converts the API calls and responses to and from SOAP messages!

With JAX-WS, web services and clients have a big advantage which is the platform independence of the Java programming language. Additionally JAX-WS is not restrictive: a JAX-WS client can access a web service that is not running on the Java platform and vice versa. This flexibility is possible because JAX-WS uses technologies defined by the World Wide Web Consortium – HTTP SOAP and the Web Service Description Language or WSDL. WSDL specifies an XML format to describe a service as a set of endpoints operating on messages.

Three most popular implementations of JAX-WS are:

Metro: It is developed and open sourced by Sun Microsystems. Metro incorporates the reference implementations of the JAXB 2.x data-binding and JAX-WS 2.x web services standards along with other XML-related Java standards. Here is the project link: http://jax-ws.java.net/. You can go through the documentation and download the implementation.

Axis: The original Apache Axis was based on the first Java standard for Web services which was JAX-RPC. This did not turn out to be a great approach because JAX-RPC constrained the internal design of the Axis code and caused performance issues and lack of flexibility. JAX-RPC also made some assumptions about the direction of Web services development, which turned out to be wrong!

By the time the Axis2 development started, replacement for JAX-RPC had already come into picture. So Axis2 was designed to be flexible enough to support the replacement web services standard on top of the base framework. Recent versions of Axis2 have implemented support for both the JAXB 2.x Java XML data-binding standard and the JAX-WS 2.x Java web services standard that replaced JAX-RPC(JAX-RPC: Java API for XML-based Remote Procedure Calls. Since RPC mechanism enables clients also to execute procedures on other systems, it is often used in a distributed client-server model. RPC in JAX-RPC is represented by an XML-based protocol such as SOAP). Here is the project link: http://axis.apache.org/axis2/java/core/

CXF: Another web services stack by the Apache. Though both Axis2 and CXF originated at Apache, they take very different approaches about how web services are configured and delivered. CXF is very well documented and has much more flexibility and additional functionality if you’re willing to go beyond the JAX-WS specification. It also supports Spring. Here is the project link: http://cxf.apache.org/

At Talentica we have used all the three implementations mentioned above in various projects.

Why CXF is my choice?
Every framework has its strengths but CXF is my choice. CXF has great integration with the Spring framework and I am a huge fan of Spring projects. It’s modular and easy to use. It has great community support and you can find lots of tutorials/resources online. It also supports both JAX-WS and JAX-RS specification but that should not be a considering factor while choosing CXF. Performance-wise it is better than AXIS and it gives almost similar performance when compared to Metro.

So CXF is my choice, what is yours?

Event: Node.js Over the Air!

Over the Air sessions at Talentica are technical workshops where a bunch of developers roll up their sleeves, tinker around with new platforms/technologies to learn together, gather new insights and get a healthy dollop of inspiration. Last week we had an “Over the Air” session on Node.js.

Node.JS is a server side javascript interpreter that changes the notion of how a server should work. It’s goal is to enable a programmer to build highly scalable applications that handle tens of thousands of simultaneous connections on a single server machine. Node.js is one of the most talked about technology today. To know how it works really, we picked it up for this Over the Air session.

Once we gathered up, it took a little while for some of the participants to get used to the event-driven programming style. Pretty soon, we were all working together on building a cool chat app. By the end of the day, we had a fully working version of a chat room app in which any user can enter the chat room by simply entering a nickname. Subsequent entries are posted to all logged in users. The right side pane shows all the logged in users.

This is a fairly decent basic version. Going forward, we plan to enhance the User Interface so that people can play games using the chat app; integrate the UI with the chat engine and enable users to be able to challenge each other to play while chatting.

First Impressions
Node.js is an excellent option for interactive apps. I will not hesitate to use Node.js in products that require interactive functionality like chat, auctions, online multiplayer games. One can use Node.js to suit a part of the product than building the complete product on it.

The fact that we can code server side with Javascript should make Javascript developers jump with joy. Code reuse between client and server side might actually be possible!

On the negative side, I am not sure if the event programming model is a good one on the server side. It might lead to spaghetti code with callbacks all over the place. Another thing is that though the community is very active and plug-ins are being developed at a rapid pace – it is still not a tried and tested technology at this moment!

Multi-server Applications on the Wireless Web

Here we will discuss how we can build Web applications that can serve wireless clients according to client capabilities.

What are the challenges?
Development of mobile applications is often highly dependent on the target platform. When developing any mobile content portal we generally think about the accessibility of that portal through the mobile browsers (like Nokia, Openwave, i-mode browsers, AvantGo in PDA etc) which generally use markup languages like WML, HDML, cHTML, XHTML etc. We want to ensure that the browser gets the compatible markup language and can present the portal content in correct format. In short, creating a wireless application that works on as many devices as possible is not difficult, it’s useless. If you invest a huge amount of resources today, chance are that a new device will be shipped tomorrow and you‘ll need to tweak your application again.

What is the solution?
Wireless Universal Resource File (WURFL) is an open source project that uses XML to describe the capabilities of wireless devices. It is a database (some call it a “repository”) of wireless device capabilities. With WURFL, figuring out which phone works with which technology is a whole lot easier. We can use the WURFL to figure out device capabilities programmatically and to serve different content to different devices dynamically, depending on the device accessing the content.

Here are some of the things WURFL can help you know about a device:

  • Screen size of the device
  • Supported image, audio, video, ringtone, wallpaper, and screensaver formats
  • Whether the device supports Unicode
  • Whether it is a wireless device? What markup does it support?
  • What XHTML MP/WML/cHTML features does it support? Does it work with tables? Can it work with standard HTML?
  • Does it have a pointing device? Can it use CSS?
  • Does it have Flash Lite/J2ME support? What features?
  • Can images be used as links on this device? Can it display image and text on the same line?
  • If it is an iMode phone, what region is it from? Japan, US or Europe?
  • Does the device auto-expand a select drop down? Does it have inline input for text fields?
  • What SMS/MMS features are supported?

WURFL framework also contains tools, utilities and libraries to parse and query the stored data in WURFL. WURFL API is available in many programming languages, including Java, PHP, .Net, Ruby, and Python. Various open source tools are build around this WURL – HAWHAW(PHP), WALL(Java) , HAWHAW.NET (.Net framework) , HawTag (JSP Custom tag library etc).

How does WURFL work?
When a mobile or non-mobile web browser visits your site, it sends a User Agent along with the request for your page. The user agent contains information about the type of device and browser that is being used. Unfortunately, this information is very limited and at times is not representative of the actual device. Using WURFL API, the framework then extracts the capabilities associated with that device. Based on the device capabilities, the framework creates the dynamic content – WML, HTML, XHTML etc.

Though there is concern with the extra latency time taken due to user-agent look up, it’s worth to use it looking at its advantages. One of the biggest advantages is regarding a new device if and when it enters the market, we will not need to change our application, but just update the WURFL to keep the application optimized. It is very simple and the architecture is sound. Go for it!!!