NoSQL - MongoDB Database

Hadoop
------------------------------
Map Reduce | Pig | Hive

SPARK
------------------------------------------------SQL & Hive | Streaming| ML | GraphX

NOSQL
-----------------------
MongoDB | HBase

Data Ingestion Tools
--------------------------
Sqoop | Flume

BigData Project List

MongoDB is an open source product, developed and supported by a company named 10gen and leading NoSQL database.In this Blog,we are going to explain all topics of MongoDB database such as insert documents, update documents, delete documents, query documents, projection, sort() and limit() methods, create collection, drop collection etc.

"what was the need of MongoDB although there were many databases in action?"

All the modern applications require big data, fast features development, flexible deployment and the older database systems not enough competent, so the MongoDB was obviously needed.

Main purpose to build MongoDB:

Scalability
Performance
High Availability
Scaling from single server deployments to large, complex multi-site architectures.

MongoDB works on concept of collection and document.

Database:

Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.

Collection:

Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.

Stores natural aggregates just as they are normally accedssed.
Typically use JSON,but could use XML to represent data.
Each 'document' is referenced by by key.

Document:

A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.

In short, Collection in MongoDb is equivalent to tables in RDBMS and Doucments in MongoDB is equivalent to Rows in RDBMS.

Here is sample of a Document(Row) in MongoDb

{

_id: ObjectId(7df78ad8902c)

"StudentNo":1,

"FName":"Rajiv",

"LName":"Gupta",

"Gender:"M",

"Age":34

}

_id is a 12 bytes hexadecimal number which assures the uniqueness of every document.

Any relational database has a typical schema design that shows number of tables and the relationship between these tables. While in MongoDB, there is no concept of relationship.

Storage:
MongoDB has its own File Storage system called "GridFS" which is very efficient binary storage of large capacity(Images/Videos/text/numbers and many other ypes).

Advantages of MongoDB over RDBMS:

Schema less − MongoDB is a document database in which one collection holds different documents. Number of fields, content and size of the document can differ from one document to another.
Structure of a single object is clear.
No complex joins.
Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
Ease of scale-out − MongoDB is easy to scale.
Conversion/mapping of application objects to database objects not needed.
Uses internal memory for storing the (windowed) working set, enabling faster access of data.

Where to Use MongoDB?

Big Data
Semi-Structured Content Management.
Real-time Analytics & High Speed Logging.
Need Caching & High Scalability
Mobile and Social Infrastructure
User Data Management
Data Hub

So wherever data lands as natural aggregate, MongoDB becomes a solid option to go for.

MongoDB is not great for Highly Transactional Applications.

SQl vs MongoDB Terminology:

SQL	MONGODB
Database	Database
Table	Collection
Row	Document
Column	Column/Field
Index	Index
Joins	Embedded documents ,$lookup
Primary key(Single/Multiple column)	_id field
Aggregation(Group By)	Aggregation Pipeline

Lets deep dive into MongoDB and understand how to use MongoDB in details..

Lets download the MongoDB tool into you Windows/Unix system.You can find the download link here.

Lets explore the MongoDB commands here..

//Function used inside query for filtering
$gt (greater than) ,$lt(less than) ,$gte(greater than equal) ,$lte(less than equal) ,$ne(not equal) ,$all ,$in, $nin(not in) ,count, limit,skip,group, etc..

// Creating/using DB

use school

//Creating collection(table) under 'School' DB

db.createCollection("students")

//Or there is another way, we can create collection if not exist or will insert into existing collection.

db.students.insertOne({

"studentNo":1,

"FName":"Rajiv2",

"LName":"Gupta",

"Gender:"M",

"Age":30,
"Marks":{"Math":80 ,"Science":90,"Physics":78}

})

//inserting multiple collections

db.students.insert([

{

"studentNo":2,

"FName":"Rajee",

"LName":"Gupta",

"Gender:"M",

"Age":30,
"Marks":[{"Math":80 ,"Science":90,"Physics":78}]

{

"studentNo":3,

"FName":"Mahesh",

"LName":"Gupta",

"Gender:"M",

"Age":40,
"Marks":[{"Math":70 ,"Science":90,"Physics":78}]

{

"studentNo":4,

"FName":"Raj",

"LName":"Gupta",

"Gender:"M",

"Age":60
"Marks":{"Math":60 ,"Science":90,"Physics":78}

{

"studentNo":5,

"FName":"Rajat",

"LName":"Gupta",

"Gender:"M",

"Age":30,
"Marks":{"Math":90 ,"Science":90,"Physics":78}

]

)

//See list of collection in db

show collections

//Droping collection

db.students.drop()

//droping database

db.dropDatabase()

//Quering from Database collections

db.students.find()

//Showing results in Pretty form

db.students.find().pretty()

//Get only 1st Record

db.students.findOne()

//find specific records

db.students.find({"Age":36}).pretty()

db.students.find({"Age":{$gt:35}}).pretty()

db.students.find({"Age":{$gte:35}}).pretty()

db.students.find({"Age":{$lt:35}}).pretty()

db.students.find({"Age":{$lte:35}}).pretty()

// Find records with And operator

db.students.find({"Age":66 ,"FName":"Rajiv6"}).pretty()

// Find records with Or operator

db.students.find(

{

$or:[{"Age":66},{"Age":36}]

}

).pretty()

//Find Records with And ,Or operator together

db.students.find(

{

"FName":"Rajiv6" , $or:[{"Age":66},{"Age":36}]

}

).pretty()

//using in operator

db.students.find(

{

"Age":{$in:[10,20,36]}

}

).pretty()

//Using Regexp to get records ,FName starting with letter 'R'

db.students.find(

{

"FName":/^R/}

}

).pretty()

// Find record using nested document filter

db.students.find(

{

"Marks":{Math:80}

}

).pretty()

// Only update first available record

db.students.update(

{"Age":26},

{ $set:{"LName":"Mittal"} }

)

// Update multi record

db.students.update(

{"Age":26},

{ $set:{"LName":"Agrawal"} },

{multi:true}

)

//Save method will create new record if _id does not exist else it will update the record

db.students.save(

{

"_id" : ObjectId("5a6eb114b1a9d60b8250b4d1"),

"studentNo" : 4,

"FName" : "Rajiv4",

"LName" : "Goel",

"Age" : 46

}

)

//delete document

// This will remove all document from collection

db.students.remove()

// Removing specific documents(multi records) from collection

db.students.remove(

{"Age":66}

)

//removing only 1 document from collection

db.students.remove(

{"Age":36},

)

//Projection command--> selecting only specific fields from collection

db.students.find({},{"FName":1,"_id":0} )

// Limit the result

db.students.find({},{"FName":1,"_id":0} ).limit(3)

//Skip few records from results

db.students.find({},{"FName":1,"_id":0} ).skip(2)

//Skip first 2 and show only 2 records after that

db.students.find({},{"FName":1,"_id":0} ).skip(2).limit(2)

//Sorting in asc order

db.students.find({},{"FName":1,"_id":0} ).sort({"FName":1})

//Sorting in desc order

db.students.find({},{"FName":1,"LName":1,"_id":0} ).sort({"FName":-1)

//Indexes in MongoDB

db.students.ensureIndex({"Age":1})

db.students.dropIndex({"Age":1})

//Aggregation operation (Sum/Min/Max/Average/First/Last)

//This is like count

db.students.aggregate([{$group:{_id:"$Gender" , MyResult:{$sum:1}}}] )

//Getting Sum
db.students.aggregate([{$group:{_id:"$Gender" , MyResult:{$sum:"$Age"}}}] )

//Getting Max

db.students.aggregate([{$group:{_id:"$Gender" , MaxResult:{$max:"$Age"}}}] )

//Getting Min

db.students.aggregate([{$group:{_id:"$Gender" , MinResult:{$min:"$Age"}}}] )

//Getting Average

db.students.aggregate([{$group:{_id:"$Gender" , AvgResult:{$avg:"$Age"}}}] )

//Getting First

db.students.aggregate([{$group:{_id:"$Gender" , FirstResult:{$first:"$Age"}}}] )

//Import Command for importing data into Mongo DB tables
mongoimport --db school --collection students --drop --file c:\filelocation\student.json

Note: run this mongoimort command from outside of momgo shell..run it from command prompt .

//Backup DB's

Open command prompt as administrator and go to
"C:\Program Files\MongoDB\Server\3.6\bin" folder

run mongodump.exe if you want to backup all DBS

run mongodump.exe --db school if you want to backup specific db

//Backup Database collection

run mongodump.exe --db school --collection students

//Restore DB's

//Open command prompt as administrator and go to "C:\Program Files\MongoDB\Server\3.6\bin" folder

run mongorestore.exe if you want to restore all DBs

run mongorestore.exe --db school dump/school if you want restore specific db

//Restore Database collection

run mongorestore.exe --db school --collection students dump/school/students.bson

MongoDB and GIS

MongoDB has a very useful feature called “geoNear.” There are other MongoDB spatial functions available to calculate the distance on a sphere (like the Earth), i.e. $nearSphere, $centerSphere, $near—but all of them have restrictions. The most important one is that they do not support sharding. The geoNear command in MongodDB, on the other hand, supports sharding.
Geo spatial queries-->$near ,$within_distance,Bound Queries(circle,Box)

//Important link about GEO data
http://tugdualgrall.blogspot.in/2014/08/introduction-to-mongodb-geospatial.html
https://www.percona.com/blog/2016/04/15/creating-geo-enabled-applications-with-mongodb-geojson-and-mysql/

https://dzone.com/articles/creating-geo-enabled-applications-with-mongodb-geo

//Reading from Mongo,Transformation using spark,writing to Mongo

https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html

http://repo1.maven.org/maven2/org/mongodb/mongo-hadoop/

=====================================

import org.apache.hadoop.conf.Configuration

import org.apache.spark.{SparkContext, SparkConf}

import org.apache.spark.rdd.RDD

import org.bson.BSONObject

import com.mongodb.hadoop.{

MongoInputFormat, MongoOutputFormat,

BSONFileInputFormat, BSONFileOutputFormat}

import com.mongodb.hadoop.io.MongoUpdateWritable

object SparkExample extends App {

// Set up the configuration for reading from MongoDB.

val mongoConfig = new Configuration()

// MongoInputFormat allows us to read from a live MongoDB instance.

// We could also use BSONFileInputFormat to read BSON snapshots.

// MongoDB connection string naming a collection to read.

// If using BSON, use "mapred.input.dir" to configure the directory

// where the BSON files are located instead.

mongoConfig.set("mongo.input.uri",

"mongodb://localhost:27017/db.collection")

val sparkConf = new SparkConf()

val sc = new SparkContext("local", "SparkExample", sparkConf)

// Create an RDD backed by the MongoDB collection.

val documents = sc.newAPIHadoopRDD(

mongoConfig, // Configuration

classOf[MongoInputFormat], // InputFormat

classOf[Object], // Key type

classOf[BSONObject]) // Value type

// Create a separate Configuration for saving data back to MongoDB.

val outputConfig = new Configuration()

outputConfig.set("mongo.output.uri",

"mongodb://localhost:27017/output.collection")

// Save this RDD as a Hadoop "file".

// The path argument is unused; all documents will go to "mongo.output.uri".

documents.saveAsNewAPIHadoopFile(

"file:///this-is-completely-unused",

classOf[Object],

classOf[BSONObject],

classOf[MongoOutputFormat[Object, BSONObject]],

outputConfig)

// We can also save this back to a BSON file.

val bsonOutputConfig = new Configuration()

documents.saveAsNewAPIHadoopFile(

"hdfs://localhost:8020/user/spark/bson-demo",

classOf[Object],

classOf[BSONObject],

classOf[BSONFileOutputFormat[Object, BSONObject]])

// We can choose to update documents in an existing collection by using the

// MongoUpdateWritable class instead of BSONObject. First, we have to create

// the update operations we want to perform by mapping them across our current

// RDD.

updates = documents.mapValues(

value => new MongoUpdateWritable(

new BasicDBObject("_id", value.get("_id")), // Query

new BasicDBObject("$set", new BasicDBObject("foo", "bar")), // Update operation

false, // Upsert

false // Update multiple documents

)

// Now we call saveAsNewAPIHadoopFile, using MongoUpdateWritable as the

// value class.

updates.saveAsNewAPIHadoopFile(

"file:///this-is-completely-unused",

classOf[Object],

classOf[MongoUpdateWritable],

classOf[MongoOutputFormat[Object, MongoUpdateWritable]],

outputConfig)

}

Search This Blog

Exploring BigData in Todays World

NoSQL - MongoDB Database

BigData Project List

Comments

Post a Comment

Popular posts from this blog

Exploring BigData Analytics Using SPARK in BigData World