NoSQL - MongoDB Database

Hadoop
------------------------------
Map Reduce | Pig | Hive
SPARK
------------------------------------------------SQL & Hive | StreamingML | GraphX
NOSQL
-----------------------
MongoDB HBase
Data Ingestion Tools
--------------------------
Sqoop Flume

BigData Project List

MongoDB is an open source product, developed and supported by a company named 10gen and leading NoSQL database.In this Blog,we are going to explain all topics of MongoDB database such as insert documents, update documents, delete documents, query documents, projection, sort() and limit() methods, create collection, drop collection etc. 

 "what was the need of MongoDB although there were many databases in action?"
All the modern applications require big data, fast features development, flexible deployment and the older database systems not enough competent, so the MongoDB was obviously needed.

Main purpose to build MongoDB:
  1. Scalability
  2. Performance
  3. High Availability
  4. Scaling from single server deployments to large, complex multi-site architectures.

MongoDB works on concept of collection and document.
Database:
Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.

Collection:
Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.
  • Stores natural aggregates just as they are normally accedssed.
  • Typically use JSON,but could use XML to represent data.
  • Each 'document' is referenced by by key.

Document:
A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.

In short, Collection in MongoDb is equivalent to tables in RDBMS and Doucments in MongoDB is equivalent to Rows in RDBMS.
Here is sample of a Document(Row) in MongoDb
{
     _id: ObjectId(7df78ad8902c)
     "StudentNo":1,
     "FName":"Rajiv",
     "LName":"Gupta",
     "Gender:"M",
     "Age":34
}
 _id is a 12 bytes hexadecimal number which assures the uniqueness of every document.

Any relational database has a typical schema design that shows number of tables and the relationship between these tables. While in MongoDB, there is no concept of relationship.

Storage:

MongoDB has its own File Storage system called "GridFS" which is very efficient binary storage  of large  capacity(Images/Videos/text/numbers and many other ypes).


Advantages of MongoDB over RDBMS:
  • Schema lessMongoDB is a document database in which one collection holds different documents. Number of fields, content and size of the document can differ from one document to another.
  • Structure of a single object is clear.
  • No complex joins.
  • Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
  • Ease of scale-out − MongoDB is easy to scale.
  • Conversion/mapping of application objects to database objects not needed.
  • Uses internal memory for storing the (windowed) working set, enabling faster access of data.

Where to Use MongoDB?
  • Big Data
  • Semi-Structured Content Management.
  • Real-time Analytics & High Speed Logging.
  • Need Caching & High Scalability
  • Mobile and Social Infrastructure
  • User Data Management
  • Data Hub
So wherever data lands as natural aggregate, MongoDB becomes a solid option to go for.

MongoDB is not great for Highly Transactional Applications.

SQl vs  MongoDB Terminology:
SQL
MONGODB
Database
Database
Table
Collection
Row
Document
Column
Column/Field
Index
Index
Joins
Embedded documents ,$lookup
Primary key(Single/Multiple column)
_id field
Aggregation(Group By)
Aggregation Pipeline





Lets deep dive into MongoDB and understand how to use MongoDB in details..

Lets download the MongoDB tool into you Windows/Unix system.You can find the download link here.

Lets explore the MongoDB commands  here..

//Function used inside query for filtering
$gt (greater than) ,$lt(less than) ,$gte(greater than equal) ,$lte(less than equal) ,$ne(not equal) ,$all ,$in, $nin(not in) ,count, limit,skip,group, etc..

// Creating/using DB
   use school

//Creating collection(table) under 'School' DB
  db.createCollection("students")

//Or  there is another
 way, we can create collection if not exist or will insert into existing collection.
    db.students.insertOne({
                                 "studentNo":1,
                                 "FName":"Rajiv2",
                                 "LName":"Gupta",
                                 "Gender:"M",
                                 "Age":30,
                                 "Marks":{"Math":80 ,"Science":90,"Physics":78}
                                 })

//inserting multiple collections
db.students.insert([
                               {
                                 "studentNo":2,
                                 "FName":"Rajee",
                                 "LName":"Gupta",
                                 "Gender:"M",
                                 "Age":30,
                                 "Marks":[{"Math":80 ,"Science":90,"Physics":78}]
                              },
                              {
                                 "studentNo":3,
                                 "FName":"Mahesh",
                                 "LName":"Gupta",
                                 "Gender:"M",
                                 "Age":40,
                                 "Marks":[{"Math":70 ,"Science":90,"Physics":78}]
                              },
                              {
                                 "studentNo":4,
                                 "FName":"Raj",
                                 "LName":"Gupta",
                                 "Gender:"M",
                                 "Age":60
                                 "Marks":{"Math":60 ,"Science":90,"Physics":78}
                              },
                               {
                                 "studentNo":5,
                                 "FName":"Rajat",
                                 "LName":"Gupta",
                                 "Gender:"M",
                                 "Age":30,
                                 "Marks":{"Math":90 ,"Science":90,"Physics":78}
                              },
                          ]
                      )

//See list of collection in db
      show collections

//Droping collection
    db.students.drop()

//droping database
    db.dropDatabase()

//Quering from Database collections
   db.students.find()
 //Showing results in  Pretty form
   db.students.find().pretty()
//Get only 1st Record
    db.students.findOne()


//find specific records
   db.students.find({"Age":36}).pretty()
   db.students.find({"Age":{$gt:35}}).pretty()
   db.students.find({"Age":{$gte:35}}).pretty()
   db.students.find({"Age":{$lt:35}}).pretty()
   db.students.find({"Age":{$lte:35}}).pretty()

// Find records with And operator
   db.students.find({"Age":66 ,"FName":"Rajiv6"}).pretty()

// Find records with Or operator
   db.students.find(
                      {
                        $or:[{"Age":66},{"Age":36}]
                      }
                            ).pretty()

//Find Records with And ,Or operator together
    db.students.find(
                            {
                            "FName":"Rajiv6" , $or:[{"Age":66},{"Age":36}]
                           }
                                ).pretty()

//using in operator
  db.students.find(
                            {
                           "Age":{$in:[10,20,36]}
                           }
                                ).pretty()

//Using Regexp to get records ,FName starting with letter 'R'
  db.students.find(
                            {
                           "FName":/^R/}
                           }

                                ).pretty()

// Find record using nested document filter
  db.students.find(
                            {
                           "Marks":{Math:80}
                           }


                                ).pretty()

// Only update first available record
      db.students.update(
                       {"Age":26},
                      { $set:{"LName":"Mittal"} }
                                         )

// Update multi record
       db.students.update(
                 {"Age":26},
                { $set:{"LName":"Agrawal"} },
                {multi:true}
                                         )

//Save method will  create new record if _id does not exist else it will update the record

         db.students.save(
                  {
                     "_id" : ObjectId("5a6eb114b1a9d60b8250b4d1"),
                       "studentNo" : 4,
                       "FName" : "Rajiv4",
                       "LName" : "Goel",
                       "Age" : 46
                }
                                 )


//delete document
     // This will remove all document from collection
            db.students.remove()

    // Removing specific documents(multi records) from collection
           db.students.remove(
                                       {"Age":66}
                                           )

   //removing only 1 document from collection
           db.students.remove(
                              {"Age":36},
                                1
                                           )

//Projection command--> selecting only specific fields from collection

     db.students.find({},{"FName":1,"_id":0} )

// Limit the result
      db.students.find({},{"FName":1,"_id":0} ).limit(3)

//Skip few records from results
     db.students.find({},{"FName":1,"_id":0} ).skip(2)

//Skip first 2 and show only 2 records after that
      db.students.find({},{"FName":1,"_id":0} ).skip(2).limit(2)

//Sorting in asc order
    db.students.find({},{"FName":1,"_id":0} ).sort({"FName":1})
//Sorting in desc order
   db.students.find({},{"FName":1,"LName":1,"_id":0} ).sort({"FName":-1)

//Indexes in MongoDB
     db.students.ensureIndex({"Age":1})
     db.students.dropIndex({"Age":1})

//Aggregation operation (Sum/Min/Max/Average/First/Last)
//This is like count
    db.students.aggregate([{$group:{_id:"$Gender" , MyResult:{$sum:1}}}] )
//Getting Sum
   db.students.aggregate([{$group:{_id:"$Gender" , MyResult:{$sum:"$Age"}}}] )
//Getting Max
   db.students.aggregate([{$group:{_id:"$Gender" , MaxResult:{$max:"$Age"}}}] )
//Getting Min
db.students.aggregate([{$group:{_id:"$Gender" , MinResult:{$min:"$Age"}}}] )
//Getting Average

db.students.aggregate([{$group:{_id:"$Gender" , AvgResult:{$avg:"$Age"}}}] )


//Getting First
db.students.aggregate([{$group:{_id:"$Gender" , FirstResult:{$first:"$Age"}}}] )

//Import Command for importing data into Mongo DB tables
mongoimport  --db school --collection students --drop --file c:\filelocation\student.json
Note: run this mongoimort command from outside of momgo shell..run it from command prompt .
//Backup DB's
Open command prompt as administrator and go to
"C:\Program Files\MongoDB\Server\3.6\bin"  folder
run mongodump.exe if you want to backup all DBS
run mongodump.exe --db school  if you want to backup specific db

//Backup Database collection
run mongodump.exe --db school --collection students

//Restore DB's
//Open command prompt as administrator and go to "C:\Program Files\MongoDB\Server\3.6\bin" folder
run mongorestore.exe if you want to restore all DBs
run mongorestore.exe --db school dump/school if you want restore specific db

//Restore Database collection
run mongorestore.exe --db school --collection students dump/school/students.bson





MongoDB and GIS

MongoDB has a very useful feature called “geoNear.” There are other MongoDB spatial functions available to calculate the distance on a sphere (like the Earth), i.e. $nearSphere, $centerSphere, $near—but all of them have restrictions. The most important one is that they do not support sharding. The geoNear command in MongodDB, on the other hand, supports sharding.
Geo spatial queries-->$near ,$within_distance,Bound Queries(circle,Box)
//Important link about GEO data
http://tugdualgrall.blogspot.in/2014/08/introduction-to-mongodb-geospatial.html
https://www.percona.com/blog/2016/04/15/creating-geo-enabled-applications-with-mongodb-geojson-and-mysql/
https://dzone.com/articles/creating-geo-enabled-applications-with-mongodb-geo




//Reading from Mongo,Transformation using spark,writing to Mongo

https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html


http://repo1.maven.org/maven2/org/mongodb/mongo-hadoop/
=====================================
import org.apache.hadoop.conf.Configuration
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.rdd.RDD

import org.bson.BSONObject
import com.mongodb.hadoop.{
  MongoInputFormat, MongoOutputFormat,
  BSONFileInputFormat, BSONFileOutputFormat}
import com.mongodb.hadoop.io.MongoUpdateWritable

object SparkExample extends App {
  // Set up the configuration for reading from MongoDB.
  val mongoConfig = new Configuration()
  // MongoInputFormat allows us to read from a live MongoDB instance.
  // We could also use BSONFileInputFormat to read BSON snapshots.
  // MongoDB connection string naming a collection to read.
  // If using BSON, use "mapred.input.dir" to configure the directory
  // where the BSON files are located instead.
  mongoConfig.set("mongo.input.uri",
    "mongodb://localhost:27017/db.collection")

  val sparkConf = new SparkConf()
  val sc = new SparkContext("local", "SparkExample", sparkConf)

  // Create an RDD backed by the MongoDB collection.
  val documents = sc.newAPIHadoopRDD(
    mongoConfig,                // Configuration
    classOf[MongoInputFormat],  // InputFormat
    classOf[Object],            // Key type
    classOf[BSONObject])        // Value type

  // Create a separate Configuration for saving data back to MongoDB.
  val outputConfig = new Configuration()
  outputConfig.set("mongo.output.uri",
    "mongodb://localhost:27017/output.collection")

  // Save this RDD as a Hadoop "file".
  // The path argument is unused; all documents will go to "mongo.output.uri".
  documents.saveAsNewAPIHadoopFile(
    "file:///this-is-completely-unused",
    classOf[Object],
    classOf[BSONObject],
    classOf[MongoOutputFormat[Object, BSONObject]],
    outputConfig)

  // We can also save this back to a BSON file.
  val bsonOutputConfig = new Configuration()
  documents.saveAsNewAPIHadoopFile(
    "hdfs://localhost:8020/user/spark/bson-demo",
    classOf[Object],
    classOf[BSONObject],
    classOf[BSONFileOutputFormat[Object, BSONObject]])

  // We can choose to update documents in an existing collection by using the
  // MongoUpdateWritable class instead of BSONObject. First, we have to create
  // the update operations we want to perform by mapping them across our current
  // RDD.
  updates = documents.mapValues(
    value => new MongoUpdateWritable(
      new BasicDBObject("_id", value.get("_id")),  // Query
      new BasicDBObject("$set", new BasicDBObject("foo", "bar")),  // Update operation
      false,  // Upsert
      false   // Update multiple documents
    )
  )

  // Now we call saveAsNewAPIHadoopFile, using MongoUpdateWritable as the
  // value class.
  updates.saveAsNewAPIHadoopFile(
    "file:///this-is-completely-unused",
    classOf[Object],
    classOf[MongoUpdateWritable],
    classOf[MongoOutputFormat[Object, MongoUpdateWritable]],
    outputConfig)
}

Comments

Post a Comment

Popular posts from this blog

Exploring BigData Analytics Using SPARK in BigData World