NoSQL - MongoDB Database
BigData Project List
MongoDB is an open source product, developed and supported by a company named 10gen and leading NoSQL database.In this Blog,we are going to explain all topics of MongoDB database such as insert documents, update documents, delete documents, query documents, projection, sort() and limit() methods, create collection, drop collection etc.
"what was the need of MongoDB although there were many databases in action?"
All the modern applications require big data, fast features development, flexible deployment and the older database systems not enough competent, so the MongoDB was obviously needed.
Main purpose to build MongoDB:
- Scalability
- Performance
- High Availability
- Scaling from single server deployments to large, complex multi-site architectures.
MongoDB works on concept of collection and document.
Database:
Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.
Collection:
Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.
- Stores natural aggregates just as they are normally accedssed.
- Typically use JSON,but could use XML to represent data.
- Each 'document' is referenced by by key.
Document:
A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.
In short, Collection in MongoDb is equivalent to tables in RDBMS and Doucments in MongoDB is equivalent to Rows in RDBMS.
Here is sample of a Document(Row) in MongoDb
{
_id: ObjectId(7df78ad8902c)
"StudentNo":1,
"FName":"Rajiv",
"LName":"Gupta",
"Gender:"M",
"Age":34
}
_id is a 12 bytes hexadecimal number which assures the uniqueness of every document.
Any relational database has a typical schema design that shows number of tables and the relationship between these tables. While in MongoDB, there is no concept of relationship.
Storage:
MongoDB has its own File Storage system called "GridFS" which is very efficient binary storage of large capacity(Images/Videos/text/numbers and many other ypes).
Advantages of MongoDB over RDBMS:
Storage:
MongoDB has its own File Storage system called "GridFS" which is very efficient binary storage of large capacity(Images/Videos/text/numbers and many other ypes).
Advantages of MongoDB over RDBMS:
- Schema less − MongoDB is a document database in which one collection holds different documents. Number of fields, content and size of the document can differ from one document to another.
- Structure of a single object is clear.
- No complex joins.
- Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
- Ease of scale-out − MongoDB is easy to scale.
- Conversion/mapping of application objects to database objects not needed.
- Uses internal memory for storing the (windowed) working set, enabling faster access of data.
Where to Use MongoDB?
- Big Data
- Semi-Structured Content Management.
- Real-time Analytics & High Speed Logging.
- Need Caching & High Scalability
- Mobile and Social Infrastructure
- User Data Management
- Data Hub
MongoDB is not great for Highly Transactional Applications.
SQl vs MongoDB Terminology:
SQL
|
MONGODB
|
Database
|
Database
|
Table
|
Collection
|
Row
|
Document
|
Column
|
Column/Field
|
Index
|
Index
|
Joins
|
Embedded documents ,$lookup
|
Primary key(Single/Multiple column)
|
_id field
|
Aggregation(Group By)
|
Aggregation Pipeline
|
Lets deep dive into MongoDB and understand how to use MongoDB in details..
Lets explore the MongoDB commands here..
//Function used inside query for filtering
$gt (greater than) ,$lt(less than) ,$gte(greater than equal) ,$lte(less than equal) ,$ne(not equal) ,$all ,$in, $nin(not in) ,count, limit,skip,group, etc..
// Creating/using DB
use school
//Creating collection(table) under 'School' DB
db.createCollection("students")
//Or there is another way, we can create collection if not exist or will insert into existing collection.
db.students.insertOne({
"studentNo":1,
"FName":"Rajiv2",
"LName":"Gupta",
"Gender:"M",
"Age":30,
"Marks":{"Math":80 ,"Science":90,"Physics":78}
"Marks":{"Math":80 ,"Science":90,"Physics":78}
})
//inserting multiple collections
db.students.insert([
{
"studentNo":2,
"FName":"Rajee",
"LName":"Gupta",
"Gender:"M",
"Age":30,
"Marks":[{"Math":80 ,"Science":90,"Physics":78}]
"Marks":[{"Math":80 ,"Science":90,"Physics":78}]
},
{
"studentNo":3,
"FName":"Mahesh",
"LName":"Gupta",
"Gender:"M",
"Age":40,
"Marks":[{"Math":70 ,"Science":90,"Physics":78}]
"Marks":[{"Math":70 ,"Science":90,"Physics":78}]
},
{
"studentNo":4,
"FName":"Raj",
"LName":"Gupta",
"Gender:"M",
"Age":60
"Marks":{"Math":60 ,"Science":90,"Physics":78}
"Marks":{"Math":60 ,"Science":90,"Physics":78}
},
{
"studentNo":5,
"FName":"Rajat",
"LName":"Gupta",
"Gender:"M",
"Age":30,
"Marks":{"Math":90 ,"Science":90,"Physics":78}
"Marks":{"Math":90 ,"Science":90,"Physics":78}
},
]
)
//See list of collection in db
show collections
//Droping collection
db.students.drop()
//droping database
db.dropDatabase()
//Quering from Database collections
db.students.find()
//Showing results in Pretty form
db.students.find().pretty()
//Get only 1st Record
db.students.findOne()
//find specific records
db.students.find({"Age":36}).pretty()
db.students.find({"Age":{$gt:35}}).pretty()
db.students.find({"Age":{$gte:35}}).pretty()
db.students.find({"Age":{$lt:35}}).pretty()
db.students.find({"Age":{$lte:35}}).pretty()
// Find records with And operator
db.students.find({"Age":66 ,"FName":"Rajiv6"}).pretty()
// Find records with Or operator
db.students.find(
{
$or:[{"Age":66},{"Age":36}]
}
).pretty()
//Find Records with And ,Or operator together
db.students.find(
{
"FName":"Rajiv6" , $or:[{"Age":66},{"Age":36}]
}
).pretty()
//using in operator
db.students.find(
{
"Age":{$in:[10,20,36]}
}
).pretty()
//Using Regexp to get records ,FName starting with letter 'R'
//Using Regexp to get records ,FName starting with letter 'R'
db.students.find(
{
"FName":/^R/}
}
).pretty()
// Find record using nested document filter
// Find record using nested document filter
db.students.find(
{
"Marks":{Math:80}
}
).pretty()
// Only update first available record
db.students.update(
{"Age":26},
{ $set:{"LName":"Mittal"} }
)
// Update multi record
db.students.update(
{"Age":26},
{ $set:{"LName":"Agrawal"} },
{multi:true}
)
//Save method will create new record if _id does not exist else it will update the record
db.students.save(
{
"_id" : ObjectId("5a6eb114b1a9d60b8250b4d1"),
"studentNo" : 4,
"FName" : "Rajiv4",
"LName" : "Goel",
"Age" : 46
}
)
//delete document
// This will remove all document from collection
db.students.remove()
// Removing specific documents(multi records) from collection
db.students.remove(
{"Age":66}
)
//removing only 1 document from collection
db.students.remove(
{"Age":36},
1
)
//Projection command--> selecting only specific fields from collection
db.students.find({},{"FName":1,"_id":0} )
// Limit the result
db.students.find({},{"FName":1,"_id":0} ).limit(3)
//Skip few records from results
db.students.find({},{"FName":1,"_id":0} ).skip(2)
//Skip first 2 and show only 2 records after that
db.students.find({},{"FName":1,"_id":0} ).skip(2).limit(2)
//Sorting in asc order
db.students.find({},{"FName":1,"_id":0} ).sort({"FName":1})
//Sorting in desc order
db.students.find({},{"FName":1,"LName":1,"_id":0} ).sort({"FName":-1)
//Indexes in MongoDB
db.students.ensureIndex({"Age":1})
db.students.dropIndex({"Age":1})
//Aggregation operation (Sum/Min/Max/Average/First/Last)
//This is like count
db.students.aggregate([{$group:{_id:"$Gender" , MyResult:{$sum:1}}}] )
//Getting Sum
db.students.aggregate([{$group:{_id:"$Gender" , MyResult:{$sum:"$Age"}}}] )
db.students.aggregate([{$group:{_id:"$Gender" , MyResult:{$sum:"$Age"}}}] )
//Getting Max
db.students.aggregate([{$group:{_id:"$Gender" , MaxResult:{$max:"$Age"}}}] )
//Getting Min
db.students.aggregate([{$group:{_id:"$Gender" , MinResult:{$min:"$Age"}}}] )
//Getting Average
db.students.aggregate([{$group:{_id:"$Gender" , AvgResult:{$avg:"$Age"}}}] )
//Getting First
db.students.aggregate([{$group:{_id:"$Gender" , FirstResult:{$first:"$Age"}}}] )
//Import Command for importing data into Mongo DB tables
mongoimport --db school --collection students --drop --file c:\filelocation\student.json
mongoimport --db school --collection students --drop --file c:\filelocation\student.json
Note: run this mongoimort command from outside of momgo shell..run it from command prompt .
//Backup DB's
Open command prompt as administrator and go to
"C:\Program Files\MongoDB\Server\3.6\bin" folder
"C:\Program Files\MongoDB\Server\3.6\bin" folder
run mongodump.exe if you want to backup all DBS
run mongodump.exe --db school if you want to backup specific db
//Backup Database collection
run mongodump.exe --db school --collection students
//Restore DB's
//Open command prompt as administrator and go to "C:\Program Files\MongoDB\Server\3.6\bin" folder
run mongorestore.exe if you want to restore all DBs
run mongorestore.exe --db school dump/school if you want restore specific db
//Restore Database collection
run mongorestore.exe --db school --collection students dump/school/students.bson
MongoDB and GIS
MongoDB has a very useful feature called “geoNear.” There are other MongoDB spatial functions available to calculate the distance on a sphere (like the Earth), i.e. $nearSphere, $centerSphere, $near—but all of them have restrictions. The most important one is that they do not support sharding. The geoNear command in MongodDB, on the other hand, supports sharding.
Geo spatial queries-->$near ,$within_distance,Bound Queries(circle,Box)
MongoDB has a very useful feature called “geoNear.” There are other MongoDB spatial functions available to calculate the distance on a sphere (like the Earth), i.e. $nearSphere, $centerSphere, $near—but all of them have restrictions. The most important one is that they do not support sharding. The geoNear command in MongodDB, on the other hand, supports sharding.
Geo spatial queries-->$near ,$within_distance,Bound Queries(circle,Box)
//Important link about GEO data
http://tugdualgrall.blogspot.in/2014/08/introduction-to-mongodb-geospatial.html
https://www.percona.com/blog/2016/04/15/creating-geo-enabled-applications-with-mongodb-geojson-and-mysql/
http://tugdualgrall.blogspot.in/2014/08/introduction-to-mongodb-geospatial.html
https://www.percona.com/blog/2016/04/15/creating-geo-enabled-applications-with-mongodb-geojson-and-mysql/
https://dzone.com/articles/creating-geo-enabled-applications-with-mongodb-geo
//Reading from Mongo,Transformation using spark,writing to Mongo
https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html
http://repo1.maven.org/maven2/org/mongodb/mongo-hadoop/
=====================================
import org.apache.hadoop.conf.Configuration
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.rdd.RDD
import org.bson.BSONObject
import com.mongodb.hadoop.{
MongoInputFormat, MongoOutputFormat,
BSONFileInputFormat, BSONFileOutputFormat}
import com.mongodb.hadoop.io.MongoUpdateWritable
object SparkExample extends App {
// Set up the configuration for reading from MongoDB.
val mongoConfig = new Configuration()
// MongoInputFormat allows us to read from a live MongoDB instance.
// We could also use BSONFileInputFormat to read BSON snapshots.
// MongoDB connection string naming a collection to read.
// If using BSON, use "mapred.input.dir" to configure the directory
// where the BSON files are located instead.
mongoConfig.set("mongo.input.uri",
"mongodb://localhost:27017/db.collection")
val sparkConf = new SparkConf()
val sc = new SparkContext("local", "SparkExample", sparkConf)
// Create an RDD backed by the MongoDB collection.
val documents = sc.newAPIHadoopRDD(
mongoConfig, // Configuration
classOf[MongoInputFormat], // InputFormat
classOf[Object], // Key type
classOf[BSONObject]) // Value type
// Create a separate Configuration for saving data back to MongoDB.
val outputConfig = new Configuration()
outputConfig.set("mongo.output.uri",
"mongodb://localhost:27017/output.collection")
// Save this RDD as a Hadoop "file".
// The path argument is unused; all documents will go to "mongo.output.uri".
documents.saveAsNewAPIHadoopFile(
"file:///this-is-completely-unused",
classOf[Object],
classOf[BSONObject],
classOf[MongoOutputFormat[Object, BSONObject]],
outputConfig)
// We can also save this back to a BSON file.
val bsonOutputConfig = new Configuration()
documents.saveAsNewAPIHadoopFile(
"hdfs://localhost:8020/user/spark/bson-demo",
classOf[Object],
classOf[BSONObject],
classOf[BSONFileOutputFormat[Object, BSONObject]])
// We can choose to update documents in an existing collection by using the
// MongoUpdateWritable class instead of BSONObject. First, we have to create
// the update operations we want to perform by mapping them across our current
// RDD.
updates = documents.mapValues(
value => new MongoUpdateWritable(
new BasicDBObject("_id", value.get("_id")), // Query
new BasicDBObject("$set", new BasicDBObject("foo", "bar")), // Update operation
false, // Upsert
false // Update multiple documents
)
)
// Now we call saveAsNewAPIHadoopFile, using MongoUpdateWritable as the
// value class.
updates.saveAsNewAPIHadoopFile(
"file:///this-is-completely-unused",
classOf[Object],
classOf[MongoUpdateWritable],
classOf[MongoOutputFormat[Object, MongoUpdateWritable]],
outputConfig)
}
Nice Blog With Full of Knowledge Thanks For Sharing..
ReplyDeleteMean stack online training
Mean stack training in hyderabad