Tuesday, January 13, 2015

Bigdata




Bigdata is flexible reliable affordable web-scale computing. Big data is an all-encompassing term for any collection  so large or complex that it becomes difficult to process them using traditional data processing applications.

The challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations. larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations


to be found to "spot business trends, prevent diseases, combat crime 

Toots typically used in Big data scenarios: 
NoSQL DatabasesMongoDB, CouchDB, Cassandra, Redis, BigTable, Hbase, Hypertable, Voldemort, Riak, ZooKeeper
        
MapReduce: Hadoop, Hive, Pig, Cascading, Cascalog, mrjob, Caffeine, S4, MapR, Acunu, Flume, Kafka, Azkaban, Oozie, Greenplum

Storage: S3, Hadoop Distributed File.

Syste: EC2, Google App Engine, Elastic, Beanstalk, Heroku.

Processing: Yahoo! Pipes, Mechanical Turk, Solr/Lucene, ElasticSearch, Datameer, BigSheets, Tinkerpop


The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models 


No comments:

Post a Comment