Bigdata is flexible reliable affordable
web-scale computing. Big data is an all-encompassing term for any collection so large or complex that it becomes difficult to process them using traditional data processing applications.
The challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations. larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations
to be found to "spot business
trends, prevent diseases, combat crime
Toots typically
used in Big data scenarios:
NoSQL DatabasesMongoDB, CouchDB, Cassandra, Redis, BigTable,
Hbase, Hypertable, Voldemort, Riak, ZooKeeper
MapReduce:
Hadoop, Hive, Pig, Cascading, Cascalog, mrjob, Caffeine, S4, MapR, Acunu,
Flume, Kafka, Azkaban, Oozie, Greenplum
Storage: S3, Hadoop Distributed File.
Syste: EC2, Google App Engine, Elastic,
Beanstalk, Heroku.
Processing:
Yahoo! Pipes, Mechanical Turk, Solr/Lucene, ElasticSearch, Datameer, BigSheets,
Tinkerpop
No comments:
Post a Comment