mapreduce - How to use Cassandra's Map Reduce with or w/o Pig? -
can explain how mapreduce works cassandra .6? i've read through word count example, don't quite follow what's happening on cassandra end vs. "client" end.
https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/
for instance, let's i'm using python , pycassa, how load in new map reduce function, , call it? map reduce function have java that's installed on cassandra server? if so, how call pycassa?
there's mention of pig making easier, i'm complete hadoop noob, didn't help.
your answer can use thrift or whatever, mentioned pycassa denote client side. i'm trying understand difference between runs in cassandra cluster vs. actual server making requests.
from i've heard (and here), way developer writes mapreduce program uses cassandra data source follows. write regular mapreduce program (the example linked pure-java version) , jars available provide custominputformat allows input source cassandra (instead of default, hadoop).
if you're using pycassa i'd you're out of luck until either (1) maintainer of project adds support mapreduce or (2) throw python functions write java mapreduce program , run it. latter bit of hack , going.
Comments
Post a Comment