amazon ec2 - Recommendations for Hadoop on EC2? -
when running hadoop in ec2, seem have 2 options:
- a: manage cluster myself, using ec2-specific shell scripts come hadoop.
- b: use elastic mapreduce, , pay little convenience.
i'm leaning towards b, i'd appreciate advice people more experience. here questions:
- are there tasks can done 1 of these methods not other?
- are there other options besides these 2 i'm overlooking?
- if choose b, how easy go a? is, what's danger of vendor lock-in?
i have been told people close amazon elastic mapreduce (emr) development team there @ least 2 other advantages using emr: a) amazon actively applying bug fixes , performance enhancements hadoop code base used on emr, , b) amazon employs high performance network between emr servers , s3 servers may not available between ec2 servers , s3 servers.
update: see @mat's comments refute rumored advantages of using emr.
Comments
Post a Comment