本篇记录在使用shark的过程中遇到过的一些坑,然后我们就开始填坑。。。
坑1
SQL:
FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE jointest SELECT t1.bar, t1.foo, t2.bar, t2.foo;
错误描述:
Client报错如下,会一直hang住
cluster.TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:205)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:216)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:197)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:102)
at shark.execution.JoinOperator$$anonfun$generateTuples$1.apply(JoinOperator.scala:169)
at shark.execution.JoinOperator$$anonfun$generateTuples$1.apply(JoinOperator.scala:154)
at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
at scala.collection.Iterator$$anon$21.next(Iterator.scala:441)
at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
at shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:73)
at shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:158)
at shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:162)
at shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:162)
at spark.scheduler.ResultTask.run(ResultTask.scala:77)
at spark.executor.Executor$TaskRunner.run(Executor.scala:98)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Worker节点上报错:
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:205)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:216)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:197)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:102)
at shark.execution.JoinOperator$$anonfun$generateTuples$1.apply(JoinOperator.scala:169)
at shark.execution.JoinOperator$$anonfun$generateTuples$1.apply(JoinOperator.scala:154)
at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
at scala.collection.Iterator$$anon$21.next(Iterator.scala:441)
at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
at shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:73)
at shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:158)
at shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:162)
at shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:162)
at spark.scheduler.ResultTask.run(ResultTask.scala:77)
at spark.executor.Executor$TaskRunner.run(Executor.scala:98)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
* 解决方案:切换成amplab的hive版本后就没有这个问题,原因是我们的hive版本是基于官方0.9.0patch而来,而官方的hive在多线程下会有concurrency的问题,amplab的hive版本修复了这些bug。shark user group上有人讨论这个[问题](https://groups.google.com/forum/#!searchin/shark-users/ArrayIndexOutOfBoundsException/shark-users/OfaZZcYhv2Q/W7HizKk4-54J)
我们后边引入amplab的这些bugfix
坑2
SQL:
select count(distinct b.deviceid, a.curr_version) from bi.dpmid_mb_deviceid a join bi.dpdm_device_user b on (a.device_id = b.deviceid) join bi.dpdm_user_tg_category c on (b.userid = c.userid) join bi.dpdm_device_permanent_city_plus d on (a.device_id = d.deviceid and d.cityid = 3 and d.last_day = '2013-05-15') where c.cat_name = '面包甜点' and --a.curr_version >= '5.5.6' and c.cityid = 3 and a.train_id = 10;
错误描述:
client报OOM,并一直hang住
13/07/24 16:21:44 INFO cluster.TaskSetManager: Loss was due to java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.mutable.ResizableArray$class.$init$(ResizableArray.scala:33)
at scala.collection.mutable.ArrayBuffer.<init>(ArrayBuffer.scala:47)
at scala.collection.mutable.ArrayBuffer.<init>(ArrayBuffer.scala:61)
at spark.CoGroupedRDD$$anonfun$getSeq$1$1.apply(CoGroupedRDD.scala:106)
at spark.CoGroupedRDD$$anonfun$getSeq$1$1.apply(CoGroupedRDD.scala:106)
at scala.Array$.fill(Array.scala:239)
at spark.CoGroupedRDD.getSeq$1(CoGroupedRDD.scala:106)
at spark.CoGroupedRDD$$anonfun$compute$2.mergePair$1(CoGroupedRDD.scala:118)
at spark.CoGroupedRDD$$anonfun$compute$2$$anonfun$apply$5.apply(CoGroupedRDD.scala:120)
at spark.CoGroupedRDD$$anonfun$compute$2$$anonfun$apply$5.apply(CoGroupedRDD.scala:120)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at spark.util.CompletionIterator.foreach(CompletionIterator.scala:6)
at spark.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:120)
at spark.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:111)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:76)
at spark.CoGroupedRDD.compute(CoGroupedRDD.scala:111)
at spark.RDD.computeOrReadCheckpoint(RDD.scala:207)
at spark.RDD.iterator(RDD.scala:196)
at spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:19)
at spark.RDD.computeOrReadCheckpoint(RDD.scala:207)
at spark.RDD.iterator(RDD.scala:196)
at spark.rdd.MapPartitionsWithIndexRDD.compute(MapPartitionsWithIndexRDD.scala:23)
at spark.RDD.computeOrReadCheckpoint(RDD.scala:207)
at spark.RDD.iterator(RDD.scala:196)
at spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:127)
at spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:75)
at spark.executor.Executor$TaskRunner.run(Executor.scala:98)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
13/07/24 16:22:43 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, cosmos133.hadoop, 44818) with no recent heart beats
解决方案:
set mapred.reduce.tasks=100;
shark user group上有人讨论这个问题