mardi 19 août 2014

Executing Custom UDF in Hive


Vote count:

0




I have the following table in my Hive



>describe weblogs;
OK
originatingip string
clientidentity string
userid string
time string
requesttype string
requestpage string
httpprotocolversion string
responsecode int
responsesize int
referrer string
useragent string
Time taken: 1.065 seconds, Fetched: 11 row(s)


I have created a UDF in Java to map Ip addresses with Geo locations. Following is my UDF



package com.prithvi.hive.logprocessing.udf.ipgeo;
public class IpgeoHive extends UDF {

Text result = new Text();
String ipCountry, ipCity;

public Text evaluate(Text input) throws IOException {
if(input==null)return null;
URL database_path = getClass().getResource("/GeoLiteCity.dat");
File file;
try {
file = new File(database_path.toURI());
} catch(URISyntaxException e) {
file = new File(database_path.getPath());
}
LookupService cl = new LookupService(file);
Location location = cl.getLocation(input.toString());
if (location != null) {
ipCountry = location.countryName;
ipCity = location.city;
} else {
ipCountry = "Unknown";
ipCity = "Unknown";
}
result.set(ipCountry+"/"+ipCity);
return result;
}
}


The above udf return results as expected when tested by passing dummy values in eclipse


After building the jar file, I run it in my sandbox using the following commands



ADD JAR MapReduce_Examples-0.0.1-SNAPSHOT-jar-with-dependencies.jar;

CREATE TEMPORARY FUNCTION IP2GEO AS 'com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive';

SELECT originatingip, IP2GEO(originatingip) from weblogs limit 10;


But the job fails with the following errors, I have no clue on how to solve this issue. Any help is greatly appreciated.



Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row{"originatingip":"25.198.250.35","clientidentity":"-","userid":"-","time":"[2014-07-19T16:05:33Z]","requesttype":"\"GET","requestpage":"/","httpprotocolversion":"HTTP/1.1\"","responsecode":404,"responsesize":1081,"referrer":"\"-\"","useragent":"\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\""}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"originatingip":"25.198.250.35","clientidentity":"-","userid":"-","time":"[2014-07-19T16:05:33Z]","requesttype":"\"GET","requestpage":"/","httpprotocolversion":"HTTP/1.1\"","responsecode":404,"responsesize":1081,"referrer":"\"-\"","useragent":"\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\""}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive.evaluate(org.apache.hadoop.io.Text) throws java.io.IOException on object com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive@63c0b9c3 of class com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive with arguments {25.198.250.35:org.apache.hadoop.io.Text} of size 1
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1241)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1217)
... 18 more
Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.<init>(File.java:418)
at com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive.evaluate(IpgeoHive.java:28)
... 23 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec*


asked 57 secs ago







Executing Custom UDF in Hive

Aucun commentaire:

Enregistrer un commentaire