Hive3.1.2自带的系统函数及UDF的随系统自动注册

Hive3.1.2自带的系统函数及UDF的随系统自动注册

前言

之前写过一篇稿子介绍了如何使用UDF函数:https://lizhiyong.blog.csdn.net/article/details/126186377

其中比较重要的一个类就是 GenericUDF。通过继承该类并自行实现具体算法、打Jar包、加载Jar包到Hive、注册到Hive及在HQL中使用函数,大体上介绍了使用流程。用户自己写的函数是通过这么一些列骚操作实现的,那么Hive自带的函数是如何就可以不用注册,直接给租户们使用?

搞明白这一点,就可以将最常用的UDF自动注册到Hive,避免经常需要加载Jar包及注册函数的繁琐操作。尤其是自行注册的UDF函数貌似默认是只对当前库生效,跨库使用时还需要使用 库名.UDF函数名来调用UDF函数,并不是非常方便。

寻找Hive自带函数

直接在idea中按2次shift即可搜索Java类。笔者以RPAD函数为例。

package org.apache.hadoop.hive.ql.udf.generic;

import org.apache.hadoop.hive.ql.exec.Description;

@Description(name = "rpad", value = "_FUNC_(str, len, pad) - " +
    "Returns str, right-padded with pad to a length of len",
    extended = "If str is longer than len, the return value is shortened to "
    + "len characters.\n"
    + "In case of empty pad string, the return value is null.\n"
    + "Example:\n"
    + "  > SELECT _FUNC_('hi', 5, '??') FROM src LIMIT 1;\n"
    + "  'hi???'\n"
    + "  > SELECT _FUNC_('hi', 1, '??') FROM src LIMIT 1;\n"
    + "  'h'\n"
    + "  > SELECT _FUNC_('hi', 5, '') FROM src LIMIT 1;\n"
    + "  null")
public class GenericUDFRpad extends GenericUDFBasePad {
  public GenericUDFRpad() {
    super("rpad");
  }

  @Override
  protected void performOp(
      StringBuilder builder, int len, String str, String pad) {
    int pos = str.length();

    builder.append(str, 0, pos);

    while (pos < len) {
      builder.append(pad);
      pos += pad.length();
    }
    builder.setLength(len);
  }
}

可以找到这个类。它继承了GenericUDFBasePad类,从Java源码可以粗略看出这货是要在字符串右侧追加字符。

其父类:

package org.apache.hadoop.hive.ql.udf.generic;

public abstract class GenericUDFBasePad extends GenericUDF {
  private transient Converter converter1;
  private transient Converter converter2;
  private transient Converter converter3;
  private Text result = new Text();
  private String udfName;
  private StringBuilder builder;

  public GenericUDFBasePad(String _udfName) {
    this.udfName = _udfName;
    this.builder = new StringBuilder();
  }

  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
    if (arguments.length != 3) {
      throw new UDFArgumentException(udfName + " requires three arguments. Found :"
        + arguments.length);
    }
    converter1 = checkTextArguments(arguments, 0);
    converter2 = checkIntArguments(arguments, 1);
    converter3 = checkTextArguments(arguments, 2);
    return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
  }

  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    Object valObject1 = arguments[0].get();
    Object valObject2 = arguments[1].get();
    Object valObject3 = arguments[2].get();
    if (valObject1 == null || valObject2 == null || valObject3 == null) {
      return null;
    }
    Text str = (Text) converter1.convert(valObject1);
    IntWritable lenW = (IntWritable) converter2.convert(valObject2);
    Text pad = (Text) converter3.convert(valObject3);
    if (str == null || pad == null || lenW == null || pad.toString().isEmpty()) {
      return null;
    }
    int len = lenW.get();
    builder.setLength(0);

    performOp(builder, len, str.toString(), pad.toString());
    result.set(builder.toString());
    return result;
  }

  @Override
  public String getDisplayString(String[] children) {
    return getStandardDisplayString(udfName, children);
  }

  protected abstract void performOp(
      StringBuilder builder, int len, String str, String pad);

  private Converter checkTextArguments(ObjectInspector[] arguments, int i)
    throws UDFArgumentException {
    if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) {
      throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
      + arguments[i].getTypeName() + " is passed.");
    }

    Converter converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
          PrimitiveObjectInspectorFactory.writableStringObjectInspector);

    return converter;
  }

  private Converter checkIntArguments(ObjectInspector[] arguments, int i)
    throws UDFArgumentException {
    if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) {
      throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
      + arguments[i].getTypeName() + " is passed.");
    }
    PrimitiveCategory inputType = ((PrimitiveObjectInspector) arguments[i]).getPrimitiveCategory();
    Converter converter;
    switch (inputType) {
    case INT:
    case SHORT:
    case BYTE:
      converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
      PrimitiveObjectInspectorFactory.writableIntObjectInspector);
      break;
    default:
      throw new UDFArgumentTypeException(i + 1, udfName
      + " only takes INT/SHORT/BYTE types as " + (i + 1) + "-ths argument, got "
      + inputType);
    }
    return converter;
  }
}

也是和普通的UDF一样,继承了GenericUDF类。该类此处不再赘述。

当然顺藤摸瓜,可以发现Hive自带的函数集中存放于 org.apache.hadoop.hive.ql.udf.generic这个包下:

Hive3.1.2自带的系统函数及UDF的随系统自动注册

根据Java类的名称,就可以看出它们为哪种函数提供了算法:

Hive3.1.2自带的系统函数及UDF的随系统自动注册

例如这个Trim函数:

package org.apache.hadoop.hive.ql.udf.generic;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.vector.VectorizedExpressions;
import org.apache.hadoop.hive.ql.exec.vector.expressions.StringTrim;

@Description(name = "trim",
    value = "_FUNC_(str) - Removes the leading and trailing space characters from str ",
    extended = "Example:\n"
    + "  > SELECT _FUNC_('   facebook  ') FROM src LIMIT 1;\n" + "  'facebook'")
@VectorizedExpressions({ StringTrim.class })
public class GenericUDFTrim extends GenericUDFBaseTrim {
  public GenericUDFTrim() {
    super("trim");
  }

  @Override
  protected String performOp(String val) {
    return StringUtils.strip(val, " ");
  }

}

不必多言,就是那个去空格的trim函数。显然,Hive自带的函数和用户自定义的UDF并没有什么太大的差别,底层都是继承了相同的类。只不过开源社区事先把Hive常用的功能函数写好了。

至此找到了Hive的自带函数存放的包名。

寻找Hive自动注册函数的方法

同样是以RPAD函数为例。在idea使用alt+F7可以找到调用关系:

Hive3.1.2自带的系统函数及UDF的随系统自动注册

显然这个GenericUDFRpad类会被registerGenericUDF方法调用。根据函数名称,可以推测出注册函数的功能与之一定有千丝万缕的联系。

跳入该类:

package org.apache.hadoop.hive.ql.exec;

public final class FunctionRegistry {

  private static final Logger LOG = LoggerFactory.getLogger(FunctionRegistry.class);

  public static final String LEAD_FUNC_NAME = "lead";
  public static final String LAG_FUNC_NAME = "lag";
  public static final String LAST_VALUE_FUNC_NAME = "last_value";

  public static final String UNARY_PLUS_FUNC_NAME = "positive";
  public static final String UNARY_MINUS_FUNC_NAME = "negative";

  public static final String WINDOWING_TABLE_FUNCTION = "windowingtablefunction";
  private static final String NOOP_TABLE_FUNCTION = "noop";
  private static final String NOOP_MAP_TABLE_FUNCTION = "noopwithmap";
  private static final String NOOP_STREAMING_TABLE_FUNCTION = "noopstreaming";
  private static final String NOOP_STREAMING_MAP_TABLE_FUNCTION = "noopwithmapstreaming";
  private static final String MATCH_PATH_TABLE_FUNCTION = "matchpath";

  public static final Set<String> HIVE_OPERATORS = new HashSet<String>();

  static {
    HIVE_OPERATORS.addAll(Arrays.asList(
        "+", "-", "*", "/", "%", "div", "&", "|", "^", "~",
        "and", "or", "not", "!",
        "=", "==", "", "!=", "<>", ", ", ">", ">=",
        "index"));
  }

  private static final Registry system = new Registry(true);

  static {
    system.registerGenericUDF("concat", GenericUDFConcat.class);
    system.registerUDF("substr", UDFSubstr.class, false);
    system.registerUDF("substring", UDFSubstr.class, false);
    system.registerGenericUDF("substring_index", GenericUDFSubstringIndex.class);
    system.registerUDF("space", UDFSpace.class, false);
    system.registerUDF("repeat", UDFRepeat.class, false);
    system.registerUDF("ascii", UDFAscii.class, false);
    system.registerGenericUDF("lpad", GenericUDFLpad.class);
    system.registerGenericUDF("rpad", GenericUDFRpad.class);
    system.registerGenericUDF("levenshtein", GenericUDFLevenshtein.class);
    system.registerGenericUDF("soundex", GenericUDFSoundex.class);

    system.registerGenericUDF("size", GenericUDFSize.class);

    system.registerGenericUDF("round", GenericUDFRound.class);
    system.registerGenericUDF("bround", GenericUDFBRound.class);
    system.registerGenericUDF("floor", GenericUDFFloor.class);
    system.registerUDF("sqrt", UDFSqrt.class, false);
    system.registerGenericUDF("cbrt", GenericUDFCbrt.class);
    system.registerGenericUDF("ceil", GenericUDFCeil.class);
    system.registerGenericUDF("ceiling", GenericUDFCeil.class);
    system.registerUDF("rand", UDFRand.class, false);
    system.registerGenericUDF("abs", GenericUDFAbs.class);
    system.registerGenericUDF("sq_count_check", GenericUDFSQCountCheck.class);
    system.registerGenericUDF("enforce_constraint", GenericUDFEnforceConstraint.class);
    system.registerGenericUDF("pmod", GenericUDFPosMod.class);

    system.registerUDF("ln", UDFLn.class, false);
    system.registerUDF("log2", UDFLog2.class, false);
    system.registerUDF("sin", UDFSin.class, false);
    system.registerUDF("asin", UDFAsin.class, false);
    system.registerUDF("cos", UDFCos.class, false);
    system.registerUDF("acos", UDFAcos.class, false);
    system.registerUDF("log10", UDFLog10.class, false);
    system.registerUDF("log", UDFLog.class, false);
    system.registerUDF("exp", UDFExp.class, false);
    system.registerGenericUDF("power", GenericUDFPower.class);
    system.registerGenericUDF("pow", GenericUDFPower.class);
    system.registerUDF("sign", UDFSign.class, false);
    system.registerUDF("pi", UDFPI.class, false);
    system.registerUDF("degrees", UDFDegrees.class, false);
    system.registerUDF("radians", UDFRadians.class, false);
    system.registerUDF("atan", UDFAtan.class, false);
    system.registerUDF("tan", UDFTan.class, false);
    system.registerUDF("e", UDFE.class, false);
    system.registerGenericUDF("factorial", GenericUDFFactorial.class);
    system.registerUDF("crc32", UDFCrc32.class, false);

    system.registerUDF("conv", UDFConv.class, false);
    system.registerUDF("bin", UDFBin.class, false);
    system.registerUDF("chr", UDFChr.class, false);
    system.registerUDF("hex", UDFHex.class, false);
    system.registerUDF("unhex", UDFUnhex.class, false);
    system.registerUDF("base64", UDFBase64.class, false);
    system.registerUDF("unbase64", UDFUnbase64.class, false);
    system.registerGenericUDF("sha2", GenericUDFSha2.class);
    system.registerUDF("md5", UDFMd5.class, false);
    system.registerUDF("sha1", UDFSha1.class, false);
    system.registerUDF("sha", UDFSha1.class, false);
    system.registerGenericUDF("aes_encrypt", GenericUDFAesEncrypt.class);
    system.registerGenericUDF("aes_decrypt", GenericUDFAesDecrypt.class);
    system.registerUDF("uuid", UDFUUID.class, false);

    system.registerGenericUDF("encode", GenericUDFEncode.class);
    system.registerGenericUDF("decode", GenericUDFDecode.class);

    system.registerGenericUDF("upper", GenericUDFUpper.class);
    system.registerGenericUDF("lower", GenericUDFLower.class);
    system.registerGenericUDF("ucase", GenericUDFUpper.class);
    system.registerGenericUDF("lcase", GenericUDFLower.class);
    system.registerGenericUDF("trim", GenericUDFTrim.class);
    system.registerGenericUDF("ltrim", GenericUDFLTrim.class);
    system.registerGenericUDF("rtrim", GenericUDFRTrim.class);
    system.registerGenericUDF("length", GenericUDFLength.class);
    system.registerGenericUDF("character_length", GenericUDFCharacterLength.class);
    system.registerGenericUDF("char_length", GenericUDFCharacterLength.class);
    system.registerGenericUDF("octet_length", GenericUDFOctetLength.class);
    system.registerUDF("reverse", UDFReverse.class, false);
    system.registerGenericUDF("field", GenericUDFField.class);
    system.registerUDF("find_in_set", UDFFindInSet.class, false);
    system.registerGenericUDF("initcap", GenericUDFInitCap.class);

    system.registerUDF("like", UDFLike.class, true);
    system.registerGenericUDF("likeany", GenericUDFLikeAny.class);
    system.registerGenericUDF("likeall", GenericUDFLikeAll.class);
    system.registerGenericUDF("rlike", GenericUDFRegExp.class);
    system.registerGenericUDF("regexp", GenericUDFRegExp.class);
    system.registerUDF("regexp_replace", UDFRegExpReplace.class, false);
    system.registerUDF("replace", UDFReplace.class, false);
    system.registerUDF("regexp_extract", UDFRegExpExtract.class, false);
    system.registerUDF("parse_url", UDFParseUrl.class, false);
    system.registerGenericUDF("nvl", GenericUDFNvl.class);
    system.registerGenericUDF("split", GenericUDFSplit.class);
    system.registerGenericUDF("str_to_map", GenericUDFStringToMap.class);
    system.registerGenericUDF("translate", GenericUDFTranslate.class);

    system.registerGenericUDF(UNARY_PLUS_FUNC_NAME, GenericUDFOPPositive.class);
    system.registerGenericUDF(UNARY_MINUS_FUNC_NAME, GenericUDFOPNegative.class);

    system.registerGenericUDF("day", UDFDayOfMonth.class);
    system.registerGenericUDF("dayofmonth", UDFDayOfMonth.class);
    system.registerUDF("dayofweek", UDFDayOfWeek.class, false);
    system.registerGenericUDF("month", UDFMonth.class);
    system.registerGenericUDF("quarter", GenericUDFQuarter.class);
    system.registerGenericUDF("year", UDFYear.class);
    system.registerGenericUDF("hour", UDFHour.class);
    system.registerGenericUDF("minute", UDFMinute.class);
    system.registerGenericUDF("second", UDFSecond.class);
    system.registerUDF("from_unixtime", UDFFromUnixTime.class, false);
    system.registerGenericUDF("to_date", GenericUDFDate.class);
    system.registerUDF("weekofyear", UDFWeekOfYear.class, false);
    system.registerGenericUDF("last_day", GenericUDFLastDay.class);
    system.registerGenericUDF("next_day", GenericUDFNextDay.class);
    system.registerGenericUDF("trunc", GenericUDFTrunc.class);
    system.registerGenericUDF("date_format", GenericUDFDateFormat.class);

    system.registerUDF("floor_year", UDFDateFloorYear.class, false);
    system.registerUDF("floor_quarter", UDFDateFloorQuarter.class, false);
    system.registerUDF("floor_month", UDFDateFloorMonth.class, false);
    system.registerUDF("floor_day", UDFDateFloorDay.class, false);
    system.registerUDF("floor_week", UDFDateFloorWeek.class, false);
    system.registerUDF("floor_hour", UDFDateFloorHour.class, false);
    system.registerUDF("floor_minute", UDFDateFloorMinute.class, false);
    system.registerUDF("floor_second", UDFDateFloorSecond.class, false);

    system.registerGenericUDF("date_add", GenericUDFDateAdd.class);
    system.registerGenericUDF("date_sub", GenericUDFDateSub.class);
    system.registerGenericUDF("datediff", GenericUDFDateDiff.class);
    system.registerGenericUDF("add_months", GenericUDFAddMonths.class);
    system.registerGenericUDF("months_between", GenericUDFMonthsBetween.class);

    system.registerUDF("get_json_object", UDFJson.class, false);

    system.registerUDF("xpath_string", UDFXPathString.class, false);
    system.registerUDF("xpath_boolean", UDFXPathBoolean.class, false);
    system.registerUDF("xpath_number", UDFXPathDouble.class, false);
    system.registerUDF("xpath_double", UDFXPathDouble.class, false);
    system.registerUDF("xpath_float", UDFXPathFloat.class, false);
    system.registerUDF("xpath_long", UDFXPathLong.class, false);
    system.registerUDF("xpath_int", UDFXPathInteger.class, false);
    system.registerUDF("xpath_short", UDFXPathShort.class, false);
    system.registerGenericUDF("xpath", GenericUDFXPath.class);

    system.registerGenericUDF("+", GenericUDFOPPlus.class);
    system.registerGenericUDF("-", GenericUDFOPMinus.class);
    system.registerGenericUDF("*", GenericUDFOPMultiply.class);
    system.registerGenericUDF("/", GenericUDFOPDivide.class);
    system.registerGenericUDF("%", GenericUDFOPMod.class);
    system.registerGenericUDF("mod", GenericUDFOPMod.class);
    system.registerUDF("div", UDFOPLongDivide.class, true);

    system.registerUDF("&", UDFOPBitAnd.class, true);
    system.registerUDF("|", UDFOPBitOr.class, true);
    system.registerUDF("^", UDFOPBitXor.class, true);
    system.registerUDF("~", UDFOPBitNot.class, true);
    system.registerUDF("shiftleft", UDFOPBitShiftLeft.class, true);
    system.registerUDF("shiftright", UDFOPBitShiftRight.class, true);
    system.registerUDF("shiftrightunsigned", UDFOPBitShiftRightUnsigned.class, true);

    system.registerGenericUDF("grouping", GenericUDFGrouping.class);

    system.registerGenericUDF("current_database", UDFCurrentDB.class);
    system.registerGenericUDF("current_date", GenericUDFCurrentDate.class);
    system.registerGenericUDF("current_timestamp", GenericUDFCurrentTimestamp.class);
    system.registerGenericUDF("current_user", GenericUDFCurrentUser.class);
    system.registerGenericUDF("current_groups", GenericUDFCurrentGroups.class);
    system.registerGenericUDF("logged_in_user", GenericUDFLoggedInUser.class);
    system.registerGenericUDF("restrict_information_schema", GenericUDFRestrictInformationSchema.class);
    system.registerGenericUDF("current_authorizer", GenericUDFCurrentAuthorizer.class);

    system.registerGenericUDF("isnull", GenericUDFOPNull.class);
    system.registerGenericUDF("isnotnull", GenericUDFOPNotNull.class);
    system.registerGenericUDF("istrue", GenericUDFOPTrue.class);
    system.registerGenericUDF("isnottrue", GenericUDFOPNotTrue.class);
    system.registerGenericUDF("isfalse", GenericUDFOPFalse.class);
    system.registerGenericUDF("isnotfalse", GenericUDFOPNotFalse.class);

    system.registerGenericUDF("if", GenericUDFIf.class);
    system.registerGenericUDF("in", GenericUDFIn.class);
    system.registerGenericUDF("and", GenericUDFOPAnd.class);
    system.registerGenericUDF("or", GenericUDFOPOr.class);
    system.registerGenericUDF("=", GenericUDFOPEqual.class);
    system.registerGenericUDF("==", GenericUDFOPEqual.class);
    system.registerGenericUDF("", GenericUDFOPEqualNS.class);
    system.registerGenericUDF("!=", GenericUDFOPNotEqual.class);
    system.registerGenericUDF("<>", GenericUDFOPNotEqual.class);
    system.registerGenericUDF(", GenericUDFOPLessThan.class);
    system.registerGenericUDF(", GenericUDFOPEqualOrLessThan.class);
    system.registerGenericUDF(">", GenericUDFOPGreaterThan.class);
    system.registerGenericUDF(">=", GenericUDFOPEqualOrGreaterThan.class);
    system.registerGenericUDF("not", GenericUDFOPNot.class);
    system.registerGenericUDF("!", GenericUDFOPNot.class);
    system.registerGenericUDF("between", GenericUDFBetween.class);
    system.registerGenericUDF("in_bloom_filter", GenericUDFInBloomFilter.class);

    system.registerUDF("version", UDFVersion.class, false);

    system.registerUDF(serdeConstants.BOOLEAN_TYPE_NAME, UDFToBoolean.class, false, UDFToBoolean.class.getSimpleName());
    system.registerUDF(serdeConstants.TINYINT_TYPE_NAME, UDFToByte.class, false, UDFToByte.class.getSimpleName());
    system.registerUDF(serdeConstants.SMALLINT_TYPE_NAME, UDFToShort.class, false, UDFToShort.class.getSimpleName());
    system.registerUDF(serdeConstants.INT_TYPE_NAME, UDFToInteger.class, false, UDFToInteger.class.getSimpleName());
    system.registerUDF(serdeConstants.BIGINT_TYPE_NAME, UDFToLong.class, false, UDFToLong.class.getSimpleName());
    system.registerUDF(serdeConstants.FLOAT_TYPE_NAME, UDFToFloat.class, false, UDFToFloat.class.getSimpleName());
    system.registerUDF(serdeConstants.DOUBLE_TYPE_NAME, UDFToDouble.class, false, UDFToDouble.class.getSimpleName());
    system.registerUDF(serdeConstants.STRING_TYPE_NAME, UDFToString.class, false, UDFToString.class.getSimpleName());

    system.registerUDF(UDFToString.class.getSimpleName(), UDFToString.class, false, UDFToString.class.getSimpleName());
    system.registerUDF(UDFToBoolean.class.getSimpleName(), UDFToBoolean.class, false, UDFToBoolean.class.getSimpleName());
    system.registerUDF(UDFToDouble.class.getSimpleName(), UDFToDouble.class, false, UDFToDouble.class.getSimpleName());
    system.registerUDF(UDFToFloat.class.getSimpleName(), UDFToFloat.class, false, UDFToFloat.class.getSimpleName());
    system.registerUDF(UDFToInteger.class.getSimpleName(), UDFToInteger.class, false, UDFToInteger.class.getSimpleName());
    system.registerUDF(UDFToLong.class.getSimpleName(), UDFToLong.class, false, UDFToLong.class.getSimpleName());
    system.registerUDF(UDFToShort.class.getSimpleName(), UDFToShort.class, false, UDFToShort.class.getSimpleName());
    system.registerUDF(UDFToByte.class.getSimpleName(), UDFToByte.class, false, UDFToByte.class.getSimpleName());

    system.registerGenericUDF(serdeConstants.DATE_TYPE_NAME, GenericUDFToDate.class);
    system.registerGenericUDF(serdeConstants.TIMESTAMP_TYPE_NAME, GenericUDFTimestamp.class);
    system.registerGenericUDF(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME, GenericUDFToTimestampLocalTZ.class);
    system.registerGenericUDF(serdeConstants.INTERVAL_YEAR_MONTH_TYPE_NAME, GenericUDFToIntervalYearMonth.class);
    system.registerGenericUDF(serdeConstants.INTERVAL_DAY_TIME_TYPE_NAME, GenericUDFToIntervalDayTime.class);
    system.registerGenericUDF(serdeConstants.BINARY_TYPE_NAME, GenericUDFToBinary.class);
    system.registerGenericUDF(serdeConstants.DECIMAL_TYPE_NAME, GenericUDFToDecimal.class);
    system.registerGenericUDF(serdeConstants.VARCHAR_TYPE_NAME, GenericUDFToVarchar.class);
    system.registerGenericUDF(serdeConstants.CHAR_TYPE_NAME, GenericUDFToChar.class);

    system.registerGenericUDAF("max", new GenericUDAFMax());
    system.registerGenericUDAF("min", new GenericUDAFMin());

    system.registerGenericUDAF("sum", new GenericUDAFSum());
    system.registerGenericUDAF("$SUM0", new GenericUDAFSumEmptyIsZero());
    system.registerGenericUDAF("count", new GenericUDAFCount());
    system.registerGenericUDAF("avg", new GenericUDAFAverage());
    system.registerGenericUDAF("std", new GenericUDAFStd());
    system.registerGenericUDAF("stddev", new GenericUDAFStd());
    system.registerGenericUDAF("stddev_pop", new GenericUDAFStd());
    system.registerGenericUDAF("stddev_samp", new GenericUDAFStdSample());
    system.registerGenericUDAF("variance", new GenericUDAFVariance());
    system.registerGenericUDAF("var_pop", new GenericUDAFVariance());
    system.registerGenericUDAF("var_samp", new GenericUDAFVarianceSample());
    system.registerGenericUDAF("covar_pop", new GenericUDAFCovariance());
    system.registerGenericUDAF("covar_samp", new GenericUDAFCovarianceSample());
    system.registerGenericUDAF("corr", new GenericUDAFCorrelation());
    system.registerGenericUDAF("regr_slope", new GenericUDAFBinarySetFunctions.RegrSlope());
    system.registerGenericUDAF("regr_intercept", new GenericUDAFBinarySetFunctions.RegrIntercept());
    system.registerGenericUDAF("regr_r2", new GenericUDAFBinarySetFunctions.RegrR2());
    system.registerGenericUDAF("regr_sxx", new GenericUDAFBinarySetFunctions.RegrSXX());
    system.registerGenericUDAF("regr_syy", new GenericUDAFBinarySetFunctions.RegrSYY());
    system.registerGenericUDAF("regr_sxy", new GenericUDAFBinarySetFunctions.RegrSXY());
    system.registerGenericUDAF("regr_avgx", new GenericUDAFBinarySetFunctions.RegrAvgX());
    system.registerGenericUDAF("regr_avgy", new GenericUDAFBinarySetFunctions.RegrAvgY());
    system.registerGenericUDAF("regr_count", new GenericUDAFBinarySetFunctions.RegrCount());

    system.registerGenericUDAF("histogram_numeric", new GenericUDAFHistogramNumeric());
    system.registerGenericUDAF("percentile_approx", new GenericUDAFPercentileApprox());
    system.registerGenericUDAF("collect_set", new GenericUDAFCollectSet());
    system.registerGenericUDAF("collect_list", new GenericUDAFCollectList());

    system.registerGenericUDAF("ngrams", new GenericUDAFnGrams());
    system.registerGenericUDAF("context_ngrams", new GenericUDAFContextNGrams());

    system.registerGenericUDAF("compute_stats", new GenericUDAFComputeStats());
    system.registerGenericUDAF("bloom_filter", new GenericUDAFBloomFilter());
    system.registerUDAF("percentile", UDAFPercentile.class);

    system.registerGenericUDF("reflect", GenericUDFReflect.class);
    system.registerGenericUDF("reflect2", GenericUDFReflect2.class);
    system.registerGenericUDF("java_method", GenericUDFReflect.class);

    system.registerGenericUDF("array", GenericUDFArray.class);
    system.registerGenericUDF("assert_true", GenericUDFAssertTrue.class);
    system.registerGenericUDF("assert_true_oom", GenericUDFAssertTrueOOM.class);
    system.registerGenericUDF("map", GenericUDFMap.class);
    system.registerGenericUDF("struct", GenericUDFStruct.class);
    system.registerGenericUDF("named_struct", GenericUDFNamedStruct.class);
    system.registerGenericUDF("create_union", GenericUDFUnion.class);
    system.registerGenericUDF("extract_union", GenericUDFExtractUnion.class);

    system.registerGenericUDF("case", GenericUDFCase.class);
    system.registerGenericUDF("when", GenericUDFWhen.class);
    system.registerGenericUDF("nullif", GenericUDFNullif.class);
    system.registerGenericUDF("hash", GenericUDFHash.class);
    system.registerGenericUDF("murmur_hash", GenericUDFMurmurHash.class);
    system.registerGenericUDF("coalesce", GenericUDFCoalesce.class);
    system.registerGenericUDF("index", GenericUDFIndex.class);
    system.registerGenericUDF("in_file", GenericUDFInFile.class);
    system.registerGenericUDF("instr", GenericUDFInstr.class);
    system.registerGenericUDF("locate", GenericUDFLocate.class);
    system.registerGenericUDF("elt", GenericUDFElt.class);
    system.registerGenericUDF("concat_ws", GenericUDFConcatWS.class);
    system.registerGenericUDF("sort_array", GenericUDFSortArray.class);
    system.registerGenericUDF("sort_array_by", GenericUDFSortArrayByField.class);
    system.registerGenericUDF("array_contains", GenericUDFArrayContains.class);
    system.registerGenericUDF("sentences", GenericUDFSentences.class);
    system.registerGenericUDF("map_keys", GenericUDFMapKeys.class);
    system.registerGenericUDF("map_values", GenericUDFMapValues.class);
    system.registerGenericUDF("format_number", GenericUDFFormatNumber.class);
    system.registerGenericUDF("printf", GenericUDFPrintf.class);
    system.registerGenericUDF("greatest", GenericUDFGreatest.class);
    system.registerGenericUDF("least", GenericUDFLeast.class);
    system.registerGenericUDF("cardinality_violation", GenericUDFCardinalityViolation.class);
    system.registerGenericUDF("width_bucket", GenericUDFWidthBucket.class);

    system.registerGenericUDF("from_utc_timestamp", GenericUDFFromUtcTimestamp.class);
    system.registerGenericUDF("to_utc_timestamp", GenericUDFToUtcTimestamp.class);

    system.registerGenericUDF("unix_timestamp", GenericUDFUnixTimeStamp.class);
    system.registerGenericUDF("to_unix_timestamp", GenericUDFToUnixTimeStamp.class);

    system.registerGenericUDF("internal_interval", GenericUDFInternalInterval.class);

    system.registerGenericUDF("to_epoch_milli", GenericUDFEpochMilli.class);

    system.registerGenericUDTF("explode", GenericUDTFExplode.class);
    system.registerGenericUDTF("replicate_rows", GenericUDTFReplicateRows.class);
    system.registerGenericUDTF("inline", GenericUDTFInline.class);
    system.registerGenericUDTF("json_tuple", GenericUDTFJSONTuple.class);
    system.registerGenericUDTF("parse_url_tuple", GenericUDTFParseUrlTuple.class);
    system.registerGenericUDTF("posexplode", GenericUDTFPosExplode.class);
    system.registerGenericUDTF("stack", GenericUDTFStack.class);
    system.registerGenericUDTF("get_splits", GenericUDTFGetSplits.class);

    system.registerGenericUDF(LEAD_FUNC_NAME, GenericUDFLead.class);
    system.registerGenericUDF(LAG_FUNC_NAME, GenericUDFLag.class);

    system.registerGenericUDAF("row_number", new GenericUDAFRowNumber());
    system.registerGenericUDAF("rank", new GenericUDAFRank());
    system.registerGenericUDAF("dense_rank", new GenericUDAFDenseRank());
    system.registerGenericUDAF("percent_rank", new GenericUDAFPercentRank());
    system.registerGenericUDAF("cume_dist", new GenericUDAFCumeDist());
    system.registerGenericUDAF("ntile", new GenericUDAFNTile());
    system.registerGenericUDAF("first_value", new GenericUDAFFirstValue());
    system.registerGenericUDAF("last_value", new GenericUDAFLastValue());
    system.registerWindowFunction(LEAD_FUNC_NAME, new GenericUDAFLead());
    system.registerWindowFunction(LAG_FUNC_NAME, new GenericUDAFLag());

    system.registerTableFunction(NOOP_TABLE_FUNCTION, NoopResolver.class);
    system.registerTableFunction(NOOP_MAP_TABLE_FUNCTION, NoopWithMapResolver.class);
    system.registerTableFunction(NOOP_STREAMING_TABLE_FUNCTION, NoopStreamingResolver.class);
    system.registerTableFunction(NOOP_STREAMING_MAP_TABLE_FUNCTION, NoopWithMapStreamingResolver.class);
    system.registerTableFunction(WINDOWING_TABLE_FUNCTION, WindowingTableFunctionResolver.class);
    system.registerTableFunction(MATCH_PATH_TABLE_FUNCTION, MatchPathResolver.class);

    system.registerHiddenBuiltIn(GenericUDFOPDTIMinus.class);
    system.registerHiddenBuiltIn(GenericUDFOPDTIPlus.class);
    system.registerHiddenBuiltIn(GenericUDFOPNumericMinus.class);
    system.registerHiddenBuiltIn(GenericUDFOPNumericPlus.class);

    system.registerGenericUDF(GenericUDFMask.UDF_NAME, GenericUDFMask.class);
    system.registerGenericUDF(GenericUDFMaskFirstN.UDF_NAME, GenericUDFMaskFirstN.class);
    system.registerGenericUDF(GenericUDFMaskLastN.UDF_NAME, GenericUDFMaskLastN.class);
    system.registerGenericUDF(GenericUDFMaskShowFirstN.UDF_NAME, GenericUDFMaskShowFirstN.class);
    system.registerGenericUDF(GenericUDFMaskShowLastN.UDF_NAME, GenericUDFMaskShowLastN.class);
    system.registerGenericUDF(GenericUDFMaskHash.UDF_NAME, GenericUDFMaskHash.class);
  }
}

显然此处就是注册了Hive的系统函数。从源码可以看出Hive的系统函数注册了几百个。注册过的函数当然就可以直接使用了。

该类中当然还有其它的方法:

  public static FunctionInfo registerPermanentFunction(String functionName,
      String className, boolean registerToSession, FunctionResource[] resources) {
    return system.registerPermanentFunction(functionName, className, registerToSession, resources);
  }

  public static void unregisterPermanentFunction(String functionName) throws HiveException {
    system.unregisterFunction(functionName);
    unregisterTemporaryUDF(functionName);
  }

  public static void unregisterPermanentFunctions(String dbName) throws HiveException {
    system.unregisterFunctions(dbName);
  }

  public static FunctionInfo registerTemporaryUDF(
      String functionName, Class<?> udfClass, FunctionResource... resources) {
    return SessionState.getRegistryForWrite().registerFunction(
        functionName, udfClass, resources);
  }

  public static void unregisterTemporaryUDF(String functionName) throws HiveException {
    if (SessionState.getRegistry() != null) {
      SessionState.getRegistry().unregisterFunction(functionName);
    }
  }

这些就是命令行【没有temporary就是永久函数】:

create temporary function UDF函数名称 as '包名.类名';
desc function extended UDF函数名称;
drop temporary function if exists UDF函数名称;

执行这类操作时底层调用的方法。

显然系统函数能够直接使用,就是因为Java源码中实现用硬编码写死了它们。并且系统函数和永久函数是共享一个Registry类的system实例对象。

至此,找到了Hive自动注册过的系统函数,并且定位到了注册为系统函数需要调用的方法是 Registry类的registerUDF、registerGenericUDF、registerGenericUDTF、registerTableFunction、registerHiddenBuiltIn等方法。

注册系统函数

虽然不明觉厉,但是已经可以八九不离十地推测出如何将自己的UDF函数注册为系统函数以便直接使用。方法很简单:

在上述代码段附近,参照着这类方法:

system.registerGenericUDF("rpad", GenericUDFRpad.class);
system.registerGenericUDAF("collect_set", new GenericUDAFCollectSet());

照猫画虎也写几句:

system.registerGenericUDF("HQL的函数名称", 对应的Java类名.class);

即可将自己的函数注册为系统函数,以后使用时就不用再去繁琐地手动注册了。

修改了此处的源码后,需要重新编译源码。聪明的JavaEr一定知道简单的方法:无视编译错误,直接将编译好的 .class文件替换掉 hive-exec这个Jar包内的文件。具体使用 mvn clean install -DskipTests或者其它黑科技,各有各的玩法。

至此,SQL Boy们应该对Hive自带的函数有哪些有了明确的认识,也不至于狐疑为神马Oracle有的函数在Hive中无法使用这种神奇的现象。重要的话说三遍:

Hive不是数据库!!!

Hive不是数据库!!!

Hive不是数据库!!!

对于平台和组件二开人员来说,也明确了如何给Hive的系统函数库添砖加瓦,让平台更加强大。

Hive3.1.2自带的系统函数及UDF的随系统自动注册

转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/127501392

Original: https://blog.csdn.net/qq_41990268/article/details/127501392
Author: 虎鲸不是鱼
Title: Hive3.1.2自带的系统函数及UDF的随系统自动注册

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/817404/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球