2016-08-08 15 views
5

Tôi vừa mới nâng cấp lên phiên bản 2.0.0 từ 1.3.1, tôi đã viết một mã đơn giản để tương tác với hive (1.2.1) đã sử dụng tia lửa sql, tôi đã đặt hive-site.xml vào thư mục conf spark, và tôi nhận được kết quả mong đợi từ sql, NHƯNG nó ném một HadExistsException lạ (thông báo: Cơ sở dữ liệu mặc định đã tồn tại), làm thế nào để bỏ qua điều này?pyspark 2.0 throw AlreadyExistsException (thông báo: Cơ sở dữ liệu mặc định đã tồn tại) khi tương tác với hive

【】 Mã

from pyspark.sql import SparkSession 

ss = SparkSession.builder.appName("test").master("local") \ 
    .config("spark.ui.port", "4041") \ 
    .enableHiveSupport()\ 
    .getOrCreate() 
ss.sparkContext.setLogLevel("INFO") 
ss.sql("show tables").show() 

【】 Đăng

Setting default log level to "WARN". 
To adjust logging level use sc.setLogLevel(newLevel). 
16/08/08 19:41:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/08/08 19:41:24 INFO execution.SparkSqlParser: Parsing command: show tables 
16/08/08 19:41:25 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 
16/08/08 19:41:26 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 
16/08/08 19:41:26 INFO metastore.ObjectStore: ObjectStore, initialize called 
16/08/08 19:41:26 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 
16/08/08 19:41:26 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 
16/08/08 19:41:26 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 
16/08/08 19:41:27 INFO DataNucleus.Query: Reading in results for query "[email protected]" since the connection used is closing 
16/08/08 19:41:27 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 
16/08/08 19:41:27 INFO metastore.ObjectStore: Initialized ObjectStore 
16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added admin role in metastore 
16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added public role in metastore 
16/08/08 19:41:27 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 
16/08/08 19:41:27 INFO metastore.HiveMetaStore: 0: get_all_databases 
16/08/08 19:41:27 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_all_databases 
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=* 
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_functions: db=default pat=* 
16/08/08 19:41:28 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/3fbc3578-fdeb-40a9-8469-7c851cb3733c_resources 
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c 
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c 
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c/_tmp_space.db 
16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse 
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/8eaa63ec-9710-499f-bd50-6625bf4459f5_resources 
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5 
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5 
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5/_tmp_space.db 
16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse 
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{}) 
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{}) 
16/08/08 19:41:28 ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists) 
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:497) 
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) 
    at com.sun.proxy.$Proxy22.create_database(Unknown Source) 
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:497) 
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) 
    at com.sun.proxy.$Proxy23.createDatabase(Unknown Source) 
    at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306) 
    at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:291) 
    at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291) 
    at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291) 
    at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:262) 
    at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:209) 
    at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:208) 
    at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:251) 
    at org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:290) 
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:99) 
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99) 
    at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99) 
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72) 
    at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:98) 
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147) 
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89) 
    at org.apache.spark.sql.hive.HiveSessionCatalog.<init>(HiveSessionCatalog.scala:51) 
    at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:49) 
    at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) 
    at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63) 
    at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) 
    at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) 
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) 
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) 
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:497) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
    at py4j.Gateway.invoke(Gateway.java:280) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:211) 
    at java.lang.Thread.run(Thread.java:745) 

16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default 
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default 
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default 
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default 
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=* 
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_tables: db=default pat=*  
16/08/08 19:41:28 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:-2 
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:-2) with 1 output partitions 
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2) 
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Parents of final stage: List() 
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Missing parents: List() 
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2), which has no missing parents 
16/08/08 19:41:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.9 KB, free 366.3 MB) 
16/08/08 19:41:29 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.4 KB, free 366.3 MB) 
16/08/08 19:41:29 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.68.80.25:58224 (size: 2.4 KB, free: 366.3 MB) 
16/08/08 19:41:29 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012 
16/08/08 19:41:29 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2) 
16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
16/08/08 19:41:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 5827 bytes) 
16/08/08 19:41:29 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0) 
16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 152.42807 ms 
16/08/08 19:41:29 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1279 bytes result sent to driver 
16/08/08 19:41:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 275 ms on localhost (1/1) 
16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/08/08 19:41:29 INFO scheduler.DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2) finished in 0.288 s 
16/08/08 19:41:29 INFO scheduler.DAGScheduler: Job 0 finished: showString at NativeMethodAccessorImpl.java:-2, took 0.538913 s 
16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 13.588415 ms 
+-------------------+-----------+ 
|   tableName|isTemporary| 
+-------------------+-----------+ 
|  app_visit_log|  false| 
|  cms_article|  false| 
|     p4|  false| 
|    p_bak|  false| 
+-------------------+-----------+ 

16/08/08 19:41:29 INFO spark.SparkContext: Invoking stop() from shutdown hook 

PS: tất cả mọi thứ hoạt động tốt khi tôi kiểm tra nó trong Java.

Mọi trợ giúp sẽ được đánh giá cao.

Trả lời

0

làm nhật ký hiển thị, thông báo này không có nghĩa là bất kỳ điều xấu nào đã xảy ra, nó chỉ kiểm tra xem cơ sở dữ liệu mặc định đã tồn tại chưa; trên thực tế, các nhật ký ngoại lệ này sẽ không được hiển thị nếu cơ sở dữ liệu mặc định tồn tại.

+0

Điều này bỏ lỡ câu hỏi, đó là cách ngăn chặn cảnh báo. – aaronsteers

0

Giả sử bạn không có kho lưu trữ Hive và nội dung hiện có, hãy thử đặt các cài đặt sau trong tệp tin spark-defaults.xml và khởi động lại trình khởi chạy.

spark.sql.warehouse.dir=file:///usr/lib/spark/..... (spark install dir) 
Các vấn đề liên quan