2019-09-13 12:45:41,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2019-09-13 12:45:41,583 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2019-09-13 12:45:42,737 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2019-09-13 12:45:44,421 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2019-09-13 12:45:44,794 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2019-09-13 12:45:44,795 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-09-13 12:45:44,795 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2019-09-13 12:45:47,337 [DataStreamer for file /tmp/temp-1827035392/tmp-278395215/libthrift-0.9.3.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
2019-09-13 12:45:50,760 [DataStreamer for file /tmp/temp-1827035392/tmp-1899972092/jdo-api-3.0.1.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
2019-09-13 12:45:51,015 [DataStreamer for file /tmp/temp-1827035392/tmp-837313168/hive-hbase-handler-1.1.0-cdh5.13.0.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
2019-09-13 12:45:51,328 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job480021334576561857.jar
2019-09-13 12:46:06,574 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job480021334576561857.jar created
2019-09-13 12:46:06,575 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2019-09-13 12:46:06,740 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-09-13 12:46:06,839 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-09-13 12:46:06,840 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2019-09-13 12:46:06,841 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-09-13 12:46:07,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-09-13 12:46:07,239 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2019-09-13 12:46:07,245 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2019-09-13 12:46:07,437 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2019-09-13 12:46:07,679 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2019-09-13 12:46:12,221 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-09-13 12:46:12,221 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-09-13 12:46:12,447 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-09-13 12:46:12,755 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-09-13 12:46:14,582 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1568369247673_0002
2019-09-13 12:46:16,513 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1568369247673_0002
2019-09-13 12:46:16,896 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568369247673_0002/
2019-09-13 12:46:16,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1568369247673_0002
2019-09-13 12:46:16,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2019-09-13 12:46:16,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4] C: R:
2019-09-13 12:46:16,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1568369247673_0002
2019-09-13 12:46:17,256 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2019-09-13 12:50:22,702 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2019-09-13 12:50:28,657 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2019-09-13 12:50:28,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2019-09-13 12:50:29,018 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0-cdh5.13.0 0.12.0-cdh5.13.0 cloudera 2019-09-13 12:45:44 2019-09-13 12:50:28 UNKNOWN Success! Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1568369247673_0002 1 0 166 166 166 166 n/a n/a n/a n/a A,B MAP_ONLY hdfs://quickstart.cloudera:8020/user/cloudera/userinfo.out, Input(s):
Successfully read 52 records (2975 bytes) from: "/user/cloudera/passwd" Output(s):
Successfully stored 52 records (1480 bytes) in: "hdfs://quickstart.cloudera:8020/user/cloudera/userinfo.out" Counters:
Total records written : 52
Total bytes written : 1480
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0 Job DAG:
job_1568369247673_0002
2019-09-13 12:50:29,436 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! Можно подтвердить наличие новой информации в HDFS:
hdfs dfs -ls /user/cloudera
Found 2 items
-rw-r--r-- 1 cloudera cloudera 2604 2019-09-13 03:51 /user/cloudera/passwd
drwxr-xr-x - cloudera cloudera 0 2019-09-13 12:50 /user/cloudera/userinfo.out
Далее была проведена работа с Hive.
Для начала были скопированы файлы в HDFS:
[cloudera@quickstart ]$ hdfs dfs -put /etc/passwd /tmp/ [cloudera@quickstart ]$ hdfs dfs -ls /tmp/
Found 6 items
drwxrwxrwt - mapred mapred 0 2017-10-23 09:15 /tmp/hadoop-yarn
drwx-wx-wx - hive supergroup 0 2019-09-10 22:51 /tmp/hive
drwxrwxrwt - mapred hadoop 0 2017-10-23 09:17 /tmp/logs
-rw-r--r-- 1 cloudera supergroup 2604 2019-09-13 17:04 /tmp/passwd
drwxr-xr-x - cloudera supergroup 0 2019-09-13 12:45 /tmp/temp-1827035392
drwxr-xr-x - cloudera supergroup 0 2019-09-13 04:16 /tmp/temp-1957354021 Запуск beeline для интерактивного доступа:
[cloudera@quickstart ]$ beeline -u jdbc:hive2://
scan complete in 6ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.13.0)
Driver: Hive JDBC (version 1.1.0-cdh5.13.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.13.0 by Apache Hive
0: jdbc:hive2://> Создание таблицы userinfo и загрузка в нее файла с паролем из HDFS:
0: jdbc:hive2://> CREATE TABLE userinfo ( uname STRING, pswd STRING, uid INT, gid INT, fullname STRING, hdir STRING, shell STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':' STORED AS TEXTFILE;
OK
No rows affected (5.604 seconds)
0: jdbc:hive2://>
0: jdbc:hive2://> LOAD DATA INPATH '/tmp/passwd' OVERWRITE INTO TABLE
. . . . . . . . > userinfo;
Loading data to table default.userinfo
Table default.userinfo stats: [numFiles=1, numRows=0, totalSize=2604, rawDataSize=0]
OK
No rows affected (4.053 seconds)
0: jdbc:hive2://> Вывод информации из таблицы userinfo:
0: jdbc:hive2://> SELECT uname, fullname, hdir FROM userinfo ORDER BY uname ;
Query ID = cloudera_20190913172727_4c682080-5fbb-4523-99e7-99469081cdb8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
19/09/13 17:27:41 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/09/13 17:27:45 [DataStreamer for file /tmp/hadoop-yarn/staging/cloudera/.staging/job_1568369247673_0003/job.split]: WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
19/09/13 17:27:45 [DataStreamer for file /tmp/hadoop-yarn/staging/cloudera/.staging/job_1568369247673_0003/job.splitmetainfo]: WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
Starting Job = job_1568369247673_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1568369247673_0003/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1568369247673_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
19/09/13 17:37:51 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2019-09-13 17:37:51,563 Stage-1 map = 0%, reduce = 0%
19/09/13 17:38:52 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2019-09-13 17:38:52,000 Stage-1 map = 0%, reduce = 0%
19/09/13 17:39:58 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2019-09-13 17:39:58,764 Stage-1 map = 0%, reduce = 0%
2019-09-13 17:40:05,804 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.29 sec
2019-09-13 17:40:51,011 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12.21 sec
MapReduce Total cumulative CPU time: 12 seconds 210 msec
Ended Job = job_1568369247673_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 12.21 sec HDFS Read: 9920 HDFS Write: 1480 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 210 msec
OK
+----------------+-------------------------------+-------------------------------+--+
| uname | fullname | hdir |
+----------------+-------------------------------+-------------------------------+--+
| abrt | | /etc/abrt |
| adm | adm | /var/adm |
| apache | Apache | /var/www |
| avahi-autoipd | Avahi IPv4LL Stack | /var/lib/avahi-autoipd |
| bin | bin | /bin |
| cloudera | | /home/cloudera |
| cloudera-scm | Cloudera Manager | /var/lib/cloudera-scm-server |
| daemon | daemon | /sbin |
| dbus | System message bus | / |
| flume | Flume | /var/lib/flume-ng |
| ftp | FTP User | /var/ftp |
| games | games | /usr/games |
| gdm | | /var/lib/gdm |
| gopher | gopher | /var/gopher |
| haldaemon | HAL daemon | / |
| halt | halt | /sbin |
| hbase | HBase | /var/lib/hbase |
| hdfs | Hadoop HDFS | /var/lib/hadoop-hdfs |
| hive | Hive | /var/lib/hive |
| httpfs | Hadoop HTTPFS | /var/lib/hadoop-httpfs |
| hue | Hue | /usr/lib/hue |
| impala | Impala | /var/lib/impala |
| kms | Hadoop KMS | /var/lib/hadoop-kms |
| llama | Llama | /var/lib/llama |
| lp | lp | /var/spool/lpd |
| mail | mail | /var/spool/mail |
| mapred | Hadoop MapReduce | /var/lib/hadoop-mapreduce |
| mysql | MySQL Server | /var/lib/mysql |
| nobody | Nobody | / |
| ntp | | /etc/ntp |
| oozie | Oozie User | /var/lib/oozie |
| operator | operator | /root |
| postfix | | /var/spool/postfix |
| pulse | PulseAudio System Daemon | /var/run/pulse |
| root | root | /root |
| rpc | Rpcbind Daemon | /var/cache/rpcbind |
| rtkit | RealtimeKit | /proc |
| saslauth | Saslauthd user | /var/empty/saslauth |
| sentry | Sentry | /var/lib/sentry |
| shutdown | shutdown | /sbin |
| solr | Solr | /var/lib/solr |
| spark | Spark | /var/lib/spark |
| sqoop | Sqoop | /var/lib/sqoop |
| sqoop2 | Sqoop 2 User | /var/lib/sqoop2 |
| sshd | Privilege-separated SSH | /var/empty/sshd |
| sync | sync | /sbin |
| tcpdump | | / |
| uucp | uucp | /var/spool/uucp |
| vboxadd | | /var/run/vboxadd |
| vcsa | virtual console memory owner | /dev |
| yarn | Hadoop Yarn | /var/lib/hadoop-yarn |
| zookeeper | ZooKeeper | /var/lib/zookeeper |
+----------------+-------------------------------+-------------------------------+--+
52 rows selected (806.618 seconds)
0: jdbc:hive2://>
Далее была произведена работа с HBase.
Запуск HBase shell:
[cloudera@quickstart ]$ hbase shell
2019-09-13 20:12:05,672 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 1.2.0-cdh5.13.0, rUnknown, Wed Oct 4 11:16:18 PDT 2017 hbase(main):001:0> Создание таблицы и добавление записей:
hbase(main):001:0>
hbase(main):002:0* create 'usertableinfo', {NAME=>'username'}, {NAME=>'fullname'}, {NAME=>'homedir'}
0 row(s) in 4.4440 seconds => Hbase::Table - usertableinfo
hbase(main):003:0>
hbase(main):004:0*
hbase(main):005:0*
hbase(main):006:0*
hbase(main):007:0*
hbase(main):008:0*
hbase(main):009:0*
hbase(main):010:0*
hbase(main):011:0*
hbase(main):012:0* put 'usertableinfo', 'r1', 'username', 'vcsa'
0 row(s) in 1.6340 seconds hbase(main):013:0> put 'usertableinfo', 'r2', 'username', 'p'
0 row(s) in 0.0660 seconds hbase(main):014:0>
hbase(main):015:0*
hbase(main):016:0*
hbase(main):017:0*
hbase(main):018:0*
hbase(main):019:0*
hbase(main):020:0*
hbase(main):021:0*
hbase(main):022:0* put 'usertableinfo', 'r1', 'fullname', 'Virtual Machine Admin'
0 row(s) in 0.1040 seconds hbase(main):023:0>
hbase(main):024:0*
hbase(main):025:0*
hbase(main):026:0*
hbase(main):027:0*
hbase(main):028:0*
hbase(main):029:0*
hbase(main):030:0*
hbase(main):031:0* put 'usertableinfo', 'r2', 'fullname', 'Python user'
0 row(s) in 0.0590 seconds Просмотр содержимого таблицы:
hbase(main):001:0> scan 'usertableinfo'
ROW COLUMN+CELL
r1 column=fullname:, timestamp=1568430958127, value=Virtual M
achine Admin
r1 column=username:, timestamp=1568430949476, value=vcsa
r2 column=fullname:, timestamp=1568430963395, value=Python us
er
r2 column=username:, timestamp=1568430957349, value=p
2 row(s) in 2.7890 seconds
Согласно варианту была создана таблица categories:
hbase(main):001:0> create 'categories', {NAME=>'name'}, {NAME=>'department'}
0 row(s) in 4.6420 seconds => Hbase::Table - categories
hbase(main):002:0> put 'categories', 'r1', 'name', 'a'
0 row(s) in 0.8040 seconds hbase(main):003:0>
hbase(main):004:0*
hbase(main):005:0* put 'categories', 'r2', 'name', 'b'
0 row(s) in 0.0180 seconds hbase(main):006:0> put 'categories', 'r3', 'name', 'b'
0 row(s) in 0.0230 seconds hbase(main):007:0> put 'categories', 'r1', 'department', 1
0 row(s) in 0.1780 seconds hbase(main):008:0>
hbase(main):009:0*
hbase(main):010:0*
hbase(main):011:0*
hbase(main):012:0*
hbase(main):013:0*
hbase(main):014:0* put 'categories', 'r2', 'department', 1
0 row(s) in 0.0740 seconds hbase(main):015:0>
hbase(main):016:0*
hbase(main):017:0*
hbase(main):018:0* put 'categories', 'r3', 'department', 2
0 row(s) in 0.0410 seconds hbase(main):019:0> scan 'categories'
ROW COLUMN+CELL
r1 column=department:, timestamp=1568683197980, value=1
r1 column=name:, timestamp=1568683197100, value=a
r2 column=department:, timestamp=1568683199019, value=1
r2 column=name:, timestamp=1568683197527, value=b
r3 column=department:, timestamp=1568683199541, value=2
r3 column=name:, timestamp=1568683197821, value=b
3 row(s) in 0.4220 seconds hbase(main):020:0>
При работе с HBase shell в некоторых случаях при попытки создания или взаимодействия с таблицами приводило к java.net.SocketTimeoutException, перезапуск VM помогал решить проблему.
ВЫВОДЫ В результате выполнения лабораторной работы были выяснено, что Hadoop обладает множеством утилит способных на взаимодействие с таблицами. |