Главная страница

Большие данные.. Отчет по лабораторной работе 1 по дисциплине Большие данные и облачные технологии


Скачать 0.57 Mb.
НазваниеОтчет по лабораторной работе 1 по дисциплине Большие данные и облачные технологии
АнкорБольшие данные
Дата02.06.2020
Размер0.57 Mb.
Формат файлаdocx
Имя файлаBDiOT_lab_1 (1).docx
ТипОтчет
#127576
страница3 из 3
1   2   3


2019-09-13 12:45:41,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1

2019-09-13 12:45:41,583 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1

2019-09-13 12:45:42,737 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032

2019-09-13 12:45:44,421 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job

2019-09-13 12:45:44,794 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent

2019-09-13 12:45:44,795 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2019-09-13 12:45:44,795 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress

2019-09-13 12:45:47,337 [DataStreamer for file /tmp/temp-1827035392/tmp-278395215/libthrift-0.9.3.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

2019-09-13 12:45:50,760 [DataStreamer for file /tmp/temp-1827035392/tmp-1899972092/jdo-api-3.0.1.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

2019-09-13 12:45:51,015 [DataStreamer for file /tmp/temp-1827035392/tmp-837313168/hive-hbase-handler-1.1.0-cdh5.13.0.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

2019-09-13 12:45:51,328 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job480021334576561857.jar

2019-09-13 12:46:06,574 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job480021334576561857.jar created

2019-09-13 12:46:06,575 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar

2019-09-13 12:46:06,740 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job

2019-09-13 12:46:06,839 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.

2019-09-13 12:46:06,840 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache

2019-09-13 12:46:06,841 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []

2019-09-13 12:46:07,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.

2019-09-13 12:46:07,239 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address

2019-09-13 12:46:07,245 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 12:46:07,437 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032

2019-09-13 12:46:07,679 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 12:46:12,221 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

2019-09-13 12:46:12,221 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

2019-09-13 12:46:12,447 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1

2019-09-13 12:46:12,755 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1

2019-09-13 12:46:14,582 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1568369247673_0002

2019-09-13 12:46:16,513 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1568369247673_0002

2019-09-13 12:46:16,896 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568369247673_0002/

2019-09-13 12:46:16,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1568369247673_0002

2019-09-13 12:46:16,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B

2019-09-13 12:46:16,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4] C: R:

2019-09-13 12:46:16,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1568369247673_0002

2019-09-13 12:46:17,256 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

2019-09-13 12:50:22,702 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete

2019-09-13 12:50:28,657 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

2019-09-13 12:50:28,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete

2019-09-13 12:50:29,018 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features

2.6.0-cdh5.13.0 0.12.0-cdh5.13.0 cloudera 2019-09-13 12:45:44 2019-09-13 12:50:28 UNKNOWN
Success!
Job Stats (time in seconds):

JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs

job_1568369247673_0002 1 0 166 166 166 166 n/a n/a n/a n/a A,B MAP_ONLY hdfs://quickstart.cloudera:8020/user/cloudera/userinfo.out,
Input(s):

Successfully read 52 records (2975 bytes) from: "/user/cloudera/passwd"
Output(s):

Successfully stored 52 records (1480 bytes) in: "hdfs://quickstart.cloudera:8020/user/cloudera/userinfo.out"
Counters:

Total records written : 52

Total bytes written : 1480

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0
Job DAG:

job_1568369247673_0002

2019-09-13 12:50:29,436 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
Можно подтвердить наличие новой информации в HDFS:

hdfs dfs -ls /user/cloudera

Found 2 items

-rw-r--r-- 1 cloudera cloudera 2604 2019-09-13 03:51 /user/cloudera/passwd

drwxr-xr-x - cloudera cloudera 0 2019-09-13 12:50 /user/cloudera/userinfo.out

Далее была проведена работа с Hive.

Для начала были скопированы файлы в HDFS:

[cloudera@quickstart

]$ hdfs dfs -put /etc/passwd /tmp/
[cloudera@quickstart ]$ hdfs dfs -ls /tmp/

Found 6 items

drwxrwxrwt - mapred mapred 0 2017-10-23 09:15 /tmp/hadoop-yarn

drwx-wx-wx - hive supergroup 0 2019-09-10 22:51 /tmp/hive

drwxrwxrwt - mapred hadoop 0 2017-10-23 09:17 /tmp/logs

-rw-r--r-- 1 cloudera supergroup 2604 2019-09-13 17:04 /tmp/passwd

drwxr-xr-x - cloudera supergroup 0 2019-09-13 12:45 /tmp/temp-1827035392

drwxr-xr-x - cloudera supergroup 0 2019-09-13 04:16 /tmp/temp-1957354021
Запуск beeline для интерактивного доступа:

[cloudera@quickstart ]$ beeline -u jdbc:hive2://

scan complete in 6ms

Connecting to jdbc:hive2://

Connected to: Apache Hive (version 1.1.0-cdh5.13.0)

Driver: Hive JDBC (version 1.1.0-cdh5.13.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

Beeline version 1.1.0-cdh5.13.0 by Apache Hive

0: jdbc:hive2://>
Создание таблицы userinfo и загрузка в нее файла с паролем из HDFS:

0: jdbc:hive2://> CREATE TABLE userinfo ( uname STRING, pswd STRING, uid INT, gid INT, fullname STRING, hdir STRING, shell STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':' STORED AS TEXTFILE;

OK

No rows affected (5.604 seconds)

0: jdbc:hive2://>

0: jdbc:hive2://> LOAD DATA INPATH '/tmp/passwd' OVERWRITE INTO TABLE

. . . . . . . . > userinfo;

Loading data to table default.userinfo

Table default.userinfo stats: [numFiles=1, numRows=0, totalSize=2604, rawDataSize=0]

OK

No rows affected (4.053 seconds)

0: jdbc:hive2://>
Вывод информации из таблицы userinfo:

0: jdbc:hive2://> SELECT uname, fullname, hdir FROM userinfo ORDER BY uname ;

Query ID = cloudera_20190913172727_4c682080-5fbb-4523-99e7-99469081cdb8

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

19/09/13 17:27:41 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

19/09/13 17:27:45 [DataStreamer for file /tmp/hadoop-yarn/staging/cloudera/.staging/job_1568369247673_0003/job.split]: WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/13 17:27:45 [DataStreamer for file /tmp/hadoop-yarn/staging/cloudera/.staging/job_1568369247673_0003/job.splitmetainfo]: WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

Starting Job = job_1568369247673_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1568369247673_0003/

Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1568369247673_0003

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

19/09/13 17:37:51 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

2019-09-13 17:37:51,563 Stage-1 map = 0%, reduce = 0%

19/09/13 17:38:52 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

2019-09-13 17:38:52,000 Stage-1 map = 0%, reduce = 0%

19/09/13 17:39:58 [HiveServer2-Background-Pool: Thread-36]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

2019-09-13 17:39:58,764 Stage-1 map = 0%, reduce = 0%

2019-09-13 17:40:05,804 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.29 sec

2019-09-13 17:40:51,011 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12.21 sec

MapReduce Total cumulative CPU time: 12 seconds 210 msec

Ended Job = job_1568369247673_0003

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 12.21 sec HDFS Read: 9920 HDFS Write: 1480 SUCCESS

Total MapReduce CPU Time Spent: 12 seconds 210 msec

OK

+----------------+-------------------------------+-------------------------------+--+

| uname | fullname | hdir |

+----------------+-------------------------------+-------------------------------+--+

| abrt | | /etc/abrt |

| adm | adm | /var/adm |

| apache | Apache | /var/www |

| avahi-autoipd | Avahi IPv4LL Stack | /var/lib/avahi-autoipd |

| bin | bin | /bin |

| cloudera | | /home/cloudera |

| cloudera-scm | Cloudera Manager | /var/lib/cloudera-scm-server |

| daemon | daemon | /sbin |

| dbus | System message bus | / |

| flume | Flume | /var/lib/flume-ng |

| ftp | FTP User | /var/ftp |

| games | games | /usr/games |

| gdm | | /var/lib/gdm |

| gopher | gopher | /var/gopher |

| haldaemon | HAL daemon | / |

| halt | halt | /sbin |

| hbase | HBase | /var/lib/hbase |

| hdfs | Hadoop HDFS | /var/lib/hadoop-hdfs |

| hive | Hive | /var/lib/hive |

| httpfs | Hadoop HTTPFS | /var/lib/hadoop-httpfs |

| hue | Hue | /usr/lib/hue |

| impala | Impala | /var/lib/impala |

| kms | Hadoop KMS | /var/lib/hadoop-kms |

| llama | Llama | /var/lib/llama |

| lp | lp | /var/spool/lpd |

| mail | mail | /var/spool/mail |

| mapred | Hadoop MapReduce | /var/lib/hadoop-mapreduce |

| mysql | MySQL Server | /var/lib/mysql |

| nobody | Nobody | / |

| ntp | | /etc/ntp |

| oozie | Oozie User | /var/lib/oozie |

| operator | operator | /root |

| postfix | | /var/spool/postfix |

| pulse | PulseAudio System Daemon | /var/run/pulse |

| root | root | /root |

| rpc | Rpcbind Daemon | /var/cache/rpcbind |

| rtkit | RealtimeKit | /proc |

| saslauth | Saslauthd user | /var/empty/saslauth |

| sentry | Sentry | /var/lib/sentry |

| shutdown | shutdown | /sbin |

| solr | Solr | /var/lib/solr |

| spark | Spark | /var/lib/spark |

| sqoop | Sqoop | /var/lib/sqoop |

| sqoop2 | Sqoop 2 User | /var/lib/sqoop2 |

| sshd | Privilege-separated SSH | /var/empty/sshd |

| sync | sync | /sbin |

| tcpdump | | / |

| uucp | uucp | /var/spool/uucp |

| vboxadd | | /var/run/vboxadd |

| vcsa | virtual console memory owner | /dev |

| yarn | Hadoop Yarn | /var/lib/hadoop-yarn |

| zookeeper | ZooKeeper | /var/lib/zookeeper |

+----------------+-------------------------------+-------------------------------+--+

52 rows selected (806.618 seconds)

0: jdbc:hive2://>

Далее была произведена работа с HBase.

Запуск HBase shell:

[cloudera@quickstart ]$ hbase shell

2019-09-13 20:12:05,672 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 1.2.0-cdh5.13.0, rUnknown, Wed Oct 4 11:16:18 PDT 2017
hbase(main):001:0>
Создание таблицы и добавление записей:

hbase(main):001:0>

hbase(main):002:0* create 'usertableinfo', {NAME=>'username'}, {NAME=>'fullname'}, {NAME=>'homedir'}

0 row(s) in 4.4440 seconds
=> Hbase::Table - usertableinfo

hbase(main):003:0>

hbase(main):004:0*

hbase(main):005:0*

hbase(main):006:0*

hbase(main):007:0*

hbase(main):008:0*

hbase(main):009:0*

hbase(main):010:0*

hbase(main):011:0*

hbase(main):012:0* put 'usertableinfo', 'r1', 'username', 'vcsa'

0 row(s) in 1.6340 seconds
hbase(main):013:0> put 'usertableinfo', 'r2', 'username', 'p'

0 row(s) in 0.0660 seconds
hbase(main):014:0>

hbase(main):015:0*

hbase(main):016:0*

hbase(main):017:0*

hbase(main):018:0*

hbase(main):019:0*

hbase(main):020:0*

hbase(main):021:0*

hbase(main):022:0* put 'usertableinfo', 'r1', 'fullname', 'Virtual Machine Admin'

0 row(s) in 0.1040 seconds
hbase(main):023:0>

hbase(main):024:0*

hbase(main):025:0*

hbase(main):026:0*

hbase(main):027:0*

hbase(main):028:0*

hbase(main):029:0*

hbase(main):030:0*

hbase(main):031:0* put 'usertableinfo', 'r2', 'fullname', 'Python user'

0 row(s) in 0.0590 seconds
Просмотр содержимого таблицы:

hbase(main):001:0> scan 'usertableinfo'

ROW COLUMN+CELL

r1 column=fullname:, timestamp=1568430958127, value=Virtual M

achine Admin

r1 column=username:, timestamp=1568430949476, value=vcsa

r2 column=fullname:, timestamp=1568430963395, value=Python us

er

r2 column=username:, timestamp=1568430957349, value=p

2 row(s) in 2.7890 seconds

Согласно варианту была создана таблица categories:

hbase(main):001:0> create 'categories', {NAME=>'name'}, {NAME=>'department'}

0 row(s) in 4.6420 seconds
=> Hbase::Table - categories

hbase(main):002:0> put 'categories', 'r1', 'name', 'a'

0 row(s) in 0.8040 seconds
hbase(main):003:0>

hbase(main):004:0*

hbase(main):005:0* put 'categories', 'r2', 'name', 'b'

0 row(s) in 0.0180 seconds
hbase(main):006:0> put 'categories', 'r3', 'name', 'b'

0 row(s) in 0.0230 seconds
hbase(main):007:0> put 'categories', 'r1', 'department', 1

0 row(s) in 0.1780 seconds
hbase(main):008:0>

hbase(main):009:0*

hbase(main):010:0*

hbase(main):011:0*

hbase(main):012:0*

hbase(main):013:0*

hbase(main):014:0* put 'categories', 'r2', 'department', 1

0 row(s) in 0.0740 seconds
hbase(main):015:0>

hbase(main):016:0*

hbase(main):017:0*

hbase(main):018:0* put 'categories', 'r3', 'department', 2

0 row(s) in 0.0410 seconds
hbase(main):019:0> scan 'categories'

ROW COLUMN+CELL

r1 column=department:, timestamp=1568683197980, value=1

r1 column=name:, timestamp=1568683197100, value=a

r2 column=department:, timestamp=1568683199019, value=1

r2 column=name:, timestamp=1568683197527, value=b

r3 column=department:, timestamp=1568683199541, value=2

r3 column=name:, timestamp=1568683197821, value=b

3 row(s) in 0.4220 seconds
hbase(main):020:0>

При работе с HBase shell в некоторых случаях при попытки создания или взаимодействия с таблицами приводило к java.net.SocketTimeoutException, перезапуск VM помогал решить проблему.

ВЫВОДЫ
В результате выполнения лабораторной работы были выяснено, что Hadoop обладает множеством утилит способных на взаимодействие с таблицами.
1   2   3


написать администратору сайта