Главная страница

Большие данные.. Отчет по лабораторной работе 1 по дисциплине Большие данные и облачные технологии


Скачать 0.57 Mb.
НазваниеОтчет по лабораторной работе 1 по дисциплине Большие данные и облачные технологии
АнкорБольшие данные
Дата02.06.2020
Размер0.57 Mb.
Формат файлаdocx
Имя файлаBDiOT_lab_1 (1).docx
ТипОтчет
#127576
страница2 из 3
1   2   3


Рисунок 1 – Результат выборки 10 самых популярных категорий
Далее был выполнен запрос согласно варианту задания (рис.2).



Рисунок 2 – Все заказы со статусом “Complete”
Было установлено ограничение в 10 заказов, всвязи с низкой производительностью в виду меньше количества выделенной оперативной памяти (около 3.3 Гб).

Далее было проведено использование Pig:

hdfs dfs -put /etc/passwd /user/cloudera

pig -x mapreduce

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

2019-09-13 03:52:34,233 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.13.0 (rexported) compiled Oct 04 2017, 11:09:03

2019-09-13 03:52:34,236 [main] INFO org.apache.pig.Main - Logging error messages to: /home/cloudera/pig_1568371953975.log

2019-09-13 03:52:34,459 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/cloudera/.pigbootup not found

2019-09-13 03:52:39,886 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:39,887 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:39,887 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://quickstart.cloudera:8020

2019-09-13 03:52:54,312 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:54,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021

2019-09-13 03:52:54,333 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:54,859 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:54,873 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:55,678 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:55,700 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:56,626 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:56,645 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:57,405 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:57,417 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:57,807 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:57,809 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:58,459 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:58,471 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:58,912 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:58,913 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:52:59,466 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:52:59,474 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

Было выполнено взятие и вывод частей записей (0,4 и 5-ое полями/колонками, которыми являются имя пользователя, полное имя и домашний каталог) с загруженного файла:

grunt> A = load '/user/cloudera/passwd' using PigStorage(':');

grunt> B = foreach A generate $0, $4, $5 ;

grunt> dump B;

2019-09-13 03:57:52,730 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN

2019-09-13 03:57:53,005 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}

2019-09-13 03:57:53,835 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false

2019-09-13 03:57:54,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1

2019-09-13 03:57:54,063 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1

2019-09-13 03:57:55,460 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032

2019-09-13 03:57:57,011 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job

2019-09-13 03:57:57,492 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent

2019-09-13 03:57:57,497 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2019-09-13 03:57:57,498 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress

2019-09-13 03:58:04,013 [DataStreamer for file /tmp/temp-1957354021/tmp-1301083020/jdo-api-3.0.1.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

2019-09-13 03:58:04,550 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7965330412867642196.jar

2019-09-13 03:58:19,256 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7965330412867642196.jar created

2019-09-13 03:58:19,257 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar

2019-09-13 03:58:19,396 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job

2019-09-13 03:58:19,473 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.

2019-09-13 03:58:19,473 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache

2019-09-13 03:58:19,473 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []

2019-09-13 03:58:19,768 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.

2019-09-13 03:58:19,777 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address

2019-09-13 03:58:19,777 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 03:58:19,878 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032

2019-09-13 03:58:20,122 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 03:58:23,961 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

2019-09-13 03:58:23,962 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

2019-09-13 03:58:24,134 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1

2019-09-13 03:58:24,437 [DataStreamer for file /tmp/hadoop-yarn/staging/cloudera/.staging/job_1568369247673_0001/job.splitmetainfo] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

2019-09-13 03:58:24,443 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1

2019-09-13 03:58:26,185 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1568369247673_0001

2019-09-13 03:58:30,031 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1568369247673_0001

2019-09-13 03:58:30,456 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568369247673_0001/

2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1568369247673_0001

2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B

2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4] C: R:

2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1568369247673_0001

2019-09-13 03:58:30,670 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

2019-09-13 04:28:04,194 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete

2019-09-13 04:28:09,874 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

2019-09-13 04:28:10,234 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete

2019-09-13 04:28:10,555 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features

2.6.0-cdh5.13.0 0.12.0-cdh5.13.0 cloudera 2019-09-13 03:57:56 2019-09-13 04:28:10 UNKNOWN
Success!
Job Stats (time in seconds):

JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs

job_1568369247673_0001 1 0 577 577 577 577 n/a n/a n/a n/a A,B MAP_ONLY hdfs://quickstart.cloudera:8020/tmp/temp-1957354021/tmp195232887,
Input(s):

Successfully read 52 records (2975 bytes) from: "/user/cloudera/passwd"
Output(s):

Successfully stored 52 records (1691 bytes) in: "hdfs://quickstart.cloudera:8020/tmp/temp-1957354021/tmp195232887"
Counters:

Total records written : 52

Total bytes written : 1691

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0
Job DAG:

job_1568369247673_0001

2019-09-13 04:28:11,018 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

2019-09-13 04:28:11,054 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

2019-09-13 04:28:11,055 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-09-13 04:28:11,062 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.

2019-09-13 04:28:11,153 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

2019-09-13 04:28:11,154 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

(root,root,/root)

(bin,bin,/bin)

(daemon,daemon,/sbin)

(adm,adm,/var/adm)

(lp,lp,/var/spool/lpd)

(sync,sync,/sbin)

(shutdown,shutdown,/sbin)

(halt,halt,/sbin)

(mail,mail,/var/spool/mail)

(uucp,uucp,/var/spool/uucp)

(operator,operator,/root)

(games,games,/usr/games)

(gopher,gopher,/var/gopher)

(ftp,FTP User,/var/ftp)

(nobody,Nobody,/)

(dbus,System message bus,/)

(vcsa,virtual console memory owner,/dev)

(abrt,,/etc/abrt)

(haldaemon,HAL daemon,/)

(ntp,,/etc/ntp)

(saslauth,Saslauthd user,/var/empty/saslauth)

(postfix,,/var/spool/postfix)

(sshd,Privilege-separated SSH,/var/empty/sshd)

(tcpdump,,/)

(zookeeper,ZooKeeper,/var/lib/zookeeper)

(cloudera-scm,Cloudera Manager,/var/lib/cloudera-scm-server)

(rpc,Rpcbind Daemon,/var/cache/rpcbind)

(apache,Apache,/var/www)

(solr,Solr,/var/lib/solr)

(hbase,HBase,/var/lib/hbase)

(sentry,Sentry,/var/lib/sentry)

(hive,Hive,/var/lib/hive)

(hdfs,Hadoop HDFS,/var/lib/hadoop-hdfs)

(yarn,Hadoop Yarn,/var/lib/hadoop-yarn)

(impala,Impala,/var/lib/impala)

(mapred,Hadoop MapReduce,/var/lib/hadoop-mapreduce)

(hue,Hue,/usr/lib/hue)

(sqoop,Sqoop,/var/lib/sqoop)

(flume,Flume,/var/lib/flume-ng)

(spark,Spark,/var/lib/spark)

(sqoop2,Sqoop 2 User,/var/lib/sqoop2)

(oozie,Oozie User,/var/lib/oozie)

(mysql,MySQL Server,/var/lib/mysql)

(kms,Hadoop KMS,/var/lib/hadoop-kms)

(llama,Llama,/var/lib/llama)

(httpfs,Hadoop HTTPFS,/var/lib/hadoop-httpfs)

(gdm,,/var/lib/gdm)

(rtkit,RealtimeKit,/proc)

(pulse,PulseAudio System Daemon,/var/run/pulse)

(avahi-autoipd,Avahi IPv4LL Stack,/var/lib/avahi-autoipd)

(cloudera,,/home/cloudera)

(vboxadd,,/var/run/vboxadd)
Видно успешное выполнение, количество полученных записей ранов 52-м. Данные команды выполнялись достаточно долго (около получаса).

Была успешно сохранена данная информация:

grunt> store B into 'userinfo.out';

2019-09-13 12:45:39,970 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN

2019-09-13 12:45:40,354 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}

2019-09-13 12:45:40,568 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator

2019-09-13 12:45:41,306 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
1   2   3


написать администратору сайта