Большие данные.. Отчет по лабораторной работе 1 по дисциплине Большие данные и облачные технологии
Скачать 0.57 Mb.
|
Рисунок 1 – Результат выборки 10 самых популярных категорий Далее был выполнен запрос согласно варианту задания (рис.2). Рисунок 2 – Все заказы со статусом “Complete” Было установлено ограничение в 10 заказов, всвязи с низкой производительностью в виду меньше количества выделенной оперативной памяти (около 3.3 Гб). Далее было проведено использование Pig: hdfs dfs -put /etc/passwd /user/cloudera pig -x mapreduce log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 2019-09-13 03:52:34,233 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.13.0 (rexported) compiled Oct 04 2017, 11:09:03 2019-09-13 03:52:34,236 [main] INFO org.apache.pig.Main - Logging error messages to: /home/cloudera/pig_1568371953975.log 2019-09-13 03:52:34,459 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/cloudera/.pigbootup not found 2019-09-13 03:52:39,886 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:39,887 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:39,887 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://quickstart.cloudera:8020 2019-09-13 03:52:54,312 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:54,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021 2019-09-13 03:52:54,333 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:54,859 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:54,873 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:55,678 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:55,700 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:56,626 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:56,645 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:57,405 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:57,417 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:57,807 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:57,809 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:58,459 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:58,471 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:58,912 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:58,913 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:52:59,466 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:52:59,474 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address Было выполнено взятие и вывод частей записей (0,4 и 5-ое полями/колонками, которыми являются имя пользователя, полное имя и домашний каталог) с загруженного файла: grunt> A = load '/user/cloudera/passwd' using PigStorage(':'); grunt> B = foreach A generate $0, $4, $5 ; grunt> dump B; 2019-09-13 03:57:52,730 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2019-09-13 03:57:53,005 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]} 2019-09-13 03:57:53,835 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2019-09-13 03:57:54,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2019-09-13 03:57:54,063 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2019-09-13 03:57:55,460 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2019-09-13 03:57:57,011 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2019-09-13 03:57:57,492 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2019-09-13 03:57:57,497 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-09-13 03:57:57,498 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2019-09-13 03:58:04,013 [DataStreamer for file /tmp/temp-1957354021/tmp-1301083020/jdo-api-3.0.1.jar] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894) 2019-09-13 03:58:04,550 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7965330412867642196.jar 2019-09-13 03:58:19,256 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7965330412867642196.jar created 2019-09-13 03:58:19,257 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar 2019-09-13 03:58:19,396 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-09-13 03:58:19,473 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-09-13 03:58:19,473 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache 2019-09-13 03:58:19,473 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-09-13 03:58:19,768 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-09-13 03:58:19,777 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 2019-09-13 03:58:19,777 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 03:58:19,878 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2019-09-13 03:58:20,122 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 03:58:23,961 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-09-13 03:58:23,962 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-09-13 03:58:24,134 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-09-13 03:58:24,437 [DataStreamer for file /tmp/hadoop-yarn/staging/cloudera/.staging/job_1568369247673_0001/job.splitmetainfo] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894) 2019-09-13 03:58:24,443 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-09-13 03:58:26,185 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1568369247673_0001 2019-09-13 03:58:30,031 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1568369247673_0001 2019-09-13 03:58:30,456 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568369247673_0001/ 2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1568369247673_0001 2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B 2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4] C: R: 2019-09-13 03:58:30,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1568369247673_0001 2019-09-13 03:58:30,670 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2019-09-13 04:28:04,194 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2019-09-13 04:28:09,874 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 2019-09-13 04:28:10,234 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2019-09-13 04:28:10,555 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.0-cdh5.13.0 0.12.0-cdh5.13.0 cloudera 2019-09-13 03:57:56 2019-09-13 04:28:10 UNKNOWN Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1568369247673_0001 1 0 577 577 577 577 n/a n/a n/a n/a A,B MAP_ONLY hdfs://quickstart.cloudera:8020/tmp/temp-1957354021/tmp195232887, Input(s): Successfully read 52 records (2975 bytes) from: "/user/cloudera/passwd" Output(s): Successfully stored 52 records (1691 bytes) in: "hdfs://quickstart.cloudera:8020/tmp/temp-1957354021/tmp195232887" Counters: Total records written : 52 Total bytes written : 1691 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1568369247673_0001 2019-09-13 04:28:11,018 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2019-09-13 04:28:11,054 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2019-09-13 04:28:11,055 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-09-13 04:28:11,062 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-09-13 04:28:11,153 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-09-13 04:28:11,154 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (root,root,/root) (bin,bin,/bin) (daemon,daemon,/sbin) (adm,adm,/var/adm) (lp,lp,/var/spool/lpd) (sync,sync,/sbin) (shutdown,shutdown,/sbin) (halt,halt,/sbin) (mail,mail,/var/spool/mail) (uucp,uucp,/var/spool/uucp) (operator,operator,/root) (games,games,/usr/games) (gopher,gopher,/var/gopher) (ftp,FTP User,/var/ftp) (nobody,Nobody,/) (dbus,System message bus,/) (vcsa,virtual console memory owner,/dev) (abrt,,/etc/abrt) (haldaemon,HAL daemon,/) (ntp,,/etc/ntp) (saslauth,Saslauthd user,/var/empty/saslauth) (postfix,,/var/spool/postfix) (sshd,Privilege-separated SSH,/var/empty/sshd) (tcpdump,,/) (zookeeper,ZooKeeper,/var/lib/zookeeper) (cloudera-scm,Cloudera Manager,/var/lib/cloudera-scm-server) (rpc,Rpcbind Daemon,/var/cache/rpcbind) (apache,Apache,/var/www) (solr,Solr,/var/lib/solr) (hbase,HBase,/var/lib/hbase) (sentry,Sentry,/var/lib/sentry) (hive,Hive,/var/lib/hive) (hdfs,Hadoop HDFS,/var/lib/hadoop-hdfs) (yarn,Hadoop Yarn,/var/lib/hadoop-yarn) (impala,Impala,/var/lib/impala) (mapred,Hadoop MapReduce,/var/lib/hadoop-mapreduce) (hue,Hue,/usr/lib/hue) (sqoop,Sqoop,/var/lib/sqoop) (flume,Flume,/var/lib/flume-ng) (spark,Spark,/var/lib/spark) (sqoop2,Sqoop 2 User,/var/lib/sqoop2) (oozie,Oozie User,/var/lib/oozie) (mysql,MySQL Server,/var/lib/mysql) (kms,Hadoop KMS,/var/lib/hadoop-kms) (llama,Llama,/var/lib/llama) (httpfs,Hadoop HTTPFS,/var/lib/hadoop-httpfs) (gdm,,/var/lib/gdm) (rtkit,RealtimeKit,/proc) (pulse,PulseAudio System Daemon,/var/run/pulse) (avahi-autoipd,Avahi IPv4LL Stack,/var/lib/avahi-autoipd) (cloudera,,/home/cloudera) (vboxadd,,/var/run/vboxadd) Видно успешное выполнение, количество полученных записей ранов 52-м. Данные команды выполнялись достаточно долго (около получаса). Была успешно сохранена данная информация: grunt> store B into 'userinfo.out'; 2019-09-13 12:45:39,970 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2019-09-13 12:45:40,354 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]} 2019-09-13 12:45:40,568 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2019-09-13 12:45:41,306 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false |