Главная страница

Большие данные.. Отчет по лабораторной работе 1 по дисциплине Большие данные и облачные технологии


Скачать 0.57 Mb.
НазваниеОтчет по лабораторной работе 1 по дисциплине Большие данные и облачные технологии
АнкорБольшие данные
Дата02.06.2020
Размер0.57 Mb.
Формат файлаdocx
Имя файлаBDiOT_lab_1 (1).docx
ТипОтчет
#127576
страница1 из 3
  1   2   3


МИНИСТЕРСТВО ОБРАЗОВАНИЯ И НАУКИ РОССИЙСКОЙ ФЕДЕРАЦИИ

ФГАОУ ВО «Севастопольский государственный университет»

кафедра Информационные системы
Повх Андрей Анатольевич
Институт информационных технологий и управления в технических системах

курс 2 группа ИСм-18-1-о

09.04.02 Информационные системы и технологии (уровень магистра)


ОТЧЕТ

по лабораторной работе №1

по дисциплине «Большие данные и облачные технологии»

на тему «Исследование способов использования экосистемы Apache Hadoop»
Отметка о зачете ____________________ ________

(дата)
Руководитель практикума
Cт. преподаватель Строганов В.А.

(должность) (подпись) (инициалы, фамилия)

Севастополь 2019
1 ЦЕЛЬ РАБОТЫ
Изучить назначение основных компонентов экосистемы Apache Hadoop. Исследовать способов использования экосистемы Apache Hadoop для выборки структурированных данных.

2 ВАРИАНТ ЗАДАНИЯ



Таблица 1 – Варианты заданий

Номер задания

Вариант

Таблица

Условие вывода

1

1

orders

все заказы со статусом “Complete”

2

1

categories

Вывести с использованием функции scan



3 ХОД РАБОТЫ

После предварительных настроек виртуальной машины (была использована версия 5-13 Cloudera Quickstart VM) была произведена попытка экспортировать данные из СУБД в HDFS с помощью утилиты Apache Sqoop по средством команды:

sqoop import-all-tables \
-m 1\
--connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \
--password=cloudera \
--compression-codec=snapy \
--as-avrodatafile \
--warehouse-dir=/user/hive/warehouse



Что привело к следующему результату:

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

19/09/11 12:02:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0

19/09/11 12:02:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

19/09/11 12:02:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

19/09/11 12:02:07 INFO tool.CodeGenTool: Beginning code generation

19/09/11 12:02:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1

19/09/11 12:02:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1

19/09/11 12:02:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-cloudera/compile/885be4e58d989ac5705c6d3cc6cb6f94/categories.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

19/09/11 12:02:19 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/885be4e58d989ac5705c6d3cc6cb6f94/categories.jar

19/09/11 12:02:19 WARN manager.MySQLManager: It looks like you are importing from mysql.

19/09/11 12:02:19 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

19/09/11 12:02:19 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

19/09/11 12:02:19 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

19/09/11 12:02:19 INFO mapreduce.ImportJobBase: Beginning import of categories

19/09/11 12:02:19 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

19/09/11 12:02:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

19/09/11 12:02:22 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: com.cloudera.sqoop.io.UnsupportedCodecException: snapy
Как видно в результате получена ошибка ввода-вывода, которая связана с кодеком сжатия snapy. Так как его использование не принципиально, то произведем попытку экспорта данных без его использования:

sqoop import-all-tables -m 1 --connect "jdbc:mysql://quickstart:3306/retail_db" --username=retail_dba --password=cloudera --as-avrodatafile --warehouse-dir=/user/hive/warehouse

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:02:07 INFO mapreduce.JobSubmitter: number of splits:1

19/09/12 00:02:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568180694054_0002

19/09/12 00:02:08 INFO impl.YarnClientImpl: Submitted application application_1568180694054_0002

19/09/12 00:02:08 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568180694054_0002/

19/09/12 00:02:08 INFO mapreduce.Job: Running job: job_1568180694054_0002

19/09/12 00:02:59 INFO mapreduce.Job: Job job_1568180694054_0002 running in uber mode : false

19/09/12 00:02:59 INFO mapreduce.Job: map 0% reduce 0%

19/09/12 00:03:53 INFO mapreduce.Job: map 100% reduce 0%

19/09/12 00:03:54 INFO mapreduce.Job: Job job_1568180694054_0002 completed successfully

19/09/12 00:03:55 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=172206

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=87

HDFS: Number of bytes written=1032483

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=51787

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=51787

Total vcore-milliseconds taken by all map tasks=51787

Total megabyte-milliseconds taken by all map tasks=53029888

Map-Reduce Framework

Map input records=12435

Map output records=12435

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1166

CPU time spent (ms)=9850

Physical memory (bytes) snapshot=149233664

Virtual memory (bytes) snapshot=1511247872

Total committed heap usage (bytes)=50921472

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1032483

19/09/12 00:03:55 INFO mapreduce.ImportJobBase: Transferred 1,008.2842 KB in 118.8311 seconds (8.485 KB/sec)

19/09/12 00:03:55 INFO mapreduce.ImportJobBase: Retrieved 12435 records.

19/09/12 00:03:55 INFO tool.CodeGenTool: Beginning code generation

19/09/12 00:03:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1

19/09/12 00:03:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/departments.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

19/09/12 00:03:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/departments.jar

19/09/12 00:03:58 INFO mapreduce.ImportJobBase: Beginning import of departments

19/09/12 00:03:58 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

19/09/12 00:03:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1

19/09/12 00:03:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1

19/09/12 00:03:58 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/departments.avsc

19/09/12 00:03:58 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

19/09/12 00:03:58 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:04:00 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:04:06 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:04:08 INFO db.DBInputFormat: Using read commited transaction isolation

19/09/12 00:04:08 INFO mapreduce.JobSubmitter: number of splits:1

19/09/12 00:04:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568180694054_0003

19/09/12 00:04:10 INFO impl.YarnClientImpl: Submitted application application_1568180694054_0003

19/09/12 00:04:10 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568180694054_0003/

19/09/12 00:04:10 INFO mapreduce.Job: Running job: job_1568180694054_0003

19/09/12 00:04:55 INFO mapreduce.Job: Job job_1568180694054_0003 running in uber mode : false

19/09/12 00:04:55 INFO mapreduce.Job: map 0% reduce 0%

19/09/12 00:05:36 INFO mapreduce.Job: map 100% reduce 0%

19/09/12 00:05:38 INFO mapreduce.Job: Job job_1568180694054_0003 completed successfully

19/09/12 00:05:38 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=171312

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=87

HDFS: Number of bytes written=450

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=37974

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=37974

Total vcore-milliseconds taken by all map tasks=37974

Total megabyte-milliseconds taken by all map tasks=38885376

Map-Reduce Framework

Map input records=6

Map output records=6

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=833

CPU time spent (ms)=3600

Physical memory (bytes) snapshot=138702848

Virtual memory (bytes) snapshot=1510150144

Total committed heap usage (bytes)=50921472

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=450

19/09/12 00:05:38 INFO mapreduce.ImportJobBase: Transferred 450 bytes in 99.9828 seconds (4.5008 bytes/sec)

19/09/12 00:05:38 INFO mapreduce.ImportJobBase: Retrieved 6 records.

19/09/12 00:05:38 INFO tool.CodeGenTool: Beginning code generation

19/09/12 00:05:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1

19/09/12 00:05:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/order_items.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

19/09/12 00:05:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/order_items.jar

19/09/12 00:05:42 INFO mapreduce.ImportJobBase: Beginning import of order_items

19/09/12 00:05:42 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

19/09/12 00:05:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1

19/09/12 00:05:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1

19/09/12 00:05:42 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/order_items.avsc

19/09/12 00:05:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

19/09/12 00:05:44 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:05:45 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:05:53 INFO db.DBInputFormat: Using read commited transaction isolation

19/09/12 00:05:53 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:05:53 INFO mapreduce.JobSubmitter: number of splits:1

19/09/12 00:05:54 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:05:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568180694054_0004

19/09/12 00:05:54 INFO impl.YarnClientImpl: Submitted application application_1568180694054_0004

19/09/12 00:05:54 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568180694054_0004/

19/09/12 00:05:54 INFO mapreduce.Job: Running job: job_1568180694054_0004

19/09/12 00:06:40 INFO mapreduce.Job: Job job_1568180694054_0004 running in uber mode : false

19/09/12 00:06:40 INFO mapreduce.Job: map 0% reduce 0%

19/09/12 00:07:32 INFO mapreduce.Job: map 100% reduce 0%

19/09/12 00:07:34 INFO mapreduce.Job: Job job_1568180694054_0004 completed successfully

19/09/12 00:07:34 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=171910

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=87

HDFS: Number of bytes written=3933008

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=48890

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=48890

Total vcore-milliseconds taken by all map tasks=48890

Total megabyte-milliseconds taken by all map tasks=50063360

Map-Reduce Framework

Map input records=172198

Map output records=172198

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=1102

CPU time spent (ms)=11930

Physical memory (bytes) snapshot=148275200

Virtual memory (bytes) snapshot=1511383040

Total committed heap usage (bytes)=50921472

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=3933008

19/09/12 00:07:34 INFO mapreduce.ImportJobBase: Transferred 3.7508 MB in 111.664 seconds (34.3963 KB/sec)

19/09/12 00:07:34 INFO mapreduce.ImportJobBase: Retrieved 172198 records.

19/09/12 00:07:34 INFO tool.CodeGenTool: Beginning code generation

19/09/12 00:07:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `orders` AS t LIMIT 1

19/09/12 00:07:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/orders.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

19/09/12 00:07:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/orders.jar

19/09/12 00:07:37 INFO mapreduce.ImportJobBase: Beginning import of orders

19/09/12 00:07:37 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

19/09/12 00:07:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `orders` AS t LIMIT 1

19/09/12 00:07:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `orders` AS t LIMIT 1

19/09/12 00:07:38 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/orders.avsc

19/09/12 00:07:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

19/09/12 00:07:39 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:07:44 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:07:45 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:07:45 INFO db.DBInputFormat: Using read commited transaction isolation

19/09/12 00:07:46 INFO mapreduce.JobSubmitter: number of splits:1

19/09/12 00:07:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568180694054_0005

19/09/12 00:07:47 INFO impl.YarnClientImpl: Submitted application application_1568180694054_0005

19/09/12 00:07:47 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568180694054_0005/

19/09/12 00:07:47 INFO mapreduce.Job: Running job: job_1568180694054_0005

19/09/12 00:08:35 INFO mapreduce.Job: Job job_1568180694054_0005 running in uber mode : false

19/09/12 00:08:35 INFO mapreduce.Job: map 0% reduce 0%

19/09/12 00:09:22 INFO mapreduce.Job: map 100% reduce 0%

19/09/12 00:09:25 INFO mapreduce.Job: Job job_1568180694054_0005 completed successfully

19/09/12 00:09:25 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=171499

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=87

HDFS: Number of bytes written=1779793

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=45340

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=45340

Total vcore-milliseconds taken by all map tasks=45340

Total megabyte-milliseconds taken by all map tasks=46428160

Map-Reduce Framework

Map input records=68883

Map output records=68883

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=953

CPU time spent (ms)=9510

Physical memory (bytes) snapshot=133357568

Virtual memory (bytes) snapshot=1511342080

Total committed heap usage (bytes)=50921472

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1779793

19/09/12 00:09:25 INFO mapreduce.ImportJobBase: Transferred 1.6973 MB in 107.2342 seconds (16.2082 KB/sec)

19/09/12 00:09:25 INFO mapreduce.ImportJobBase: Retrieved 68883 records.

19/09/12 00:09:25 INFO tool.CodeGenTool: Beginning code generation

19/09/12 00:09:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1

19/09/12 00:09:25 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/products.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

19/09/12 00:09:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/products.jar

19/09/12 00:09:28 INFO mapreduce.ImportJobBase: Beginning import of products

19/09/12 00:09:28 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

19/09/12 00:09:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1

19/09/12 00:09:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1

19/09/12 00:09:28 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/56c17b47fb4edda02e7879974232fc55/products.avsc

19/09/12 00:09:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

19/09/12 00:09:29 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:09:30 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:09:33 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:09:38 INFO db.DBInputFormat: Using read commited transaction isolation

19/09/12 00:09:38 INFO mapreduce.JobSubmitter: number of splits:1

19/09/12 00:09:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568180694054_0006

19/09/12 00:09:39 INFO impl.YarnClientImpl: Submitted application application_1568180694054_0006

19/09/12 00:09:39 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1568180694054_0006/

19/09/12 00:09:39 INFO mapreduce.Job: Running job: job_1568180694054_0006

19/09/12 00:10:24 INFO mapreduce.Job: Job job_1568180694054_0006 running in uber mode : false

19/09/12 00:10:24 INFO mapreduce.Job: map 0% reduce 0%

19/09/12 00:11:05 INFO mapreduce.Job: map 100% reduce 0%

19/09/12 00:11:07 INFO mapreduce.Job: Job job_1568180694054_0006 completed successfully

19/09/12 00:11:07 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=171804

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=87

HDFS: Number of bytes written=175677

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=37548

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=37548

Total vcore-milliseconds taken by all map tasks=37548

Total megabyte-milliseconds taken by all map tasks=38449152

Map-Reduce Framework

Map input records=1345

Map output records=1345

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=905

CPU time spent (ms)=5410

Physical memory (bytes) snapshot=131756032

Virtual memory (bytes) snapshot=1510158336

Total committed heap usage (bytes)=50921472

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=175677

19/09/12 00:11:07 INFO mapreduce.ImportJobBase: Transferred 171.5596 KB in 98.676 seconds (1.7386 KB/sec)

19/09/12 00:11:07 INFO mapreduce.ImportJobBase: Retrieved 1345 records.
Видно что в данном случае операция успешно завершилась и вернула 1345 записей. Также можно заметить, что в данном логе присутствует множество ошибок, но так как они находятся только на уровне предупреждений ими можно пренебречь.

Далее был просмотрен результат:

[cloudera@quickstart

]$ hadoop fs -ls /user/hive/warehouse

Found 6 items

drwxr-xr-x - cloudera supergroup 0 2019-09-12 00:01 /user/hive/warehouse/categories

drwxr-xr-x - cloudera supergroup 0 2019-09-12 00:03 /user/hive/warehouse/customers

drwxr-xr-x - cloudera supergroup 0 2019-09-12 00:05 /user/hive/warehouse/departments

drwxr-xr-x - cloudera supergroup 0 2019-09-12 00:07 /user/hive/warehouse/order_items

drwxr-xr-x - cloudera supergroup 0 2019-09-12 00:09 /user/hive/warehouse/orders

drwxr-xr-x - cloudera supergroup 0 2019-09-12 00:11 /user/hive/warehouse/products

[cloudera@quickstart ]$ hadoop fs -ls /user/hive/warehouse/categories

Found 2 items

-rw-r--r-- 1 cloudera supergroup 0 2019-09-12 00:01 /user/hive/warehouse/categories/_SUCCESS

-rw-r--r-- 1 cloudera supergroup 1534 2019-09-12 00:01 /user/hive/warehouse/categories/part-m-00000.avro
При экспорте созданы также .avsc-файлы со схемами данных в домашнем каталоге:

[cloudera@quickstart ]$ ls -1 *.avsc

categories.avsc

customers.avsc

departments.avsc

order_items.avsc

orders.avsc

products.avsc
Можно заметить, что в отличия от ожидаемых имен данные имена потеряли приставку “sqoop_import_”.

Содержимое одного из файлов:

[cloudera@quickstart ]$ vim categories.avsc

{

"type" : "record",

"name" : "categories",

"doc" : "Sqoop import of categories",

"fields" : [ {

"name" : "category_id",

"type" : [ "null", "int" ],

"default" : null,

"columnName" : "category_id",

"sqlType" : "4"

}, {

"name" : "category_department_id",

"type" : [ "null", "int" ],

"default" : null,

"columnName" : "category_department_id",

"sqlType" : "4"

}, {

"name" : "category_name",

"type" : [ "null", "string" ],

"default" : null,

"columnName" : "category_name",

"sqlType" : "12"

} ],

"tableName" : "categories"

}
Копирование схемы данных в HDFS:

[cloudera@quickstart ]$ sudo -u hdfs hadoop fs -mkdir /user/examples

[cloudera@quickstart ]$ sudo -u hdfs hadoop fs -chmod +rw /user/examples

[cloudera@quickstart ]$ hadoop fs -copyFromLocal /*.avsc /user/examples

19/09/12 00:30:11 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:30:11 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:30:12 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

19/09/12 00:30:12 WARN hdfs.DFSClient: Caught exception

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1281)

at java.lang.Thread.join(Thread.java:1355)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
Можно наблюдать ранее полученные ошибки, но так как они снова на уровне предупреждения, то можно их проигнорировать.

С помощью утилиты Impala были созданы таблицы на основе ранее экспортированных данных:

CREATE EXTERNAL TABLE categories STORED AS AVRO LOCATION 'hdfs:///user/hive/warehouse/categories' TBLPROPERTIES

('avro.schema.url'='hdfs://quickstart/user/examples/categories.avsc');
CREATE EXTERNAL TABLE customers STORED AS AVRO LOCATION 'hdfs:///user/hive/warehouse/customers' TBLPROPERTIES

('avro.schema.url'='hdfs://quickstart/user/examples/customers.avsc');
CREATE EXTERNAL TABLE departments STORED AS AVRO LOCATION 'hdfs:///user/hive/warehouse/departments' TBLPROPERTIES

('avro.schema.url'='hdfs://quickstart/user/examples/departments.avsc');

CREATE EXTERNAL TABLE orders STORED AS AVRO LOCATION 'hdfs:///user/hive/warehouse/orders' TBLPROPERTIES

('avro.schema.url'='hdfs://quickstart/user/examples/orders.avsc');
CREATE EXTERNAL TABLE order_items STORED AS AVRO LOCATION 'hdfs:///user/hive/warehouse/order_items' TBLPROPERTIES

('avro.schema.url'='hdfs://quickstart/user/examples/order_items.avsc');
CREATE EXTERNAL TABLE products STORED AS AVRO LOCATION 'hdfs:///user/hive/warehouse/products' TBLPROPERTIES

('avro.schema.url'='hdfs://quickstart/user/examples/products.avsc');
В качестве примера выполним выборку 10 самых популярных категорий продуктов (рис.1).

  1   2   3


написать администратору сайта