{"id":2035,"date":"2024-01-17T11:43:49","date_gmt":"2024-01-17T02:43:49","guid":{"rendered":"https:\/\/www.kwonline.org\/memo2\/?p=2035"},"modified":"2024-02-19T13:25:36","modified_gmt":"2024-02-19T04:25:36","slug":"connect-s3-from-apache-spark","status":"publish","type":"post","link":"https:\/\/www.kwonline.org\/memo2\/2024\/01\/17\/connect-s3-from-apache-spark\/","title":{"rendered":"Spark \u304b\u3089 S3 \u306b\u30a2\u30af\u30bb\u30b9\u3059\u308b"},"content":{"rendered":"<p>&nbsp;<br \/>\nSpark \u304b\u3089 S3 \u306e\u30d5\u30a1\u30a4\u30eb\u3092\u958b\u304d\u305f\u304b\u3063\u305f\u306e\u3067\u30e1\u30e2<\/p>\n<p>~\/.profile \u306b AWS Access Key \u3068 Secret \u3092\u8ffd\u52a0\u3059\u308b\u3002<br \/>\n\u4e0b\u8a18\u306e replace_here \u306f\u6b63\u3057\u3044\u30ad\u30fc\u3068\u30b7\u30fc\u30af\u30ec\u30c3\u30c8\u306b\u66f8\u304d\u63db\u3048\u308b\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n# vim ~\/.profile\r\nexport AWS_ACCESS_KEY_ID=&quot;replace_here&quot;\r\nexport AWS_SECRET_ACCESS_KEY=&quot;replace_here&quot;\r\n<\/pre>\n<p>Pyspark \u5b9f\u884c\u6642\u306b\u4f9d\u5b58\u30e9\u30a4\u30d6\u30e9\u30ea\u3092\u6307\u5b9a\u3059\u308c\u3070\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u3066\u304f\u308c\u308b\u3002<br \/>\n\u4f7f\u3063\u3066\u308b Spark\/Hadoop \u306e\u30d0\u30fc\u30b8\u30e7\u30f3\u306b\u5408\u308f\u305b\u3066\u4e0b\u8a18\u3092\u5b9f\u884c\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\npyspark --packages org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.637\r\n<\/pre>\n<p>\u5fc5\u8981\u306a jar \u30d5\u30a1\u30a4\u30eb\u306e\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u304c\u59cb\u307e\u308b\u306e\u3067\u3057\u3070\u3089\u304f\u5f85\u3063\u305f\u3089\u4f7f\u3048\u308b\u3002<\/p>\n<p>\u3067\u3001\u30b3\u30fc\u30c9\u3092\u5b9f\u884c<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom pyspark.sql import SparkSession\r\n\r\nspark = SparkSession.builder \\\r\n    .appName(&quot;ReadFromS3&quot;) \\\r\n#    .config(&quot;spark.hadoop.fs.s3a.access.key&quot;, &quot;replace_here&quot;) \\ \r\n#    .config(&quot;spark.hadoop.fs.s3a.secret.key&quot;, &quot;replace_here&quot;) \\\r\n    .getOrCreate()\r\n\r\ndf = spark.read.csv(&quot;s3a:\/\/orenomemo-s3-test\/orders.csv&quot;, header=True, inferSchema=True)\r\ndf.show()\r\n<\/pre>\n<p>\u4e0a\u8a18\u306e <strong>spark.hadoop.fs.s3a.access.key<\/strong> \u3068 <strong>spark.hadoop.fs.s3a.secret.key<\/strong> \u306f\u74b0\u5883\u5909\u6570\u3067\u5ba3\u8a00\u3057\u3066\u308b\u304b\u3089\u306a\u304f\u3082OK.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Spark \u304b\u3089 S3 \u306e\u30d5\u30a1\u30a4\u30eb\u3092\u958b\u304d\u305f\u304b\u3063\u305f\u306e\u3067\u30e1\u30e2 ~\/.profile \u306b AWS Access Key \u3068 Secret \u3092\u8ffd\u52a0\u3059\u308b\u3002 \u4e0b\u8a18\u306e replace_here \u306f\u6b63\u3057\u3044\u30ad\u30fc\u3068\u30b7\u30fc\u30af\u30ec\u30c3\u30c8 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25,21,8,29,10],"tags":[],"class_list":["post-2035","post","type-post","status-publish","format-standard","hentry","category-aws","category-data-engineering","category-linux","category-python","category-spark"],"_links":{"self":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts\/2035","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/comments?post=2035"}],"version-history":[{"count":4,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts\/2035\/revisions"}],"predecessor-version":[{"id":2105,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts\/2035\/revisions\/2105"}],"wp:attachment":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/media?parent=2035"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/categories?post=2035"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/tags?post=2035"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}