{"id":464,"date":"2019-02-15T11:01:52","date_gmt":"2019-02-15T11:01:52","guid":{"rendered":"http:\/\/www.wis.co.uk\/blog\/?p=464"},"modified":"2019-03-26T16:23:17","modified_gmt":"2019-03-26T16:23:17","slug":"storage-capacity","status":"publish","type":"post","link":"https:\/\/www.wis.co.uk\/blog\/storage-capacity","title":{"rendered":"AWS Storage capacity"},"content":{"rendered":"\n<p>I remember when I thought a gigabyte was a lot. I bought an external 1GB hard disk in 1995, and filled it up in no time at all. As geeky things go, it was pretty exciting.<\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/hadoop.apache.org\/\" target=\"_blank\">Hadoop<\/a> is designed for petabyte-scale data processing. Hadoop&#8217;s filesystem, HDFS, has <a rel=\"noreferrer noopener\" aria-label=\"a set of linux-like commands (opens in a new tab)\" href=\"https:\/\/hadoop.apache.org\/docs\/r3.1.2\/hadoop-project-dist\/hadoop-hdfs\/HDFSCommands.html\" target=\"_blank\">a set of linux-like commands<\/a> for filesystem interaction.. For example, this command lists the files in the hdfs directory shown:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>hdfs dfs -ls \/user\/justin\/dataset3\/<\/code><\/pre>\n\n\n\n<p>Like Linux, <a href=\"https:\/\/hadoop.apache.org\/docs\/r3.1.2\/hadoop-project-dist\/hadoop-common\/FileSystemShell.html\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"other commands exist  (opens in a new tab)\">other commands exist <\/a>for finding information about file and filesystem usage. For example, the <code>du<\/code> command gives the amount of space used up by files in that directory:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>hdfs dfs -du \/user\/justin\/dataset3\n1147227 part-m-00000<\/code><\/pre>\n\n\n\n<p>and the <code>df<\/code> command shows the <a rel=\"noreferrer noopener\" aria-label=\"capacity and free space of the filesystem (opens in a new tab)\" href=\"https:\/\/hadoop.apache.org\/docs\/r3.1.2\/hadoop-project-dist\/hadoop-common\/FileSystemShell.html#df\" target=\"_blank\">capacity and free space of the filesystem<\/a>. The -h option displays the output in human-readable format, instead of a very long number.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>hdfs dfs -df -h \/\nFilesystem              Size     Used     Available  Use%\nhdfs:\/\/local:8020       206.8 G  245.6 M    205.3 G   0%<\/code><\/pre>\n\n\n\n<p>The hdfs command <a rel=\"noreferrer noopener\" aria-label=\"also supports other filesystems (opens in a new tab)\" href=\"https:\/\/hadoop.apache.org\/docs\/r3.1.2\/hadoop-project-dist\/hadoop-common\/FileSystemShell.html#usage\" target=\"_blank\">also supports other filesystems<\/a>. You can use it to report on the local filesystem instead of hdfs, or other filesystems for which it has a suitable driver. Object storage systems such as <a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Amazon S3 (opens in a new tab)\">Amazon S3<\/a> and <a rel=\"noreferrer noopener\" aria-label=\"Openstack Swift (opens in a new tab)\" href=\"https:\/\/wiki.openstack.org\/wiki\/Swift\" target=\"_blank\">Openstack Swift<\/a> are also supported, so you can do this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>hdfs dfs -ls file:\/\/var\/www\/html    the local filesystem\nhdfs dfs -df -h s3:\/\/dataset4\/      an Amazon s3 bucket called dataset4.<\/code><\/pre>\n\n\n\n<p>Here is a screenshot showing the results of doing just that (from within an Amazon EMR cluster node).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"631\" height=\"99\" src=\"http:\/\/www.wis.co.uk\/blog\/wp-content\/uploads\/2019\/02\/hdfs-df-s3.png\" alt=\"\" class=\"wp-image-467\" srcset=\"https:\/\/www.wis.co.uk\/blog\/wp-content\/uploads\/2019\/02\/hdfs-df-s3.png 631w, https:\/\/www.wis.co.uk\/blog\/wp-content\/uploads\/2019\/02\/hdfs-df-s3-300x47.png 300w\" sizes=\"(max-width: 631px) 100vw, 631px\" \/><\/figure>\n\n\n\n<p>It suggests that the available capacity of this s3 bucket is 8.0 <a rel=\"noreferrer noopener\" aria-label=\"Exabytes (opens in a new tab)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Exabyte\" target=\"_blank\">Exabytes<\/a>. This is the first time I&#8217;ve ever seen an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Metric_prefix\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"exa SI prefix (opens in a new tab)\">exa SI prefix<\/a> for a disk capacity command. As geeky things go, it&#8217;s pretty exciting. <\/p>\n\n\n\n<p>I assume this is just a reporting limit set in the s3 driver, and that the actual capacity of s3 is higher. AWS claim the <a rel=\"noreferrer noopener\" aria-label=\"capacity of s3 is unlimited (opens in a new tab)\" href=\"https:\/\/aws.amazon.com\/s3\/faqs\/\" target=\"_blank\">capacity of s3 is unlimited<\/a> (though each object is limited to 5TB).  AWS is <a rel=\"noreferrer noopener\" aria-label=\"constantly expanding (opens in a new tab)\" href=\"https:\/\/www.zdnet.com\/article\/cloud-computing-aws-bumps-up-its-data-centre-capacity-again\/\" target=\"_blank\">constantly<\/a> <a rel=\"noreferrer noopener\" aria-label=\"expanding (opens in a new tab)\" href=\"https:\/\/aws.amazon.com\/new\" target=\"_blank\">expanding<\/a>, so it is safe to assume that AWS must be adding capacity to s3 all the time.<\/p>\n\n\n\n<p>The factor limiting how much data you can store in s3 will be your wallet. The cost of using s3 is charged per GB per month. <a rel=\"noreferrer noopener\" aria-label=\"standard S3 price (opens in a new tab)\" href=\"https:\/\/aws.amazon.com\/s3\/pricing\/?sc_channel=PS&amp;sc_campaign=acquisition_UK&amp;sc_publisher=google&amp;sc_medium=s3_b&amp;sc_content=sitelink&amp;sc_detail=%2Bs3%20%2Bpricing&amp;sc_category=s3&amp;sc_segment=pricing&amp;sc_matchtype=b&amp;sc_country=UK&amp;s_kwcid=AL!4422!3!285292284150!b!!g!!%2Bs3%20%2Bpricing&amp;ef_id=EAIaIQobChMIweDL-rq94AIVjrztCh0M5wsYEAAYASABEgKrh_D_BwE:G:s\" target=\"_blank\">Prices&nbsp;vary by region,<\/a> and start at 2.1c\/GB per month (e.g. Virginia, Ohio or Ireland). For large-scale data, and for infrequent access, prices drop to around 1c\/GB. Assuming you don&#8217;t want to do anything with your massive data-hoard. <\/p>\n\n\n\n<p>Using &#8220;<a href=\"https:\/\/aws.amazon.com\/s3\/storage-classes\/?nc=sn&amp;loc=3\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"one-zone-IA (opens in a new tab)\">one-zone-IA<\/a>&#8221; (1c\/GB\/month), it will cost US$ 86 MILLION a month to store 8 EB of data, plus support, plus tax. If you want to do anything useful with the data, a different storage class might be more appropriate, and you should also expect significiant cost for processing.  <\/p>\n\n\n\n<p style=\"text-align:right\"><em>Justin &#8211; February 2019<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I remember when I thought a gigabyte was a lot. I bought an external 1GB hard disk in 1995, and filled it up in no time at all. As geeky things go, it was pretty exciting. Hadoop is designed for &hellip; <a href=\"https:\/\/www.wis.co.uk\/blog\/storage-capacity\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[50,40],"_links":{"self":[{"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/posts\/464"}],"collection":[{"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=464"}],"version-history":[{"count":11,"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/posts\/464\/revisions"}],"predecessor-version":[{"id":476,"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/posts\/464\/revisions\/476"}],"wp:attachment":[{"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=464"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=464"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wis.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=464"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}