sync; echo 3 > /proc/sys/vm/drop_caches
Cloudflare cache
Digitalocean cdn cache
lscache folder inside remove
although I have done all of the operations I mentioned above, I cannot see the current html file.
I also deleted the history of my computer.
Related
On one of the accounts we use on a cluster there is a hidden folder in the home directory:
/home/user/.felix/
This contains a huge number of directories:
[user#gateway .felix]$ ls | head -10
osgi-cache1050e0f4_15774cb91f4_-7ffe
osgi-cache-1063880a_15289337854_-7ffe
osgi-cache-10716929_155ac249b99_-7ffe
osgi-cache-1076af32_1567b76f77c_-7ffe
osgi-cache10fdd858_15288297a76_-7ffe
osgi-cache1145761a_1567b157a97_-7ffe
osgi-cache-1158de5c_15775794758_-7ffe
osgi-cache-117b5c79_1577655ca87_-7ffe
osgi-cache-1188faa3_154532959fc_-7fff
osgi-cache11bf2822_1528906f443_-7ffe
In each of these folders:
osgi-cache-37166e7_1545cb3b7e0_-7ffe/bundle10
[user#gateway bundle4]$ cat bundle.location
reference:file:/gpfs22/local/centos6/matlab/2013a/java/jar/toolbox/bioinfo.jar
So I'm thinking these files are created by matlab somehow.
This .felix folder contains about ~150k files which is causing us to go over our quota of 300k files. Is there a way to:
disable the creation of these files
clean them up in a safe way (maybe a cron)
possible move the location of where these files are created?
Technically its the apache-felix bundle cache (http://felix.apache.org/documentation/subprojects/apache-felix-framework/apache-felix-framework-usage-documentation.html) and I'm afraid there's no safe way to remove any of these without contacting the user (even when migrating the path).
I noticed that Matlab is creating about 7k files in /tmp/.felix. The space usage is pretty minimal (184k). I was able to delete them by:
find /tmp/.felix -user <my username> -exec rm -r {} \;
But when I run my Matlab code it recreates many (all?) of the files. So at least in the Matlab usage case it seems relatively safe to delete them, but I could imagine there being problems if this info is actively being updated.
Digging into the Felix docs a bit (mentioned in answer), I google "Felix bundle cache", and find that this is used to store pointers to Java jar files, and perhaps to state as well. There are indeed parameters that you can configure to control the location and flushing of this cache. configuring Felix bundle cache
Mathworks also has Matlab specific suggestions. In the case mentioned there, this seemed to be triggered by plotting. Names in the stack trace there suggest it may have to do with implementation of key bindings (keyboard shortcuts).
Rob
I'm a bit confused as to how setting a source works for a script inside an HTML file.
Whenever I use a script, I set the source to something along the lines of:
http://localhost:8080/module_name/module.js
However, when I go through the directory of my server, the location is actually something along the lines of:
modules/module_version2.0/module_name/module.js
How is the client accessing the source file when portions of the directory are emitted?
I think what is throwing you off is the configuration and root of the web-server/website versus the file system of the server.
The URL being read by your browser/client will lead the browser/client to the web-server located at localhost:8080 and the web-server will supply the browser/client with information from that website's root directory for any files/pages that are referenced within the html being used via the tags.
The method of configuring this direction/reference to the root or the document tree of the web-server, varies based on OS and web server setup, but is usually held in a files like httpd.conf; which is the file for Apache.
These configuration files can also have different file locations and aliases setup to create relative paths: i.e. you could have pointers for different locations within the web-server's document tree.
The path example from your server is coming from the perspective of a user account logged in to a machine using some method of exploring the systems local file structure. As such the root directory of that system is the lowest location in the file structure and therefore will show the full path of all the files and directories on that machine.
Where as the web-server/website's root directory in most likely located further away from the systems root.
Hope this helps.
I want to start by thanking you all for your help ahead of time, as this will help clear up a detail left out on the readthedocs.io guide. What I need is to compress several files into a single gzip, however, the guide shows only how to compress a list of files as individual gzipped file. Again, I appreciate any help as there is very few resources and documentation for this set up. (If there is some extra info, please include links to sources)
After I had set up the grid engine, I ran through the samples in the guide.
Am I right in assuming there is not a script for combining multiple files into one gzip using grid-computing-tools?
Are there any solutions on the Elasticluster Grid Engine setup to compress multiple files into 1 gzip?
What changes can be made to the grid-engine-tools to make it work?
EDIT
The reason we are considering a cluster is that we do expect multiple operations occurring simultaneously, zipped up files per order, which will occur systematically so that a vendor can download a single compressed file per order.
May I state the definition of the problem and you can let me know if I understood it correctly, as both Matt and I provided the exact same solution and somehow it doesn't seem sufficient.
Problem Definition
You have an Order defining the start of a task to process some data.
The processing of data would be split among several compute nodes, each producing a resulting file stored on GS directories.
The goal is:
Collect the files from GS bucket (that were produced by each of the nodes),
Archive the collection of files as one file,
Then compress that archive, and
Push it back to a different GS location.
Let me know if I summarized it properly,
Thanks,
Paul
Are the files in question in Cloud Storage?
Are the files in question on a local or network drive?
In your description, you indicate "What I need is to compress several files into a single gzip". It isn't clear to me that a cluster of computers is needed for this. It sounds more like you just want to use tar along with gzip.
The tar utility will create an archive file it can compress it as well. For example:
$ # Create a directory with a few input files
$ mkdir myfiles
$ echo "This is file1" > myfiles/file1.txt
$ echo "This is file2" > myfiles/file2.txt
$ # (C)reate a compressed archive
$ tar cvfz archive.tgz myfiles/*
a myfiles/file1.txt
a myfiles/file2.txt
$ # (V)erify the archive
$ tar tvfz archive.tgz
-rw-r--r-- 0 myuser mygroup 14 Jul 20 15:19 myfiles/file1.txt
-rw-r--r-- 0 myuser mygroup 14 Jul 20 15:19 myfiles/file2.txt
To extract the contents use:
$ # E(x)tract the archive contents
$ tar xvfz archive.tgz
x myfiles/file1.txt
x myfiles/file2.txt
UPDATE:
In your updated problem description, you have indicated that you may have multiple orders processed simultaneously. If the frequency in which results need to be tar-ed is low, and providing the tar-ed results is not extremely time-sensitive, then you could likely do this with a single node.
However, as the scale of the problem ramps up, you might take a look at using the Pipelines API.
Rather than keeping a fixed cluster running, you could initiate a "pipeline" (in this case a single task) when a customer's order completes.
A call to the Pipelines API would start a VM whose sole purpose is to download the customer's files, tar them up, and push the resulting tar file into Cloud Storage. The Pipelines API infrastructure does the copying from and to Cloud Storage for you. You would effectively just need to supply the tar command line.
There is an example that does something similar here:
https://github.com/googlegenomics/pipelines-api-examples/tree/master/compress
This example will download a list of files and compress each of them independently. It could be easily modified to tar the list of input files.
Take a look at the https://github.com/googlegenomics/pipelines-api-examples github repository for more information and examples.
-Matt
So there are many ways to do it, but the thing is that you cannot directly compress on Google Storage a collection of files - or a directory - into one file, and would need to perform the tar/gzip combination locally before transferring it.
If you want you can have the data compressed automatically via:
gsutil cp -Z
Which is detailed at the following link:
https://cloud.google.com/storage/docs/gsutil/commands/cp#changing-temp-directories
And the nice thing is that you retrieve uncompressed results from compressed data on Google Storage, because it has the ability to perform Decompressive Transcoding:
https://cloud.google.com/storage/docs/transcoding#decompressive_transcoding
You will notice on the last line in the following script:
https://github.com/googlegenomics/grid-computing-tools/blob/master/src/compress/do_compress.sh
The following line will basically copy the current compressed file to Google Cloud Storage:
gcs_util::upload "${WS_OUT_DIR}/*" "${OUTPUT_PATH}/"
What you will need is to first perform the tar/zip on the files in the local scratch directory, and then gsutil copy the compressed file over to Google Storage, but make sure that all the files that need to be compressed are in the scratch directory before starting to compress them. Most likely you would need to SSH copy (scp) them to one of the nodes (i.e. master), and then have the master tar/gzip the whole directory before sending it over to Google Storage. I am assuming each GCE instance has its own scratch disk, but the "gsutil cp" transfer is very fast when working on GCE.
Since Google Storage is fast at data transfers with Google Compute instances, the easiest second option to pursue is to mark out lines 66-69 in the do_compress.sh file:
https://github.com/googlegenomics/grid-computing-tools/blob/master/src/compress/do_compress.sh
This way no compression happens, but the copy happens on the last line via gsutil::upload, in order to have all the uncompressed files transferred to the same Google Storage bucket. Then using "gsutil cp" from the master node you would copy them back locally, in order to compress them locally via tar/gz and then copy the compressed directory file back to the bucket using "gsutil cp".
Hope it helps but it's tricky,
Paul
I'm attempting to create a mirror of a WordPress site with clean URLs (i.e. http://example.org/foo not http://example.org/foo.php). When Wget mirrors the site, it gives all pages and links a ".html" extension (i.e. http://example.org/foo.html).
Is it possible to set options for Wget to create a clean URL structure, so that the mirrored file corresponding to the page "http:example.org/foo" would be "/foo/index.html" and the link to that page would be "http:example.org/foo"? If so, how?
If I understand your question correctly, you're asking for what is the default behaviour of Wget.
Wget will only add the extension to the local copy, if the --adjust-extension option has been passed to it. Quoting the man page for Wget:
--adjust-extension
If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the
local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good
use for this is when you're downloading CGI-generated materials. A URL like http://example.com/article.cgi?25 will be saved as article.cgi?25.html.
However, what you seem to be asking for, that Wget saves example.org/foo as /foo/index.html is actually the default option. If you're seeing some other output, you should post the complete output of Wget with the --debug switch.
I'm having a strange problem with making a bucket on google cloud storage open to the public (as a static website).
I created a bucket called fixparser.targetcompid.com. I followed google's procedure of adding an identifying html file to my existing host.
I am able to copy my htmls/css/js/etc. into the bucket and even view the index.html page when I provide the full url:
http://commondatastorage.googleapis.com/fixparser.targetcompid.com/index.html
However, I can't get the index file if I only provide the general website address:
http://commondatastorage.googleapis.com/fixparser.targetcompid.com
Following is what I see when I set MainPageSuffix and NotFoundPage:
$ ./gsutil setwebcfg -m index.html -e 404.html gs://fixparser.targetcompid.com
Setting website config on gs://fixparser.targetcompid.com/...
$ ./gsutil setwebcfg -m index.html -e 404.html gs://fixparser.targetcompidxxxx.com
Setting website config on gs://fixparser.targetcompidxxxx.com/...
GSResponseError: status=404, code=NoSuchBucket, reason=Not Found.
$ ./gsutil getwebcfg gs://fixparser.targetcompid.com
Getting website config on gs://fixparser.targetcompid.com/...
-WebsiteConfiguration>
-MainPageSuffix>
index.html
-/MainPageSuffix>
-NotFoundPage>
404.html
-/NotFoundPage>
-/WebsiteConfiguration>
(I don't know how else to format this xml snippet)
The Google Cloud Storage website configuration will only affect requests directed to CNAME aliases of c.storage.googleapis.com. In this particular example, you probably want to set up a CNAME alias for fixparser.targetcompid.com to point to c.storage.googleapis.com. Once you do that, opening http://fixparser.targetcompid.com will load the index.html page you set up with the gsutil setwebcfg command.
Mike Schwartz, Google Cloud Storage Team