Image Annotation for Large Dataset - annotations

I have this huge data set for which I have only taken a sample from that data to show you,now as you can see it has two class cat and dog where in the training data i have to label it manually since the cat and dog images are mixed, so is there any alternative way to do it.I have to annotate this then only i can train as to if whether cat or dog.

One possible solution is to upload your dataset to labelbox (link: https://www.labelbox.com/) there you are able to annotate your dataset and then download the results for instance as a JSON file. The web page correlates your images with the labels and then you can use those informations for your work.

Related

How to use VTK to efficiently write time-varying field data on a fixed mesh?

I am working on physics simulation research. I have a large fixed grid in one of my projects that does not vary with time. The fields on the grid, on the other hand, vary with time in the simulation. I need to use VTK to record the field data in each step for visualization (Paraview).
The method I am using is to write a separate *.vtu file to disk at each time step. This basically serves the purpose, but actually writes a lot of duplicate data (re-recording the geometry of the mesh at each step), which not only consumes more disk space, but also wastes time on encoding and parsing.
I would like to have a way to write the mesh information only once, and the rest of the time only new field data is written, while being able to guarantee the same visualization. Please let me know if VTK and Paraview provide such an interface and how to implement it.
Using .pvtu and refer to the same .vtu as Piece for each step should do the trick.
See this similar post on the ParaView discourse, and the pvtu doc
EDIT
This seems to be a side effect of the format, this is not supported by the writer.
The correct solution is to use another file format ...
Let me provide my own research findings for reference.
As Nico said, with the combination of pvtu/vtu files, we could theoretically implement a geometry structure stored in a separate vtu file, referenced by a pvtu file. Setting the NumberOfPieces attribute of the ptvu file to 1 would enable the construction of only one separate vtu file.
However, the VTK library does not expose a dedicated operation interface to control the writing process of vtu files. No matter how it is set, as long as the writer's input contains geometry structures, the writer will write geometry information to disk, and this process cannot be skipped through the exposed interface.
However, it is indeed possible to make multiple pvtu files point to the same vtu file by manually editing the piece node in the ptvu file, and paraview can recognize and visualize such a file group properly.
I did not proceed to try adding arrays to the unstructured grid and using pvtu output.
So, I think the conclusion is.
if you don't want to dive into VTK's library code and XML implementation, then this approach doesn't make sense.
if you are willing to write a series of files, delete most of them from the vtu file, and then point all the pvtu's piece nodes to the only surviving vtu file by editing the pvtu file, you can save a lot of disk space, but will not shorten the write, read, and parse times.
If you implement an XML writer by yourself, you can achieve all the requirements in theory, but it requires a lot of coding work.

Literature for Classification Problem with Changing Classes

I am currently looking at a text classification problem (say N classes), for which labeled training data exists. Now, ocasionally, a new class is created and some of the labels in the "old" training data become wrong because they now should have the new class label. So the new class recruits from the old classes.
We can assume that we have some new labeled data for the new class, or even that from an input stream of new data we eventually obtain the correct labels by human verification (the goal, however, is to require as few manual corrections as possible).
How to set up a classifier that may face new "recruiting" classes from time to time? Are you aware of approaches/literature for the specific setting described above?
Perhaps, basic strategies may include
trying to relabel the training data and re-train,
using incremental classifiers (e.g., KNN)

DeepLearning4J - Acquiring Data and Train Model

I try to create the easiest of a NeuralNetwork and training it with some data:
Therefore I created a test.csv with a the following pattern:
number,number+1;
number2,number2+1
...
I try to make a linear regression with the network...
But I do not find a way to acquire the data, DataSetIterator does not work.
How to fit the Data, how to test the Data?
In our examples, we encourage people to use datavec + recordreaderdatasetiterator.
Datavec has all of the various data loading components.
I'm not sure what you mean about "datasetiterator not working" wihtout seeing any code, but it seems like you didn't really look at our examples.
In there are multiple examples of a csv record reader you can use for both regression and classification use cases.
Consider reorienting your data pipeline to use those.
Those examples are always found here:
https://github.com/deeplearning4j/dl4j-examples
If you follow any of those, the same pattern emerges:
Record reader for whatever data format -> RecordReaderDataSetIterator
The iterator allows you to specify common constructors such as whether it is a regression or not, which column your label is etc.

How do I obtain a hash of the payload of a digital photo container, ideally in Java?

I have edited EXIF properties on digital pictures, and would like to be able to identify them as identical. I believe this implies extracting the payload stream and computing a hash. What is the best way to do this, ideally in the Java language, most ideally in Java using a native implementation for performance.
JPEG files are a series of 'segments'. Some contain image data, others don't.
Exif data is stored in the APP1 segment. You could write some code to compare the other segments, and see if they match. A hash seems like a reasonable approach here. For example, you might compare a hash of only the SOI, DQT or DHT segments. You'd need to experiment to see which of these gives the best result.
Check out the JpegSegmentReader class from my metadata-extractor library.
Java: https://github.com/drewnoakes/metadata-extractor
.NET: https://github.com/drewnoakes/metadata-extractor-dotnet
With that class you can pull out specific segment(s) from a JPEG for processing.
Let us know how you get on!

data mining project Dilemma

I research a set of data, consisting of two data files:
The first contains user id id artists and ranking of users for artists that want to rank.
The second data file contains id and name artists
I have chosen research question which is:
Is the artist is Popular or not?
In other words,by given the new singer, who will not found in the data file, using algorithms, we will classify it as an artist and to know if it is a popular or not.
For Prediction step I chose to use logistic regression method
But my problem is earlier.
I do not know how, technically, to determine who from the existing data will be defined as successful as an artist who is unsuccessful.
I thought of some methods, for example:k-means with k=2 (but in this method i have a problem with function disance),knn with k=2 etc.
I need guidance ,refers to how i will make to clustering to the Existing data
and general tips to the project.
thank you.