Adding a background on inputs for convolutional neural net? - neural-network

I'm trying to train a neural net using YOLOv2 to recognize characters and objects in a video game. For input data, I took screen shots of in game assets from various angles. However, there are no backgrounds in these character models - only the models themselves. In the game, of course, there will be backgrounds behind the characters.
Will this confuse the neural network? And if so, should I go ahead and find some sample background images from the game and apply them randomly to the input data?

Yes, you should add ingame backgrounds to your models or you will never get a decent quality detection. The network needs to know the background, placement of the objects on the background, even the lighting of the objects in the scene. They all contribute to the final detection quality.
Also the technique you use to blend the background and your images is important.
A good read about the subject: Synthesizing Training Data for Object Detection in
Indoor Scenes

Related

How to simulate multiple sensors by having a 3D object render differently in different cameras

I am trying to work out how to have objects render differently in different cameras.
I have a situation where we have the visible light rendering in the main camera, but the player-controlled objects may have multiple sensors, each represented by a camera.
For example, we may have:
An IR camera which sees a light emitted by the target with a colour based upon the object's temperature, on the IR layer
A radar, which has its own directional light, and sees only that which is the same colour as its own light on the RF layer, and would basically be a rotating vertical slit camera.
A sound sensor, which "sees" the "light" directly emitted by the target object, as well as that reflected off other hard surfaces, on the sound layer.
Radio direction finders, which see all colours of light on the RF layer
An IFF/identification sensor, which sees barcodes on the body of moving objects (The ability of the 2D barcode to be read simulates shape identification with fewer processing resources than a neural network while maintaining uncertainty as to the identity of a newly-seen object until analysed)
We may also have various sensor tricks such as radar ECM, which would be simulated by having false objects placed in the field of view of the "radar camera".
The output from these sensors may be viewed by the players, but is intended to be used by player-implemented AI routines.
My scenario is 3D, but it could also be applied to 2D.
So...
What would be the best way of handling all these different cameras and renderings? I would like to make the game objects as simple as possible, and minimise coding and CPU overhead.
Some renderings will have to change according to whatever the target object is doing - moving or firing a weapon may generate heat and sound, IFF would be complicated by cover, radar could be complicated by radar-absorbent materials, jamming, and shapes that reduce radar cross-section, and so-on.
Edit
These different renderings would need to be available simultaneously, as if looking at the same scene through all of the sensors, and each sensor camera would need to render appropriately.
edit
Given your comment below, it appears to me that the question you intended to ask is somewhat unclear.
To answer the comment:
Set up as many camera as you need, and have each one set to "render to texture".
You will need to create a RenderTexture for each camera.
Now each camera will feed its data in real time to a texture, and you can do with those textures whatever you like.
edit
A simple solution to a handful of the different sensor results you're looking for is to find some shaders that produce the desired effect.
Here are some shaders you could try:
https://github.com/paganini24/ShaderExperiment
The sonic sensor I imagine would be much more difficult to create, but I think a good head start would be to explore Unity's AudioSource and AudioPlayer components. If you can get the volume/pitch of sound bring output and couple that with the position it is emitted from, you could map it to a full screen UI texture.
Not a hard solution to the sonic sensor, but hopefully that points you in the right direction.

Model lost on uniform background surface with ARCamera (Vuforia, Unity)

I'm trying to use Vuforia in Unity to see a model in AR. It is working properly when I'm in a room with lost of different colors, but if I go in a room with one single color (example : white floor, white wall, no furniture), the model keeps disappearing. I'm using Extended tracking with Prediction enabled.
Is there a way to keep the model on screen whatever the background seen by webcam?
Is there a way to keep the model on screen whatever the background seen by webcam??
I am afraid this is not possible. Since vuforia uses Markerless Tracking it requires high contrast on the points.
Since most of AR SDKs only use a monocular RGB camera (not RGB-Depth), they rely on computer vision techniques to calculate missing depth information. It means extracting visual distinct feature points and locating device using estimated distance to these feature points over several frames while you move.
However, they also leverage from sensor fusion which means they combine data gathered from camera and the data from IMU unit(sensors) of the device. Unfortunately, this data is mainly used for complementing when motion tracking fails in situations like excessive motion(when camera image is blurred). Therefore, sensor data itself is not reliable which is the case when you walk into a room where there are no distinctive points to extract.
The only way you can solve this is by placing several image targets in that room. That will allow Vuforia to calculate device position in 3D space. Otherwise this is not possible.
You can also refer to SLAM for more information.

How to find contours of soccer player in dynamic background

My project is a designation for a system which analyze soccer videos. In a part of this project I need to detect contours of players and everybody in the play field. For all players which don’t have occlusion with the advertisement billboards, I have used the color of play field (green) to detect contours and extract players. But I have problem with the situation that players or referee have occlusion with the advertisement billboards. Suppose a situation that advertisements on the billboards are dynamic (LED billboards). As you know in this situation finding the contours is more difficult because there is no static background color or texture. You can see two example of this condition in the following images.
NOTE: in order to find the position of occlusion, I use the region between the field line and advertisement billboards, because this region has the color of the field (green). This region is shown by a red rectangular in the following image.
I expect the result be similar to the following image.
Could anyone suggest an algorithm to detect these contours?
You can try several things.
Use vision.PeopleDetector object to detect people on the field. You can also track the detected people using vision.KalmanFilter, as in the Tracking Pedestrians from a Moving Car example.
Use vision.OpticalFlow object in the Computer Vision System Toolbox to compute optical flow. You can then analyze the resulting flow field to separate camera motion from the player motion.
Use frame differencing to detect moving objects. The good news is that that will give you the contours of the people. The bad news is that it will give you many spurious contours as well.
Optical Flow would work for such problems as it captures motion information. Foreground extraction techniques using HMM or GMM or non-parametric may solve the problem as I have used for motion analysis in surveillance videos to detect anomaly (Background was static). Magnitude and orientation of optical flow seems to be an effective method. I have read papers on segmentation using optical flow. I hope this may help you.

iphone 2d sprites

I am looking at building a 2d game for the iphone. I am using the cocos2d framework to build the game. However, I am not very good with graphics so I was hoping there were some good repositories out there for some free 2d sprites that are open source. I searched around but most of the articles are 2 years older or more. Does anyone have places they go to get 2d graphics for games? Also could I use 3d graphics in a 2d game and if so any resources for 3d graphics would be nice to.
I went down this road several times before. It is not fruitful. You will spend a lot of time trying to find free sprites. You amass lots of sprites all of which don't really fit what you need in terms of: looks, size, transparency, image format, shape, and what not. You'll waste time converting, scaling, filtering and otherwise mangling with these images. Still, the end result is nothing but a gross mashup of graphic styles.
As a game programmer with no artist, it's your job to define the size and shape of the images used in your game. An artist can later fill these out perfectly.
You'll be much better off to simply use dummy graphics which may not be more than a color gradient, a circle, an X, etc. But at least they're the correct size, shape and format. In particular size and shape will ultimately define how the game plays. You don't want that to be defined by whatever "free sprites" you can find.

quartz 2d / openGl / cocos2d image distortion in iphone by moving vertices for 2.5d iphone game

We are trying to achieve the following in an iphone game:
Using 2d png files, set-up a scene that seems 3d. As the user moves the device, the individual png files would warp/distort accordingly to give the effect of depth.
example of a scene: an empty room, 5 walls and a chair in the middle. = 6 png files layered.
We have successfully accomplished this using native functions like skew and scale. By applying transformations to the various walls and the chair, as the device is tilted moved, the walls would skew/scale/translate . However, the problem is since we are using 6 png files, the edges dont meet as we move the device. We need a new solution using a real engine.
Question:
we are thinking of instead of applying skew/scale transformations, that if given the freedom to move the vertices of the rectangular images, we could precisly distort images and keep all the edges 100% aligned.
What is the best framework to do this in the LEAST amount of time? Are we going about this the correct way?
You should be able to achieve this effect (at least in regards to the perspective being applied to the walls) using Core Animation layers and appropriate 3-D transforms.
A good example of constructing a scene like this can be found in the example John Blackburn provides here. He shows how to set up layers to represent the walls in a maze by applying the appropriate rotation and translation to them, then gives the scene perspective by using the trick of altering the m34 component of the CATransform3D for the scene.
I'm not sure how well your flat chair would look using something like this, but certainly you can get your walls to have a nice perspective to them. Using layers and Core Animation would let you pull off what you want using far less code than implementing this using OpenGL ES.
Altering the camera angle is as simple as rotating the scene in response to shifts in the orientation of the device.
If you're going to the effort of warping textures as they would be warped in a 3D scene, then why not let the graphics hardware do the hard work for you by mapping the textures to 3D polygons, then changing your projection or moving polygons around?
I doubt you could do it faster by restricting yourself to 2D transformations --- the hardware is geared up to do 3x3 (well, 4x4 homogenous) matrix multiplication.