I'm currently working of a project involving bots that crawl facebook to create a "neural map" of the connection between users and pages and i had some worries about legality.
Considering:
no information is saved or distributed from profiles
bots are polite (no spam, human-like delays between requests, etc)
the final outcome will be more of a visual representation of the
network
I've done a lot of researches around this problematic, but every question/answer revolves around saving data and making it publicly availlable while i do not save any data nor distribute any data.
Could i get in hot water for doing such project? Or at worst will i get asked to stop?
Thanks a lot!
Related
I am pretty much sure that if you look carefully at any friend's timeline profile you can easily predict what going on in his/her life, Even you can write his/her entire life, you can also find out the hidden fact which he/she never told or updated directly but indirectly he/she shared n liked related thing which will help you to analyze his/her activity. Is it anyway possible to build an automated system which can read n analyze friends entire facebook profile, his/her shared stuff, likes, comments etc. and create a report which will expose his/her entire life facts including hidden one, using some AI or Machine learning concepts?
There's no system that will automatically be able to give content and understanding like you're looking for automatically. The human mind is able to infer a lot that computers simply can't understand. Also, you (generally) know some things about the people outside of Facebook (since you are friends with them) that fills in a lot of detail that the analysis system won't have.
The best thing you can do is to clearly define your problem and question that you're asking. There was a 'gaydar' project at MIT that was able to look at networks of students and generally correlate which ones are gay. For large groups you'll find it works overall, but for an individual person you're not going to be able to have great certainty.
Yet, to just ask the computer to 'find hidden information' won't work. You need to have a pretty solid model to work with. Overall, you're probably going to need a lot of data with confirmed facts to get started on testing that model as well (thousands of points needed). Also, with any social network you'll find that there is a lot of inaccurate/fake data on any given social network. People mis-list things all the time for various reasons (humor, etc) and this is going to throw off your models.
While brainstorming about six years ago, I had what I thought was a great idea: in the future there could be webservice standards and DTDs that effectively turn the web into a decentralized knowledgebase. I listed several areas where I thought this could be applied, one of which was:
For making data avail. directly from a business's website: open hours, locations, and contact phone numbers. Suggest a web service standard by which businesses have a standard URL extended off the main (base) URL for there website, at which is located a webservice. That webservice as well has a standardized set of services for downloading a list of their locations, contact telephone numbers, and business hours.
It's interesting looking back at these notes now since this is not how things have evolved. Instead of businesses putting this information on only their website then letting any search engine or other data aggregator to crawl it, they are updating it separately on their website, their Facebook page, and Google Maps. Facebook and Google Maps, due to their popularity, have become the solution to the problem I though my idea would solve.
Is the way things are better than the way I thought they could be? If so then why doesn't my idea fit the reality? If not then what's holding my idea back from being realized?
A lot of this information is available via APIs, that doesn't mean that it doesn't get put other places as well, through a variety of means. For example, a company may expose information via an API, and their Facebook app might use that API to populate a Facebook page.
Also, various microformats are in use that encapsulate some of this information.
The biggest obstacle is agreeing on what meta-information should be exposed, how it should be exposed, and how it should be accessed.
What is the purpose of CDN service providers?
My guess is that large scale sites like facebook,wikipedia,youtube etc use CDN service providers for some kind of outsourcing.
My understanding:
youtube keeps its content in these
CDNs and the site actually focus on
algorithms such as searching of
videos,suggesting related videos,
keeping subscriber list/playlist of
users etc.
The youtube site only keeps
meta-data,indexes?. or may be it also
contains one copy of its entire
content?. The user connects to
youtube site, searches for a video.
The site finds out the file name and
sends it to the CDN hub along with IP
address of the user.
The CDN hub than perhaps locates the
CDN node closest to the user and
serves the content to the user.
What is the advantage of this approach?
One most important I can see is that esp for videos, it is perhaps remarkably much more faster if you are streaming video from the same country than from across the globe.
but does distance really matter that much? Any concrete numbers to get a sense of increase in speed between getting videos from across the globe than from same country?
and Google doesn't want to install its storage nodes all over the world. It would rather outsource this to CDN service providers which have already spread their nodes all over the world. and Google only focuses on algorithms part (which it mostly keeps secret)
Is my understanding of the picture correct? Any input/pointers would be highly useful.
Thanks,
I learned about the importance of CDNs in terms of website performance a couple of years back thanks to Yahoo's "Best Practices for Speeding Up Your Web Site"
This is oft-referenced in ySlow, and Yahoo estimated a 20% speed increase.
Another "benefit" is parallel downloading, which is discussed at length by one of the above authors in this blog post.
These are some resources that I ran into when looking into site optimization so I just thought I'd share. Besides that, you seem to have a good grasp on the concept.
I've searched the web for this bit to no avail - I Hope some one can point me in the right direction. I'm happy to look things up, but its knowing where to start.
I am creating an iPhone app which takes content updates from a webserver and will also push feedback there. Whilst the content is obviously available via the app, I don't want the source address to be discovered and published my some unhelpful person so that it all becomes freely available.
I'm therefore looking at placing it in a mySQL database and possibly writing some PHP routines to provide access to my http(s) requests. That's all pretty new to me but I can probably do it. However, I'm not sure where to start with the security question. Something simple and straightforward would be great. Also, any guidance on whether to stick with the XML parser I currently have or to switch to JSON would be much appreciated.
The content consists of straightforward data but also html and images.
Doing exactly what you want (prevent users from 'unauthorized' apps to get access to this data') is rather difficult because at the end of the day, any access codes and/or URLs will be stored in your app for someone to dig up and exploit.
If you can, consider authenticating against the USER not the App. So that even if there is a 3rd party app created that can access this data from where ever you store it, you can still disable it on a per-user basis.
Like everything in the field of Information Security, you have to consider the cost-benefit. You need to weigh-up the value of your data vs. the cost of your security both in terms of actual development cost and the cost of protecting it as well as the cost of inconveniencing users to the point that you can't sell your data at all.
Good luck!
How does one build a directory of 'Spots' for users to check-in to in a native iPhone app? Or, does the developer borrow data from, let's say, Google Maps?
When you Use data obtained from another network or source, you take a risk that the data may change and or may not be accurate, The data may cease to exist, (more so with google, LOL, one minute they are there like gangbusters, the next they are like "Gone" no explanation no apologies, just missing in action, if your developing an application for a business its always best to use your own data sources.
That may be more expensive but its the only way you will have any kind of control over your application resources,.
You can go both ways, it depends on what you want to do and how you designed it to do it. You can have a prerecorded and static database of spots, or you can update it sometimes connecting to some server or you can do it all dynamically by loading each time data from the internet.
Which one to choose? first you shall design your app having in mind something like:
How many times will these datas change
How frequently will these changes happen
How much will it cost to do an update
and so on
Developing your own database of places is likely to be quite an undertaking (and your competitors have a big head start). Google is beginning to provide their Places API for "check-in" style applications, so you may be able to get in on their beta.