I am wondering how to determine the age of a web site (not the age of the host / domain registration) in a robust and universal way.
Take this site as an example:
Most of the times, the age / date (December 21, 2011, in this case) appears on the site, but AFAIK there is no universal way of getting this information from the page (could be on the page, in the META-tag, header...)
If you google the headline, Google will show the age (first result; gray; so Google extracted this information somehow):
http://i.stack.imgur.com/BcXwo.png [I don't have privileges to embedd this as an image]
Alongside, there are other sites with the same news (i guess it's from a press agency) and Google shows the age for those as well, but not for the last one, despite its occurrence in the text (First line; Wednesday, December 21, 2011).
Q1) How to determine the age in a universal way?
Q2) How does Google do it? Is it just the time the URL showed up in the index? Why isn't there a date then for the last result?
Q3) If there is no other way then actually getting it from Google, how can that be done automatically for a couple of domains? After a number of automated requests, Google will block / prevent you from sending more requests. I had a look in the Google Custom Search API, but the data does not show up in the results there.
Thanks!
If the server supports it you can use the Last-Modified Header part of the HTTP-Request.
try: curl -I http://online.wsj.com/article/SB10001424052970204058404577110380555673036.html
to get only the HTTP-Header of the Reply and have a look at the output
HTTP/1.1 200 OK
Date: Wed, 09 May 2012 12:40:10 GMT
Server: Apache/2.2.15 (CentOS)
...
FastDynaPage-ServerInfo: secj2kentwap07 - Wed 05/09/12 - 08:40:10 EDT
Last-Modified: Wed, 09 May 2012 12:40:10 GMT
Content-Type: text/html; charset=UTF-8
Actually I haven't found a proper way to get the date from the URL. So I took another approach: I try to find a feed (either from the site itself or through Google) that contains that URL as an item.
Then there is a good chance that I'll either get a pubDate or dc:date which contains the date of publication. This is then usable.
Thanks for all the input.
Related
When making a HTTP(S) request, the response contains the header "Date" with the format day-name, day month year hour:minute:second GMT
I am using Django (3.2) with DjangoRestFramework (3.12) and I want to know if it's possible to change the format of this date.
For example, I want to use this format for my django server: "YYYY-MM-DDTHH:mm:ss"
When using python3.6 requests module
import requests
resp = requests.get('https://stackoverflow.com/')
print(resp.headers['Date'])
# 'Tue, 27 Sep 2022 13:31:25 GMT'
So I think it is not possible to change the format of this date.
But I somehow found a solution to my problem by adding a middleware that sets a new header to my response, with the wright format
following this article http://www.ktskumar.com/2017/01/access-sharepoint-online-using-postman/
I was able to register an app and get a client_id as well as a security token.
Now if I follow the article, I'm able to get an access token by using Postman, SOAP UI as well as by using a REST client in browser. I'm also able to fetch data from SharePoint using this token.
However, I need to do this from a unix based middleware, which is able to do HTTP calls as well. I tried everything but I can't get it work.
Preparation that has been done before:
register new app by using https://.sharepoint.com/sites//_layouts/15/appregnew.aspx
add app and permission to site collection to grant access by using https://.sharepoint.com/sites//_layouts/15/appinv.aspx
After this, I do some webservice calls like this
I try to get an access token by calling https://accounts.accesscontrol.windows.net/<mytenant_id>/tokens/OAuth/2 and got one. I can use this token in every REST client as well as in Postman. So I assume it is a valid one.
Now I try to retrieve the Title of web by calling this URL https://<my_tenant>.sharepoint.com/sites/<site_collection>/_api/web?$select=Title
This always returns a 403 but only when using middleware system. If I do the same from any other client, it works.
Could someone please enlight me what is going wrong here?
This is how the request header looks like (I've shorten some things)
cookie'='fpc=...some other stuff; domain=.accounts.accesscontrol.windows.net; path=/; secure; HttpOnly; SameSite=None
x-ms-gateway-slice=prod; path=/; SameSite=None; secure; HttpOnly
stsservicecookie=ests; path=/; SameSite=None; secure; HttpOnly'
'User-Agent'='Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'
'accept'='application/json;odata=verbose'
'Authorization'='Bearer eyJ0eXAiOiJKV1QiLCJhbG... lot following here, but only value of access_token'
This is what the response looks like:
'RESPONSE_HTTP_HEADER_X-ASPNET-VERSION'='4.0.30319'
'RESPONSE_HTTP_HEADER_LAST-MODIFIED'='Tue, 23 Jun 2020 08:10:42 GMT'
'RESPONSE_HTTP_HEADER_X-SHAREPOINTHEALTHSCORE'='1'
'RESPONSE_HTTP_HEADER_X-FORMS_BASED_AUTH_RETURN_URL'='https://<mytenant>.sharepoint.com/_layouts/15/error.aspx'
'RESPONSE_HTTP_HEADER_CACHE-CONTROL'='private, max-age=0'
'RESPONSE_HTTP_DATA'='<?xml version="1.0" encoding="utf-8"?><m:error xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"><m:code>-2147024891, System.UnauthorizedAccessException</m:code><m:message xml:lang="en-US">Access denied. You do not have permission to perform this action or access this resource.</m:message></m:error>'
'RESPONSE_HTTP_HEADER_X-POWERED-BY'='ASP.NET'
'RESPONSE_HTTP_HEADER_DATE'='Tue, 23 Jun 2020 08:10:42 GMT'
'RESPONSE_HTTP_STATUSLINE'='Forbidden'
'RESPONSE_HTTP_HEADER_EXPIRES'='Mon, 08 Jun 2020 08:10:42 GMT'
'RESPONSE_HTTP_HEADER_CONTENT-SECURITY-POLICY'='frame-ancestors 'self' teams.microsoft.com *.teams.microsoft.com *.skype.com *.teams.microsoft.us local.teams.office.com;'
'RESPONSE_HTTP_HEADER_MICROSOFTSHAREPOINTTEAMSERVICES'='16.0.0.20203'
'RESPONSE_HTTP_HEADER_X-MSDAVEXT_ERROR'='917656; Access+denied.+Before+opening+files+in+this+location%2c+you+must+first+browse+to+the+web+site+and+select+the+option+to+login+automatically.'
'RESPONSE_HTTP_HEADER_SPREQUESTGUID'='78265f9f-40b3-b000-f2bb-2df685280534'
'RESPONSE_HTTP_HEADER_STRICT-TRANSPORT-SECURITY'='max-age=31536000'
'RESPONSE_HTTP_HEADER_TRANSFER-ENCODING'='chunked'
'RESPONSE_HTTP_HEADER_MS-CV'='n18meLNAALDyuy32hSgFNA.0'
'RESPONSE_HTTP_HEADER_CONTENT-TYPE'='application/xml;charset=utf-8'
'RESPONSE_HTTP_HEADER_P3P'='CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"'
'RESPONSE_HTTP_HEADER_X-FRAME-OPTIONS'='SAMEORIGIN'
'RESPONSE_HTTP_HEADER_X-IDCRL_AUTH_PARAMS_V1'='IDCRL Type="BPOSIDCRL", EndPoint="/sites/<sitecollection>/_vti_bin/idcrl.svc/", RootDomain="sharepoint.com", Policy="MBI"'
'RESPONSE_HTTP_HEADER_SERVER'='Microsoft-IIS/10.0'
'RESPONSE_HTTP_HEADER_REQUEST-ID'='78265f9f-40b3-b000-f2bb-2df685280534'
'RESPONSE_HTTP_HEADER_X-MS-INVOKEAPP'='1; RequireReadOnly'
'RESPONSE_HTTP_HEADER_X-CONTENT-TYPE-OPTIONS'='nosniff'
'RESPONSE_HTTP_HEADER_X-FORMS_BASED_AUTH_REQUIRED'='https://<mytenant>.sharepoint.com/_forms/default.aspx?ReturnUrl=/_layouts/15/error.aspx&Source=%2f_vti_bin%2fclient.svc%2fweb%3f%24select%3dTitle'
'RESPONSE_HTTP_STATUS'='403'
'RESPONSE_HTTP_HEADER_DATASERVICEVERSION'='3.0'
I also tried it with different HTTP Headers, by using cookies and by skip them. Nothing works from middleware but everything from my PC.
Patrick
You could try this way to get authentication:
https://www.c-sharpcorner.com/article/access-sharepoint-online-rest-api-via-postman-with-user-context/
I use some kind of middleware called "Lobster data". This is a software product to map data between different kind of systems. It's comparable to Microsoft BizTalk or others.
However this software uses some special prefix for HTTP header which I was not aware of. Thanks to their support team, I was able to overcome this issue.
Commonly, if you set a HTTP header, you simple use the name of the header you want to add like "content-type" or "authorization" and pass a value.
When using Lobster, you need to add "REQUEST_HTTP_HEADER_" as a prefix, so it needs to be "REQUEST_HTTP_HEADER_authorization" instead of just "authorization". Otherwise it will not send the data as a HTTP Header.
This is only true when using Lobster and not in general. I wasn't aware that they use this syntax.
There is a problem accessing the To alias from a o365 account IF the from account is also o365. If the from account is say, gmail, it works.
If I send an email to alias#mycompany.com which is an alias to realAccount#mycompany.com, if I examine the To header in Outlook, it will always show me the original alias. If I view the header progrmatically, it will NOT show the alias if it was sent from an o365 account. Instead, it shows the real account. If I do this same test with a gmail instead of an o365 email it works -- shows the alias in the To: header as expected.
How does Outlook access this data? The number of headers are different too. Outlook contains more data. Has anyone experienced this? Any ideas on how to access the alias like Outlook does?
Header when accessing from Outlook:
From: o365Account#somecompany.com
To: ***************** alias#mycompany.com ****************
Subject: shdaKJSDHA
Thread-Topic: shdaKJSDHA
Thread-Index: AQHUSTkz1fQhzI5SG0ie26mNIvHmmQ==
Date: Mon, 10 Sep 2018 19:05:12 +0000
Message-ID: <---#-----.prod.outlook.com>
Header when accessed programatically:
From: o365Account#company.com
To: *****************realAccount#mycompany.com ****************
Subject: shdaKJSDHA
Thread-Topic: shdaKJSDHA
Thread-Index: AQHUSTkz1fQhzI5SG0ie26mNIvHmmQ==
X-MS-Exchange-MessageSentRepresentingType: 1
Date: Mon, 10 Sep 2018 19:05:12 +0000
Message-ID: <----#-----.prod.outlook.com>
Keep in mind that Exchange always resolves the sender and recipients to their primary addresses both when sending and when receiving the messages. This is just the way it works.
Are you sending through SMTP?
You know what I ended up doing? I made a Distro list with 1 recipient. The distro list takes the place of the alias. It always shows the To: as the distro list. That way it doesn't get lost. Thank you for helping me understand Exchange Server dmitry-streblechenko.
Hi I am trying to find the property Date of the Header in the System.Web.HttpRequest but I cannot find it. This property is in the System.Net lib but not in the System.Web
Any idea where or how to find it?
My request is:
Authorization: blablabla
Date: Wed, 25 Feb 2015 21:36:39 +0000
Host: localhost:57449
Content-Length: 128
Those are two completely different types with very different purposes (System.Web.HttpRequest is for servers; System.Net.HttpWebRequest is for clients). Differences in the design reflect those different uses.
Have you tried looking in HttpRequest's Headers collection?
The code (taken from SO):
// create the logger and log writer
$writer = new Zend_Log_Writer_Firebug();
$logger = new Zend_Log($writer);
// get the wildfire channel
$channel = Zend_Wildfire_Channel_HttpHeaders::getInstance();
// create and set the HTTP response
$response = new Zend_Controller_Response_Http();
$channel->setResponse($response);
// create and set the HTTP request
$channel->setRequest(new Zend_Controller_Request_Http());
// record log messages
$logger->info('info message');
$logger->warn('warning message');
$logger->err('error message');
// insert the wildfire headers into the HTTP response
$channel->flush();
// send the HTTP response headers
$response->sendHeaders();
$this->_redirect('/login/success');
Apparently, all the messages won't appear if I use _redirect(), however, if I use something like
$this->getResponse()->setHeader('Refresh', '0; URL=/login/success');
it will work. So my question is:
What should I do to make sure the messages will appear in my Firebug Console (using _redirect())?
Update 1:
In the Net tab, I can see the messages are in the HEADER, but it's not appearing in my Firebug
Date Wed, 08 Dec 2010 03:42:15 GMT
Server Apache/2.2.16 (Unix) DAV/2 PHP/5.3.3
X-Powered-By PHP/5.3.3
Expires Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma no-cache
X-Wf-Protocol-1 http://meta.wildfirehq.org/Protocol/JsonStream/0.2
X-Wf-1-Structure-1 http://meta.firephp.org/Wildfire/Structure/FirePHP/FirebugConsole/0.1
X-Wf-1-Plugin-1 http://meta.firephp.org/Wildfire/Plugin/ZendFramework/FirePHP/1.6.2
X-Wf-1-1-1-1 156|[{"Type":"INFO","File":"\/home\/foo\/workspace\/php\/identiti\/application\/modules\/default\/controllers\/LoginController.php","Line":64},"info message"]|
X-Wf-1-1-1-2 159|[{"Type":"WARN","File":"\/home\/foo\/workspace\/php\/identiti\/application\/modules\/default\/controllers\/LoginController.php","Line":65},"warning message"]|
X-Wf-1-1-1-3 158|[{"Type":"ERROR","File":"\/home\/foo\/workspace\/php\/identiti\/application\/modules\/default\/controllers\/LoginController.php","Line":66},"error message"]|
Location /login/success
Content-Length 0
Keep-Alive timeout=5, max=100
Connection Keep-Alive
Content-Type text/html
Update 2:
Apparently it's a bug, confirmed in FirePHP Official Forum. I'll wait untill there's a real fix before I answer this question.
Thanks for the detailed test case.
This is a bug in FirePHP Companion.
Working on a fix. Will let you know
when done (ETA Friday).
Thanks! Christoph
Does enabling the "Persist" option in the Firebug Console tab help?
This is the official answer from the author himself:
I have good and bad news. Logging during redirects works now for FirePHP 1.0 + FirePHP Companion. It will not work for the native Zend Framework implementation until early next year.
To get a working solution, please upgrade to FirePHP 1.0: http://upgrade.firephp.org/
Also see: http://www.christophdorn.com/Blog/2010/11/29/firephp-1-0-in-5-steps/
Instructions for logging during redirects:
http://reference.developercompanion.com/#/Tools/FirePHPCompanion/FAQ/#Redirect Messages
I would suggest using the FirePHP 1.0 library in addition to or instead of the ZF components. This will be much improved early next year.
Please let me know if you get this working.