X Ray Scraper: Manipulate data before .write - scrape

I'm fiddling around with some scraping, and need to manipulate some of the data before writing it to my json file.
var Xray = require('x-ray');
var x = Xray();
x('http://myUrl.com', '#search_results div div a', [{
title: '.responsive_search_name_combined .search_name .title',
price: '.col.search_price.responsive_secondrow',
}])
.paginate('.search_pagination_right a.pagebtn:last-child#href')
.limit(10)
.write('data.json');
When saved, price looks like this: "price": "\r\n\t\t\t\t\t\t\t\t13,99€\t\t\t\t\t\t\t".
I guess its because theres a lot of spaces in div.col.search_price.responsive_secondrow.
<div class="col search_price responsive_secondrow">
9,99€ </div>
So my question is: Would it be possible to manipulate the data before .write?

Yes, you can simply provide a callback function that takes an object which is the result of your scrape. In this function you can take full control of any post-processing you want to do.
So your code would end up something like:
x('http://myUrl.com', '#search_results div div a', [{
title: '.responsive_search_name_combined .search_name .title',
price: '.col.search_price.responsive_secondrow',
}])
(function(products){
var cleanedProducts = [];
products.forEach(function(product){
var cleanedProduct = {};
cleanedProduct.price = product.price.trim();
//etc
cleanedProducts.push(cleanedProduct)
});
//write out results.json 'manually'
fs.writeFile('results.json', JSON.stringify(cleanedProducts));
})

You could use X-Ray native supported approach which is called filter functions and completely covers the case you described.
filters are custom defined functions allowing you to implement custom logic while processing scraped data.
See code sample below. There's a custom defined filter function with name of cleanUpText and apply it to scraped data price.
var Xray = require('x-ray');
var x = Xray({
filters: {
cleanUpText: function (value) { return value.replace('\r\n\t\t\t\t\t\t\t\t', '').replace('\t\t\t\t\t\t\t', ''); },
}
});
x('http://store.steampowered.com/search/?filter=topsellers', '#search_results div div a', [{
title: '.responsive_search_name_combined .search_name .title ',
price: '.col.search_price.responsive_secondrow | cleanUpText', // calling filter function 'cleanUpText'
}])
.paginate('.search_pagination_right a.pagebtn:last-child#href')
.limit(10)
.write('data.json');
data.json looks like below:
{"title": "PLAYERUNKNOWN'S BATTLEGROUNDS",
"price": "$29.99"},
{"title": "PAYDAY 2: Ultimate Edition",
"price": "$44.98"}

Related

Proper custom component with complex data in it

I have following interface:
export interface Product {
name: string;
provider: {
name: string;
logo: string;
};
pricePerUnit: {
quantity: number;
currency: string;
};
}
And my rowData looks like this:
rowData = [
{
name: 'Fish',
provider: {
name: 'Amazon',
logo: 'url to amazon logo',
},
pricePerUnit: {
quantity: 5,
currency: 'USD',
},
},
]
So, as you can see i have at least 2 complex object here, and by design I should display provider as img + name and price as quantity + currency symbol.
I`m using custom angular components for that with styling.
Actual problem
In order to provide these object to my custom components, I set field property in colDefs as follow (example for price):
{
headerName: 'Price',
field: 'pricePerUnit',
cellRenderer: PriceCellRendererComponent,
},
And here is the catch, because I specified in field property complex object, I no longer able to visualize data using integrated charts, because for them to work I should specify in my field propery path to number itself, like so:
{
field: 'pricePerUnit.quantity',
}
But now I`ve broke my custom component because params.value now holds just a number and not my complex object. Same goes to provider.
And it`s also broke grouping, sorting, filtering.
html template for one of my custom component (provider) looks like so:
<div class="wrapper provider">
<tui-avatar [avatarUrl]="params.value.logo" class="provider__logo"></tui-avatar>
<div class="provider__name">{{params.value.name}}</div>
</div>
So the question is:
How to properly setup custom components, so they would work in grouping, sorting, filtering and also integrated charts would use just simple primitive like number to correctly display data?

Meteor JS $near Reactive Sorting

I was happy to see that $near support for geospatial indexes was recently added to minimongo in Meteor 0.6.6. However, it doesn't appear that the sorting behavior of $near (it should sort in order of distance) is reactive. That is, when an document is added to the collection, the client loads it, but always at the end of the result list, even if it is closer to the $near coordinate than other documents. When I refresh the page, the order is corrected.
For example:
Server:
Meteor.publish('events', function(currentLocation) {
return Events.find({loc: {$near:{$geometry:{ type:"Point", coordinates:currentLocation}}, $maxDistance: 2000}});
});
Client:
Template.eventsList.helpers({
events: function() {
return Events.find({loc: {$near:{$geometry:{ type:"Point", coordinates:[-122.3943391, 37.7935434]}},
$maxDistance: 2000}});
}
});
Is there a way to get it to sort reactively?
There is nothing special about sorting reactivity for $near queries as supposed to any other query in minimongo: minimongo uses some sorting function either based on your sort specifier passed in query or a default sorting for queries containing $near operator.
Minimongo would sort everything and compare previous order with the new order every time something updates.
From your original question, it's unclear what behavior you expect and what do you see instead. Just to prove mentioned sorting works reactively, I wrote a mini-app to show it:
html templates:
<body>
{{> hello}}
</body>
<template name="hello">
Something will go here:
{{#each things}}
<p>{{name}}
{{/each}}
</template>
and JS file:
C = new Meteor.Collection('things');
if (Meteor.isClient) {
Template.hello.things = function () {
return C.find({location:{$near:{$geometry:{type: "Point",coordinates:[0, 0]}, $maxDistance:50000}}});
};
}
if (Meteor.isServer) {
Meteor.startup(function () {
C.remove({});
var j = 0;
var x = [10, 2, 4, 3, 9, 1, 5, 4, 3, 1, 9, 11];
// every 3 seconds insert a point at [i, i] for every i in x.
var handle = Meteor.setInterval(function() {
var i = x[j++];
if (!i) {
console.log('Done');
clearInterval(handle);
return;
}
C.insert({
name: i.toString(),
location: {
type: "Point",
coordinates: [i/1000, i/1000]
}
});
}, 3000);
});
}
What I see right after starting application and opening the browser: Numbers appear on the screen one by one from the x array. Every time new number arrives, it appears on the correct spot, keeping the sequence sorted all the time.
Did you mean something else by '$near reactive sorting'?

Multiple entry select2 and angular model fetched by $resource

I am having some difficulty figuring out how to make it all work together. Here is what I would like to do:
The model is fetched using $resource from the rest API:
var itemResource = $resource('http://blabla.com/items/:id');
$scope.item = itemResource.get({id: '12345'});
The returned item has some fields among which is one array field that lists the ids of the categories:
{
"item_name: "some value",
"categories": ["cat_id1", "cat_id7", "cat_id8"]
}
In the UI I want these categories to be shown as editable multi select. The user should not operate using ids, but rather he should see and be able to chose string representations which come from the mapping within the application. So in html:
<input type"text" ui-select2="categoryOptions" ng-model="item.categories" />
and also in controller:
var categoryMapping = [
{id: "cat_id1", text: "CategoryAlpha"},
...
{id: "cat_id8", text: "CategoryOmega"},
...
];
$scope.categoryOptions = {
'multiple': true,
'placeholder': 'Chose categories',
'width': 'element',
'data': categoryMapping,
};
Obviously the pieces of code above are not working and I don't know how to make them work to do what I want. ui-select2 wants the model (item.categories) to be an array of objects {id, text} and I want it to store only the ids in the items in the database and have the mapping separate. I can't be the first one to do it, there must be a solution, please help.
Thanks

angularstrap typeahead with json object array is not working

I am using angularstrap typeahead directive. Its working fine with single object json values but its not working when replacing the json with my json object array.
Demo Json:
typeahead= ["Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware","Florida","Georgia"];
<input type="text" ng-model="typeaheadValue" bs-typeahead="typeahead">
The above code is working fine.
My JSON object array:
typeahead = [
{id: 1, name: 'name1', email: 'email1#domain.com'},
{id: 2, name: 'name2', email: 'email2#domain.com'},
{id: 3, name: 'name3', email: 'email3#domain.com'}
];
$scope.typeaheadFn = function(query) {
return $.map($scope.typeahead, function(contacts) {
return contacts;
});
}
<input type="text" ng-model="typeaheadValue" bs-typeahead="typeaheadFn">
Please give me some solution for this.
You want to map your items to a list of strings, I believe.
Try:
$scope.typeaheadFn = function(query) {
return $.map($scope.typeahead, function(contact) {
return contact.name;
});
}
(I should add that I am currently stumped by something similar)
If you have, for example:
items = [
{id: 1, name: 'name1', email: 'email1#domain.com'},
{id: 2, name: 'name2', email: 'email2#domain.com'},
{id: 3, name: 'name3', email: 'email3#domain.com'}
];
You will need:
<input type="text" bs-typeahead ng-model="selectedItem" ng-options="item.name for item in items|orederBy:'name'|filter:{name:$viewValue}:optionalCompareFn"></input>
If you exclude filter from ng-options matching will be done on every property of item object, so if you want it to be done on one property add filter:{propName:$viewValue}. Also, if you exclude optionalCompareFn, default comparison from angular will be applied, but you can add your custom one (on your $scope), with signature (actual is property value of the item, stated in filter, not the whole object).
optionalCompareFn(expected,actual){ return /compare and return true or false/}
Attempt 1
I finally got this semi-working after a huge amount of frustration.
An easy way to get your desired text appearing is for each item to have a toString method.
You might have something like
typeaheadData = [
{id: 1, text: "abc", toString: function() { return "abc"; }},
{id: 2, text: "def", toString: function() { return "def"; }}
]
Then you will see the correct text in the options that popup, but the matching won't yet work properly (the items shown by the widget won't match the text the user enters in the box).
To get this working I used the new filter option that's been added in the current git version of angular-strap. Note that it's not even in the pre-built dist/angular-strap.js file in the repository, you will need to rebuild this file yourself to get the new feature. (As of commit ce4bb9de6e53cda77529bec24b76441aeaebcae6).
If your bs-typeahead widget looks like this:
<input bs-typeahead ng-options="item for item in items" filter="myFilter" ng-model="myModel" />
Then the filter myFilter is called whenever the user enters a key. It's called with two arguments, the first being the entire list you passed to the typeahead, and the second being the text entered. You can then loop over the list and return the items you want, probably by checking whether the text matches one or more of the properties of an item. So you might define the filter like this:
var myApp = angular.module('myApp', ['mgcrea.ngStrap'])
.filter('myFilter', function() {
return function(items, text) {
var a = [];
angular.forEach(items, function(item) {
// Match an item if the entered text matches its `text` property.
if (item.label.indexOf(text) >= 0) {
a.push(item);
}
});
return a;
};
});
Unfortunately this still isn't quite right; if you select an item by clicking on it, then the text parameter will be the actual object from the items list, not the text.
Attempt 2
I still found this too annoying so I made a fork of angular-strap (https://github.com/amagee/angular-strap) which lets you do this:
typeaheadData = [
{id: 1, text: "abc"},
{id: 2, text: "def"}
]
//...
$scope.myFormatter = function(id) {
if (id == null) { return; }
typeaheadData.forEach(function(d) {
if (d.id === id) {
return d.text;
}
});
};
<input bs-typeahead ng-options="item for item in items" ng-model="myModel"
key-field="id" text-field="text" formatter="myFormatter" />
With no need to fuss around with toString methods or filters. The text-field is what the user sees, and the key-field is what is written to the model.
(Nope! Still doesn't work if you update the model without going through the view).
The formatter option is needed so when you set the model value to just the key field, the widget can figure out the right text to display.

How to display JSONModel data using "sap.ui.table.DataTable"?

I'm trying to bring data from an internal table in a ABAP program into an SAPUI5 application.
From the ABAP side, I have converted the required data into JSON format and sending it across like it is mentioned in the a guide.
I have written the following code in the 'Read' section of the controller
$.ajax({
type: 'GET',
url: 'http://socw3s1er67.solutions.glbsnet.com:8000/sap/bc/Z_tets_json?sap-client=950',
success: function(data) {
alert(data[1].PROJECT);
alert(data[0].MANDT);
var oModel_Projects = new sap.ui.model.json.JSONModel();
oModel_Projects.setData({ modelData: data });
}
});
Sample JSON response from the server:
[
{
"MANDT": "PJ1",
"PROJECT": "Test Project1",
"DESCRIPTION": ""
},
{
"MANDT": "PJ2",
"PROJECT": "Test Project2",
"DESCRIPTION": ""
}
]
The alerts seem to be working fine and are returning expected data from the internal tables.
I want to know: how to bind this data into a particular table using models?
Well, your code looks ok, but there are other parts, which are missing, and there might be a problem there...
How is your table constructed - ?
It should be:
var table = new sap.ui.table.DataTable({
title: 'My first table',
width: '100%'
});
Do you make following calls to connect table and model?
table.setModel(oModel_Projects);
table.bindRows("modelData");
Are you properly creating columns of the table?
label = new sap.ui.commons.Label({ text: 'Client' });
template = new sap.ui.commons.TextView({ text: '{MANDT}' });
col = new sap.ui.table.Column({ label: label, template: template });
table.addColumn(col);
Is table properly placed into HTML using placeAt method?
Updated the answer after your clarification:
Try this parser:
var html = "<table><tr><td>MANDT</td><td>PROJECT</td><td>DESCRIPTION</td></tr>";
for(var index in data){
html+="<tr><td>"+data[index].MANDT+"</td><td>"+data[index].PROJECT+" </td><td>"+data[index].DESCRIPTION+"</td></tr>";
}
html+="</table>";
//now you can insert this html into some div like as follows: $("#div1").html(html);
Or google for some jquery gridview as I suggested in the comment.