How to list objects in Google Cloud Storage from PHP - google-cloud-storage

I am trying to list objects in a folder within a Google Cloud Storage bucket. I can get a result with 1000 objects easily (or increase the number if I want) using the following code:
$names = [];
$bucket = $client->bucket('mybucketname');
$options = ['prefix' => 'myfoldername', 'fields' =>' items/name,nextPageToken'];
$objects = $bucket->objects($options);
foreach ($objects as $object) {
$names[] = $object->name();
}
So far so good, but now I want to get the next 1000 objects (or whatever limit I set using maxResults and resultLimit) using the fact that I specified the nextPageToken object. I know that I have to do this by specifying pageToken as an option - it's just that I have no idea how.
I expect my final code will look something like this - what I need is the line of code which retrieves the next page token.
$names = [];
$bucket = $client->bucket('mybucketname');
$options = ['prefix' => 'myfoldername', 'fields' =>' items/name,nextPageToken'];
while(true) {
$objects = $bucket->objects($options);
foreach ($objects as $object) {
$names[] = $object->name();
}
$nextPageToken = $objects->getNextPageTokenSomehowOrOther(); // #todo Need help here!!!!!!!
if (empty($objects) || empty($nextPageToken)){
break;
}
$options['pageToken'] = $nextPageToken;
}
Any ideas?

The nextPageToken is the name of the last object of the first request encoded in Base64.
Here we have an example from the documentation:
{
"kind": "storage#objects",
"nextPageToken": "CgtzaGliYS0yLmpwZw==",
"items": [
objects Resource
…
]
}
If you decode the value "CgtzaGliYS0yLmpwZw==" this will reveal the value "shiba-2.jpg"
Here we have the definition of PageToken based on API documentation:
The pageToken is an encoded field that marks the name and generation of the last
object in the returned list. In a subsequent request using the pageToken, items
that come after the pageToken are shown (up to maxResults).
References:
https://cloud.google.com/storage/docs/json_api/v1/objects/list#parameters
https://cloud.google.com/storage/docs/paginate-results#rest-paginate-results
See ya

Related

Iterate through CSV and create an array

I am newbie to Powershell. Need a logic for CSV automation. I have a CSV log file contains large number of API calls.
I need to go row by row and segregate the data, output should be like below. Sum of calls count and average of response time to be updated.
I have written complicated If else conditions for different types of API calls and able to take the scenario name and other values from the csv. My pain starts here, struggling to come to conclusion to move forward. Can i create an array and store all the values then do all the calculation later or write the values in another csv then do all the calculation to find the Count and average response time?
If i choose array, scenario should not be duplicated. For me its really hard to take a decision without knowing the available cmdlets for array and CSV. Please throw some light..
Thanks in advance...
Here is an approach you can use a combination of c# available to Powershell (which can be MUCH more efficient handling larger files and data).
The first component is you need some consistent logic to isolate the API category you want each URL to be assigned. From your screenshots, sometimes it seems you use last segment of the URL but others it is some path in the middle of the resource.
Here is just a quick approach where you pass in an array of categories, and if it can be matched to URI in any way, then that category is used. Otherwise, the URI stands as its own category. Please replace with whatever logic you want here.
function Get-ApiCategory {
param([string[]] $Categories, [string] $Text)
foreach ($c in $Categories) {
if ($Text.IndexOf($c) -gt 0) {
return $c
}
}
return $Text # Not found
}
Then, here is a method that (1) reads the large CSV file row-by-row and uses basic parsing logic (since your source data seems simple enough) without loading the full file into memory, and then (2) exports a CSV file with summary data.
function Write-SummaryToFile {
param([string[]] $Categories, [string] $InputFile, [string] $Output)
# Parse the file line-by-line (optimize for memory)
$result = #{}
$lineNum = 0
Write-Host $InputFile
foreach ($line in [System.IO.File]::ReadLines($InputFile)) {
if ($lineNum++ -lt 1) { continue } # Skip header
$cols = $line.Split(',')
$category = Get-ApiCategory $Categories $cols[0]
$new = #{
Category = $category
Count = [int]$cols[1]
AvgResponse = [double]$cols[2]
}
if ($result.ContainsKey($category)) {
$weighted = $result[$category].AvgResponse * $result[$category].Count
$result[$category].Count += $new.Count
$result[$category].AvgResponse = ($weighted + $new.AvgResponse * $new.Count) / $result[$category].Count;
} else {
$result[$category] = $new
}
}
# Output to file
if (Test-Path $Output) { Remove-Item $Output }
try {
$stream = [System.IO.StreamWriter] $Output
$stream.WriteLine('Scenario,Count,Avg_Response_Time')
$result.Values | ForEach-Object { $stream.WriteLine([string]::Format("{0},{1},{2}", $_.Category, $_.Count, $_.AvgResponse.ToString("0.##"))) }
}
finally {
$stream.Dispose()
}
}
Then, you are able to call these methods in an example like this:
$categories = #('MoveRequestQueue', 'DeliveryDate')
Write-SummaryToFile $categories 'c:\dev\scratch\ps1\test.csv' 'C:\dev\scratch\ps1\Output.csv'

TYPO3 ConnectionPool find a file after the uid of the file reference and update data

The concept is that, after a successfull save of my object, it should update a text in the database (With a Hook). Lets call the field 'succText'. The table i would like to access is the sys_file but i only get the sys_file_reference id when i save the object. So i thought i could use the ConnectionPool to select the sys_file row of this file reference and then insert the data on the field 'succText'.
I tried this:
public function processDatamap_preProcessFieldArray(array &$fieldArray, $table, $id, \TYPO3\CMS\Core\DataHandling\DataHandler &$pObj) {
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)->getQueryBuilderForTable('sys_file_reference');
$findItemsId = $queryBuilder
->select('*')
->from('sys_file_reference')
->join(
'sys_file_reference',
'sys_file',
'reference',
$queryBuilder->expr()->eq('reference.uid', $queryBuilder->quoteIdentifier('uid_local'))
)
->where(
$queryBuilder->expr()->eq('uid_local', $queryBuilder->createNamedParameter($fieldArray['downloads'], \PDO::PARAM_INT))
)
->execute();
}
But this give me back the sys_file_reference id and not the id and the field values of the sys_file table.
As for the update, i havent tried it yet, cause i haven't figured out yet, how to get the row that needs to be updated. I gues with a subquery after the row is found, i don't really know.
The processDatamap_preProcessFieldArray is going to be renamed to post. I only have it this way in order to get the results on the backend.
Thanks in advance,
You might want to make use of the FileRepository class here.
$fileRepository = GeneralUtility::makeInstance(\TYPO3\CMS\Core\Resource\FileRepository::class);
$fileObjects = $fileRepository->findByRelation('tablename', 'fieldname', $uid);
Where $uid is the ID of the record that the files are connected to via file reference.
You will get back an array of file objects to deal with.
I resolved my problem by removing the first code and adding a filerepository instance.
$fileRepository = GeneralUtility::makeInstance(FileRepository::class);
$fileObjects = $fileRepository->findByRelation('targetTable', 'targetField', $uid);
VERY IMPORTANT!
If you are creating a new element then TYPO3 assigns a temp UID variable with a name that looks like this NEW45643476. In order to get the $uid from the processDatamap_afterDatabaseOperations you need to add this code before you get the instance of the fileRepository.
if (GeneralUtility::isFirstPartOfStr($uid, 'NEW')) {
$uid = $pObj->substNEWwithIDs[$uid];
}
Now as far as the text concerns, i extracted from a pdf. First i had to get the basename of the file in order to find its storage location and its name. Since i have only one file i don't really need a foreach loop and i can use the [0] as well. So the code looked like this:
$fileID = $fileObjects[0]->getOriginalFile()->getProperties()['uid'];
$fullPath[] = [PathUtility::basename($fileObjects[0]->getOriginalFile()->getStorage()->getConfiguration()['basePath']), PathUtility::basename($fileObjects[0]->getOriginalFile()->getIdentifier())];
This, gives me back an array looking like this:
array(1 item)
0 => array(2 items)
0 => 'fileadmin' (9 chars)
1 => 'MyPdf.pdf' (9 chars)
Now i need to save the text from every page in a variable. So the code looks like this:
$getPdfText = '';
foreach ($fullPath as $file) {
$parser = new Parser();
$pdf = $parser->parseFile(PATH_site . $file[0] . '/' . $file[1]);
$pages = $pdf->getPages();
foreach ($pages as $page) {
$getPdfText .= $page->getText();
}
}
Now that i have my text i want to add it on the database so i will be able to use it on my search action. I now use the connection pool to get the file from the sys_file.
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)->getQueryBuilderForTable('sys_file');
$queryBuilder
->update('sys_file')
->where(
$queryBuilder->expr()->eq('uid', $queryBuilder->createNamedParameter($fileID))
)
->set('pdf_text', $getPdfText)
->execute();
Now everytime i choose a PDF from my extension, i save its text on the database.
EXTRA CONTENT
If you want to include the PDFParser as well and you are on composer mode, then add this on your composer.json:
"smalot/pdfparser" : "*"
and on the autoload:
"Smalot\\PdfParser\\" : "Packages/smalot/pdfparser/src/"
Then under: yourExtension/Classes/Hooks/DataHandler.php add the namespace:
use Smalot\PdfParser\Parser;
Now you are able to use the getPages() and getText() functions.
The Documentation
If i missed something let me know and i will add it.

How to get images in the product list view with an API call in Shopware?

I’m working on a single-page-application with fully client-side rendered frontend (React) and Shopware acting as a headless CMS in the background. So all product-data will be pulled from the API and checkout will also use the API to send the data.
My problem is the following:
When trying to render the product list page, I call the articles endpoint which returns the basic info of all my products. The problem is that I would also need to render the main image associated with each product and this list does not expose any data from the media attached to the products.
I can get the media item(s) for a product using the Media endpoint, problem is that this one expects a Media Id which I cannot get from the listed representations of the articles.
So right now the only way I see to get the images is first making a call that gets all the products, then loop through them and get each product’s details, calling the article endpoint again with each product ID, then when I have the media IDs I loop through those and get the images for each product using the media endpoint, because that’s the only one that exposes the actual image path in the response. This seems way too complicated and slow.
Is there a smarter way to do this? Can I get Shopware to output the 1st image’s path in the Article list response, that’s associated with the current product?
Also I saw that the paths to the images look like this:
/media/image/f5/fb/95/03_200x200.jpg
The first part up to /media/image/ is fixed, that’s straight forward, and I get the image’s filename and extension in the article detail’s response in the media object and even the file extension, which looks like this in the API's response.
The problem is that I don’t know what’s the f5/fb/95/ stand for. If I could get this info from the details I could assemble the URL for the image programmatically and then I wouldn’t need the call to the media endpoint.
You can see how the path is generated in the class \Shopware\Bundle\MediaBundle\Strategy\Md5Strategy of Shopware's source code. The two relevant methods are:
<?php
public function normalize($path)
{
// remove filesystem directories
$path = str_replace('//', '/', $path);
// remove everything before /media/...
preg_match("/.*((media\/(?:archive|image|music|pdf|temp|unknown|video|vector)(?:\/thumbnail)?).*\/((.+)\.(.+)))/", $path, $matches);
if (!empty($matches)) {
return $matches[2] . '/' . $matches[3];
}
return $path;
}
public function encode($path)
{
if (!$path || $this->isEncoded($path)) {
return $this->substringPath($path);
}
$path = $this->normalize($path);
$path = ltrim($path, '/');
$pathElements = explode('/', $path);
$pathInfo = pathinfo($path);
$md5hash = md5($path);
if (empty($pathInfo['extension'])) {
return '';
}
$realPath = array_slice(str_split($md5hash, 2), 0, 3);
$realPath = $pathElements[0] . '/' . $pathElements[1] . '/' . join('/', $realPath) . '/' . $pathInfo['basename'];
if (!$this->hasBlacklistParts($realPath)) {
return $realPath;
}
foreach ($this->blacklist as $key => $value) {
$realPath = str_replace($key, $value, $realPath);
}
return $realPath;
}
Step for step, this happens generally to an image path, for example https://example.com/media/image/my_image.jpg, upon passing it to the encode-method:
The normalize method strips everything except for media/image/my_image.jpg
An md5-hash is generated out of the resulting string: 5dc18cdfa0...
The resulting string from step 1 is split at every /: ['media', 'image', 'my_image.jpg']
The first three character pairs from the md5-hash are being stored into an array: ['5d', 'c1', '8c']
The final path is being assembled out of the arrays from steps 3 and 4: media/image/5d/c1/8c/my_image.jpg
Every occurence of /ad/ is being replaced by /g0/. This is done, because there are ad-blockers which block requests to URLs containing /ad/
Shopware computes the path for you.
There is acutally no need to know about the hashed path...
You will be redirected to the correct path.
Try the following:
/media/image/ + image['path'] + image['extension']
An for the thumbnails:
/media/image/thumbnail/ + image['path'] + '_200x200' + image['extension']
or
/media/image/thumbnail/ + image['path'] + '_600x600' + image['extension']

Finding bad words from large list of email addressess using PHP -Mongodb

I have large list of email addressses from a file. It comes around 1 million email ids. I have list of bad words like spam,junk etc, it consist of 20,000+ bad words.
I need to validate email ids. If bad words is present any where in email id it will be marked as invalid.
For example;
testspam#gmail.com - invalid
newuser#desspam.com - invalid
I would like to know which will be fastest comparison method as array looping will take time.
I tried following methods
//$keyword_list- array of bad words;
//$check_key- the email id which need to validate
$arrays = array_chunk($keyword_list, 2000);
for($i=0;$i<count($arrays);$i++)
{
if (preg_match('/'.implode('|', $arrays[$i]).'/', $check_key, $matches)){
return 1;
}
}
The above method is taking more time when comparing 1 million data.
Next we tried with the following code and this also takes more time
//$contain = bad words separated by '|'
// $str - the email id which need to validate
if(stripos($contain,"|") !== false)
{
$s = preg_split('/[|]+/i',$contain);
$len = sizeof($s);
for($i=0;$i < $len;$i++)
{
if(stripos($str,$s[$i]) !== false)
{
return(true);
}
}
}
if(stripos($str,$contain) !== false)
{
return(true);
}
return(false);
Finally I had tried Mongodb Text Search. It works fast with the following issues
If 'Hell' is the word in my bad list and my email id is like
head#e-hellinglysussex.sch.uk, then the Mongodb Text Search wont matches it.
Here is the code I used;
$ret = $db->command( array("text" =>$section, "search" => $keyword_string, "limit"=>$cnt_finalnonmatch));
where $section = Collection name,
$keyword_string = Comparing keywords string separated by space, for eg "Hell Spam Junk" etc,
$cnt_finalnonmatch = total number of comparing email ids
Please help me to solve this issue.
I am not entirely sure, but I suspect that the problem is that 'Hell' is not equal to 'hell' when you search for text since mongodb is case sensitive.
The solution should be to force all the strings and word to be lowercase (or uppercase)
We have used Mongodb 'like' to solve this issue;
$keywords = $key['keyword']; // Keywords need to compare
$regexObj = new MongoRegex("/".$keywords."/i"); // MongoRegex function declration
$where = array($section => $regexObj); // $section is the collection name
$resultset = $info->find($where);

Youtube API - How to limit results for pagination?

I want to grab a user's uploads (ie: BBC) and limit the output to 10 per page.
Whilst I can use the following URL:
http://gdata.youtube.com/feeds/api/users/bbc/uploads/?start-index=1&max-results=10
The above works okay.
I want to use the query method instead:
The Zend Framework docs:
http://framework.zend.com/manual/en/zend.gdata.youtube.html
State that I can retrieve videos uploaded by a user, but ideally I want to use the query method to limit the results for a pagination.
The query method is on the Zend framework docs (same page as before under the title 'Searching for videos by metadata') and is similar to this:
$yt = new Zend_Gdata_YouTube();
$query = $yt->newVideoQuery();
$query->setTime('today');
$query->setMaxResults(10);
$videoFeed = $yt->getUserUploads( NULL, $query );
print '<ol>';
foreach($videoFeed as $video):
print '<li>' . $video->title . '</li>';
endforeach;
print '</ol>';
The problem is I can't do $query->setUser('bbc').
I tried setAuthor but this returns a totally different result.
Ideally, I want to use the query method to grab the results in a paginated fashion.
How do I use the $query method to set my limits for pagination?
Thanks.
I've decided just to use the user uploads feed as a way of getting pagination to work.
http://gdata.youtube.com/feeds/api/users/bbc/uploads/?start-index=1&max-results=10
If there is a way to use the query/search method to do a similar job would be interesting to explore.
I basically solved this in the same way as worchyld with a slight twist:
$username = 'ignite';
$limit = 30; // Youtube will throw an exception if > 50
$offset = 1; // First video is 1 (silly non-programmers!)
$videoFeed = null;
$uploadCount = 0;
try {
$yt = new Zend_Gdata_YouTube();
$yt->setMajorProtocolVersion(2);
$userProfile = $yt->getUserProfile($username);
$uploadCount = $userProfile->getFeedLink('http://gdata.youtube.com/schemas/2007#user.uploads')->countHint;
// The following code is a dirty hack to get pagination with the YouTube API without always starting from the first result
// The following code snippet was copied from Zend_Gdata_YouTube->getUserUploads();
$url = Zend_Gdata_YouTube::USER_URI .'/'. $username .'/'. Zend_Gdata_YouTube::UPLOADS_URI_SUFFIX;
$location = new Zend_Gdata_YouTube_VideoQuery($url);
$location->setStartIndex($offset);
$location->setMaxResults($limit);
$videoFeed = $yt->getVideoFeed($location);
} catch (Exception $e) {
// Exception handling goes here!
return;
}
The Zend YouTube API seems silly as the included getUserUploads method never returns the VideoQuery instance before it actually fetches the feed, and while you can pass a location object as a second parameter, it's an "either-or" situation - it'll only use the username parameter to construct a basic uri or only use the location, where you have to construct the whole thing yourself (as above).