StackExchange Data Dump user tags

StackExchange Data Dump user tags - tags

For my project I am using the StackExchange Data Dump that I found on https://archive.org/details/stackexchange.
To be able to do my work, I need to know what tags a user has used mostly (these are the tags you can find when looking on a user's profile page). However, when going to the Data Dump files, I noticed there is a file called User.xml so I was hoping to find these tags here but no I did not.
Is there a way so I can gather these tags for the Data Dumps? Or should I do it brainlessly by writing a small application that retrieves these tags by calling the StackExchange API with the UserId?
Thanks!

Related

How to upload files and attachments to the sobject record using REST API?

Salesforce has two different UIs and in accordance with it, it has the possibility to store attached files differently.
Two files were uploaded via the classic UI and they are marked as 'attachments'. Other files were uploaded through the new UI and they are marked as 'files'.
I want to upload all of these files using REST API. I cannot find the proper documentation. Can somebody help me with this?

That's not 100% true. In SF Classic UI you were able to upload Files too. It's "just" about knowing the right API name of the table and you'll find lots of examples online.
Attachment and Document objects have exactly same API names, you can view their definitions in SOAP API definition or in REST API explorer (there was something which you can still see in screenshot in here, seems to be down now, maybe they're moving it to another area in documentation...)
The Files (incl. "Chatter Files") are stored in ContentDocument and ContentVersion object. The name is unexpected because long time ago SF purchased another company's product and it was called "Salesforce Content". In beginning it was bit of mess, now it's better integrated into whole platform but still some things lurk like File folders can be called Libraries sometimes in documentation but actual API name is ContentWorkspace. The entity relationship diagram can help a bit: https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_erd_content.htm
ContentDocument is a header to which many places in SF link (imagine file wasting space on disk only once but being cross-linked from multiple records). It can have at least 1 version and if you need to update the document - you'd upload new version but all links in org wouldn't change, they'd still link to header.
So, how to use it?
REST API guide: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_sobject_insert_update_blob.htm
or maybe Chatter API guide (you tagged it with chatter so chances are you already use it): https://developer.salesforce.com/docs/atlas.en-us.chatterapi.meta/chatterapi/connect_resources_files.htm
some of my answers here might help (shameless plug). They're about upload and reading data too and one is even about data loader... but you might experiment with exporting files first, get familiar with structure before you load?
https://stackoverflow.com/a/48668673/313628
https://stackoverflow.com/a/56268939/313628
https://stackoverflow.com/a/60284736/313628

how to find what looks at a content Database in SharePoint 2010

There is plenty of documentation out there for looking up what content DB a site collection uses in SharePoint. However, I'm looking for the reverse. I have a specific DB, and I need to know where (what URL's) it's content is referenced or displayed.
We have a DB that has been partially corrupted and in need of restoring. It appears the only clean backup we have of it is relatively old. However, at first glance the library we know to be using it is lightly used. There has been no new content added to it since our backup was taken.
I am looking for a way to confirm that restoring from this backup wont unknowingly overwrite some critical data somewhere else.

In doing more digging, I did find another SO post that was able to get me the information I needed.
How to see all site collections in a specific content DB
-ContentDatabase contentdbname | select url, #{label="Size";Expression={$_.usage.storage}}
In navigating to the returned URL, I found recently added data. So that now rules out the restore.

Extending RequestTracker tickets with external data

I'm thinking about how to extend RT (and also with the IR extension, but I don't think this makes a difference) in regards to retrieving files from external sources (e.g. sftp) and adding them as attachments to tickets. I'm asking for suggestions of how I might go about this, as I've not used RT much and never programmed in Perl before.
I'm thinking of adding an input and button in the ticket to allow the user to provide a unique ID for the file and for them to be able to click when they want to retrieve the file from the external source, so not an automatic retrieval, unless it only does it once.
I'm thinking of creating a MakeClicky (http://requesttracker.wikia.com/wiki/MakeClicky) which creates a link to a cgi script (something like 'getfile(abc.txt)'), providing the ticket ID and the UID for the file. This script would then retrieve the file and post it as a comment/reply to the ticket. A couple of things to ask:
Are comments and replies to tickets really the only way to add an attachment? I read this somewhere but cant find the source now
How would I modify the existing ticket from a cgi script? Its on the same host, would I still need to use the REST api? Or can I just import the RT modules and add a attachment/comment/reply with the attachment without using the REST api?
The other option would be to create a scrip for on create/comment/reply that would search the contents of the ticket for an identifier for the file, retrieve the file and attach it.
I'm open to suggestions, unless one of these is a good way to do it!
TIA!

Find all <forms> used on a site

Is there e.g. a crawler that can find (and list the form action etc.) all pages that have forms in my site?
I'd like to log all pages with unique actions to then audit further.

Norconex HTTP Collector is an open source web crawler that can certainly help you. Its "Importer" module has a "TextBetweenTagger" feature to extract text between any start and end text and store it in a metadata field of your choice. You can then filter out those that have no such text extracted (look at the EmptyMetadataFilter option for this).
You can do this without writing code. As far as storing the results, the product uses "Committers". A few committers are readily available (including a filesystem one), but you may want to write your own to "commit" your crawled data wherever you like (e.g. in a database).
Check its configuration page for ideas.

tinymce file browsers multiple file source

I am doing some updates to a site I have developed over the last few years. It has grown rather erratically (I tried to plan ahead, but with this site it has taken some odd turns).
Anyway, the site has a community blog ( blog.domain.com - used to be domainblog.com) ) and users with personal areas ( user1.domain.com, user2.domain.com, etc ).
The personal areas have standard page content that the user can use, or add snippets of text to partially customize. Now the owner wants the users to be able to create their own content.
Everything is done up to using a file browser.
I need a browser that will allow me to do the following:
the browser needs to be able to browse the common files at blog.domain.com/files and the user files at user_x.domain.com/files
the browser will also need to be able to differentiate between the two and generate the appropriate image url.
of course, the browser access to the user files will need to be dynamic and only show those files particular to the user (along with the common files)
I also need to be able to set a file size for images
the admin area is in a different directory than either the blog or the user subdomains.
general directory structure
--webdir--
|--client --
|--clientsite--
|--blog (blog.domain.com)
|--sites--
|--main site (domain.com)
|--admin (admin.domain.com)
|--users--
|--user1 (user1.domain.com)
|--user2 (user2.domain.com)
...etc.
I have tried several different browsers and using symlinks but the browsers don't seem to be able to follow them. I am also having trouble even setting them to use a directory that isn't the default.
what browser would you recommend? what would I need to customize to make it work.
TIA

ok, since I have not had any responses to this question, I guess I will have to do a work around and then see about writing a custom file browser down the road.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse