Explaining English Wikipedia revisions diff - diff

I found those files on the download page of Wikipedia.
Entries in those files look like this:
206430 12 0 'Anarchism' 1031579715 None True 3810 u'Bobdobbs1723' 11394:1:u'\n* [[Lysander Spooner]]\n* '
This is another line from the last dump (Number 15):
348524708 26470001 0 'Southeast air corps training center' 1268053721 u'[[WP:AES|\u2190]]Redirected page to [[Category:USAAF Southeast Training Center]]' False 803088 u'Bwmoll3' 0:1:u'#redirect [[Category:USAAF Southeast Training Center]]'
Those are tab-separated values with the 6th column being a list of changes committed on a specific Wikipedia page (The name of the page is in column 4).
I didn't find anything on the web about the structure of those logs. Particularly challenging is the format of the change list where multiple changes are documented.
Does anybody here know about the structure of those files?

Those dumps are from a tool called RevDiffSearch (formerly DiffDb), I believe. They are intended to be used with Lucene, and seem to be restructured versions of diffs from WikiHadoop.
The structure is:
rev_id page_id namespace title timestamp comment minor user_id user_text diff1_position:diff1_action:diff1_content diff2_position:diff2_action:diff2_content
Where (copy-pasted from here):
rev_id: The identifier of the revision being described PRIMARY KEY
page_id: The identifier of the page being revised
namespace: The identifier of the namespace of the page
title: The title of the page being revised
timestamp: The time the revision took place as a Unix epoch timestamp in seconds
comment: The edit summary left by the editor
minor: Minor status of the edit (boolean)
user_id: The identifier of the editor who saved the revision
user_text: The username of the editor who saved the revision
diffs - Tab separated, diff operations. Each diff operation has three parts (separated by colons):
position: The position in the article text at which the operation took place
action: Did the operation add or remove some text? ("1" for add, "-1" for remove)
content: The text operated on. For added text, this is the content to add. For removed text, this is the content that was removed.

Related

Word/Publisher email merge issues

I've been searching for days for an answer to this issue. I'm trying to append an Access field to a base URL to customize each email in my merge like so: http://www.example.com/myItems.asp?ItemID={field}.
I tried several approaches in Word 2007, then gave up and finally tried Publisher after coming across this post - MS Word: Mailmerge hyperlinks with query get URL string with a MERGEFIELD.
In Publisher, I got everything to merge properly including the custom links (according to preview), but when I hit "send email" it wasn't passing the emails to Outlook - said 0 message(s) sent. I tried again, using a blank email template and got it to pass the email, but the email showed field names rather than the merged data.
Coming across this article regarding the field names - http://msgroups.net/microsoft.public.publisher/emailmerge-not-working-in-publishe/213664 - I clicked outside the text box as suggested before sending email but still, the field names show and not the merged data.
I'm super frustrated and exhausted. This shouldn't be this difficult! Any ideas or suggestions would be appreciated.
This shouldn't be this difficult!
I agree. I can't help on the Publisher front, but this link should help for Windows Word.
To summarise, when you insert the HYPERLINK field, do it this way:
Use ctrl-F9 to insert a field code brace pair { }
Type HYPERLINK between the braces
Select the field and update it once (F9)
Do not update this field code again. If you do, Word will always insert the same link text (i.e. the hyperlink target). People working with fields often select F9 quite a lot just to make sure things are up to date, so you have to try not to do that.
If you Alt-F9, you should see that the display text is an error message (starting with "E" in the ENglish language version of Word).
Move the insertion point so it is immediately after the E. Type the display text that you want, or, if you want a variable display text built from text + MERGE fields etc, enter that text and those codes).
Carefully remove the "E" and the other part of the error text.
Use ALt-F9 again to display the HYPERLINK field code. Click after the K, type a space, then enter the following fields and text, assuming your variable text is coming from a MERGE field called fieldname:
"{ SET X 1 }http://www.example.com/myitems.asp?ItemID={ MERGEFIELD fieldname }"
(The SET field is there to stop Word doing something else wrong. If you have more than one HYPERLINK field, you will need to SET a different variable name (X1, X2 etc.) in each HYPERLINK). This is discussed in more detail here - interestingly enough, that question was also about merge to HTML email, but I think you also have to do the additional stuff I mention above to make it all work.

number representing text string

A web form collects data on students in a band organization at school. The form data is fed into a google sheet that then populates a merge template and the merged forms are emailed to the recipient. A parent needs to print, sign and turn in the forms. There are hundreds of kids in this band and at registration time when the forms are turned in it is easier to sort all the papers in the stack if you have a short sort number in the corner... Volunteer kids don't apply alphabetization well. I'm trying to create a formula that will give me that sorting number to merge onto the header of each page of the PDF they receive after submitting the form. I want it based on last name and then first name and be able to create that number (in the google sheet) on the fly because the merging happens almost instantly when the user submits the form. Hence, an excel type formula is desired that will result in a number representing the kids name. I'd like for each number to be unique but some names are the same for the first few letters, also some names are only 2 characters long. I tried making A=10, B=11, z=35 etc. (so all are 2 digits) So, using only the first 3 characters, Bob Jones would = 192423112411 - hardly easy to sort the paper at a glance and it doesn't really differentiate between Bob Janes either. 4 digits is preferable. I also looked at =code() formula and it came out with long numbers too. Any advice is appreciated. Thanks!
Side note: What method do spreadsheets use to sort text? Do they weight the characters or what? Before I got the automerge thing to work I assigned each kid in the list a number higher than the one below and lower than above (on the sheet), then did the merge.
One option is to:
sort the name list alphabetically
add a sort number column, and put a =TEXT(row(),"0000") formula to generate a unique ID
on the merge spreadsheet, use a VLOOKUP function to retrieve the unique ID for that specific name.
First off, that wall of text was kind of hard to read through. Please try and do a little formatting so the people trying to help you can easily follow what you're trying to convey.
Personally I would suggest a hyphenated system. First initial of last name converted to a number, followed by a hyphen, followed by the first two letters of their first name converted to numbers.
Bob Jones becomes 11-1956 assuming you differentiate between upper and lower case, or 11-1924 if you convert everything to upper case, which I guess makes more sense.
You could use this VBA function to convert names to a system like that:
Function ConvertToIndex(strInput As String) As String
Dim strLast As String
Dim arrName() As String
Dim strFirst1 As String
Dim strFirst2 As String
arrName = Split(strInput, " ")
strLast = Mid(arrName(1), 1, 1)
strFirst1 = Mid(arrName(0), 1, 1)
strFirst2 = Mid(arrName(0), 2, 1)
ConvertToIndex = Asc(UCase(strLast)) - 55 & "-" & Asc(UCase(strFirst1)) - 55 & Asc(UCase(strFirst2)) - 55
'MsgBox ConvertToIndex
End Function
Thank you Tim, Nutsch and Mad Tech for your responses. I appreciate your input. Sorry the paragraph was so long, I get wordy. Because the members get their merged PDF sheet immediately after submitting I need the number to be based on the name as soon as it's entered, not after the fact; so I was looking for a formula that would reside in the sheet. Interesting VBA function too though. I'll settle for numbering them afterwards, maybe when the sheets are turned in. By then I'll know all who are in the band and can assign numbers like before. Thanks again!

Is there a way to get the page number of the start of a paragraph with Applescript in Mac Word 2011?

I have a Word document and want to get the page number for any arbitrary paragraph within the document. I realise that paragraphs can span pages, so I actually need to ask about the start (or end) of the paragraph. Something like this pseudocode:
set the_page_number to page number of character 1 of paragraph 1 of my_document
I haven't been able to figure out how you link a range object with any kind of information about its rendering and am officially baffled.
Does anyone know the proper way?
I just found this question about dealing with this in C#: How do I find the page number for a Word Paragraph?
Poking around in the answer to that I found reference to range.get_Information(Word.WdInformation.wdActiveEndPageNumber)
It turns out there's a get range information command in the applescript dictionary, so you can do this:
set the_range to text object of character 1 of paragraph 123 of the_document
set page_number to get range information the_range information type active end adjusted page number
That'll get the page number that would be printed (e.g. if you'd set the document to start at page 42, this will produce the number you expect). Or, you can get the number without adjustment, i.e. your document page numbering is set to start at 42, but you want the page number as if numbering started at 1.
set the_range to text object of character 1 of paragraph 456 of the_document
set page_number to get range information the_range information type active end page number
Phew.

How do you control the order in which files appear in a GitHub gist

Is there a way to control the order in which files appear in a gist? They don't seem to be alphabetical or chronological. I'd like to have a README.md appear as the first file in a multi-file gist, but no amount of "deleting" a file and re-adding it seems to change anything.
Is there an order to these files that I'm not seeing, or does GitHub maintain an internal filetype priority list?
Since at least 2018, the order is alphabetical, with periods and numbers coming before letters.
That is, as mentioned in Andrew D.Bond's answer:
$
. (dot)
Numbers
Leading space (although the space doesn't appear after saving, the sort order is still updated)
(although bis, in Sept. 2020, IvanaGyro adds in the comments leading spaces will not affect the order any more)
_ (underscore)
Letters (case insensitive)
Around 2013-2014 a different order was used. See Andrew D. Bond's answer for more.
They are ordered automatically by name, following the ASCII table.
Unfortunatly, right now, it is not possible to order them by dragging, but there is a trick. You can control the order by adding one or more spaces before the name. The space will not be shown after editing, but the order will change.
E.g: lets say we have 3 files with the automatic order:
AFile.java
Readme.md
SomeFile.txt
We can invert the order by putting spaces like this:
(space)(space)SomeFile.txt
(space)Readme.md
AFile.txt
Updating my answer from an earlier year with additional testing I did just now:
Github automatically sorts files in a gist according to:
#
$
. (period)
Numbers
_ (underscore)
Letters (case insensitive)
Leading spaces are dropped.
If additional characters' sort order is discovered, feel free to edit this answer.
(Added this answer because even after I improved another answer to this question last year, I still couldn't find the sort order of special characters anywhere.)
As mentioned by #VonC in his answer, the order is asciibetical. Quick solution would be to prefix all files with numbers indicating the order in which you wish the files to appear, example 0_, 1_, 2_, ... 9_. Note, this solution will not work beyond 9 files as 10_ will appear before 2_. In that case, two digits need to be used 00_, 01_, 02_, ..., 09_, 10_, 11_, ... This can be generalized to any number of digits in the number of files. Although, it seems less likely, to me, that more more than 10 files to be shared in a gist.

format a word document

I have written the index of my report in word document but the page numbers are not properly formatted, I even tried to use table for it but it is still not working .
TABLE OF CONTENTS
Chapter: 1 Introduction…………………………………………………………….…....……..1
1.1 Project Summary……………………………………………………….......………..2
1.2 Objective……………………………………………………….……….…….….........2
1.3 Scope…………………………………………………….…………………...........…...2
1.4 Technology and literature……………………………….……………………..2
like above i ve my index. In word document page numbers are not arranged in a line.kindly help me.
Try to use instead of Whitespace button a Tab button.
And you may add a points "." at well. And numbers will be in true places