I have a MongoDB instance which contains a translation of texts:
{
"_id" : ObjectId("57c68ba415f4d42b6ecd9ee7"),
"en" : "Adana (pronounced [aˈda.na]) is a major city in southern Turkey. The city is situated on the Seyhan river, 35 km (22 mi) inland from the Mediterranean Sea, in south-central Anatolia. It is the administrative seat of the Adana Province and has a population of 1.7 million,[1] making it the fifth most populous city in Turkey. Adana-Mersin polycentric metropolitan area, with a population of 3 million, stretches over 70 km (43 mi) east-west and 25 km (16 mi) north-south; encompassing the cities of Mersin, Tarsus and Adana.",
"sw" : "Adana (Kigiriki Άδανα) ni mji mkubwa katika nchi ya Uturuki. Kwa mujibu wa sensa iliyofanyika mwaka wa 2000, mji una wakazi wapatao 1,130,710 waishio huko,[2] na kuufanya kuwa mmoja kati ya miji mitano mikubwa ya Uturuku (baada ya Istanbul, Ankara, İzmir na Bursa). Mwaka wa 2006 mji wa Adana umekadiriwa kufikia iadadi ya wakazi wapatao 1,271,894. Huu ndiyo mji mkuu wa Mkoa wa Adana."
}
{
"_id" : ObjectId("57c68ba915f4d42b6ecd9eea"),
"en" : "Addis Ababa or Addis Abeba (the spelling used by the official Ethiopian Mapping Authority),(Amharic: አዲስ አበባ? Addis Abäba IPA: [adˈdis ˈabəba] ( listen), \"new flower\"; Oromo: Finfinne,[3][4] [fɪnˈfɪ́n.nɛ́] \"Natural Spring(s)\"), is the capital and largest city of Ethiopia. Finfinne is its Oromo name. It has a population of 3,384,569 according to the 2007 population census, with annual growth rate of 3.8%. This number has been increased from the originally published 2,738,248 figure and appears to be still largely underestimated.[2][5]",
"sw" : "Addis Ababa (pia Addis Abeba; kwa Kiamhara አዲስ አበባ, \"Ua Jipya\"; kwa Kioromo Finfinne) ni mji mkuu wa Ethiopia na wa Umoja wa Afrika."
}
{
"_id" : ObjectId("57c68bab15f4d42b6ecd9eec"),
"en" : "Adelaide of Italy (931 – 16 December 999), also called Adelaide of Burgundy, was the second wife of Holy Roman Emperor Otto the Great[2] and was crowned as the Holy Roman Empress with him by Pope John XII in Rome on February 2, 962. Empress Adelaide was perhaps the most prominent European woman of the 10th century; she was regent of the Holy Roman Empire as the guardian of her grandson in 991-995.[2]",
"sw" : "Adelaide wa Italia (takriban 931 – 16 Desemba, 999) alikuwa binti wa Rudolf II, mfalme wa Burgundia. Kwanza aliolewa na Lothar, mfalme wa Italia. Alipofariki Lothar, Adelaide aliolewa na Otto I, mfalme wa Ujerumani. Aliishi maisha matakatifu. Sikukuu yake ni 16 Desemba."
}
What I would like to do is to select one specific record. For example I expect to select the last record by doing this:
db.wiki.find({"sw": "Adelaide wa Italia"}).pretty();
But the mongo shell returns nothing.
Indeed, I know that I can create an index and do something like:
db.wiki.find({$text: {$search: "\"Adelaide wa Italia\""}}).pretty();
which indeed returns the record as expected.
What am I doing wrong in the non-index searching please?
In this case you should use search with regex:
db.wiki.find({"sw": /Adelaide wa Italia/}).pretty();
The way you are doing it by:
db.wiki.find({"sw": "Adelaide wa Italia"}).pretty();
you simply tell Mongo to return you all documents where sw is equal to Adelaide wa Italia but you want to get all documents which contains this phrase in sw field instead.
Related
After executing a query on a huge ontology using Jena, I exported the results in JSON format in a MongoDB collection named items in a database named galileo.
Now I want to query on the collection to find items by their names (names are in the title field), in particular I want that searching for "Astrolabio", I can retrieve all the objects that contain the word "Astrolabio" in the title field (e.g. "Astrolabio", "Astrolabio Piano" etc...).The objects that interest me are contained in the #graph array.
I tried
db.items.find({"#graph":{$elemMatch:{"title":{$regex: /Astrolabio$/}}}})
but it returns lots of objects that don't contain the searched word too.
I tried also
db.items.find({},{"#graph":{$elemMatch:{"title":{$regex: /Astrolabio$/}}}})
but, as I discovered only after my try, it returns only the first object that match the request.
So what's the correct query for what I'm trying to do?
In order to provide an help, here there is a little slice of the document
{
"_id" : ObjectId("59e07632b5d295462b330c4c"),
"#graph" : [
{
"#id" : "http://minerva.atcult.it/rdf/000000016001",
"#type" : [
"bibo:Book",
"bibo:MultiVolumeBook"
],
"P1053" : [
"1 astrolabe",
"1 astrolabio"
],
"abstract" : [
"This astrolabe presently comprises two tympanums, for latitudes 30° and 33°, the other for latitudes 36° and 42° (corresponding to the regions between Persia and the Black Sea). There is an alidade, a rule, and a rete. The back carries a double shadow square and the zodiacal calendar. The instrument comes with a tooled black leather case (cover missing) containing a sixteenth-century manuscript note stating that the astrolabe was brought from Spain and dates from 1252. The astronomical data inscribed on the astrolabe suggest it may have been built before 1000. According to tradition, the instrument dates from the period of Charlemagne (9th C. ). A very similar Arab astrolabe is documented in a drawing by Antonio da Sangallo il Giovane [the Younger] (c. 1520?) at the Gabinetto dei Disegni e delle Stampe (Department of Drawings and Prints) of the Uffizi. Provenance: Medici collections",
"Questo astrolabio contiene attualmente due timpani, uno per le latitudini 30° e 33°, e l'altro 36° e 42° (corrispondenti alle regioni comprese tra la Persia e il Mar Nero). È completo di alidada, di regolo e di rete. Nel dorso presenta un doppio quadrato delle ombre e il calendario zodiacale. Lo strumento, proveniente dalle collezioni medicee, è completo di custodia di pelle nera lavorata (coperchio mancante) che porta all'interno una nota manoscritta del XVI secolo nella quale si ricorda che l'astrolabio fu portato dalla Spagna e che risale al 1252. I dati astronomici riportati sullo strumento suggeriscono di anticiparne la costruzione a prima del 1000. Secondo la tradizione si tratterebbe di uno strumento del tempo di Carlo Magno (IX secolo). Un astrolabio arabo molto simile a questo è documentato in un disegno di Antonio da Sangallo il Giovane (c. 1520?) conservato presso il Gabinetto dei Disegni e delle Stampe degli Uffizi. Proviene dalle collezioni medicee"
],
"contributor" : "http://minerva.atcult.it/rdf/ed494c3a-2ba6-3464-b34a-a57e4f70c5e0",
"creator" : [
"http://minerva.atcult.it/rdf/d481cbac-209b-3741-bba4-906590d805b3",
"http://minerva.atcult.it/rdf/36e6efa2-6c8f-350e-ae37-479dade48850",
"http://minerva.atcult.it/rdf/47734211-2637-3895-a690-4f33412931ec"
],
"identifier" : "000000016001",
"issued" : "sec. X",
"publisher" : "http://minerva.atcult.it/rdf/90310a84-1133-3356-bb3b-647ae1a7d14d",
"title" : "Astrolabio piano",
"numPages" : [
"1 astrolabio",
"1 astrolabe"
],
"placeOfPublication" : "Fattura araba",
"label" : "Astrolabio piano"
},
{
"#id" : "http://minerva.atcult.it/rdf/000000016002",
"#type" : [
"bibo:MultiVolumeBook",
"bibo:Book"
],
"P1053" : [
"1 astrolabe",
"1 astrolabio"
],
"abstract" : [
"Questo piccolo astrolabio contiene quattro timpani per le latitudini 24° e 30°, 31° e 35°, 32° e 36° (corrispondenti alla Persia) e per le latitudini 0° (cioè il circolo dell'equatore) e 66°. È completo di alidada e di rete. Il dorso della madre presenta il calendario lunare, secondo l'uso islamico, un quadrato delle ombre e un quadrante. Lo strumento reca la data 496 dell'Egira (1102-1103 dell'età Cristiana) ed è firmato dal suo artefice, Muhammad 'Ibn Abi'l Qasim 'Ibn Bakran, del quale non si hanno notizie. Fu donato al Museo di Storia della Scienza dal Principe fiorentino Tommaso Corsini",
"This small astrolabe carries four tympanums for latitudes 24°/30°, 31°/35°, and 32°/36° (corresponding to Persia), and for latitude 0° (i. e. , the circle of the equator) and 66°. There is an alidade and a rete. The back of the mater displays a lunar calendar, in accordance with Islamic use, a shadow square, and a quadrant. The instrument is dated 496 of the Hegira (1102-1103 of the Christian era) and is signed by its maker, Muhammad 'Ibn Abi'l Qasim 'Ibn Bakran, on whom we have no information. Donated to the Museo di Storia della Scienza by the Florentine Prince Tommaso Corsini"
],
"creator" : [
"http://minerva.atcult.it/rdf/c5738e64-fb77-354a-8fc8-47164105b5f7",
"http://minerva.atcult.it/rdf/3fa79916-cb7f-3574-a3fb-589ca42ebf17"
],
"identifier" : "000000016002",
"issued" : "1102-1103",
"publisher" : "http://minerva.atcult.it/rdf/90310a84-1133-3356-bb3b-647ae1a7d14d",
"title" : "Astrolabio piano",
"numPages" : [
"1 astrolabio",
"1 astrolabe"
],
"placeOfPublication" : "Fattura araba",
"label" : "Astrolabio piano"
},
if you need to have in a result only elements of the #graph array that match the query (if title contains word Astrolabio), you can reach that with the following aggregation framework query:
db.items.aggregate([
{$match: {"#graph.title": /Astrolabio/}},
{$unwind: "$#graph"},
{$match: {"#graph.title": /Astrolabio/} },
{$group: {"_id": "$_id", "#graph": {"$push": "$#graph" } }}
]);
your regex {$regex: /Astrolabio$/} will return only documents with titles that have 'Astrolabio' as the last word is a sentence ("Astrolabio Piano" will not be included as 'Piano' is the last word here).
This question already has answers here:
Add double quotation on duplicated name
(4 answers)
Closed 5 years ago.
I tried to use
sed 's/ */:/' file | awk -F: '{ if (arr[$1":"$2]) print "\""$1"\":"$2; else { arr[$1":"$2]++; print $0 }}'
but cannot get ideal output. Thanks.
The following is the file information and the desired output that I want.
Text File:
Jon DeLoach:408-253-3122:123 Park St., San Jose, CA 04086:7/25/53:85100
Karen Evich:284-758-2857:23 Edgecliff Place, Lincoln, NB 92086:7/25/53:85100
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Lori Gortz:327-832-5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
Paco Gutierrez:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
Paco Gutierrez:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
Jesse Neal:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Jesse Neal:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Zippy Pinhead:834-823-8319:2356 Bizarro Ave., Farmount, IL 84357:1/1/67:89500
Required output: Add stars indicating duplicated names
Jon DeLoach:408-253-3122:123 Park St., San Jose, CA 04086:7/25/53:85100
*Karen Evich*:284-758-2857:23 Edgecliff Place, Lincoln, NB 92086:7/25/53:85100
*Karen Evich*:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
*Karen Evich*:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
*Fred Fardbarkle*:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
*Fred Fardbarkle*:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Lori Gortz:327-832-5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
*Paco Gutierrez*:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
*Paco Gutierrez*:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
*Jesse Neal*:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
*Jesse Neal*:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Zippy Pinhead:834-823-8319:2356 Bizarro Ave., Farmount, IL 84357:1/1/67:89500
Give a test to this. Seems to work ok.
$ awk -F":" 'NR==FNR{a[$1]++;next}(a[$1]>1){sub($1,"*" $1 "*")}1' file1 file1
Explanation:
This code reads the same file twice. This maybe has a performance penalty depending on the filesize.
-F":" : Global Input Fields Delimiter is defined as :
NR==FNR{a[$1]++;next} : The code in { } is executed when NR==FNR = the first file is read by awk
a[$1]++ : Creates an array a with index $1 and value ++ => +1 for each $1 found. So for record 1 we have a[Jon DeLoach]=1. For Record2 a[Karen Evich]=1, for record 3 a[Karen Evich]++ => 2,etc
next : instructs awk to go to the next record and skip the rest script.
(a[$1]>1){sub($1,"*" $1 "*")}1 : This condition & action is performed on the second file. For each a[$1] found in second file with a value >1 (as has been finalized when the first file finished), we insert * around $1 using awk sub function. sub function applies substitution directly to $0 = Whole record.
1 : prints the whole record of the second file.
SED question
I need to print any lines that have contain 11 for November or 12 for December.
My two questions are:
How do I search for more than one item I.E. print lines with the value 11 and 12?
How do I tell the search to look in column 4 which has the dates?
What I have so far:
sed -n -e '/11/,/12/p' datebook
File datebook:
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Betty Boop:245-836-8357:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500
Igor Chevsky:385-375-8395:3567 Populus Place, Caldwell, NJ 23875:6/18/68:23400
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:245700
Jennifer Cowan:548-834-2348:583 Laurel Ave., Kingsville, TX 83745:10/1/35:58900
Jon DeLoach:408-253-3122:123 Park St., San Jose, CA 04086:7/25/53:85100
Karen Evich:284-758-2857:23 Edgecliff Place, Lincoln, NB 92086:7/25/53:85100
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Lori Gortz:327-832-5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
Paco Gutierrez:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
Ephram Hardy:293-259-5395:235 CarltonLane, Joliet, IL 73858:8/12/20:56700
James Ikeda:834-938-8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Barbara Kertz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/1/46:268500
Lesley Kirstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62:52600
William Kopf:846-836-2837:6937 Ware Road, Milton, PA 93756:9/21/46:43500
Sir Lancelot:837-835-8257:474 Camelot Boulevard, Bath, WY 28356:5/13/69:24500
Jesse Neal:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Zippy Pinhead:834-823-8319:2356 Bizarro Ave., Farmount, IL 84357:1/1/67:89500
Arthur Putie:923-835-8745:23 Wimp Lane, Kensington, DL 38758:8/31/69:126000
Popeye Sailor:156-454-3322:945 Bluto Street, Anywhere, USA 29358:3/19/35:22350
Jose Santiago:385-898-8357:38 Fife Way, Abilene, TX 39673:1/5/58:95600
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale, CA 94087:5/19/66:34200
Yukio Takeshida:387-827-1095:13 Uno Lane, Ashville, NC 23556:7/1/29:57000
Vinh Tranh:438-910-7449:8235 Maple Street, Wilmington, VM 29085:9/23/63:68900
How do I tell the search to look in column 4 which has the dates?
This is an indication that you should use awk because sed doesn't have the concept of fields. An awk solution would be
awk -v FS=":" '$4 ~ /^1[12]\/.*/{print}' datebook
Output
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
James Ikeda:834-938-8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Barbara Kertz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/1/46:268500
Deciphering the solution
FS=":" sets the the field/column delimiter to colon.
$4 represents the column four in your input file which is the date in the format mm/dd/yy
The ~ in $4 ~ /^1[12]\/.*/ means we do a regex match in which
^ represents the beginning of the string
[12] can match either one or two.
Since the regex part itself is delimited by / you need to escape any literal / as in \/
It appears that you want to select lines where the first characters after the third colon on the line are 11/ or 12/ (since the data formats appear to be pre-Y2K-style US-format dates with mm/dd/yy notation). So you write:
$ sed -n '/^\([^:]*:\)\{3\}1[12]\//p' datebook
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
James Ikeda:834-938-8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Barbara Kertz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/1/46:268500
$
The ^ matches at the start of a line; the \([^:]*]:\) part looks for a series of zero or more non-colons followed by a colon; the \{3\} requires 3 of them; the 1[12]\/ demands 11/ or 12/ after that; the p prints.
I observe that the initial statement says 'contain 11 for November or 12 for December', but your first numbered question says 'value 11 and 12'. These are contradictory; a given date field can only start with one or the other, not both. I've assumed that 'or' is what you intended.
So far we have made this code to scrape the website: http://www.theft-alerts.com
In the website is a form and in that form a frmSFair. We need all the stolen artworks information. Can someone help?
If we scrape the form by:
import urllib2
from bs4 import BeautifulSoup
connection = urllib2.urlopen('http://www.theft-alerts.com')
soup = BeautifulSoup(connection, "html.parser")
form = soup.find_all(span="table")
for form in soup.form.stripped_strings:
print(str(form.encode('utf-7')))
Output:
Sign up for our newsletter
Add email address below
See a sample eSalvo
The code picks the table newsletter on the ride side of the website and we need the table in the middle. This information:
STOLEN : CHERUB IN MARBLE, PART OF A FOUNTAIN
Stolen from Canterbury, Kent, UK on 8 February 2016
Item : A copy of Verrocchio's cupid - winged cherub standing on one leg holding a dolphin - in white marble which formed the top part of a fountain. approximately 3 foot high. Item has discoloured due to weathering with some lichen growth.
Any info to : PC 12994 Canterbury. Tel 01622 690690
Messages : Send a message
Crime Ref : ZY - 4370 - 16
No of items stolen : 1
images:
Location : UK > Kent
Category : STATUARY
ID : 93578
User : 53329 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 1 month)
Date Created : 10 Feb 2016 14:36:23
Date Modified : 11 Feb 2016 16:40:06;
To get the text from each:
for sp in soup.select("table div.itemspacingmodified"):
for wd in sp.select("div.itemindentmodified"):
text = wd.text
if not text.startswith("Images :"):
print(text)
Which gives you:
STOLEN : CHERUB IN MARBLE, PART OF A FOUNTAIN
Stolen from Canterbury, Kent, UK on 8 February 2016
Item : A copy of Verrocchio's cupid - winged cherub standing on one leg holding a dolphin - in white marble which formed the top part of a fountain. approximately 3 foot high. Item has discoloured due to weathering with some lichen growth.
Any info to : PC 12994 Canterbury. Tel 01622 690690
Messages : Send a message
Crime Ref : ZY - 4370 - 16
No of items stolen : 1
Location : UK > KentCategory : STATUARYID : 93578User : 53329 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 1 month)Date Created : 10 Feb 2016 14:36:23Date Modified : 11 Feb 2016 16:40:06;
STOLEN : OVER 70 ANTIQUE YORK STONE PAVING SLABS
Stolen from Steyning on 30th October 2015
Item : Antique York Stone paving slabs stolen from historic landscaped garden overnight. Truck driven through electric gates to gain access.
Any info to : PCSO Stewart Metcalfe. Sussex Police mob. 07912 894151
Messages : Send a message
Web URL : https://stmarysbramber.co.uk
Crime Ref : 47150140173
No of items stolen : 70
Recovered Details : None
Location : UK > West SussexCategory : FLAGSTONES & FLOOR TILESID : 92311User : 52866 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 3 months)Date Created : 05 Nov 2015 12:04:50Date Modified : 05 Nov 2015 12:15:10;
STOLEN : GARDEN STATUE OF BOY
Stolen from Bridgnorth, Shropshire on 20th / 21st Aug 2015
Item : Small lead,(I think ! ),statue of a boy standing on a stone plinth
Any info to : West Mercia Police - Crime No 22FJ59981W15
Messages : Send a message
Crime Ref : 22FJ59981W15
No of items stolen : 2
Recovered Details : NA
Location : UK > ShropshireCategory : STATUARYID : 91278User : 52457 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 6 months)Date Reset : 10 Sep 2015 00:30:01Date Modified : 26 Aug 2015 15:07:31; EL from 26 Aug 2015 to 09 Sep 2015 26Aug15;EL History : 26 Aug 2015 10:52:49;
STOLEN : YORKSTONE FLAGSTONES
Stolen from Nr Sevenoaks, Kent, UK on 26 Aug 2015
Item : Flagstones from St Mary's Church, Sundridge, Sevenoaks, Kent TN14 6EA. 70 in total. They are old Yorkstone flagstones approx. 2" thick, the sizes are as follows:
24 x 3'x2'
14 x 2'x1.5'
10 x 2'x2'
10 x 2'x1'
12 x 1'x1'
Any info to : Maidstone, Kent Police station Tel: 101
Messages : Send a message
Crime Ref : YY/17519/15
No of items stolen : 70
Location : UK > KentCategory : FLAGSTONES & FLOOR TILESID : 91428User : 52513 ; Churches and Memorial Custodians ; (Registered SalvoWEB user for 6 months)Date Created : 03 Sep 2015 14:48:45Date Modified : 04 Sep 2015 09:34:13;
STOLEN : RED STANDSTONE BIRDBATH
Stolen from Watlington, Oxfordshire UK on 16 July 2015
Item : A red sandstone bird bath with applied bronze decoration and a central bronze figure of a young girl on a dolphin by Richard Garbe
Overall height 4'3" (129.5 cm), Figure height 1' 3" (38 cm)
Square at bowl 1'3" (38 cm) Square at base 1'4" (40.4 cm)
Any info to : R McIntyre, PC 0200, Wallingford
Messages : Send a message
Crime Ref : 4315016713
No of items stolen : 1
Location : UK > OxfordshireCategory : STATUARYID : 89824User : 52054 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 6 months)Date Reset : 13 Jul 2015 00:30:02Date Modified : 21 Jun 2015 20:58:33; EL from 21 Jun 2015 to 12 Jul 2015 21Jun15;EL History : 21 Jun 2015 20:50:51;
STOLEN : STONE SCULPTURE - STATUE IS OF A PROPHET
Stolen from Belton, Loughbrough on 05/05/2015
Item : Statue is of a Prophet
5ft tall, this has been sympathetically restored by stonemason using a local stone which is very similar to the original stone.
The statue is 63" tall and weighs around ½ ton
Any info to : 3046515
Messages : Send a message
Crime Ref : 3046515
No of items stolen : 1
Location : UK > LeicestershireCategory : GARDENID : 89030User : 51750 ; Professional/Architect/Designer/Media/Film/TV ; (Registered SalvoWEB user for 6 months)Date Created : 06 May 2015 15:38:24Date Modified : 06 May 2015 16:18:28;
STOLEN : WOOL CARPET/RUG IN VARIOUS COLOURS GEOMETRIC DESIGN
Stolen from Kensal Green on 21 April 2015
Item : A knotted wool rug or carpet, approx 118 by 76 inches, with a geometric design in light and dark brown, cream, pink and black.
Any info to : Metropolitan Police. Tel: 101 and quote crime ref: 6518176/15
Messages : Send a message
Crime Ref : 6518176/15
No of items stolen : 1
Location : UK > London North WestCategory : FURNITURE & MIRRORSID : 88924User : 34 ; Antique/Reclamation/Salvage Trade ; (Salvo Code Dealer)Date Created : 29 Apr 2015 21:46:11Date Modified : 29 Apr 2015 22:24:45;
STOLEN : ANOTHER HISTORIC MILESTONE STOLEN AT REDBOURN, HERTS.
Stolen from Redbourn, Hertfordshire on 15th March 2015
Item : Very distinctive square section small Milepost, well-known to local residents and others, and featured in local publications. It was on the A5183 at St.Albans Road at Redbourn, opposite the Chequers Public House. It would originally have been installed by The Dunstable-St. Albans-London Turnpike Trust, established by Act of Parliament in 1722; after the abolition of this and other similar trusts responsibility would have fallen to the Parish Council, later passing to the County Council in the 20th Century.
This milestone has the Milestone Society Identity ref. HE_LH24. Its neighbour HE_LH23 was stolen two years ago.
Any info to : Hertfordshire Police 01707 354000
Messages : Send a message
Crime Ref : WCR/41/20187/15
No of items stolen : 1
Location : UK > WarwickshireCategory : Architectural STONE & TERRACOTTAID : 88606User : 46089 ; Charity/Government/Institution/Plc ; (Registered SalvoWEB user for 2 years or more)Date Created : 10 Apr 2015 14:15:35Date Modified : 28 Apr 2015 16:52:03;
STOLEN : COALBROOKDALE GARDEN BENCH
Stolen from near Lichfield, Staffordshire UK on 25 February 2015
Item : A beautiful (most likely Coalbrookdale) cast iron 'Oak and Ivy' pattern bench. White painted with wooden slatted seat - but could now be a different colour. Approximate size 155 cm wide. Taken from a garden near Lichfield but possibly now in the Kent or South East region or anywhere in the country. Unusual and striking pattern. A £100 reward for first information which leads to recovery.
Any info to : PC 3864 Lichfield Tel 0300 123 4455
Messages : Send a message
Crime Ref : 27th February 2015 No 644
No of items stolen : 1
Location : UK > North YorkshireCategory : GARDENID : 87846User : 51382 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 1 year)Date Reset : 19 Mar 2015 00:30:01Date Modified : 11 Mar 2015 18:00:05; EL from 03 Mar 2015 to 10 Mar 2015 03Mar15; EL from 11 Mar 2015 to 18 Mar 2015 11Mar15;EL History : 02 Mar 2015 10:07:07;11 Mar 2015 00:30:01;
STOLEN : MILESTONE TAKEN FROM THE SIDE OF OLD LONDON ROAD, MALDON, ESSEX
Stolen from Maldon, Essex, UK on 31/1/15 - 11/3/15
Item : This is a milestone approximately 30 cms square but only around 90 total height including below surface. It was set into a concrete socket and would probably have needed lifting equipment to extract. It had received damage to one corner and face about six years ago, probably by carelessly operated grass-cutting machinery. It was situated opposite the cemetery in Old London Road (Grid Ref TL83740712) and was registered bythe Milestone Society with the ID ref EX_MGMN37
Any info to : PS 214 Maldon District Neighbourhood Policing Sergeant Direct Dial: 101 Ext 412104
Messages : Send a message
Crime Ref : CF0205920315
No of items stolen : 1
Location : UK > EssexCategory : Architectural STONE & TERRACOTTAID : 88150User : 46089 ; Charity/Government/Institution/Plc ; (Registered SalvoWEB user for 2 years or more)Date Created : 16 Mar 2015 11:31:40Date Modified : 16 Mar 2015 11:45:20;
STOLEN : MILESTONE PLATE STOLEN. A420 JUST WEST OF CHIPPENHAM, WILTSHIRE
Stolen from Chippenham, Wiltshire on Before Christmas 2014
Item : This stone was involved in a major traffic accident sometime before Christmas 2014 and is now not only leaning over at a worse angle than ever but is in three or more pieces. The Cast iron plate on its front has disappeared, presumed stolen.
Any info to : Crime reference number 5410002796 reported by Wiltshire Council. Tel: 101 to report any news, quoting the ref. number.
Messages : Send a message
Crime Ref : 5410002796 Wiltshire
No of items stolen : 1
Location : UK > WiltshireCategory : Architectural METALWORKID : 86859User : 46089 ; Charity/Government/Institution/Plc ; (Registered SalvoWEB user for 2 years or more)Date Created : 10 Jan 2015 17:26:33Date Modified : 28 Apr 2015 16:54:46;
STOLEN : A LARGE TAYLORS OF LOUGHBOROUGH BELL
Stolen from Bromyard on 7 August 2014
Item : The bell has a diameter of 37 1/2" is approx 3' tall weighs just shy of half a ton and was made by Taylor's of Loughborough in 1902. It is stamped with the numbers 232 and 11.
The bell had come from Co-operative Wholesale Society's Crumpsall Biscuit Works in Manchester.
Any info to : PC 2361. Tel 0300 333 3000
Messages : Send a message
Crime Ref : 22EJ / 50213D-14
No of items stolen : 1
Location : UK > Hereford & WorcsCategory : Shop, Pub, Church, Telephone Boxes & BygonesID : 84377User : 1 ; Antique/Reclamation/Salvage Trade ; (Administrator)Date Created : 11 Aug 2014 15:27:57Date Modified : 11 Aug 2014 15:37:21;
Each section is contained in a div with the class itemspacingmodified, then all the info is inside the divs with the class itemindentmodified so you just need to pull the text from each.
The only problem is the line breaks, you can see 91278User is syuck together, we can replace the line breaks with newlines:
connection = urllib2.urlopen('http://www.theft-alerts.com')
soup = BeautifulSoup(connection.read().replace("<br>","\n"), "html.parser")
for sp in soup.select("table div.itemspacingmodified"):
for wd in sp.select("div.itemindentmodified"):
text = wd.text
if not text.startswith("Images :"):
print(text)
So now we get:
STOLEN : CHERUB IN MARBLE, PART OF A FOUNTAIN
Stolen from Canterbury, Kent, UK on 8 February 2016
Item : A copy of Verrocchio's cupid - winged cherub standing on one leg holding a dolphin - in white marble which formed the top part of a fountain. approximately 3 foot high. Item has discoloured due to weathering with some lichen growth.
Any info to : PC 12994 Canterbury. Tel 01622 690690
Messages : Send a message
Crime Ref : ZY - 4370 - 16
No of items stolen : 1
Location : UK > Kent
Category : STATUARY
ID : 93578
User : 53329 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 1 month)
Date Created : 10 Feb 2016 14:36:23
Date Modified : 11 Feb 2016 16:40:06;
STOLEN : OVER 70 ANTIQUE YORK STONE PAVING SLABS
Stolen from Steyning on 30th October 2015
Item : Antique York Stone paving slabs stolen from historic landscaped garden overnight. Truck driven through electric gates to gain access.
Any info to : PCSO Stewart Metcalfe. Sussex Police mob. 07912 894151
Messages : Send a message
Web URL : https://stmarysbramber.co.uk
Crime Ref : 47150140173
No of items stolen : 70
Recovered Details : None
Location : UK > West Sussex
Category : FLAGSTONES & FLOOR TILES
ID : 92311
User : 52866 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 3 months)
Date Created : 05 Nov 2015 12:04:50
Date Modified : 05 Nov 2015 12:15:10;
STOLEN : GARDEN STATUE OF BOY
Stolen from Bridgnorth, Shropshire on 20th / 21st Aug 2015
Item : Small lead,(I think ! ),statue of a boy standing on a stone plinth
Any info to : West Mercia Police - Crime No 22FJ59981W15
Messages : Send a message
Crime Ref : 22FJ59981W15
No of items stolen : 2
Recovered Details : NA
Location : UK > Shropshire
Category : STATUARY
ID : 91278
User : 52457 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 6 months)
Date Reset : 10 Sep 2015 00:30:01
Date Modified : 26 Aug 2015 15:07:31; EL from 26 Aug 2015 to 09 Sep 2015 26Aug15;
EL History : 26 Aug 2015 10:52:49;
STOLEN : YORKSTONE FLAGSTONES
Stolen from Nr Sevenoaks, Kent, UK on 26 Aug 2015
Item : Flagstones from St Mary's Church, Sundridge, Sevenoaks, Kent TN14 6EA. 70 in total. They are old Yorkstone flagstones approx. 2" thick, the sizes are as follows:
24 x 3'x2'
14 x 2'x1.5'
10 x 2'x2'
10 x 2'x1'
12 x 1'x1'
Any info to : Maidstone, Kent Police station Tel: 101
Messages : Send a message
Crime Ref : YY/17519/15
No of items stolen : 70
Location : UK > Kent
Category : FLAGSTONES & FLOOR TILES
ID : 91428
User : 52513 ; Churches and Memorial Custodians ; (Registered SalvoWEB user for 6 months)
Date Created : 03 Sep 2015 14:48:45
Date Modified : 04 Sep 2015 09:34:13;
STOLEN : RED STANDSTONE BIRDBATH
Stolen from Watlington, Oxfordshire UK on 16 July 2015
Item : A red sandstone bird bath with applied bronze decoration and a central bronze figure of a young girl on a dolphin by Richard Garbe
Overall height 4'3" (129.5 cm), Figure height 1' 3" (38 cm)
Square at bowl 1'3" (38 cm) Square at base 1'4" (40.4 cm)
Any info to : R McIntyre, PC 0200, Wallingford
Messages : Send a message
Crime Ref : 4315016713
No of items stolen : 1
Location : UK > Oxfordshire
Category : STATUARY
ID : 89824
User : 52054 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 6 months)
Date Reset : 13 Jul 2015 00:30:02
Date Modified : 21 Jun 2015 20:58:33; EL from 21 Jun 2015 to 12 Jul 2015 21Jun15;
EL History : 21 Jun 2015 20:50:51;
STOLEN : STONE SCULPTURE - STATUE IS OF A PROPHET
Stolen from Belton, Loughbrough on 05/05/2015
Item : Statue is of a Prophet
5ft tall, this has been sympathetically restored by stonemason using a local stone which is very similar to the original stone.
The statue is 63" tall and weighs around ½ ton
Any info to : 3046515
Messages : Send a message
Crime Ref : 3046515
No of items stolen : 1
Location : UK > Leicestershire
Category : GARDEN
ID : 89030
User : 51750 ; Professional/Architect/Designer/Media/Film/TV ; (Registered SalvoWEB user for 6 months)
Date Created : 06 May 2015 15:38:24
Date Modified : 06 May 2015 16:18:28;
STOLEN : WOOL CARPET/RUG IN VARIOUS COLOURS GEOMETRIC DESIGN
Stolen from Kensal Green on 21 April 2015
Item : A knotted wool rug or carpet, approx 118 by 76 inches, with a geometric design in light and dark brown, cream, pink and black.
Any info to : Metropolitan Police. Tel: 101 and quote crime ref: 6518176/15
Messages : Send a message
Crime Ref : 6518176/15
No of items stolen : 1
Location : UK > London North West
Category : FURNITURE & MIRRORS
ID : 88924
User : 34 ; Antique/Reclamation/Salvage Trade ; (Salvo Code Dealer)
Date Created : 29 Apr 2015 21:46:11
Date Modified : 29 Apr 2015 22:24:45;
STOLEN : ANOTHER HISTORIC MILESTONE STOLEN AT REDBOURN, HERTS.
Stolen from Redbourn, Hertfordshire on 15th March 2015
Item : Very distinctive square section small Milepost, well-known to local residents and others, and featured in local publications. It was on the A5183 at St.Albans Road at Redbourn, opposite the Chequers Public House. It would originally have been installed by The Dunstable-St. Albans-London Turnpike Trust, established by Act of Parliament in 1722; after the abolition of this and other similar trusts responsibility would have fallen to the Parish Council, later passing to the County Council in the 20th Century.
This milestone has the Milestone Society Identity ref. HE_LH24. Its neighbour HE_LH23 was stolen two years ago.
Any info to : Hertfordshire Police 01707 354000
Messages : Send a message
Crime Ref : WCR/41/20187/15
No of items stolen : 1
Location : UK > Warwickshire
Category : Architectural STONE & TERRACOTTA
ID : 88606
User : 46089 ; Charity/Government/Institution/Plc ; (Registered SalvoWEB user for 2 years or more)
Date Created : 10 Apr 2015 14:15:35
Date Modified : 28 Apr 2015 16:52:03;
STOLEN : COALBROOKDALE GARDEN BENCH
Stolen from near Lichfield, Staffordshire UK on 25 February 2015
Item : A beautiful (most likely Coalbrookdale) cast iron 'Oak and Ivy' pattern bench. White painted with wooden slatted seat - but could now be a different colour. Approximate size 155 cm wide. Taken from a garden near Lichfield but possibly now in the Kent or South East region or anywhere in the country. Unusual and striking pattern. A £100 reward for first information which leads to recovery.
Any info to : PC 3864 Lichfield Tel 0300 123 4455
Messages : Send a message
Crime Ref : 27th February 2015 No 644
No of items stolen : 1
Location : UK > North Yorkshire
Category : GARDEN
ID : 87846
User : 51382 ; Diyer/Homeowner/Private ; (Registered SalvoWEB user for 1 year)
Date Reset : 19 Mar 2015 00:30:01
Date Modified : 11 Mar 2015 18:00:05; EL from 03 Mar 2015 to 10 Mar 2015 03Mar15; EL from 11 Mar 2015 to 18 Mar 2015 11Mar15;
EL History : 02 Mar 2015 10:07:07;11 Mar 2015 00:30:01;
STOLEN : MILESTONE TAKEN FROM THE SIDE OF OLD LONDON ROAD, MALDON, ESSEX
Stolen from Maldon, Essex, UK on 31/1/15 - 11/3/15
Item : This is a milestone approximately 30 cms square but only around 90 total height including below surface. It was set into a concrete socket and would probably have needed lifting equipment to extract. It had received damage to one corner and face about six years ago, probably by carelessly operated grass-cutting machinery. It was situated opposite the cemetery in Old London Road (Grid Ref TL83740712) and was registered bythe Milestone Society with the ID ref EX_MGMN37
Any info to : PS 214 Maldon District Neighbourhood Policing Sergeant Direct Dial: 101 Ext 412104
Messages : Send a message
Crime Ref : CF0205920315
No of items stolen : 1
Location : UK > Essex
Category : Architectural STONE & TERRACOTTA
ID : 88150
User : 46089 ; Charity/Government/Institution/Plc ; (Registered SalvoWEB user for 2 years or more)
Date Created : 16 Mar 2015 11:31:40
Date Modified : 16 Mar 2015 11:45:20;
STOLEN : MILESTONE PLATE STOLEN. A420 JUST WEST OF CHIPPENHAM, WILTSHIRE
Stolen from Chippenham, Wiltshire on Before Christmas 2014
Item : This stone was involved in a major traffic accident sometime before Christmas 2014 and is now not only leaning over at a worse angle than ever but is in three or more pieces. The Cast iron plate on its front has disappeared, presumed stolen.
Any info to : Crime reference number 5410002796 reported by Wiltshire Council. Tel: 101 to report any news, quoting the ref. number.
Messages : Send a message
Crime Ref : 5410002796 Wiltshire
No of items stolen : 1
Location : UK > Wiltshire
Category : Architectural METALWORK
ID : 86859
User : 46089 ; Charity/Government/Institution/Plc ; (Registered SalvoWEB user for 2 years or more)
Date Created : 10 Jan 2015 17:26:33
Date Modified : 28 Apr 2015 16:54:46;
STOLEN : A LARGE TAYLORS OF LOUGHBOROUGH BELL
Stolen from Bromyard on 7 August 2014
Item : The bell has a diameter of 37 1/2" is approx 3' tall weighs just shy of half a ton and was made by Taylor's of Loughborough in 1902. It is stamped with the numbers 232 and 11.
The bell had come from Co-operative Wholesale Society's Crumpsall Biscuit Works in Manchester.
Any info to : PC 2361. Tel 0300 333 3000
Messages : Send a message
Crime Ref : 22EJ / 50213D-14
No of items stolen : 1
Location : UK > Hereford & Worcs
Category : Shop, Pub, Church, Telephone Boxes & Bygones
ID : 84377
User : 1 ; Antique/Reclamation/Salvage Trade ; (Administrator)
Date Created : 11 Aug 2014 15:27:57
Date Modified : 11 Aug 2014 15:37:21;
I tried a Script to mark the Journal using Score Condition.
W{REGEXP("Journal",true)->MARK(ONLY_Journal)};
W{REGEXP("Retraction|Retracted")->MARK(RETRACT)};
W{REGEXP("Suppl")->MARK(SUPPLY)};
NUM {->MARK(VOLUMEISSUE,1,6)}LParen NUM SPECIAL?{REGEXP("-")} NUM? RParen;
Reference{CONTAINS(ONLY_Journal)->MARKSCORE(10,JOURNAL_MAYBE)};
Reference{CONTAINS(JournalVolumeMarker)->MARKSCORE(5,JOURNAL_MAYBE)};
Reference{CONTAINS(VOLUMEISSUE)->MARKSCORE(15,JOURNAL_MAYBE)};
Reference{CONTAINS(JOURNALNAME)->MARKSCORE(10,JOURNAL_MAYBE)};
Reference{CONTAINS(RETRACT)->MARKSCORE(10,JOURNAL_MAYBE)};
Reference{CONTAINS(SUPPLY)->MARKSCORE(5,JOURNAL_MAYBE)};
JOURNAL_MAYBE{SCORE(20,55)->MARK(JOURNAL)};
Sample Text
1.Lawrence RA. A review of the medical 342–340 benefits and contraindications to breastfeeding in the United States [Internet] . Arlington (VA): National Center for Education in Maternal and Child Health; 1997 Oct [cited 2000 Apr 24]. p. 40. Available from: www.ncemch.org/pubs/PDFs/Welcometojungle.pdf.
2.Shishido A. Retraction notice: Effect of platinum compounds on murine lymphocyte mitogenesis [Retraction of Alsabti EA, Ghalib ON, Salem MH. In: Jpn J Med Biol 1979 Apr; 32(2):53-65]. Jpn J Med Sci Biol 1980 Aug;33(4):235-237.
3.Leist TP, Zinkernagel RM. Effects of treatment with IL-2 receptor specific monoclonal antibody in mice [letter] [Retraction of Leist TP, Kohler M, Eppler M, Zinkernagel RM. In: J Immunol 1989 Jul 15; 143(2): 628-32]. J Immunol 1990 Apr 1;144(7):2847.
4.Chen, L., James, N., Barker, C., Busam, K., & Marghoob, A. (2013). Desmoplastic
melanoma: A review. Journal of the American Academy of Dermatology, 68(5), 825-833.
doi: 10.1016/j.jaad.2012.10.041.
But the above script is not working.Can anyone find a solution for it.
Thanks in advance.
This should work jsut fine, but depends of course on the amount of annotations of the types existence of ONLY_Journal, JournalVolumeMarker, and so on ...
Here's the test script for a simple ruta project:
ENGINE utils.PlainTextAnnotator;
TYPESYSTEM utils.PlainTextTypeSystem;
Document{->EXEC(PlainTextAnnotator, {Paragraph})};
DECLARE Reference, ONLY_Journal, JOURNAL_MAYBE, JournalVolumeMarker, VOLUMEISSUE, JOURNALNAME, RETRACT, SUPPLY;
DECLARE JOURNAL;
Paragraph{-> Reference};
"Jpn J Med Biol" -> JOURNALNAME;
"32\\(2\\)" -> VOLUMEISSUE;
Reference{CONTAINS(ONLY_Journal)->MARKSCORE(10,JOURNAL_MAYBE)};
Reference{CONTAINS(JournalVolumeMarker)->MARKSCORE(5,JOURNAL_MAYBE)};
Reference{CONTAINS(VOLUMEISSUE)->MARKSCORE(15,JOURNAL_MAYBE)};
Reference{CONTAINS(JOURNALNAME)->MARKSCORE(10,JOURNAL_MAYBE)};
Reference{CONTAINS(RETRACT)->MARKSCORE(10,JOURNAL_MAYBE)};
Reference{CONTAINS(SUPPLY)->MARKSCORE(5,JOURNAL_MAYBE)};
JOURNAL_MAYBE{SCORE(20,55)->MARK(JOURNAL)};
... applied sample text, the second reference is annotated with JOURNAL.
DISCLAIMER: I am a develoepr of UIMA Ruta.