I really hope this question hasn't been asked and answered before but I can't find a single clear SO answer on it so here goes.
I am trying to import a CSV file into a mongoDB collection. The CSV file contains word definitions, which often include commas. I want to be able to store these data points in mongodb with their commas and be able to read them with their commas in a javascript app.
When I import the csv into mongoDB, it reads these commas as an indication to go on to the next field. I have tried using double quotes around each of my observations but to no avail - instead this imports as three quotation marks (""") and mongoDB still takes the comma within that observation as an indication to move on to the next field.
Please help!!
(For clarity I am using the simplified GUI through mongoDB compass...but happy to use the command line if there is a solution!)
Example of the CSV:
You might try using the command-line tool, mongoimport for more options than the MongoDB Compass import interface has.
Related
One of my input file is a csv (separated with comma). One of the field is address which has new line character in it. So this causes me considerable trouble when I read it using spark, where one input record gets split into muitiple records.
Has any one able to find a solution to deal with this. The workaround currently done is to remove the new line characters in data at source side before reading into spark.
I would like to create a general solution for this in spark. I use scala dataframe api's.
You can try the multiLine option of the csv reader.
spark.read.csv(file, multiLine=True)
I have been asked to do NLP on a folder of free text documents in SAS. Normally I do this in Python or R and I am not sure how to import he txt files into SAS because there is no structure.
I have thought about using proc import but don't know what I would use as a delimiter. How can one import free text files with no structure into SAS? I supposed once I got in I could use things '%like%' sort of items to pull out what they want.
I would strongly recommend against this. Use the right tool for the right job, in this case it's not SAS.
Ok, that being said some basics you could do:
Import text files and create n grams. Ideally, 1,2 &3 words.
Use PROC FREQ to summarize n-grams.
Find a parts of speech corpus and merge than with the 1 gram to remove useless words.
Calculate length of words and length of sentence to create a document complexity score.
Those are all doable in Base.
Perhaps a stupid question but I have a document where I have a large number of numerical values arranged in columns, although not in word's actual column formatting and I want to delete certain columns while leaving one intact. Heres a link to a part of my document.
Data
As can be seen there are four columns and I only want to keep the 3rd column but when I select any of this in word, it selects the whole line. Is there a way I can select data in word as a column, rather than as whole lines? If not, can this be done in other word processing programs?
Generally, spreadsheet apps or subprograms are what you need for deleting and modifying data in column or row format.
Microsoft's spreadsheet equivalent is Excel, part of the Microsoft Office Suite that Word came with. I believe Google Docs has a free spreadsheet tool online as well.
I have not looked at the uploaded file, but if it is small enough, you might be able to paste one row of data at a time into a spreadsheet, and then do your operation on the column data all at once.
There may be other solutions to this problem, but that's a start.
We are working towards migrating data from MongoDB to Teradata (DW).
We feel that transformations on the data will be necessary.
Could you please help me answer the below questions which will guide us on developing a solution for migration :
Which would be the best and efficient format to export data from MongoDB to load into Teradata(DW) considering transformations involved ? (CSV/JSON/Others)
Transformations could include omission of line(s) from the exported file, omission of fields, aggregation(sum/count) across fields etc.
If developing a framework for ETL, will Java be a good choice ?
We noticed that ‘\n’ [newline character] is also part of some records. Hence, in the csv we are seeing some blank lines in between.
Do we need to be concerned of the right line delimiter ? Or can the export format help us in this regard ?
We are seeing some records getting truncated because the length of the record exceeds 1024 characters.
We get the ‘Line too long’ message in VI editor. We don’t have an alternate editor in our system. Is there a way around to handle line truncation ?
CSV is not particularly well-specified - there are several variants of it in the wild with slightly different escaping behaviors. I almost always prefer anything-but-csv.
JSON
Yes
This is not a question, but ok.
Don't edit the data with vi, this is purely a limitation of the editor and not the export format. Do transformations programmatically
I am developing an iphone app which will fetch the data from CSV file as per the keyword entered in to the UITextFiled, eg. if user enters london than all the possible entries containing the same keyword should be listed down, I have tried CHCSVParser but i am still not able to conclude any result. Can anyone tell me is it even feasible??? and if yes than please help me through the initial steps.
Thanks.
If you can, then using a plist instead of csv will be much easier/flexible.
Maybe you can import your data in a .sqlite ressource file that contains all elements from your csv file.
Then for listing 15 000 elements or a subset of them in a tableview, the nsfetchedresultscontroller will help you. Initializing it with a fetch request will permits you to filter your elements based on one or more attribute name(s).
http://developer.apple.com/library/ios/#documentation/CoreData/Reference/NSFetchedResultsController_Class/Reference/Reference.html
Yeah, if you're going to repeatedly reference "random" elements of your CSV file, you should convert it to an SQLite DB (though that would be overkill if you're only referencing things once). For your app, as well as I can understand the description, a DB of some sort is most definitely warranted.
As to reading the CSV file, there are, no doubt, tools available, but it's probably just as easy to use NSString componentsSeparatedByString to parse each line.