Selecting records with a huge "where data set"

Selecting records with a huge "where data set" - tsql

Background Info
C#
MS MVC 4
Sql Azure
Linq - Identities
Problem at hand:
Selecting records in an Items table where zip code is within a certain range of miles.
Items Table
id (PK)
Title
Body
ZipCode (Int)
Summary of Progress:
I have a class which uses the 2013 US Gazatteer zip code and tabulation areas to gather zip codes and assess distances between zip codes. It is basically a .csv/.txt file that I open into a stream and convert to POCOs in order to process distances. That much of the equation is working fine; however, selecting a list of Items from an Items table based on this list of zip codes is where I'm not sure what to do.
Scenario
User A wants to search for items within a 25 miles radius of area code 46324.
User A hits search and in the background my class returns a list of 124 zip codes within a 25 mile radius.
Question: What is the best way (performance wise) to retrieve items in my Item table using this list of zipcodes?
Possible Solutions
I thought about creating a dynamic query using the tsql in keyword within my where clause and simply supplying this list as the where parameters. This does not seem to be a very performance oriented way of doing this; however, considering my current architecture I do not see any other way.
I also thought about incorporating a sort of paging functionality that will only take the first 5 zip codes to return results followed by the next 5 and so on and so on. This will involve more work but it definitely would seem to be a better performance choice.
Any ideas?

I stumbled across your question purely by chance searching for something else, and also I see it's quite old, but I thought I'd give you a comment none the less:
What I would do in this case is actually allow the database to do the search and the C# to do the calcs. You have a class in C# which calculates the distances? Then why not save the distance from each zip code to each zip code in a "lookup table" in sql.
Doing it this way makes sure that the data is calculated once but you let the sql find the right data for you.
ie:
Create a table with from_zip, to_zip, distance fields
Calculate and populate table once at the beginning
Query by saying "select * from zip_lookup where zip_from = bla and distance between 0 and 100" or something like that

Related

Automating a data feed into a PostgreSQL table when the number of columns could change and there are duplicate names

My company uses a third-party vendor to get all of our NPS information. I'm trying to set up a data feed from this vendor into our data warehouse, which runs PostgreSQL.
The feed is in the form of 2 tab-separated text files: "question mapping" and the responses. The question map is one row per question, with columns for question id, question text, question label question type, etc - straightforward. The responses are one row per survey response, with a column for each question and stuff like user id, etc. Here are the 2 biggest problems:
The survey questions sometimes use the same question ID for different questions, resulting in multiple columns in the response data having the same name but not being the same question.
The number of questions could change, resulting in a different number of columns in the data.
Both of these things make it a real headache to automate a data feed into a single table.
I'm afraid I don't quite know how to phrase my real question other than, "Does anyone have any ideas how I can accomplish this?" If I think of something better than that, I'll come and update this, so for now:
Does anyone have any ideas at all about how I can efficiently set up my automated data feed without having to always drop and recreate everything?

If your data is a mess and doesn't really have well defined columns you can use the entity attribute value pattern, where you turn each fact into a set of rows with 4 columns - a unique row id, the same entity id for each row extracted from the map, an attribute column (where you put what would be the name of the column) you get from the key of the map, and a value column where you put the value from the map. It's not that neat but you can still query it and you won't have to drop it when you receive a map with a new column.

Adding a 'Heading 1' to a Word 2010/13 Doc in Alphabetical Order

I'm operating in Word 2013 and 2010, so I can use code that works in either. I'm trying to create a word document to keep track of my recipes. At its most basic I want to have a TOC that updates based on headings. I also want it to have any category I want (eg: Appetizers, Drinks, Entrée, etc...) ordered alphabetically. Under each category I have tables. Each recipe gets a table that has it's name, directions, notes, tags, and potentially a picture. The second cell has another two column table inside of it that contains the quantity and name of each ingredient necessary for the recipe.
I have all of that so far and I'd like to automate adding new categories and recipes. Currently, I have to find the category, then scroll down to find where the name goes alphabetically and insert a quick table I made. I then fill in the info.
I'd like to be able to search the document for each category name, then insert the new category wherever it belongs, with a space before and after it. I found that my tables give me trouble if I don't have a space between everything. It tries to pull anything it's touching into the table and merge them.
I wanted to give the backstory, so you'd know where I was going to go eventually and could provide help that fits better with what I need. After I can add a new category, I plan to use vba to organize each table alphabetically by the name in the first cell of each table. It will also help when I start adding sorts to it. Eventually, I'd like to be able to sort it to say, only display recipes from a certain person, or display my frequently used recipes. I'd then have it either hide all the others or create a new doc with just these. So thanks for the help. Below I'll post the code I most recently tried. I tried a few other variations of this same code and keep getting an 'expected end of statement. I've gotten other errors when trying other variations of it, but this is the best I can come up with on my own.
Private Sub UserForm_Initialize()
For Each cat In ActiveDocument.Styles = "Heading 1"
lstCat.AddItem (cat)
Next
End Sub
I have a form called frmAddCategory I'm using as a test. I was going to have a listbox lstCat to show every category with the style heading 1. I have a textbox called txtAdd to type new ones and a cmdAdd button to add it to the form.
Edit: I've been playing around with my macro recorder after finding out about outline view mode. I set it to show only 'heading 1' level and selected the ones I wanted, not selecting the appendix or reference. Then I went to the home tab and sorted paragraph by ascending alphabetical order. I got some code I believe I can use to get it to run in VBA. However, it's not a complete fix as I don't want to select the last two with heading 1. It also works if I manually select the tables under each heading 1, but I can't set the spacing before and after. I'd like each heading and the tables under them to have a space or two between each for looks and editing purposes.
Also, if someone is going to give my question a negative rating, then please post a comment explaining. As far as I can tell from the faq about the forum and the other questions I've seen, it is a well posed question. A clear title, a good explanation of the problem, code examples, research. So if I am doing something wrong, please inform me, so that I can correct it.

first thanks for your bit about macro recorder and outline mode, I have been trying for long time to fill a list box with selection.text between two HeadingLevel(1) headings.
now to yours, sorry I can't think of way to do in word. BUT it would be real easy in Access. one table of categories like called tblCategories another for recipes tblRecipes. To make real easy, when the use autoID on ALL tables. But to avoid LOTS of headaches for tblRecipes rename its autoID to RecipesID same for other tables. in table of recipes you can use a memo field to hold large amount of data. the spot for text in Heading would be put in one field of the tblRecipes. once you have tables looking to have a field for each item you want to track. hit save, then use wizard to create a form based on table. repeat for all tables you want to have it real easy to put info into any table.
1.reportTOC based on query of Every Heading you want, can preview or print as want. reportByCategory and so on reports are sorted a to z unless you want to sort by Owner, then recipe all auto sorted a - z.
report wizard to get hard copy. if you want to sort real easy, built in. also if want to be able to pick all recipes for a holiday real easy, one table tblHolidays. one tblHolidayRecipes fields autoId (not used by you anywhere but needed), fldRecipesId (holds RecipesId) , fldHolidayId (holds HolidayId). the wizard will show how to get only what you want. in access 2013 you can include pix of food or.

FullText Index - Searching values from another table

Is it possible, in SQL Server 2008, using the full text index syntax, to run a query such as this one?
SELECT *
FROM TABLE_TO_SEARCH S,
TABLE_WITH_STRINGS_TO_SEARCH SS
WHERE
CONTAINS(S.WHOLE_NAME,SS.FIRST_NAME)
OR CONTAINS(S.WHOLE_NAME,SS.LAST_NAME)
I need to search for the FIRST_NAME in table TABLE_TO_SEARCH, column WHOLE_NAME that has an full text index on it. It doesn't seem to be a valid query though... Is there any workaround to it by using the full text index search?
LATER EDIT:
Here is the business case: each night I am downloading from several websites information about "blacklisted" individuals and insert it into a table in this format: WholeName, LastName, FirstName, MiddleName. But the data is chaotic as WholeName does not necessarily contain either the last, first or middle name or the WholeName is null while the other 3 fields have values, or every of these 4 fields is null and so on. Also, the data may repeat itself as one blacklisted individual may come from 2+ of these websites. What I need to do is to compare this data, as chaotic as it is, against our customer data based on our customer's First and Last name and give it a matching score (rank) against the files we download from these websites.
First I tried with charindex or like operators but I couldn't create a scoring algorithm based on this and also it took 6 hours to compare just our customer's first and last name with only the WholeName column from the TABLE_TO_SEARCH table. I thought that perhaps implementing the full_text index it would get easier and faster but ... apparently I was wrong.
Has anyone dealt with a task like this? And if so, what was the best approach?

After skimming http://technet.microsoft.com/en-us/library/ms187787.aspx and http://technet.microsoft.com/en-us/library/ms142571.aspx I don't think it is possible to do your search in this way. Not only that, but it seems this type of index wouldn't work well with names anyway.
If you care about checking one name then all you have to do is set those values to variables. This method would allow you to use the full-text index.
Otherwise, I would suggest splitting the WHOLE_NAME column (if there is a space or unique character between the first and last name) and comparing each part to those other columns. If you are working with a huge data set, you may want to experiment with doing this at a temp table level and creating an index.
Good luck!

In a TFS 2010 Work Item Query can I add two field values as the "Value"?

I would like to create a team query for our TFS users that shows all task items where the sum of the [hours remaining] and [hours completed] fields exceed the [original estimate] value.
Now Whilst I can add a clause to the WIQL that compares one field to the value of ONE other field (Which I had to ask you kind folks in stack exchange how to do in How do I write a TFS 2010 Work Item query clause whose value is a field value? which was answered by PVitt with admirable politeness - Since I had simply failed to read the "operator" drop down properly!)
I am struggling to find a way of querying the sum of two fields.
For example:
This query clause works;
And Completed Work > [Field] Remaining Work
What I really want is something along the lines of;
And Completed Work > [Field] Remaining Work+Original Estimate
The problem is either this cannot be done, or my wild guesses as to the correct syntax for summing two field values have all been wrong.
Specifying two filenames separated with a + just yields a TF51005 error
Similarly guessing at a "macro" like Sum(Remaining Work+Original Estimate) or Sum(Remaining Work,Original Estimate) results in the same.
So is this even possible?
if it is how would I go about this?

What you want is not possible. Your options are to use:
1) the object model to write C# code that iterates over the work items from a query and do the math yourself
2) run the query in excel and do the calculations and filter in excel
3) use ssrs to create a report that does the math for you
If you think this is important, you can always post your suggestion on user voice: https://visualstudio.uservoice.com

Add a new hidden field to store the sum, install TFS Aggregator and set it up to update the hidden field when the workitem is updated.
Then perform your queries off the new field.
There may be a couple of minutes delay between updating the sum having the new value.
<AggregatorItem operationType="Numeric" operation="Sum" linkType="Self" workItemType="Task">
<TargetItem name="Total Work"/>
<SourceItem name="Remaining Work"/>
<SourceItem name="Original Estimate"/>
</AggregatorItem>

If you're soley focused on the results/don't mind a (short) trip outside TFS, what I do is run the TFS query that produces the individual results I need, then just select all/copy/paste the results grid into Excel. There you'll have access to all Excel's suming and auto (col headers)/advanced filtering capabilities. You can define criteria that reference other cells using this method as well. Can also use the "If()" fcn to (in/ex)clude cell values in straight formulas. W/ some Excel fancy footwork, should even be able to preserve the summing fcns btw runs/pastes.

Efficiently updating cosine similarity scores

My iPhone application is using a SQLite database with the following schema:
items(id, name, ...) -> this table contains 50 records
tags(id, name) -> this table contains 50 records
item_tags(id, item_id, tag_id, user_id)
similarities(id, item1_id, item2_id, score)
The items, tags, item_tags and similarities tables are populated with pre-defined records, hence also the similarities between different items have already been calculated offline (using cosine similarity algorithm based on the items' tags).
Users are able to add additional tags to items and to remove their custom tags later on. Whenever this happens the similarity scores between the items should be updated locally, i.e. without contacting the server application.
My question now is the following:
What is the most efficient way to do so? So far, on startup of the iPhone application, I compute a term-document matrix for all the items and tags (which reflects the tag frequencies for each item) and keep this matrix in memory as long as the application is running. Whenever a tag is added or removed, I use this matrix to update the similarities in the database. However, this is rather inefficient. Do you have any suggestions?
Thanks!

This presentation might help you:
http://www.slideshare.net/jnvms/incremental-itembased-collaborative-filtering-4095306

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse