I have a table with 2.44 million rows, and after loading it into server:
copy sample_table
from 'C:\sample_table.txt'
delimiter E'\t'
csv header
if I do
select count(*) from sample_table
pgAdmin 4 will return count as only 1.35 million rows
I found it odd, so I exported this table and looked at the number of rows in Notepad++, and it is still 2.44 million rows (in fact there is 1 row count difference and not sure why, but guess will worry about that later)
As recommended by Adrian in comments, I verified this in psql, and still only see 1.35M.
Any advice please? Thank you!
When using CSV format, literal newlines enclosed in quotes do not terminate a row.
"How now
Brown cow"
Is 2 "lines", but only 1 row.
If you re-export in the default text format, then the number of lines should match the number of rows, with the literal newlines turned into the two-character escape \n
Related
I use PostgreSQL 13 on windows 10. I have a table with text-type columns. I'm looking for a query to find all the rows that contain just a single word is not accurate in a specific column.
Any ideas?
My table in database the form of, I try to extract all the rows with a single word.
output:
alvi
alexander
elmar
mahmoud
you can try this :
SELECT Name
FROM your_table
WHERE Name ~ '^\w+$'
see the power of the regular expressions in the manual
I am converting the following T-SQL statement to Redshift. The purpose of the query is to convert a column in the table with a value containing a comma delimited string with up to 60 values into multiple rows with 1 value per row.
SELECT
id_1
, id_2
, value
into dbo.myResultsTable
FROM myTable
CROSS APPLY STRING_SPLIT([comma_delimited_string], ',')
WHERE [comma_delimited_string] is not null;
In SQL this processes 10 million records in just under 1 hour which is fine for my purposes. Obviously a direct conversation to Redshift isn't possible due to Redshift not having a Cross Apply or String Split functionality so I built a solution using the process detailed here (Redshift. Convert comma delimited values into rows) which utilizes split_part() to split the comma delimited string into multiple columns. Then another query that unions everything to get the final output into a single column. But the typical run takes over 6 hours to process the same amount of data.
I wasn't expecting to run into this issue just knowing the power difference between the machines. The SQL Server I was using for the comparison test was a simple server with 12 processors and 32 GB of RAM while the Redshift server is based on the dc1.8xlarge nodes (I don't know the total count). The instance is shared with other teams but when I look at the performance information there are plenty of available resources.
I'm relatively new to Redshift so I'm still assuming I'm not understanding something. But I have no idea what am I missing. Are there things I need to check to make sure the data is loaded in an optimal way (I'm not an adim so my ability to check this is limited)? Are there other Redshift query options that are better than the example I found? I've searched for other methods and optimizations but unless I start looking into Cross Joins, something I'd like to avoid (Plus when I tried to talk to the DBA's running the Redshift cluster about this option their response was a flat "No, can't do that.") I'm not even sure where to go at this point so any help would be much appreciated!
Thanks!
I've found a solution that works for me.
You need to do a JOIN on a number table, for which you can take any table as long as it has more rows that the numbers you need. You need to make sure the numbers are int by forcing the type. Using the funcion regexp_count on the column to be split for the ON statement to count the number of fields (delimiter +1), will generate a table with a row per repetition.
Then you use the split_part function on the column, and use the number.num column to extract for each of the rows a different part of the text.
SELECT comma_delimited_string, numbers.num, REGEXP_COUNT(comma_delimited_string , ',')+1 AS nfields, SPLIT_PART(comma_delimited_string, ',', numbers.num) AS field
FROM mytable
JOIN
(
select
(row_number() over (order by 1))::int as num
from
mytable
limit 15 --max num of fields
) as numbers
ON numbers.num <= regexp_count(comma_delimited_string , ',') + 1
How can I split a column value into two values in the output? I need have the numerals in one column and the alphabet in the other.
For Example 1
Existing
Column
========
678J
2345K
I need the output to be:
Column 1 Column 2
======== ========
678 J
2345 K
The existing column can have 4 or 5 characters, as shown in the example. There is no space.
Thanks in advance!!
You could convert all letters to spaces & strip them away, then do the opposite with digits in the other column:
SELECT trim(translate(mycol,repeat(' ',26),'ABCDEFGHIJKLMNOPQRSTUVWXYZ')) as col1,
trim(translate(mycol,repeat(' ',10),'0123456789')) as col2
FROM mytable
Adjust as necessary to translate additional characters.
I am not sure about the performance of WarrenT's solution, but it looks like very heavy solution. It does what it is supposed to be doing with little constraints on the the data. If you know more about the data, you can optimize.
String always ends with 1 and only one letter
select left(mycol, length(mycol)-1), right(mycol,1) from mytable
I am using mysql workbench (SQL Editor). I need copy the list of columns in each query as was existed in Mysql Query Browser.
For example
Select * From tb
I want have the list of fields like as:
id,title,keyno,......
You mean you want to be able to get one or more columns for a specified table?
1st way
Do SHOW COLUMNS FROM your_table_name and from there on depending on what you want have some basic filtering added by specifying you want only columns that data type is int, default value is null etc e.g. SHOW COLUMNS FROM your_table_name WHERE type='mediumint(8)' ANDnull='yes'
2nd way
This way is a bit more flexible and powerful as you can combine many tables and other properties kept in MySQL's INFORMATION_SCHEMA internal db that has records of all db columns, tables etc. Using the query below as it is and setting TABLE_NAME to the table you want to find the columns for
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='your_table_name';
To limit the number of matched columns down to a specific database add AND TABLE_SCHEMA='your_db_name' at the end of the query
Also, to have the column names appear not in multiple rows but in a single row as a comma separated list you can use GROUP_CONCAT(COLUMN_NAME,',') instead of only COLUMN_NAME
To select all columns in select statement, please go to SCHEMAS menu and right click ok table which you want to select column names, then select "Copy to Clipboard > Select All statement".
The solution accepted is fine, but it is limited to field names in tables. To handle arbitrary queries would be to standardize your select clause to be able to use regex to strip out only the column aliases. I format my select clause as "1 row per element" so
Select 1 + 1 as Col1, 1 + 2 Col2 From Table
becomes
Select 1 + 1 as Col1
, 1 + 2 Col2
From Table
Then I use simple regex on the "1 row per select element" version to replace "^.* " (excluding quotes) with nothing. The regex finds everything before the final space in the line, so it assumes your column aliases doesn't contain spaces (so replace spaces with underscore). Or if you don't like "1 row per element" then always use "as" keyword to give you a handle that regex can grasp.
i have a table [Company] with a column [Address3] defined as varchar(50)
i can not control the values entered into that table - but i need to extract the values without leading and trailing spaces. i perform the following query:
SELECT DISTINCT RTRIM(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
the column contain both rtl and ltr values
most of the data retrieved is retrieved correctly - but SOME (not all) RTL values are returned with leading and or trailing spaces
i attempted to perform the following query:
SELECT DISTINCT ltrim(rTRIM(ltrim(rTRIM([Address3])))) c, ltrim(rTRIM([Address3])) b, [Address3] a, rtrim(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
but it returned the same problem on all columns - anyone has any idea what could cause it?
The rows that return with extraneous spaces might have a kind of space or invisible character the trim functions don't know about. The documentation doesn't even mention what is considered "a blank" (pretty damn sloppy if you ask me). Try taking one of those rows and looking at the characters one by one to see what character they are.
since you are using varchar, just do this to get the ascii code of all the bad characters
--identify the bad character
SELECT
COUNT(*) AS CountOf
,'>'+RIGHT(LTRIM(RTRIM(Address3)),1)+'<' AS LastChar_Display
,ASCII(RIGHT(LTRIM(RTRIM(Address3)),1)) AS LastChar_ASCII
FROM Company
GROUP BY RIGHT(LTRIM(RTRIM(Address3)),1)
ORDER BY 3 ASC
do a one time fix to data to remove the bogus character, where xxxx is the ASCII value identified in the previous select:
--only one bad character found in previous query
UPDATE Company
SET Address3=REPLACE(Address3,CHAR(xxxx),'')
--multiple different bad characters found by previous query
UPDATE Company
SET Address3=REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')
if you have bogus chars in your data remove them from the data and not each time you select the data. you WILL have to add this REPLACE logic to all INSERTS and UPDATES on this column, to keep any new data from having the bogus characters.
If you can't alter the data, you can just select it this way:
SELECT
LTRIM(RTRIM(REPLACE(Address3,CHAR(xxxx),'')))
,LTRIM(RTRIM(REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')))
...