I need to create a table in hive to insert a data like the one below:
Column 1 -- account id String(11 characters)
Column 2 -- Age int
Column 3 -- duplicate account_id
The data is stored in a text file delimited by spaces, but the last column will have multiple values, hence doing querying I will need to eliminate that row if the value is present in that column
Example text file:
Thomsxx3125 25 Davidxx3125 Raghuxx3125 Vijayxx3125 Gracexx3125
Appreciate your help on this please.
You can't create duplicate column names.
Here is a query that will work:
create table if not exists name_of_table
(
account_id string comment '11 characters',
age int,
account_id2 string
)
fields terminated by ' '
stored as textfile;
You can also refer to the official documentation for Hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
Related
I have fixed width files stored in S3 location, and need to create external hive table on top of it. Below are the options I tried:
option 1 : To create table with single column, and then I can use sql to substring to multiple columns based on length and index.
CREATE EXTERNAL TABLE `tbl`(
line string)
ROW FORMAT delimited
fields terminated by '/n'
stored as textfile
LOCATION 's3://bucket/folder/';
option 2: Use RegexSerDe to segregate the data into different columns:
CREATE EXTERNAL TABLE `tbl`(
col1 string ,
col2 string ,
col3 string ,
col4_date string ,
col5 string )
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{10})(.{10})(.{16})(.{19})(.*)")
LOCATION 's3://bucket/folder/';
Both the above options do not give any record.
select * from tbl;
OK
Time taken: 0.086 seconds
Background:
I am making a db for a reservartions calendar. The reservations are hourly based, so I need to insert many items to one column called "hours_reserved".
Example tables of what I need:
Table "Space"
Column / Values
id / 1
date / 5.2.2020
hours / { 8-10, 10-12 }
Table "reservation"
Column / Values
id / 1
space_id / 1
date / 5.2.2020
reserved_hours / 8-10
Table "reservation"
Column / Values
id / 2
space_id / 1
date / 5.2.2020
hours / 10-12
So I need to have multiple items inserted into "space" table "hours" column.
How do I do this in Postgres?
Also is there a better way to accomplish this?
There is more way to do this, depending on the type of the hours field (i.e. text[], json or jsonb) I'd go with jsonb just because you can do a lot of things with it and you'll find this experience to be useful in the short term.
CREATE TABLE "public"."space"
("id" SERIAL, "date_schedule" date, "hours" jsonb, PRIMARY KEY ("id"))
Whenever you insert a record in this table that's manually crafted, write it as text (single quoted json object) and cast it to jsonb
insert into "space"
(date_schedule,hours)
values
('05-02-2020'::date, '["8-10", "10-12"]'::jsonb);
There is more than one way to match these available hours against the reservations and you can take a look at the docs, on the json and jsonb operations. For example, doing:
SELECT id,date_schedule, jsonb_array_elements(hours) hours FROM "public"."space"
would yield
Which has these ugly double quotes (which is correct, since json can hold several kinds of scalars, that column is polimorfic :D)
However, you can perform a little transformation to remove them and be able to perform a join with reservations
with unnested as (
SELECT id,date_schedule, jsonb_array_elements(hours) hours FROM "public"."space"
)
select id,date_schedule,replace(hours::text, '"','') from unnested
The same can be achieved defining the field as text[] (the insertion syntax is different but trivial)
in that scenario your data will look like:
Which you can unwrap as:
SELECT id,date_schedule, unnest(hours) FROM "public"."space"
Apparently
ALTER TABLE mytable
ADD COLUMN myarray text[];
Works fine.
I got a following problem when trying to put(update) into that column using postman(create works fine):
{
"myarray": ["8-10"]
}
Results into:
"message": "error: invalid input syntax for type integer:
\"{\"myarray\":[\"8-10\"]}\""
Desired output-
Emp_name|Hobbies|age|DOB
______________________________
LOPEZ |Football , Swimming , Fishing |19| 1999-05-11
Here in the question, the Hobbies column is having multiple records with comma separate, BUT I want in a SINGLE line (Vertically).
All the records for hobbies should be a single record, like multiple values in single record.
And, last display in one row.
Please help me creating a table and way to insert and fetch the record in postgres DB.
So we want to reformat your 'hobbies' string to an array. You can use the ARRAY_AGG() function:
CREATE TABLE new_hobbies AS
SELECT
name
, age
, DOB
, ARRAY_AGG(hobbies) AS hobbies
FROM table
GROUP BY
name
, age
, dob
But yeah I agree with sticky bit's answer that normalisation of this single table would be a good idea. As well as not having an age value - to avoid issues with updates.
Hello I am using Redshift where I have a staging table & a base table. one of the column (city) in my base table has data type varchar & its length is 100.When I am trying to insert the column value from staging table to base table, I want this value to be truncated to 1st 100 characters or leftmost 100 characters. Can this be possible in Redshift?
INSERT into base_table(org_city) select substring(city,0,100) from staging_table;
I tried using the above query but it failed. Any solutions please ?
Try this! Your base table column length is Varchar(100), so you need to substring 0-99 chars, which is 100 chars. You are trying to substring 101 chars.
INSERT into base_table(org_city) select substring(city,0,99) from staging_table;
My table looks like below,
Table Name: Number_List
Columns Name: Num_ID INTEGER
First_Number VARCHAR(16)
Last_Number VARCHAR(16)
In that, Num_ID is PK. and the rest of the columns First_Number and Last_Number always have a 8 digit number.
my requirement is to update that column to 6 Digit entry..
Consider the Entries in the two columns are 32659814 (First_Number) and 32659819 (Last_Number). Now I need to write a update query to change the entries in the table to 326598 (First_Number) and 326598 (Last_Number).
and this table has 15K entries and i need to update the whole in single query in single execution.
Please help me to resolve this.
TIA.
All you need is SUBSTR:
UPDATE SESSION.NUMBER_LIST
SET FIRST_NUMBER = SUBSTR(FIRST_NUMBER, 1,6)
,LAST_NUMBER = SUBSTR(LAST_NUMBER, 1,6)