updating date delete the row in sqlite android

updating date delete the row in sqlite android - android-sqlite

I have a problem when i try to update a field that contains date (as text) the sqlexecsql(sql) or rawquery(sql) delete the row.
I don't know why delete the row. If someone can help me, I'll be grateful
CREATE TABLE [Study](
[IdMedicalStudy] INTEGER PRIMARY KEY ASC AUTOINCREMENT NOT NULL UNIQUE,
[Fecha] TEXT,
[IdDoctor] INTEGER NOT NULL REFERENCES [Doctors]([IdDoctor]) ON UPDATE CASCADE,
[IdTypeOfMedicalStudy] INTEGER NOT NULL REFERENCES [TypeOfMedicalStudy]([IdTTypeOfMedicalStudy]) ON UPDATE CASCADE,
[IdMedicalStudyPlace] INTEGER NOT NULL REFERENCES [MedicalStudyPlace]([IdMedicalStudyPlace]) ON UPDATE CASCADE);
public void updateRow (ContentValues contentValues) {
SQLiteDatabase db = this.getWritableDatabase();
String sql = "UPDATE " + ConstDB.TABLE_MEDICAL_STUDY + " SET "
+ ConstDB.TABLE_MEDICAL_STUDY_DATE + " = '"
+ contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_DATE).toString().trim()
+ "', " + ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY _PLACE + " = "
+ contentValues.get(ConstDB.ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY _PLACE)
+ ", " + ConstDB.TABLE_MEDICAL_STUDY_ID_DOCTOR + " = "
+ contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_ID_DOCTOR)
+ ", " + ConstDB.TABLE_TABLE_ID_TYPE OF MEDICAL STUDY + " = "
+ contentValues.get(ConstDB.TABLE_ID_TYPE OF MEDICAL STUDY)
+ " WHERE " + ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY + " = "
+ contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY);
try {
db.execSQL(sql);
db.close();
Toast.makeText(context, "Row updated", Toast.LENGTH_SHORT).show();
db.close();
} catch (SQLException e) {
Toast.makeText(context, e.getMessage(), Toast.LENGTH_SHORT).show();
db.close();
}

There are a number of errors in the above code, such as spaces where spaces are not allowed, but after
recreating the code with errors removed
creating some underlying code to create all the required tables
adding some methods to add data to the referenced tables and of course the Study table
adding some methods to dump the data in the tables
logAll (sequentially retrieves all rows from each tables into a cursor and dumps the cursor)
logStudy (specific for the Study table, used by logAll)
changing to use the update convenience method rather than using the execSQL method.
removing the all too confusing try/catch clause
Testing shows that the code in-principle (with the above changes/corrections) works and doesn't lose data.
Working Example Code
ConstDB.java
This was constructed based upon your code but may well differ :-
public class ConstDB {
public static final String TABLE_MEDICAL_STUDY_TYPE = "[TypeOfMedicalStudy]";
public static final String TABLE_MEDICAL_STUDY_TYPE_ID = "[IdTTypeOfMedicalStudy]";
public static final String TABLE_MEDICAL_STYUDY_TYPE_NAME = "[TypeName]";
public static final String TABLE_MEDICAL_STUDY_DOCTOR = "[Doctors]";
public static final String TABLE_MEDICAL_STUDY_DOCTOR_ID = "[IdDoctor]";
public static final String TABLE_MEDICAL_STUDY_DOCTOR_NAME = "[DoctorName]";
public static final String TABLE_MEDICAL_STUDY_STUDY_PLACE = "[MedicalStudyPlace]";
public static final String TABLE_MEDICAL_STUDY_STUDY_PLACE_ID = "[IdMedicalStudyPlace]";
public static final String TABLE_MEDICAL_STUDY_STUDY_PLACE_NAME = "[PlaceName]";
public static final String TABLE_MEDICAL_STUDY = "[Study]";
public static final String TABLE_MEDICAL_STUDY_DATE = "[fecha]";
public static final String TABLE_MEDICAL_STUDY_ID_DOCTOR = "[IdDoctor]";
public static final String TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY = "[IdTypeOfMedicalStudy]";
public static final String TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY = "[IdMedicalStudy]";
public static final String TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE = "[IdMedicalStudyPlace]";
//TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE
/*
CREATE TABLE [Study](
[IdMedicalStudy] INTEGER PRIMARY KEY ASC AUTOINCREMENT NOT NULL UNIQUE,
[Fecha] TEXT,
[IdDoctor] INTEGER NOT NULL REFERENCES [Doctors]([IdDoctor]) ON UPDATE CASCADE,
[IdTypeOfMedicalStudy] INTEGER NOT NULL REFERENCES [TypeOfMedicalStudy]([IdTTypeOfMedicalStudy]) ON UPDATE CASCADE,
[IdMedicalStudyPlace] INTEGER NOT NULL REFERENCES [MedicalStudyPlace]([IdMedicalStudyPlace]) ON UPDATE CASCADE);
*/
public static final String TABLE_MEDICAL_STUDY_CREATESQL = "CREATE TABLE IF NOT EXISTS " +
TABLE_MEDICAL_STUDY +
"(" +
TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY + " INTEGER PRIMARY KEY ASC AUTOINCREMENT NOT NULL UNIQUE," +
TABLE_MEDICAL_STUDY_DATE + " TEXT," +
TABLE_MEDICAL_STUDY_ID_DOCTOR + " INTEGER NOT NULL REFERENCES " +TABLE_MEDICAL_STUDY_DOCTOR +
"(" + TABLE_MEDICAL_STUDY_DOCTOR_ID + ") ON UPDATE CASCADE," +
TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY + " INTEGER NOT NULL REFERENCES " + TABLE_MEDICAL_STUDY_TYPE +
"(" + TABLE_MEDICAL_STUDY_TYPE_ID + ") ON UPDATE CASCADE," +
TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE + "INTEGER NOT NULL REFERENCES " + TABLE_MEDICAL_STUDY_STUDY_PLACE +
"(" + TABLE_MEDICAL_STUDY_STUDY_PLACE_ID + ") ON UPDATE CASCADE" +
")";
public static final String TABLE_MEDICAL_STUDY_TYPE_CREATESQL = "CREATE TABLE IF NOT EXISTS " +
TABLE_MEDICAL_STUDY_TYPE +
"(" +
TABLE_MEDICAL_STUDY_TYPE_ID + " INTEGER PRIMARY KEY," +
TABLE_MEDICAL_STYUDY_TYPE_NAME + " TEXT" +
")";
public static final String TABLE_MEDICAL_STUDY_DOCTOR_CREATESQL = "CREATE TABLE IF NOT EXISTS " +
TABLE_MEDICAL_STUDY_DOCTOR +
"(" +
TABLE_MEDICAL_STUDY_DOCTOR_ID + " INTEGER PRIMARY KEY," +
TABLE_MEDICAL_STUDY_DOCTOR_NAME + " TEXT" +
")";
public static final String TABLE_MEDICAL_STUDY_STUDY_PLACE_CREATESQL = "CREATE TABLE IF NOT EXISTS " +
TABLE_MEDICAL_STUDY_STUDY_PLACE + "(" +
TABLE_MEDICAL_STUDY_STUDY_PLACE_ID + " INTEGER PRIMARY KEY, " +
TABLE_MEDICAL_STUDY_STUDY_PLACE_NAME + " TEXT" +
")";
}
Note I found the naming conventions very confusing.
DBHelper.java
I didn't see any reference to the class that extends SQLiteOpenHelper (typically referred to as the Database Helper), so this is an equivalent to what you have.
public class DBHelper extends SQLiteOpenHelper {
SQLiteDatabase mDB;
Context mContext;
public static final String DBNAME = "ms";
public static final int DBVERSION = 1;
public DBHelper(Context context) {
super(context, DBNAME, null, DBVERSION);
this.mContext = context;
mDB = this.getWritableDatabase();
}
#Override
public void onCreate(SQLiteDatabase db) {
db.execSQL(TABLE_MEDICAL_STUDY_DOCTOR_CREATESQL);
db.execSQL(TABLE_MEDICAL_STUDY_TYPE_CREATESQL);
db.execSQL(TABLE_MEDICAL_STUDY_STUDY_PLACE_CREATESQL);
db.execSQL(TABLE_MEDICAL_STUDY_CREATESQL);
}
#Override
public void onConfigure(SQLiteDatabase db) {
super.onConfigure(db);
db.setForeignKeyConstraintsEnabled(true);
}
#Override
public void onUpgrade(SQLiteDatabase db, int i, int i1) {
}
public long addStudyType(String studyTypeName) {
ContentValues cv = new ContentValues();
cv.put(TABLE_MEDICAL_STYUDY_TYPE_NAME,studyTypeName);
return mDB.insert(TABLE_MEDICAL_STUDY_TYPE,null,cv);
}
public long addDoctor(String doctorName) {
ContentValues cv = new ContentValues();
cv.put(TABLE_MEDICAL_STUDY_DOCTOR_NAME,doctorName);
return mDB.insert(TABLE_MEDICAL_STUDY_DOCTOR,null,cv);
}
public long addPlace(String placeName) {
ContentValues cv = new ContentValues();
cv.put(TABLE_MEDICAL_STUDY_STUDY_PLACE_NAME,placeName);
return mDB.insert(TABLE_MEDICAL_STUDY_STUDY_PLACE,null,cv);
}
public long addStudy(String date, long doctor, long type, long place) {
ContentValues cv = new ContentValues();
cv.put(TABLE_MEDICAL_STUDY_DATE,date);
cv.put(TABLE_MEDICAL_STUDY_ID_DOCTOR,doctor);
cv.put(TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY,type);
cv.put(TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE,place);
return mDB.insert(TABLE_MEDICAL_STUDY,null,cv);
}
public void updateRow (ContentValues contentValues) {
SQLiteDatabase db = this.getWritableDatabase();
String sql = "UPDATE " + ConstDB.TABLE_MEDICAL_STUDY + " SET "
+ ConstDB.TABLE_MEDICAL_STUDY_DATE + " = '"
+ contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_DATE).toString().trim()
//<<<<<<<<<< COMMENTED OUT + "', " + ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY _PLACE + " = " //<<<<<<<<<< extra space
+ "', " + ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE + " = "
//<<<<<<<<<< COMMENTED OUT+ contentValues.get(ConstDB.ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY _PLACE) //<<<<<<<<<< extra space
+ contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE)
+ ", " + ConstDB.TABLE_MEDICAL_STUDY_ID_DOCTOR + " = "
+ contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_ID_DOCTOR)
//<<<<<<<<<< COMMENTED OUT + ", " + ConstDB.TABLE_TABLE_ID_TYPE OF MEDICAL STUDY + " = "
+ ", " + ConstDB.TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY + " = "
//<<<<<<<<<< COMMENTED OUT+ contentValues.get(ConstDB.TABLE_ID_TYPE OF MEDICAL STUDY)
+ contentValues.get((ConstDB.TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY))
+ " WHERE " + ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY + " = "
+ contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY);
//<<<<<<<<<< try/catch block will not necessarily indicate row not update
//<<<<<<<<<< Betwer way is to use the update convenience methdo which returns the number of rows updated
/*
try {
db.execSQL(sql);
db.close();
Toast.makeText(mContext, "Row updated", Toast.LENGTH_SHORT).show();
db.close();
} catch (SQLException e) {
Toast.makeText(mContext, e.getMessage(), Toast.LENGTH_SHORT).show();
db.close();
}
*/
ContentValues cv = new ContentValues();
cv.put(ConstDB.TABLE_MEDICAL_STUDY_DATE, contentValues.get(ConstDB.TABLE_MEDICAL_STUDY_DATE).toString().trim());
cv.put(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE, contentValues.getAsLong(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE));
cv.put(ConstDB.TABLE_MEDICAL_STUDY_ID_DOCTOR, contentValues.getAsLong(ConstDB.TABLE_MEDICAL_STUDY_ID_DOCTOR));
cv.put(TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY, contentValues.getAsLong(ConstDB.TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY));
String whereclause = ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY + "=?";
String[] whereargs = new String[]{String.valueOf(contentValues.getAsLong(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY))};
int rv = mDB.update(ConstDB.TABLE_MEDICAL_STUDY, cv, whereclause, whereargs);
if (rv < 1) {
Toast.makeText(mContext,"Row not updated.",Toast.LENGTH_SHORT);
} else {
Toast.makeText(mContext,"Updated " + String.valueOf(rv) + " rows",Toast.LENGTH_LONG);
}
}
public void logAll() {
Cursor c;
c = mDB.query(ConstDB.TABLE_MEDICAL_STUDY_TYPE,null,null,null,null,null,null);
DatabaseUtils.dumpCursor(c);
c = mDB.query(ConstDB.TABLE_MEDICAL_STUDY_DOCTOR,null,null,null,null,null,null);
DatabaseUtils.dumpCursor(c);
c= mDB.query(TABLE_MEDICAL_STUDY_STUDY_PLACE,null,null,null,null,null,null);
DatabaseUtils.dumpCursor(c);
logStudy();
}
public void logStudy() {
DatabaseUtils.dumpCursor(mDB.query(TABLE_MEDICAL_STUDY,null,null,null,null,null,null));
}
public ContentValues preapreCVForUpdate(long studyId, String studyDate, long doctorId, long type, long place) {
ContentValues rv = new ContentValues();
rv.put(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY,studyId);
rv.put(ConstDB.TABLE_MEDICAL_STUDY_DATE,studyDate);
rv.put(ConstDB.TABLE_MEDICAL_STUDY_ID_DOCTOR,doctorId);
rv.put(ConstDB.TABLE_TABLE_ID_TYPE_OF_MEDICAL_STUDY,type);
rv.put(ConstDB.TABLE_MEDICAL_STUDY_ID_MEDICAL_STUDY_PLACE,place);
return rv;
}
}
MainActivity.java
This is a simple Activity that was used to test creating the tables, populating them with some testing data and finally updating the data with the modified/corrected updateRow method :-
public class MainActivity extends AppCompatActivity {
DBHelper mDBHlpr;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
mDBHlpr = new DBHelper(this);
mDBHlpr.addDoctor("Mary");
mDBHlpr.addDoctor("Sue");
mDBHlpr.addDoctor("Fred");
mDBHlpr.addDoctor("Tom");
mDBHlpr.addStudyType("Skeletal");
mDBHlpr.addStudyType("Cardial");
mDBHlpr.addStudyType("Cranial");
mDBHlpr.addStudyType("Abdominal");
mDBHlpr.addPlace("Home");
mDBHlpr.addPlace("St. Barts");
mDBHlpr.addPlace("Paddington");
mDBHlpr.addPlace("London School of Medicine");
mDBHlpr.addStudy("2019-01-01 10:30",2,3,3);
mDBHlpr.addStudy("2019-01-02 12:45",1,1,1);
mDBHlpr.addStudy("2019-01-03 15:25",3,2,4);
mDBHlpr.logAll();
mDBHlpr.updateRow(mDBHlpr.preapreCVForUpdate(2,"2019-01-02 16:45",4,4,2));
mDBHlpr.logStudy();
}
}
Results
With regard to the update the row with the id of 2 is added as :-
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdMedicalStudy=2
2019-01-17 14:34:23.206 4595-4595/? I/System.out: fecha=2019-01-02 12:45
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdDoctor=1
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdTypeOfMedicalStudy=1
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdMedicalStudyPlace=1
After the Update is becomes :-
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdMedicalStudy=2
2019-01-17 14:34:23.211 4595-4595/? I/System.out: fecha=2019-01-02 16:45
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdDoctor=4
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdTypeOfMedicalStudy=4
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdMedicalStudyPlace=2
Which tallies with
mDBHlpr.updateRow(mDBHlpr.preapreCVForUpdate(2,"2019-01-02 16:45",4,4,2));
Saying update the Study row that has an ID of 2 to
- have a date of 2019-01-01
- reference the row from the Doctors table that has an ID of 4 (Tom)
- reference the row from the Type tables that has an ID of 4 (Abdominal)
- reference the row from the Place that has and ID of 2 (St. Barts)
Which as can be seen has done.
The log in full was :-
2019-01-17 14:34:23.201 4595-4595/? I/System.out: >>>>> Dumping cursor android.database.sqlite.SQLiteCursor#fe54ec2
2019-01-17 14:34:23.201 4595-4595/? I/System.out: 0 {
2019-01-17 14:34:23.201 4595-4595/? I/System.out: IdTTypeOfMedicalStudy=1
2019-01-17 14:34:23.201 4595-4595/? I/System.out: TypeName=Skeletal
2019-01-17 14:34:23.201 4595-4595/? I/System.out: }
2019-01-17 14:34:23.201 4595-4595/? I/System.out: 1 {
2019-01-17 14:34:23.201 4595-4595/? I/System.out: IdTTypeOfMedicalStudy=2
2019-01-17 14:34:23.201 4595-4595/? I/System.out: TypeName=Cardial
2019-01-17 14:34:23.201 4595-4595/? I/System.out: }
2019-01-17 14:34:23.201 4595-4595/? I/System.out: 2 {
2019-01-17 14:34:23.201 4595-4595/? I/System.out: IdTTypeOfMedicalStudy=3
2019-01-17 14:34:23.202 4595-4595/? I/System.out: TypeName=Cranial
2019-01-17 14:34:23.202 4595-4595/? I/System.out: }
2019-01-17 14:34:23.202 4595-4595/? I/System.out: 3 {
2019-01-17 14:34:23.202 4595-4595/? I/System.out: IdTTypeOfMedicalStudy=4
2019-01-17 14:34:23.202 4595-4595/? I/System.out: TypeName=Abdominal
2019-01-17 14:34:23.202 4595-4595/? I/System.out: }
2019-01-17 14:34:23.202 4595-4595/? I/System.out: <<<<<
2019-01-17 14:34:23.202 4595-4595/? I/System.out: >>>>> Dumping cursor android.database.sqlite.SQLiteCursor#eadd4d3
2019-01-17 14:34:23.202 4595-4595/? I/System.out: 0 {
2019-01-17 14:34:23.202 4595-4595/? I/System.out: IdDoctor=1
2019-01-17 14:34:23.202 4595-4595/? I/System.out: DoctorName=Mary
2019-01-17 14:34:23.202 4595-4595/? I/System.out: }
2019-01-17 14:34:23.202 4595-4595/? I/System.out: 1 {
2019-01-17 14:34:23.203 4595-4595/? I/System.out: IdDoctor=2
2019-01-17 14:34:23.203 4595-4595/? I/System.out: DoctorName=Sue
2019-01-17 14:34:23.203 4595-4595/? I/System.out: }
2019-01-17 14:34:23.203 4595-4595/? I/System.out: 2 {
2019-01-17 14:34:23.203 4595-4595/? I/System.out: IdDoctor=3
2019-01-17 14:34:23.203 4595-4595/? I/System.out: DoctorName=Fred
2019-01-17 14:34:23.203 4595-4595/? I/System.out: }
2019-01-17 14:34:23.203 4595-4595/? I/System.out: 3 {
2019-01-17 14:34:23.203 4595-4595/? I/System.out: IdDoctor=4
2019-01-17 14:34:23.203 4595-4595/? I/System.out: DoctorName=Tom
2019-01-17 14:34:23.203 4595-4595/? I/System.out: }
2019-01-17 14:34:23.203 4595-4595/? I/System.out: <<<<<
2019-01-17 14:34:23.203 4595-4595/? I/System.out: >>>>> Dumping cursor android.database.sqlite.SQLiteCursor#50e1810
2019-01-17 14:34:23.204 4595-4595/? I/System.out: 0 {
2019-01-17 14:34:23.204 4595-4595/? I/System.out: IdMedicalStudyPlace=1
2019-01-17 14:34:23.204 4595-4595/? I/System.out: PlaceName=Home
2019-01-17 14:34:23.204 4595-4595/? I/System.out: }
2019-01-17 14:34:23.204 4595-4595/? I/System.out: 1 {
2019-01-17 14:34:23.204 4595-4595/? I/System.out: IdMedicalStudyPlace=2
2019-01-17 14:34:23.204 4595-4595/? I/System.out: PlaceName=St. Barts
2019-01-17 14:34:23.204 4595-4595/? I/System.out: }
2019-01-17 14:34:23.205 4595-4595/? I/System.out: 2 {
2019-01-17 14:34:23.205 4595-4595/? I/System.out: IdMedicalStudyPlace=3
2019-01-17 14:34:23.205 4595-4595/? I/System.out: PlaceName=Paddington
2019-01-17 14:34:23.205 4595-4595/? I/System.out: }
2019-01-17 14:34:23.205 4595-4595/? I/System.out: 3 {
2019-01-17 14:34:23.205 4595-4595/? I/System.out: IdMedicalStudyPlace=4
2019-01-17 14:34:23.205 4595-4595/? I/System.out: PlaceName=London School of Medicine
2019-01-17 14:34:23.205 4595-4595/? I/System.out: }
2019-01-17 14:34:23.205 4595-4595/? I/System.out: <<<<<
2019-01-17 14:34:23.205 4595-4595/? I/System.out: >>>>> Dumping cursor android.database.sqlite.SQLiteCursor#eb54f09
2019-01-17 14:34:23.205 4595-4595/? I/System.out: 0 {
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdMedicalStudy=1
2019-01-17 14:34:23.206 4595-4595/? I/System.out: fecha=2019-01-01 10:30
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdDoctor=2
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdTypeOfMedicalStudy=3
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdMedicalStudyPlace=3
2019-01-17 14:34:23.206 4595-4595/? I/System.out: }
2019-01-17 14:34:23.206 4595-4595/? I/System.out: 1 {
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdMedicalStudy=2
2019-01-17 14:34:23.206 4595-4595/? I/System.out: fecha=2019-01-02 12:45
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdDoctor=1
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdTypeOfMedicalStudy=1
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdMedicalStudyPlace=1
2019-01-17 14:34:23.206 4595-4595/? I/System.out: }
2019-01-17 14:34:23.206 4595-4595/? I/System.out: 2 {
2019-01-17 14:34:23.206 4595-4595/? I/System.out: IdMedicalStudy=3
2019-01-17 14:34:23.207 4595-4595/? I/System.out: fecha=2019-01-03 15:25
2019-01-17 14:34:23.207 4595-4595/? I/System.out: IdDoctor=3
2019-01-17 14:34:23.207 4595-4595/? I/System.out: IdTypeOfMedicalStudy=2
2019-01-17 14:34:23.207 4595-4595/? I/System.out: IdMedicalStudyPlace=4
2019-01-17 14:34:23.207 4595-4595/? I/System.out: }
2019-01-17 14:34:23.207 4595-4595/? I/System.out: <<<<<
2019-01-17 14:34:23.211 4595-4595/? I/System.out: >>>>> Dumping cursor android.database.sqlite.SQLiteCursor#adad2c5
2019-01-17 14:34:23.211 4595-4595/? I/System.out: 0 {
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdMedicalStudy=1
2019-01-17 14:34:23.211 4595-4595/? I/System.out: fecha=2019-01-01 10:30
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdDoctor=2
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdTypeOfMedicalStudy=3
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdMedicalStudyPlace=3
2019-01-17 14:34:23.211 4595-4595/? I/System.out: }
2019-01-17 14:34:23.211 4595-4595/? I/System.out: 1 {
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdMedicalStudy=2
2019-01-17 14:34:23.211 4595-4595/? I/System.out: fecha=2019-01-02 16:45
2019-01-17 14:34:23.211 4595-4595/? I/System.out: IdDoctor=4
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdTypeOfMedicalStudy=4
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdMedicalStudyPlace=2
2019-01-17 14:34:23.212 4595-4595/? I/System.out: }
2019-01-17 14:34:23.212 4595-4595/? I/System.out: 2 {
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdMedicalStudy=3
2019-01-17 14:34:23.212 4595-4595/? I/System.out: fecha=2019-01-03 15:25
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdDoctor=3
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdTypeOfMedicalStudy=2
2019-01-17 14:34:23.212 4595-4595/? I/System.out: IdMedicalStudyPlace=4
2019-01-17 14:34:23.212 4595-4595/? I/System.out: }
2019-01-17 14:34:23.212 4595-4595/? I/System.out: <<<<<

Related

Convert unix_timestamp to utc_timestamp using pyspark, unix_timestamp not working

I have a string column that has unix_tstamp in a pyspark dataframe.
unix_tstamp utc_stamp
1547741586462 2019-01-17 16:13:06:462
1547741586562 2019-01-17 16:13:06:562
1547741586662 2019-01-17 16:13:06:662
1547741586762 2019-01-17 16:13:06:762
1547741586862 2019-01-17 16:13:06:862
I exactly want to perform conversion in the above format but I'm getting null when I tried the below method,
data.withColumn("utc_stamp", unix_timestamp('unix_tstamp',"yyyy-MM-dd'T'HH:mm:ss.SSSZ"))
Am I missing something or is there any other way?

You can specify the format like this:
df = df.withColumn('utc_stamp', F.from_unixtime('Timestamp', format="YYYY-MM-dd HH:mm:ss.SSS"))
df.show(truncate=False)
+----------+-----------------------+
|Timestamp |utc_stamp |
+----------+-----------------------+
|1579887004|2020-01-24 18:30:04.000|
|1579887004|2020-01-24 18:30:04.000|
+----------+-----------------------+
Sample Data
# today's datestamp
d = [[1579887004],
[1579887004],
]
df = spark.createDataFrame(d, ['Timestamp'])

writing spark dataframe by overwriting the values in key as redis list

I have a redis key in which I have inserted the data using command
format is like below
lpush key_name 'json data'
lpush test4 '{"id":"358899055773504","start_lat":0,"start_long":0,"end_lat":26.075942,"end_long":83.179573,"start_interval":"2018-02-01 00:01:00","end_interval":"2018-02-01 00:02:00"}'
Now I did some processing in spark scala and got a dataframe like this
id end_interval start_interval start_lat start_long end_lat end_long
866561010400483 2018-02-01 00:02:00 2018-02-01 00:01:00 0 0 26.075942 83.179573
358899055773504 2018-08-02 04:57:29 2018-08-01 21:35:52 22.684658 75.909716 22.684658 75.909716
862304021520161 2018-02-01 00:02:00 2018-02-01 00:01:00 0 0 26.075942 83.179573
862304021470656 2018-08-02 05:25:11 2018-08-02 00:03:21 26.030764 75.180587 26.030764 75.180587
351608081284031 2018-08-02 05:22:10 2018-08-02 05:06:17 17.117284 78.269013 17.117284 78.269013
866561010407496 2018-02-01 00:02:00 2018-02-01 00:01:00 0 0 26.075942 83.179573
862304021504975 2018-02-01 00:02:00 2018-02-01 00:01:00 0 0 26.075942 83.179573
866561010407868 2018-02-01 00:02:00 2018-02-01 00:01:00 0 0 26.075942 83.179573
862304021483931 2018-02-01 00:02:00 2018-02-01 00:01:00 0 0 26.075942 83.179573
I want to insert this dataframe into the same key(test4) by overwriting it as a redis list(as it was before but now with the rows of dataframe)

how to split row into multiple rows on the basis of date using spark scala?

I have a dataframe that contains rows like below and i need to split this data to get month wise series on the basis of pa_start_date and pa_end_date and create a new column period start and end date.
i/p dataframe df is
p_id pa_id p_st_date p_end_date pa_start_date pa_end_date
p1 pa1 2-Jan-18 5-Dec-18 2-Mar-18 8-Aug-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18
p1 pa3 1-Jan-17 1-Dec-17 9-Feb-17 20-Apr-17
o/p is
p_id pa_id p_st_date p_end_date pa_start_date pa_end_date period_start_date period_end_date
p1 pa1 2-Jan-18 5-Dec-18 2-Mar-18 8-Aug-18 2-Mar-18 31-Mar-18
p1 pa1 2-Jan-18 5-Dec-18 2-Mar-18 8-Aug-18 1-Apr-18 30-Apr-18
p1 pa1 2-Jan-18 5-Dec-18 2-Mar-18 8-Aug-18 1-May-18 31-May-18
p1 pa1 2-Jan-18 5-Dec-18 2-Mar-18 8-Aug-18 1-Jun-18 30-Jun-18
p1 pa1 2-Jan-18 5-Dec-18 2-Mar-18 8-Aug-18 1-Jul-18 31-Jul-18
p1 pa1 2-Jan-18 5-Dec-18 2-Mar-18 8-Aug-18 1-Aug-18 31-Aug-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 6-Mar-18 31-Mar-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-Apr-18 30-Apr-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-May-18 31-May-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-Jun-18 30-Jun-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-Jul-18 31-Jul-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-Aug-18 31-Aug-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-Sep-18 30-Sep-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-Oct-18 30-Oct-18
p1 pa2 3-Jan-18 8-Dec-18 6-Mar-18 10-Nov-18 1-Nov-18 30-Nov-18
p1 pa3 1-Jan-17 1-Dec-17 9-Feb-17 20-Apr-17 9-Feb-17 28-Feb-17
p1 pa3 1-Jan-17 1-Dec-17 9-Feb-17 20-Apr-17 1-Mar-17 31-Mar-17
p1 pa3 1-Jan-17 1-Dec-17 9-Feb-17 20-Apr-17 1-Apr-17 30-Apr-17

I have done with creating an UDF like below.
This UDF will create an array of dates(dates from all the months inclusive start and end dates) if pa_start_date and the number of months between the pa_start_date and pa_end_date passed as parameters.
def udfFunc: ((Date, Long) => Array[String]) = {
(d, l) =>
{
var t = LocalDate.fromDateFields(d)
val dates: Array[String] = new Array[String](l.toInt)
for (i <- 0 until l.toInt) {
println(t)
dates(i) = t.toString("YYYY-MM-dd")
t = LocalDate.fromDateFields(t.toDate()).plusMonths(1)
}
dates
}
}
val my_udf = udf(udfFunc)
And the final dataframe is created as below.
val df = ss.read.format("csv").option("header", true).load(path)
.select($"p_id", $"pa_id", $"p_st_date", $"p_end_date", $"pa_start_date", $"pa_end_date",
my_udf(to_date(col("pa_start_date"), "dd-MMM-yy"), ceil(months_between(to_date(col("pa_end_date"), "dd-MMM-yy"), to_date(col("pa_start_date"), "dd-MMM-yy")))).alias("udf")) // gives array of dates from UDF
.withColumn("after_divide", explode($"udf")) // divide array of dates to individual rows
.withColumn("period_end_date", date_format(last_day($"after_divide"), "dd-MMM-yy")) // fetching the end_date for the particular date
.drop("udf")
.withColumn("row_number", row_number() over (Window.partitionBy("p_id", "pa_id", "p_st_date", "p_end_date", "pa_start_date", "pa_end_date").orderBy(col("after_divide").asc))) // just helper column for calculating `period_start_date` below
.withColumn("period_start_date", date_format(when(col("row_number").isin(1), $"after_divide").otherwise(trunc($"after_divide", "month")), "dd-MMM-yy"))
.drop("after_divide")
.drop("row_number") // dropping all the helper columns which is not needed in output.
And here is the output.
+----+-----+---------+----------+-------------+-----------+---------------+-----------------+
|p_id|pa_id|p_st_date|p_end_date|pa_start_date|pa_end_date|period_end_date|period_start_date|
+----+-----+---------+----------+-------------+-----------+---------------+-----------------+
| p1| pa3| 1-Jan-17| 1-Dec-17| 9-Feb-17| 20-Apr-17| 28-Feb-17| 09-Feb-17|
| p1| pa3| 1-Jan-17| 1-Dec-17| 9-Feb-17| 20-Apr-17| 31-Mar-17| 01-Mar-17|
| p1| pa3| 1-Jan-17| 1-Dec-17| 9-Feb-17| 20-Apr-17| 30-Apr-17| 01-Apr-17|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 31-Mar-18| 06-Mar-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 30-Apr-18| 01-Apr-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 31-May-18| 01-May-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 30-Jun-18| 01-Jun-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 31-Jul-18| 01-Jul-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 31-Aug-18| 01-Aug-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 30-Sep-18| 01-Sep-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 31-Oct-18| 01-Oct-18|
| p1| pa2| 3-Jan-18| 8-Dec-18| 6-Mar-18| 10-Nov-18| 30-Nov-18| 01-Nov-18|
| p1| pa1| 2-Jan-18| 5-Dec-18| 2-Mar-18| 8-Aug-18| 31-Mar-18| 02-Mar-18|
| p1| pa1| 2-Jan-18| 5-Dec-18| 2-Mar-18| 8-Aug-18| 30-Apr-18| 01-Apr-18|
| p1| pa1| 2-Jan-18| 5-Dec-18| 2-Mar-18| 8-Aug-18| 31-May-18| 01-May-18|
| p1| pa1| 2-Jan-18| 5-Dec-18| 2-Mar-18| 8-Aug-18| 30-Jun-18| 01-Jun-18|
| p1| pa1| 2-Jan-18| 5-Dec-18| 2-Mar-18| 8-Aug-18| 31-Jul-18| 01-Jul-18|
| p1| pa1| 2-Jan-18| 5-Dec-18| 2-Mar-18| 8-Aug-18| 31-Aug-18| 01-Aug-18|
+----+-----+---------+----------+-------------+-----------+---------------+-----------------+

Here is how I did it using RDD and UDF
kept data in a file
/tmp/pdata.csv
p_id,pa_id,p_st_date,p_end_date,pa_start_date,pa_end_date
p1,pa1,2-Jan-18,5-Dec-18,2-Mar-18,8-Aug-18
p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18
p1,pa3,1-Jan-17,1-Dec-17,9-Feb-17,20-Apr-17
spark scala code
import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql.functions.broadcast
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import scala.collection.mutable.ListBuffer
import java.util.{GregorianCalendar, Date}
import java.util.Calendar
val ipt = spark.read.format("com.databricks.spark.csv").option("header","true").option("inferchema","true").load("/tmp/pdata.csv")
val format = new java.text.SimpleDateFormat("dd-MMM-yy")
format.format(new java.util.Date()) --test date
def generateDates(startdate: Date, enddate: Date): ListBuffer[String] ={
var dateList = new ListBuffer[String]()
var calendar = new GregorianCalendar()
calendar.setTime(startdate)
val monthName = Array("Jan", "Feb","Mar", "Apr", "May", "Jun", "Jul","Aug", "Sept", "Oct", "Nov","Dec")
dateList +=(calendar.get(Calendar.DAY_OF_MONTH)) + "-" + monthName(calendar.get(Calendar.MONTH)) + "-" + (calendar.get(Calendar.YEAR)) +","+
(calendar.getActualMaximum(Calendar.DAY_OF_MONTH)) + "-" + monthName(calendar.get(Calendar.MONTH)) + "-" + (calendar.get(Calendar.YEAR))
calendar.add(Calendar.MONTH, 1)
while (calendar.getTime().before(enddate)) {
dateList +="01-" + monthName(calendar.get(Calendar.MONTH)) + "-" + (calendar.get(Calendar.YEAR)) +","+
(calendar.getActualMaximum(Calendar.DAY_OF_MONTH)) + "-" + monthName(calendar.get(Calendar.MONTH)) + "-" + (calendar.get(Calendar.YEAR))
calendar.add(Calendar.MONTH, 1)
}
dateList
}
val oo = ipt.rdd.map(x=>(x(0).toString(),x(1).toString(),x(2).toString(),x(3).toString(),x(4).toString(),x(5).toString()))
oo.flatMap(pp=> {
var allDates = new ListBuffer[(String,String,String,String,String,String,String)]()
for (x <- generateDates(format.parse(pp._5),format.parse(pp._6))) {
allDates += ((pp._1,pp._2,pp._3,pp._4,pp._5,pp._6,x))}
allDates
}).collect().foreach(println)
I did Flatmap and while doing that function is used to pull concatenated dates and list buffer to append the concatenated values
I used monthName to get the month as per your output format.
output came as below
(p1,pa1,2-Jan-18,5-Dec-18,2-Mar-18,8-Aug-18,2-Mar-2018,31-Mar-2018)
(p1,pa1,2-Jan-18,5-Dec-18,2-Mar-18,8-Aug-18,01-Apr-2018,30-Apr-2018)
(p1,pa1,2-Jan-18,5-Dec-18,2-Mar-18,8-Aug-18,01-May-2018,31-May-2018)
(p1,pa1,2-Jan-18,5-Dec-18,2-Mar-18,8-Aug-18,01-Jun-2018,30-Jun-2018)
(p1,pa1,2-Jan-18,5-Dec-18,2-Mar-18,8-Aug-18,01-Jul-2018,31-Jul-2018)
(p1,pa1,2-Jan-18,5-Dec-18,2-Mar-18,8-Aug-18,01-Aug-2018,31-Aug-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,6-Mar-2018,31-Mar-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-Apr-2018,30-Apr-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-May-2018,31-May-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-Jun-2018,30-Jun-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-Jul-2018,31-Jul-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-Aug-2018,31-Aug-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-Sept-2018,30-Sept-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-Oct-2018,31-Oct-2018)
(p1,pa2,3-Jan-18,8-Dec-18,6-Mar-18,10-Nov-18,01-Nov-2018,30-Nov-2018)
(p1,pa3,1-Jan-17,1-Dec-17,9-Feb-17,20-Apr-17,9-Feb-2017,28-Feb-2017)
(p1,pa3,1-Jan-17,1-Dec-17,9-Feb-17,20-Apr-17,01-Mar-2017,31-Mar-2017)
(p1,pa3,1-Jan-17,1-Dec-17,9-Feb-17,20-Apr-17,01-Apr-2017,30-Apr-2017)
I am happy t explain more if any one has doubt and also I might have read file in a silly way we can improve that as well.

Yii2-Send email as a batch

If you want to send the output of the query into an email then you should go for creating a csv file and sending as an attachment via email.
public function test() {
$sql = "SELECT COUNT(DISTINCT od.`meter_serial`) AS 'OGP Created',
COUNT(DISTINCT mp.`meter_id`) AS 'Installed & Un-Verified Meters',
COUNT(DISTINCT ins.`meter_msn`) AS 'Installed & Verified',
sd.`sub_div_code` AS 'SD Code',sd.`name` AS 'SD-Name'
FROM `ogp_detail` od
INNER JOIN `survey_hesco_subdivision` sd ON od.`sub_div` =
sd.`sub_div_code`
LEFT JOIN `meter_ping` mp ON od.`meter_id` = mp.`meter_id`
LEFT JOIN `installations` ins ON od.`meter_serial` = ins.`meter_msn`
WHERE od.`meter_type` = '3-Phase'
GROUP BY sd.`name`";
$results = Yii::$app->db->createCommand($sql)->queryAll();
//create a csv file
$filename = $this->getAttachment($results);
//send email
$this->sendEmail('omer#omer.com',$filename);
}
/**
*
* #param type $email
* #param type $filename
* #return type
*/
public function sendEmail($email,$filename)
{
return Yii::$app->mailer->compose()
->setTo($email)
->setFrom(['admin#domain.com' => 'Admin'])
->setSubject('Some Subject for the email')
->setTextBody('Text body of the email ')
->attach($filename,['filename'=>'information','contentType'=>'text/csv'])
->send();
}
/**
*
* #param type $results
* #return string $filename
*/
public function getAttachment($results) {
$filename = Yii::getAlias('#webroot') . DIRECTORY_SEPARATOR . 'my-attachment-' . time() . '.csv';
//open a csv file
$file = fopen($filename, "w");
$headerInjected = false;
$header = ['OGP Created', 'Installed & Un-Verified Meters', 'Installed & Verified', 'SD Code', 'SD-Name'];
//write lines to the csv file
foreach ($results as $result) {
if (!$headerInjected) {
$headerInjected = true;
fputcsv($file, $header);
}
fputcsv($file, $result);
$sum_OGP +=$result['OGP_Created'];
$sum_UnVerified +=$result['Installed_&_Un_Verified_Meters'];
$sum_Verified +=$result['Installed_&_Verified_Meters'];
}
//add the sum in the last row
fputcsv($file,[$sum_OGP,$sum_UnVerified,$sum_Verified]);
//close the file handle
fclose($file);
return $filename;
}

How to parse delimited fields with some (sub)fields empty?

I use Spark 2.1.1 and Scala 2.11.8 in spark-shell.
My input dataset is something like :
2017-06-18 00:00:00 , 1497769200 , z287570731_serv80i:7:175 , 5:Re
2017-06-18 00:00:00 , 1497769200 , p286274731_serv80i:6:100 , 138
2017-06-18 00:00:00 , 1497769200 , t219420679_serv37i:2:50 , 5
2017-06-18 00:00:00 , 1497769200 , v290380588_serv81i:12:800 , 144:Jo
2017-06-18 00:00:00 , 1497769200 , z292902510_serv83i:4:45 , 5:Re
2017-06-18 00:00:00 , 1497769200 , v205454093_serv75i:5:70 , 50:AK
It is saved as a CSV file which is read using sc.textFile("input path")
After a few transformations, this is the output of the RDD I have:
(String, String) = ("Re ",7)
I get this by executing
val tid = read_file.map { line =>
val arr = line.split(",")
(arr(3).split(":")(1), arr(2).split(":")(1))
}
My input RDD is:
( z287570731_serv80i:7:175 , 5:Re )
( p286274731_serv80i:6:100 , 138 )
( t219420679_serv37i:2:50 , 5 )
( v290380588_serv81i:12:800 , 144:Jo )
( z292902510_serv83i:4:45 , 5:Re )
As it can be observed, in the first entry column 2, I have
5:Re
of which I'm getting the output
("Re ",7)
However when I reach the second row, according to the format, column 2 is 138 which should be
138:null
but gives ArrayIndexOutOfBoundsException on executing
tid.collect()
How can I correct this so that null is displayed with 138 and 5 for the second and third rows respectively? I tried to do it this way:
tid.filter(x => x._1 != null )

The problem is that you expect at least two parts in the position while you may have only one.
The following is the line that causes the issue.
{var arr = line.split(","); (arr(3).split(":")(1),arr(2).split(":")(1))});
After you do line.split(",") you then arr(3).split(":")(1) and also arr(2).split(":")(1).
There's certainly too much assumption about the format and got beaten by missing values.
but gives ArrayIndexOutOfBoundsException on executing
That's because you access 3 and 2 elements but have only 2 (!)
scala> sc.textFile("input.csv").
map { line => line.split(",").toSeq }.
foreach(println)
WrappedArray(( z287570731_serv80i:7:175i , 5:Re ))
WrappedArray(( p286274731_serv80i:6:100 , 138 ))
The problem has almost nothing to do with Spark. It's a regular Scala problem where the data is not where you expect it.
scala> val arr = "hello,world".split(",")
arr: Array[String] = Array(hello, world)
Note that what's above is just a pure Scala.
Solution 1 - Spark Core's RDDs
Given the following dataset...
2017-06-18 00:00:00 , 1497769200 , z287570731_serv80i:7:175 , 5:Re
2017-06-18 00:00:00 , 1497769200 , p286274731_serv80i:6:100 , 138
2017-06-18 00:00:00 , 1497769200 , t219420679_serv37i:2:50 , 5
2017-06-18 00:00:00 , 1497769200 , v290380588_serv81i:12:800 , 144:Jo
2017-06-18 00:00:00 , 1497769200 , z292902510_serv83i:4:45 , 5:Re
2017-06-18 00:00:00 , 1497769200 , v205454093_serv75i:5:70 , 50:AK
...I'd do the following:
val solution = sc.textFile("input.csv").
map { line => line.split(",") }.
map { case Array(_, _, third, fourth) => (third, fourth) }.
map { case (third, fourth) =>
val Array(_, a # _*) = fourth.split(":")
val Array(_, right, _) = third.split(":")
(a.headOption.orNull, right)
}
scala> solution.foreach(println)
(Re,7)
(null,6)
(Re,4)
(null,2)
(AK,5)
(Jo,12)
Solution 2 - Spark SQL's DataFrames
I strongly recommend using Spark SQL for such data transformations. As you said, you are new to Spark, so why not start from the right place which is exactly Spark SQL.
val solution = spark.
read.
csv("input.csv").
select($"_c2" as "third", $"_c3" as "fourth").
withColumn("a", split($"fourth", ":")).
withColumn("left", $"a"(1)).
withColumn("right", split($"third", ":")(1)).
select("left", "right")
scala> solution.show(false)
+----+-----+
|left|right|
+----+-----+
|Re |7 |
|null|6 |
|null|2 |
|Jo |12 |
|Re |4 |
|AK |5 |
+----+-----+

If your data is as below in a file
( z287570731_serv80i:7:175 , 5:Re )
( p286274731_serv80i:6:100 , 138 )
( t219420679_serv37i:2:50 , 5 )
( v290380588_serv81i:12:800 , 144:Jo )
( z292902510_serv83i:4:45 , 5:Re )
Then you can use
val tid = sc.textFile("path to the input file")
.map(line => line.split(","))
.map(array => {
if (array(1).contains(":")) (array(1).split(":")(1).replace(")", "").trim, array(0).split(":")(1))
else (null, array(0).split(":")(1))
})
tid.foreach(println)
which should give you output as
(Re,7)
(null,6)
(null,2)
(Jo,12)
(Re,4)
But if you have data as
2017-06-18 00:00:00 , 1497769200 , z287570731_serv80i:7:175 , 5:Re
2017-06-18 00:00:00 , 1497769200 , p286274731_serv80i:6:100 , 138
2017-06-18 00:00:00 , 1497769200 , t219420679_serv37i:2:50 , 5
2017-06-18 00:00:00 , 1497769200 , v290380588_serv81i:12:800 , 144:Jo
2017-06-18 00:00:00 , 1497769200 , z292902510_serv83i:4:45 , 5:Re
2017-06-18 00:00:00 , 1497769200 , v205454093_serv75i:5:70 , 50:AK
2017-06-18 00:00:00 , 1497769200 , z287096299_serv80i:19:15000 , 39:Re
Then you need to do
val tid = sc.textFile("path to the input file")
.map(line => line.split(","))
.map(array => {
if (array(3).contains(":")) (array(3).split(":")(1).replace(")", "").trim, array(2).split(":")(1))
else (null, array(2).split(":")(1))
})
tid.foreach(println)
And you should have output as
(Re,7)
(null,6)
(null,2)
(Jo,12)
(Re,4)
(AK,5)
(Re,19)

ArrayIndexOutOfBounds is occurring because the element will not be there if no : is present in the second element of the tuple.
You can check if : is present in the second element of each tuple. And then use map to give you an intermediate RDD on which you can run your current query.
val rdd = sc.parallelize(Array(
( "z287570731_serv80i:7:175" , "5:Re" ),
( "p286274731_serv80i:6:100" , "138" ),
( "t219420679_serv37i:2:50" , "5" ),
( "v290380588_serv81i:12:800" , "144:Jo" ),
( "z292902510_serv83i:4:45" , "5:Re" )))
rdd.map { x =>
val idx = x._2.lastIndexOf(":")
if(idx == -1) (x._1, x._2+":null")
else (x._1, x._2)
}
There are obviously better (lesser lines of code) ways to do what you want to accomplish but as a beginner, it's good to layout each step in a single command so t's easily readable and understandable, specially with scala where you can stop global warming with a single line of code.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

updating date delete the row in sqlite android - android-sqlite

Related

Convert unix_timestamp to utc_timestamp using pyspark, unix_timestamp not working

writing spark dataframe by overwriting the values in key as redis list

how to split row into multiple rows on the basis of date using spark scala?

Yii2-Send email as a batch

How to parse delimited fields with some (sub)fields empty?

Categories

Resources