Pyspark string comparison with == for RDkit functions throws error

Pyspark string comparison with == for RDkit functions throws error - pyspark

I've a Pyspark UDF defined as below -
from rdkit import Chem
input_smile = 'CCOC(=O)c1cc2cc(ccc2[nH]1)C(=O)O'
converted_smile_in = Chem.MolToSmiles(Chem.MolFromSmiles(input_smile)
def convertSmile(smile):
return (Chem.MolToSmiles(Chem.MolFromSmiles(smile)))
applyconvertSmileUdf = udf(convertSmile)
data_converted = data_converted.withColumn("converted_smile", applyconvertSmileUdf(data_filtered.smiles))
if __name__ == "__main__":
# using the new approach
data_converted.filter(data_converted.converted_smile == converted_smile_in ).select("id","smiles").show()
else:
print("Cannot convert!")
Comparison between data_converted.converted_smile and converted_smile_in throws error. I've printed some 20 values for converted_smile and it looks good. Can't we do string comparison this way?
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolfiles.MolToSmiles(NoneType) did not match C++ signature:
MolToSmiles(RDKit::ROMol mol, bool isomericSmiles=True, bool kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
allBondsExplicit=False, bool allHsExplicit=False, bool doRandom=False)

replace
data_converted.filter(data_converted.converted_smile == converted_smile_in ).select("id","smiles").show()
with
from pyspark.sql.functions import lit
data_converted.filter(data_converted.converted_smile == lit(converted_smile_in) ).select("id","smiles").show()

Related

Check variable run time Type in flutter with conditions like "123" is present as a String but is a int so how can i check this?

I have to check runtime Type of value for this I am using :-
for Example:-
String a = "abc";
int b = 123;
var c = "123"; //this is int value but because of this double quotes is represent as a String
a.runtimeType == String //true
b.runtimeType == int // true
c.runtimeType == String //true
c.runtimeType == int //false
a = "abc" // okay
b = 123 //okay
c = "123" //is issue
now I have to call a api with only String body in this case :-
this c is called the API because is String but i know this is a int value which is present as a String, so I have to stop this.
How can I check this??
when I am using try catch so my program is stopped because of FormatException error.
Note:- I don't know the real value of C, may be its "123" or "65dev" or "strjf" this value is changed every time.
and if i am parsing this in int to this return an error in many case.

Ok i understood that you want to pass "123" by checking and if it is int you are passing it , My question is what you will do if it is "123fe" you are going to pass as string? or you will pass nothing.

I don't know how you're passing it to API but if you wanna pass integer value from string quoted variable, you can parse/convert to integer like this.
int.parse(c);
either you can pass it directly or you can store in another variable and pass that variable.
Alternatively if you've int value and to have to pass it as a string, simply parse like this
integerValue.toString();
according to your code
b.toString();
Edit
String a = '20';
String b = 'a20';
try{
int check = int.parse(a);
//call your api inside try then inside if
if(check.runtimeType == int){
print('parsed $check');
}
}
catch(e){
print('not parsed ');
//handle your error
throw(e);
}

This will definitely help you!
String name = "5Syed8Ibrahim";
final RegExp nameRegExp = RegExp(r'^[a-zA-Z ][a-zA-Z ]*[a-zA-Z ]$');
print(nameRegExp.hasMatch(name));
//output false
name = "syed ibrahim";
print(nameRegExp.hasMatch(name));
//output true
Just check the output and based on that boolean value invoke api call
I hope it will done the work

Neatly parsing a date in "MMddyy" format along with other formats in dart

I guess it is not possible to parse a date in "MMddyy" format in dart.
void main() {
String strcandidate = "031623";
String format = "MMddyy";
var originalFormat = DateFormat(format).parse(strcandidate);
}
Output:
Uncaught Error: FormatException: Trying to read dd from 031623 at position 6
The following works fine when parsing a date in "MM-dd-yy" format.
void main() {
String strcandidate = "03-16-23";
String format = "MM-dd-yy";
var originalFormat = DateFormat(format).parse(strcandidate);
}
In the problem, the input date string can be in any format e.g ['yyyy-MM-dd', 'MMM'-yyyy, 'MM/dd/yy']. I am parsing the input string for these formats in a loop as follows.
dateFormatsList = ['yyyy-MM-dd', 'MMM'-yyyy, 'MM/dd/yy'];
for (String format in dateFormatsList ) {
try {
originalFormat = DateFormat(format).parse(strcandidate);
dateFound = true;
} catch (e) {}
}
Adding 'MMddyy' to dateFormatsList is not going to work.
But regular expression be used to parse this format.
However if all formats are parsed using parse method and one additional format is parsed using regular expression, then the code is not that neat, and cluttered.
To write as much neat and efficient code as possible, if you want, you can share your insights about any possibility for making it efficient and clean while incorporating 'MMddyy'format. Tysm!

See How do I convert a date/time string to a DateTime object in Dart? for how to parse various date/time strings to DateTime objects.
If you need to mix approaches, you can provide a unified interface. Instead of using a List<String> for your list of formats, you can use a List<DateTime Function(String)>:
import 'package:intl/intl.dart';
/// Parses a [DateTime] from [dateTimeString] using a [RegExp].
///
/// [re] must have named groups with names `year`, `month`, and `day`.
DateTime parseDateFromRegExp(RegExp re, String dateTimeString) {
var match = re.firstMatch(dateTimeString);
if (match == null) {
throw FormatException('Failed to parse: $dateTimeString');
}
var year = match.namedGroup('year');
var month = match.namedGroup('month');
var day = match.namedGroup('day');
if (year == null || month == null || day == null) {
throw ArgumentError('Regular expression is malformed');
}
// In case we're parsing a two-digit year format, instead of
// parsing the strings ourselves, reparse it with [DateFormat] so that it can
// apply its -80/+20 rule.
//
// [DateFormat.parse] doesn't work without separators, which is why we
// can't use directly on the original string. See:
// https://github.com/dart-lang/intl/issues/210
return DateFormat('yy-MM-dd').parse('$year-$month-$day');
}
typedef DateParser = DateTime Function(String);
DateParser dateParserFromRegExp(String rePattern) =>
(string) => parseDateFromRegExp(RegExp(rePattern), string);
var parserList = [
DateFormat('yyyy-MM-dd').parse,
DateFormat('MMM-yyyy').parse,
DateFormat('MM/dd/yy').parse,
dateParserFromRegExp(
r'^(?<month>\d{2})(?<day>\d{2})(?<year>\d{4})$',
)
];
void main() {
var strcandidate = '12311776';
DateTime? originalFormat;
for (var tryParse in parserList) {
try {
originalFormat = tryParse(strcandidate);
break;
} on Exception {
// Try the next format.
}
}
print(originalFormat);
}

I think it's a bit hacky but what about use a regular expression (RegExp) to parse the date divider and then replace it with just ""?

void main() {
String strcandidate = "031623";
String strYear = strcandidate.substring(4);
//Taken 20 as the year like 2023 as year is in 2 digits
String _newDateTime = '20' + strYear + strcandidate.substring(0, 4);
var _originalFormat = DateTime.parse(_newDateTime);
print(_originalFormat);
}

add the intl to yaml then write this code:
import 'package:intl/intl.dart';
void main() {
var strcandidate = DateTime(2023, 3, 16);
String format = "MMddyy";
var originalFormat = DateFormat(format).format(strcandidate);
print(originalFormat);
}

Firestore Security rules - checking if a string can be cast to an int

There is this page showing how to convert a string to an int:
int("2") == 2
int(2.0) == 2
But how can I know before doing the conversion whether it will work or throw an exception?
For instance how can I implement the following:
IF x can be cast to an integer THEN return int(x) < 10
ELSE IF y can be cast to an integer THEN return int(y) < 10
ELSE return false

You can use the cast_as_int function defined below:
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
function cast_as_int(x) {
let pattern = '[+-]?([0-9]*[.])?[0-9]+';
return (
(x is float)
|| (x is int)
|| (x is string) && (x.matches(pattern))
) ? int(float(x)) : null;
}
// All gets will succeed
match /{document=**} {
allow get: if cast_as_int(1) == 1
&& cast_as_int('2') == 2
&& cast_as_int('3.14') == 3
&& cast_as_int(4.44) == 4
&& cast_as_int("5!") != 5;
}
}
}
The function takes in an variable and returns either an Integer or null.
Firestore does not allow statements to evaluate to multiple types, so the function can only return Integer or null (not false as requested).
The function assumes you want to convert from (Integer or Float) to Integer. If you only want to convert from Integer to Integer, then replace int(float(x)) with int(x).
The regex for let pattern = '[+-]?([0-9]*[.])?[0-9]+' was taken this StackOverflow Question

Is there a way to sort string lists by numbers inside of the strings?

Is there a way to sort something like:
List<String> hi = ['1hi', '2hi','5hi', '3hi', '4hi'];
to this?
['1hi', '2hi','3hi', '4hi', '5hi']

Just calling List<String>.sort() by itself will do a lexicographic sort. That is, your strings will be sorted in character code order, and '10' will be sorted before '2'. That usually isn't expected.
A lexicographic sort will work if your numbers have leading 0s to ensure that all numbers have the same number of digits. However, if the number of digits is variable, you will need to parse the values of the numbers for sorting. A more general approach is to provide a callback to .sort() to tell it how to determine the relative ordering of two items.
Luckily, package:collection has a compareNatural function that can do this for you:
import 'package:collection/collection.dart';
List<String> hi = ['1hi', '2hi','5hi', '3hi', '4hi'];
hi.sort(compareNatural);
If your situation is a bit more complicated and compareNatural doesn't do what you want, a more general approach is to make the .sort() callback do parsing itself, such as via a regular expression:
/// Returns the integer prefix from a string.
///
/// Returns null if no integer prefix is found.
int parseIntPrefix(String s) {
var re = RegExp(r'(-?[0-9]+).*');
var match = re.firstMatch(s);
if (match == null) {
return null;
}
return int.parse(match.group(1));
}
int compareIntPrefixes(String a, String b) {
var aValue = parseIntPrefix(a);
var bValue = parseIntPrefix(b);
if (aValue != null && bValue != null) {
return aValue - bValue;
}
if (aValue == null && bValue == null) {
// If neither string has an integer prefix, sort the strings lexically.
return a.compareTo(b);
}
// Sort strings with integer prefixes before strings without.
if (aValue == null) {
return 1;
} else {
return -1;
}
}
void main() {
List<String> hi = ['1hi', '2hi','5hi', '3hi', '4hi'];
hi.sort(compareIntPrefixes);
}

You can sort the list like this:
hi.sort();
(because numbers sort before letters in its implementation)

Lexical Analyzer not getting the next character

So I am working on a project where we are making a small compiler program but before I can move on to the other parts I am having troubles with getting the lexical analyzer to output anything after '\BEGIN' afterwards I debugged it and it seems the value is stuck in a loop where the condition is saying the next character is always a newline. Is it because I haven't added the pattern matching yet to the defined tokens?
Here is the code
import java.util
//import com.sun.javafx.fxml.expression.Expression.Parser.Token
/*Lexical analyzer will be responsible for the following:
- finds the lexemes
- Checks each given character determining the tokens
* */
class MyLexicalAnalyzer extends LexicalAnalyzer {
//Array full of the keywords
//val SpecialCharacters = List(']', '#', '*', '+', '\\', '[', '(',')', "![", '=')
val TEXT = "[a-z] | _ | 0-9 | [A-Z]:"
private var sourceLine: String = null
private val lexeme: Array[Char] = new Array[Char](999)
private var nextChar: Char = 0
private var lexLength: Int = 0
private var position: Int = 0
private val lexems: util.List[String] = new util.ArrayList[String]
def start(line: String): Unit = {
initializeLexems()
sourceLine = line
position = 0
getChar()
getNextToken()
}
// A helper method to determine if the current character is a space.
private def isSpace(c: Char) = c == ' '
//Defined and intialized tokens
def initializeLexems(): Any = {
lexems.add("\\BEGIN")
lexems.add("\\END")
lexems.add("\\PARAB")
lexems.add("\\DEF[")
lexems.add("\\USE[")
lexems.add("\\PARAE")
lexems.add("\\TITLE[")
lexems.add("]")
lexems.add("[")
lexems.add("\\")
lexems.add("(")
lexems.add(")")
lexems.add("![")
lexems.add("=")
lexems.add("+")
lexems.add("#")
}
//val pattern = new regex("''").r
def getNextToken() ={
lexLength = 0
// Ignore spaces and add the first character to the token
getNonBlank()
addChar()
getChar()
// Continue gathering characters for the token
while ( {
(nextChar != '\n') && (nextChar != ' ')
}) {
addChar()
getChar()
}
// Convert the gathered character array token into a String
val newToken: String = new String(lexeme)
if (lookup(newToken.substring(0, lexLength)))
MyCompiler.setCurrentToken(newToken.substring(0,lexLength))
}
// A helper method to get the next non-blank character.
private def getNonBlank(): Unit = {
while ( {
isSpace(nextChar)
}) getChar()
}
/*
Method of function that adds the current character to the token
after checking to make sure that length of the token isn't too
long, a lexical error in this case.
*/
def addChar(){
if (lexLength <= 998) {
lexeme({
lexLength += 1; lexLength - 1
}) = nextChar
lexeme(lexLength) = 0
}
else
System.out.println("LEXICAL ERROR - The found lexeme is too long!")
if (!isSpace(nextChar))
while ( {
!isSpace(nextChar)
})
getChar()
lexLength = 0
getNonBlank()
addChar()
}
//Reading from the file its obtaining the tokens
def getChar() {
if (position < sourceLine.length)
nextChar = sourceLine.charAt ( {
position += 1;
position - 1
})
else nextChar = '\n'
def lookup(candidateToken: String): Boolean ={
if (!(lexems.contains(candidateToken))) {
System.out.println("LEXICAL ERROR - '" + candidateToken + "' is not recognized.")
return false
}
return true
}
}
else nextChar = '\n'<- this is where the condition goes after rendering the first character '\BEGIN' then just keeps outputting in the debug console as listed below.
This is what the debug console it outputting after '\BEGIN' is read through
Can anyone please let me know why that is? This happens after I keep stepping into it many times as well.
Here is the driver class that uses the lexical analyzer
import scala.io.Source
object MyCompiler {
//check the arguments
//check file extensions
//initialization
//get first token
//call start state
var currentToken : String = ""
def main(args: Array[String]): Unit = {
val filename = args(0)
//check if an input file provided
if(args.length == 0) {
//usage error
println("USAGE ERROR: Must provide an input file. ")
System.exit(0)
}
if(!checkFileExtension(args(0))) {
println("USAGE ERROR: Extension name is invalid make sure its .gtx ")
System.exit(0)
}
val Scanner = new MyLexicalAnalyzer
val Parser = new MySyntaxAnalyzer
//getCurrentToken(Scanner.getNextToken())
//Parser.gittex()
for (line <- Source.fromFile(filename).getLines()){
Scanner.start(line)
println()
}
//.......
//If it gets here, it is compiled
//post processing
}
//checks the file extension if valid and ends with .gtx
def checkFileExtension(filename : String) : Boolean = filename.endsWith(".gtx")
def getCurrentToken() : String = this.currentToken
def setCurrentToken(t : String ) : Unit = this.currentToken = t
}

The code is operating as it is supposed to. The first line contains only the string \BEGIN so the lexical analyser is treating the end of the first line as an '\n' as shown in this method:
def getChar() {
if (position < sourceLine.length)
nextChar = sourceLine.charAt ( {
position += 1;
position - 1
})
else nextChar = '\n'
However, the comment directly above that method does not describe what the method actually does. This could be a hint as to where your confusion lies. If the comment says it should read from the file, but it is not reading from the file, maybe that's what you've forgotten to implement.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Pyspark string comparison with == for RDkit functions throws error - pyspark

replace data_converted.filter(data_converted.converted_smile == converted_smile_in ).select("id","smiles").show() with from pyspark.sql.functions import lit data_converted.filter(data_converted.converted_smile == lit(converted_smile_in) ).select("id","smiles").show()

Related

Check variable run time Type in flutter with conditions like "123" is present as a String but is a int so how can i check this?

Neatly parsing a date in "MMddyy" format along with other formats in dart

Firestore Security rules - checking if a string can be cast to an int

Is there a way to sort string lists by numbers inside of the strings?

Lexical Analyzer not getting the next character

Categories

Resources