I have used dotnetzip in c# with large files with no problem. I have a requirement to zip up some large files in the power shell. I see from dotnetzip docs I can use it iN ps .
But I keep getting the error
Compressed or Uncompressed size, or offset exceeds the maximum value. Consider setting the UseZip64WhenSaving property on the ZipFile instance.
this is my PS code. How do I set the UseZip64WhenSaving in PS?
[System.Reflection.Assembly]::LoadFrom("D:\\mybigfiles\\Ionic.Zip.dll");
$directoryToZip = "D:\\mybigfiles\\";
$zipfile = new-object Ionic.Zip.ZipFile;
#$zipfile.UseZip64WhenSaving;
$e= $zipfile.AddEntry("mybig.csv", "This is a zipfile created from within powershell.")
$e= $zipfile.AddDirectory($directoryToZip, "home")
$zipfile.Save("D:\\mybigfiles\\big.zip");
$zipfile.Dispose();
Working C# code.
using (ZipFile zip = new ZipFile())
{
zip.UseZip64WhenSaving = Zip64Option.AsNecessary;
zip.AddFile(compressedFileName);
zip.AddFile("\\\\server\\bigfile\\CM_Report_20220411200326.csv");
zip.AddFile("\\\\server\\bigfile\\PM_Report_20220411200326.csv");
zip.AddFile("\\\\server\\bigfile\\SCE_Report_20220411200326.csv");
}```
Unlike C#, PowerShell loves implicit type conversions - and it'll implicitly parse and convert a string value to its cognate enum value when you assign it to an enum-typed property:
$zipfile.UseZip64WhenSaving = 'AsNecessary'
Alternatively, make sure you qualify the enum type name:
#$zipfile.UseZip64WhenSaving = [Ionic.Zip.Zip64Option]::AsNecessary
It's also worth noting that all PowerShell string literals act like verbatim strings in C# - in other words, \ is not a special character that needs to be escaped:
$directoryToZip = "D:\mybigfiles\"
# ...
$e = $zipfile.AddDirectory($directoryToZip, "home")
$zipfile.Save("D:\mybigfiles\big.zip")
On page 74 of the ANTRL4 book it says that any Unicode character can be used in a grammar simply by specifying its codepoint in this manner:
'\uxxxx'
where xxxx is the hexadecimal value for the Unicode codepoint.
So I used that technique in a token rule for an ID token:
grammar ID;
id : ID EOF ;
ID : ('a' .. 'z' | 'A' .. 'Z' | '\u0100' .. '\u017E')+ ;
WS : [ \t\r\n]+ -> skip ;
When I tried to parse this input:
Gŭnter
ANTLR throws an error, saying that it does not recognize ŭ. (The ŭ character is hex 016D, so it is within the range specified)
What am I doing wrong please?
ANTLR is ready to accept 16-bit characters but, by default, many locales will read in characters as bytes (8 bits). You need to specify the appropriate encoding when you read from the file using the Java libraries. If you are using the TestRig, perhaps through alias/script grun, then use argument -encoding utf-8 or whatever. If you look at the source code of that class, you will see the following mechanism:
InputStream is = new FileInputStream(inputFile);
Reader r = new InputStreamReader(is, encoding); // e.g., euc-jp or utf-8
ANTLRInputStream input = new ANTLRInputStream(r);
XLexer lexer = new XLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
...
Grammar:
NAME:
[A-Za-z][0-9A-Za-z\u0080-\uFFFF_]+
;
Java:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.TokenStream;
import com.thalesgroup.dms.stimulus.StimulusParser.SystemContext;
final class RequirementParser {
static SystemContext parse( String requirement ) {
requirement = requirement.replaceAll( "\t", " " );
final CharStream charStream = CharStreams.fromString( requirement );
final StimulusLexer lexer = new StimulusLexer( charStream );
final TokenStream tokens = new CommonTokenStream( lexer );
final StimulusParser parser = new StimulusParser( tokens );
final SystemContext system = parser.system();
if( parser.getNumberOfSyntaxErrors() > 0 ) {
Debug.format( requirement );
}
return system;
}
private RequirementParser() {/**/}
}
Source:
Lexers and Unicode text
For those having the same problem using antlr4 in java code, ANTLRInputStream beeing deprecated, here is a working way to pass multi-char unicode data from a String to a the MyLexer lexer :
String myString = "\u2013";
CharBuffer charBuffer = CharBuffer.wrap(myString.toCharArray());
CodePointBuffer codePointBuffer = CodePointBuffer.withChars(charBuffer);
CodePointCharStream cpcs = CodePointCharStream.fromBuffer(codePointBuffer);
OneLexer lexer = new MyLexer(cpcs);
CommonTokenStream tokens = new CommonTokenStream(lexer);
You can specify the encoding of the file when actually reading the file.
For Kotlin/Java that could look like this, no need to specify the encoding in the grammar!
val inputStream: CharStream = CharStreams.fromFileName(fileName, Charset.forName("UTF-16LE"))
val lexer = BlastFeatureGrammarLexer(inputStream)
Supported Charsets by Java/Kotlin
I have an app that is storing images in a Windows Azure Block Blob. I'm adding meta data to each blob that gets uploaded. The metadata may include some special characters. For instance, the registered trademark symbol (®). How do I add this value to meta data in Windows Azure?
Currently, when I try, I get a 400 (Bad Request) error anytime I try to upload a file that uses a special character like this.
Thank you!
You might use HttpUtility to encode/decode the string:
blob.Metadata["Description"] = HttpUtility.HtmlEncode(model.Description);
Description = HttpUtility.HtmlDecode(blob.Metadata["Description"]);
http://lvbernal.blogspot.com/2013/02/metadatos-de-azure-vs-caracteres.html
The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.
joe
HttpUtility.HtmlEncode may not work; if Unicode characters are in your string (i.e. ’), it will fail. So far, I have found Uri.EscapeDataString does handle this edge case and others. However, there are a number of characters that get encoded unnecessarily, such as space (' '=chr(32)=%20).
I mapped the illegal ascii characters metadata will not accept and built this to restore the characters:
static List<string> illegals = new List<string> { "%1", "%2", "%3", "%4", "%5", "%6", "%7", "%8", "%A", "%B", "%C", "%D", "%E", "%F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17", "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%7F", "%80", "%81", "%82", "%83", "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C", "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95", "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E", "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7", "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0", "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9", "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2", "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB", "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4", "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD", "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6", "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF", "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8", "%F9", "%FA", "%FB", "%FC", "%FD", "%FE" };
private static string MetaDataEscape(string value)
{
//CDC%20Guideline%20for%20Prescribing%20Opioids%20Module%206%3A%20%0Ahttps%3A%2F%2Fwww.cdc.gov%2Fdrugoverdose%2Ftraining%2Fdosing%2F
var x = HttpUtility.HtmlEncode(value);
var sz = value.Trim();
sz = Uri.EscapeDataString(sz);
for (int i = 1; i < 255; i++)
{
var hex = "%" + i.ToString("X");
if (!illegals.Contains(hex))
{
sz = sz.Replace(hex, Uri.UnescapeDataString(hex));
}
}
return sz;
}
The result is:
Before ==> "1080x1080 Facebook Images"
Uri.EscapeDataString =>
"1080x1080%20Facebook%20Images"
After => "1080x1080 Facebook
Images"
I am sure there is a more efficient way, but the hit seems negligible for my needs.
My SQL Server 2008 R2 database has string columns (nvarchar). And some of the old data is showing ASCII. I need to show it to the user in my site and I prefer to convert the data in the database to Unicode. Is there a quick way to do this? Are there downsides that I should be aware of?
Examples to my issue:
In the database, I see special chars instead of regulars chars. I have a name of a user which is supposed to be Amédée, and instead it shows Am?d??.
In other cases I see " instead of Quotation mark ("), or the chars &# instead of the word "and".
Well if you have accented characters in your database, it's definitely not ASCII. Find out first what codepage you/they were using in the old DB, and convert that to UTF-8, and save to a new database.
I built the following function:
public static string FixString(string textString)
{
textString = Regex.Replace(textString, "[\t\r\n]*", String.Empty);
textString = Regex.Replace(textString, "(\\ )+", " ").Replace(" ", " ").Trim(); // .Replace("-", " ");
textString = Regex.Replace(textString, "\\(.*?\\)", String.Empty);
textString = HttpUtility.HtmlDecode(textString).Trim();
textString = Regex.Replace(textString, "<.*?>", String.Empty);
return textString;
}
and that did the trick!!!