Why does cmp instruction cost too much time? - x86-64

I am trying to do profile with libunwind (using linux perf), with perf top monitoring the target process, I get this assembly time cost screen:
0.19 │ mov %rcx,0x18(%rsp) ▒
│ trace_lookup(): ▒
1.54 │ mov 0x8(%r9),%rcx ▒
│ _ULx86_64_tdep_trace(): ▒
0.52 │ and $0x1,%edx ◆
0.57 │ mov %r14d,0xc(%rsp) ▒
0.40 │ mov 0x78(%rsp),%r10 ▒
1.24 │ sub %rdx,%r15 ▒
│ trace_lookup(): ▒
0.35 │ shl %cl,%r12d ▒
│ _ULx86_64_tdep_trace(): ▒
2.18 │ mov 0x90(%rsp),%r8 ▒
│ trace_lookup(): ▒
0.46 │ imul %r15,%r13 ▒
│ _ULx86_64_tdep_trace(): ▒
0.59 │ mov %r15,0x88(%rsp) ▒
│ trace_lookup(): ▒
0.50 │ lea -0x1(%r12),%rdx ▒
1.22 │ shr $0x2b,%r13 ▒
0.37 │ and %r13,%rdx ▒
0.57 │177: mov %rdx,%rbp ▒
0.43 │ shl $0x4,%rbp ▒
1.33 │ add %rdi,%rbp ▒
0.49 │ mov 0x0(%rbp),%rsi ▒
24.40 │ cmp %rsi,%r15 ▒
│ ↓ jne 420 ▒
│ _ULx86_64_tdep_trace(): ▒
2.10 │18e: movzbl 0x8(%rbp),%edx ▒
3.68 │ test $0x8,%dl ▒
│ ↓ jne 370 ▒
1.27 │ mov %edx,%eax ▒
0.06 │ shl $0x5,%eax ▒
0.73 │ sar $0x5,%al ▒
1.70 │ cmp $0xfe,%al ▒
│ ↓ je 380 ▒
0.01 │ ↓ jle 2f0 ▒
0.01 │ cmp $0xff,%al ▒
│ ↓ je 3a0 ▒
0.02 │ cmp $0x1,%al ▒
│ ↓ jne 298 ▒
0.01 │ and $0x10,%edx ▒
│ movl $0x1,0x10(%rsp) ▒
│ movl $0x1,0x1c8(%rbx) ▒
0.00 │ ↓ je 393
The corresponding source code is here trace_lookup source code, If I read correctly, the number of lines of code corresponding to this hot path cmp instruction is line 296, but I don't know why this line is so slow and cost most of the time?

Command cmp %rsi,%r15 is marked as having huge overhead because it waits for data to be loaded from cache or memory by mov 0x0(%rbp),%rsi command. There is probably L1 or even L2 cache miss on that command.
For the code fragment
│ trace_lookup():
0.50 │ lea -0x1(%r12),%rdx
1.22 │ shr $0x2b,%r13
0.37 │ and %r13,%rdx
0.57 │177: mov %rdx,%rbp
0.43 │ shl $0x4,%rbp
1.33 │ add %rdi,%rbp
0.49 │ mov 0x0(%rbp),%rsi
24.40 │ cmp %rsi,%r15
│ ↓ jne 420
you have 24% of profiling events of the current function accounted to the cmp instruction. Default event for sampling profiling is "cycles" (hardware event for CPU clock cycles) or "cpu-clock" (software event for linear time). So, around 24% of sampling interrupts which did interrupt this function were reported for instruction address of this cmp command. There is systematic skew possible with profiling and modern Out-of-order CPUs, when cost is reported not for command which did run slowly, but for the command which did not finish its execution (retire) quickly. This cmp+jne command pair (fused uop) will change instruction flow of program if %rsi register value is not equal to %r15 register value. For ancient times such command should just read two registers and compare their values, which is fast and should not take 1/4 of function execution time. But with modern CPU registers are not just 32 or 64 bit place to store the value, they have some hidden flags (or renaming techniques) used in Out-of-order engines. In your example, there was mov 0x0(%rbp),%rsi which did change %rsi register. This command is load from memory by address *%rbp. CPU did start this load into cache/memory subsystem and mark %rsi register as "load pending from memory", continuing to execute instructions. There are some chances that next instructions will not require result of that load (which takes some time, for example Haswell: 4 cpu cycles for L1 hit, 12 for L2 hit, 36-66 for L3 hit, and additional 50-100 ns for cache miss and RAM read). But in your case next instruction was cmp+jne with read from %rsi, and that instruction can't be finished until data from memory is written to %rsi (CPU may block in the middle of cmp+jne execution or do many restarts of that command). So, cmp has 24% overhead because that mov did miss nearest caches. With more advanced counters you can estimate which cache it did miss, and which cache/memory layer did serve the request most often.
The corresponding source code is here trace_lookup source code, If I read correctly, the number of lines of code corresponding to this hot path cmp instruction is line 296, but I don't know why this line is so slow and cost most of the time?
With so short asm fragment it can be hard to find corresponding code line in source code of trace_lookup and to find what value and why was not in L1/L2 cache. You should try to write shorted reproducible example.

Related

Kustomize: how to apply the same patch in multiple overlays without LoadRestrictionsNone

I have a kustomize layout something like this:
├──release
│ ├──VariantA
│ │ └──kustomization.yaml
│ │ cluster_a.yaml
| └──VariantB
│ └──kustomization.yaml
│ cluster_b.yaml
└──test
├──TestVariantA
│ └──kustomization.yaml; resources=[VariantA]
│ common_cluster_patch.yaml
└──TestVariantB
└──kustomization.yaml; resources=[VariantB]
common_cluster_patch.yaml
My issue is the duplication of common_cluster_patch.yaml. It is a common patch which I need to apply to the the different base cluster objects. I would prefer not to have to maintain identical copies of it for each test variant.
The 2 unsuccessful solutions I tried are:
A common patch resource
├──release
│ ├──VariantA
│ │ └──kustomization.yaml
│ │ cluster_a.yaml
| └──VariantB
│ └──kustomization.yaml
│ cluster_b.yaml
└──test
├──TestVariantA
│ └──kustomization.yaml; resources=[VariantA, TestPatch]
├──TestVariantB
│ └──kustomization.yaml; resources=[VariantB, TestPatch]
└──TestPatch
└──kustomization.yaml
common_cluster_patch.yaml
This fails with no matches for Id Cluster..., presumably because TestPatch is trying to patch an object it doesn't contain.
A common patch directory
├──release
│ ├──VariantA
│ │ └──kustomization.yaml
│ │ cluster_a.yaml
| └──VariantB
│ └──kustomization.yaml
│ cluster_b.yaml
└──test
├──TestVariantA
│ └──kustomization.yaml; resources=[VariantA]; patches=[../TestPatch/common_cluster_patch.yaml]
├──TestVariantB
│ └──kustomization.yaml; resources=[VariantB]; patches=[../TestPatch/common_cluster_patch.yaml]
└──TestPatch
└──common_cluster_patch.yaml
This fails with: '/path/to/test/TestPatch/common_cluster_patch.yaml' is not in or below '/path/to/test/TestVariantA'.
I can work round this and successfully generate my templates with kustomize build --load-restrictor LoadRestrictionsNone, but this comes with dire warnings and portents. I am hoping that there is some better way of organising my resources which doesn't require either workarounds or duplication.
Thanks to criztovyl for this answer! The solution is kustomize components. Components are currently only defined in kustomize.config.k8s.io/v1alpha1 and the reference documentation is a stub, but they are included in current release versions of kustomize.
My solution now looks like:
├──release
│ ├──VariantA
│ │ └──kustomization.yaml
│ │ cluster_a.yaml
| └──VariantB
│ └──kustomization.yaml
│ cluster_b.yaml
└──test
├──TestVariantA
│ └──kustomization.yaml; resources=[VariantA]; components=[../TestCommon]
├──TestVariantB
│ └──kustomization.yaml; resources=[VariantB]; components=[../TestCommon]
└──TestCommon
└──kustomization.yaml; patches=[common_cluster_patch.yaml]
common_cluster_patch.yaml
where test/TestCommon/kustomization.yaml has the header:
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component
The crucial difference between a component and a resource is that a component is applied after other processing. This means it can patch an object in the resource which included it.

How to compile Unicode characters in LaTex?

I'm trying to write a little piece of text with LaTex in overleaf. All works right until I use Unicode characters.
I want for example insert this Devanagri symbol: ऄ and make it visible after LaTex compiles it.
This is an example of my document:
\documentclass[a4paper,12pt,openright,notitlepage,twoside]{book}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\begin{document}
symbol: ऄ
\end{document}
Whether I compile with LaTex, the symbol doesn't appear and I get this error:
Package inputenc Error: Unicode character ऄ (U+0904) (inputenc)not set up for use with LaTeX.
Whether I compile with LuaLaTex or XeLaTex, the character still does not appear but the error message disappears.
I tried all the methods described in this post: https://tex.stackexchange.com/questions/34604/entering-unicode-characters-in-latex but no one work for me.
does anyone have a solution to figure out this problem?
If you compile with xelatex or lualatex, you'll need to select a font which contains this glyph.
If you work locally on your computer, you can run the command albatross ऄ from the command line to find out which of your fonts has it:
__ __ __
.---.-.| | |--.---.-.| |_.----.-----.-----.-----.
| _ || | _ | _ || _| _| _ |__ --|__ --|
|___._||__|_____|___._||____|__| |_____|_____|_____|
Unicode code point [904] mapping to ऄ
┌─────────────────────────────────────────────────────────────────────────────┐
│ Font name │
├─────────────────────────────────────────────────────────────────────────────┤
│ .LastResort │
├─────────────────────────────────────────────────────────────────────────────┤
│ Devanagari Sangam MN,देवनागरी संगम एम॰एन॰ │
├─────────────────────────────────────────────────────────────────────────────┤
│ ITF Devanagari Marathi,आई॰टी॰एफ़॰ देवनागरी मराठी │
├─────────────────────────────────────────────────────────────────────────────┤
│ ITF Devanagari,आई॰टी॰एफ़॰ देवनागरी │
├─────────────────────────────────────────────────────────────────────────────┤
│ Kohinoor Devanagari,कोहिनूर देवनागरी │
├─────────────────────────────────────────────────────────────────────────────┤
│ Lohit Devanagari │
├─────────────────────────────────────────────────────────────────────────────┤
│ Lohit Hindi │
├─────────────────────────────────────────────────────────────────────────────┤
│ Shobhika,Shobhika Regular │
├─────────────────────────────────────────────────────────────────────────────┤
│ Shree Devanagari 714,श्री देवनागरी ७१४ │
└─────────────────────────────────────────────────────────────────────────────┘
Or if you are using overleaf, consult this list of installed fonts https://www.overleaf.com/latex/examples/fontspec-all-the-fonts/hjrpnxhrrtxc
So in my case, I can take e.g. the Shobhika font:
% !TeX TS-program = xelatex
\documentclass[a4paper,12pt,openright,notitlepage,twoside]{book}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{fontspec}
\setmainfont{Shobhika}
\begin{document}
symbol: ऄ
\end{document}

Is There a Way to do Binomial Regression in PySpark?

I'm working with a PySpark dataframe and I need to do a Binomial regression, with more than one trial in each row. For example, my table looks like this:
┌──────────┬──────────┬─────────────┬────────────┐
│ Features │ # Trials │ # Successes │ # Failures │
├──────────┼──────────┼─────────────┼────────────┤
│ ... │ 10 │ 4 │ 6 │
│ ... │ 7 │ 2 │ 5 │
│ ... │ 5 │ 4 │ 1 │
└──────────┴──────────┴─────────────┴────────────┘
I don't want to 'ungroup' the data. In statsmodels, there is a possibility to directly do a Binomial Regression on the grouped data with a patsy formula:
formula = '# Successes + # Failures ~ Features'
Is there a way to do so in PySpark as well?

Error building `LibCURL` after Pkg.add("SMTPClient") - is there a solution or an alternative?

I attempted the installation of SMTPClient by executing in the REPL:
using Pkg
Pkg.add("SMTPClient")
The last line was not run successully - I got an issue from Julia:
Error: Error building `LibCURL`:
│ ERROR: LoadError: Could not download https://github.com/bicycle1885/ZlibBuilder/releases/download/v1.0.4/build_Zlib.v1.2.11.jl to /home/jerzy/.julia/packages/LibCURL/lWJxD/deps/build_Zlib.v1.2.11.jl:
│ ErrorException("")
│ Stacktrace:
│ [1] error(::String) at ./error.jl:33
│ [2] #download#96(::Bool, ::Function, ::String, ::String) at /home/jerzy/.julia/packages/BinaryProvider/U2dKK/src/PlatformEngines.jl:619
│ [3] download(::String, ::String) at /home/jerzy/.julia/packages/BinaryProvider/U2dKK/src/PlatformEngines.jl:606
│ [4] top-level scope at /home/jerzy/.julia/packages/LibCURL/lWJxD/deps/build.jl:19
│ [5] include at ./boot.jl:317 [inlined]
│ [6] include_relative(::Module, ::String) at ./loading.jl:1044
│ [7] include(::Module, ::String) at ./sysimg.jl:29
│ [8] include(::String) at ./client.jl:392
│ [9] top-level scope at none:0
│ in expression starting at /home/jerzy/.julia/packages/LibCURL/lWJxD/deps/build.jl:14
│ --2020-06-14 20:26:37-- https://github.com/bicycle1885/ZlibBuilder/releases/download/v1.0.4/build_Zlib.v1.2.11.jl
│ Resolving github.com (github.com)... 140.82.118.4
│ Connecting to github.com (github.com)|140.82.118.4|:443... connected.
│ HTTP request sent, awaiting response... 302 Found
│ Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/126450947/3549c780-3c62-11e9-9144-67fac571e02a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200614%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200614T182643Z&X-Amz-Expires=300&X-Amz-Signature=fe638ea82a12f0d962d80996378f917f7d5ccd387197248e5a7ee9239f11f6d3&X-Amz-SignedHeaders=host&actor_id=0&repo_id=126450947&response-content-disposition=attachment%3B%20filename%3Dbuild_Zlib.v1.2.11.jl&response-content-?type=application%2Foctet-stream [following]
│ --2020-06-14 20:26:43-- https://github-production-release-asset-2e65be.s3.amazonaws.com/126450947/3549c780-3c62-11e9-9144-67fac571e02a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200614%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200614T182643Z&X-Amz-Expires=300&X-Amz-Signature=fe638ea82a12f0d962d80996378f917f7d5ccd387197248e5a7ee9239f11f6d3&X-Amz-SignedHeaders=host&actor_id=0&repo_id=126450947&response-content-disposition=attachment%3B%20filename%3Dbuild_Zlib.v1.2.11.jl&response-content-type=application%2Foctet-stream
│ Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... failed: Name or service not known.
│ wget: unable to resolve host address ‘github-production-release-asset-2e65be.s3.amazonaws.com’
└ # Pkg.Operations /build/julia-wJr69F/julia-1.0.3+dfsg/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:1097
Consistent with the feedback from the REPL, I then proceeded to run the following:
Pkg.build("LibCURL")
and got back from Julia the following:
Error: Error building `LibCURL`:
│ ERROR: LoadError: UndefVarError: products not defined
│ Stacktrace:
│ [1] getproperty(::Module, ::Symbol) at ./sysimg.jl:13
│ [2] top-level scope at /home/jerzy/.julia/packages/LibCURL/lWJxD/deps/build.jl:26
│ [3] include at ./boot.jl:317 [inlined]
│ [4] include_relative(::Module, ::String) at ./loading.jl:1044
│ [5] include(::Module, ::String) at ./sysimg.jl:29
│ [6] include(::String) at ./client.jl:392
│ [7] top-level scope at none:0
│ in expression starting at /home/jerzy/.julia/packages/LibCURL/lWJxD/deps/build.jl:14
└ # Pkg.Operations /build/julia-wJr69F/julia-1.0.3+dfsg/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:1097
I then restarted the REPL, to no avail.
Is there a fix to this issue?
My environment: Debian 10 LTS. I intend to send emails from my desktop using third-party SMTP servers. Are there perchance any other packages that would do this job?
Most likely the network connection failed in the middle of installation process of LibCURL.
Try removing the package and adding again:
using Pkg
pkg"rm SMTPClient"
pkg"rm LibCURL"
pkg"add LibCURL"
Another possible reasons could be no disk space/quota left or some problem with writing rights to your .julia location.

What is the exact difference between Windows-1252(1/3/4) and ISO-8859-1?

We are hosting PHP apps on a Debian based LAMP installation.
Everything is quite ok - performance, administrative and management wise.
However being a somewhat new devs (we're still in high-school) we've run into some problems with the character encoding for Western Charsets.
After doing a lot of researches I have come to the conclusion that the information online is somewhat confusing. It's talking about Windows-1252 being ANSI and totally ISO-8859-1 compatible.
So anyway, What is the difference between Windows-1252(1/3/4) and ISO-8859-1?
And where does ANSI come into this anyway?
What encoding should we use on our Debian servers (and workstations) in order to ensure that clients get all information in the intended way and that we don't lose any chars on the way?
I'd like to answer this in a more web-like manner and in order to answer it so we need a little history. Joel Spolsky has written a very good introductionary article on the absolute minimum every dev should know on Unicode Character Encoding.
Bear with me here because this is going to be somewhat of a looong answer. :)
As a history I'll point to some quotes from there: (Thank you very much Joel! :) )
The only characters that mattered were good old unaccented English letters, and we had a code for them called ASCII which was able to represent every character using a number between 32 and 127. Space was 32, the letter "A" was 65, etc. This could conveniently be stored in 7 bits. Most computers in those days were using 8-bit bytes, so not only could you store every possible ASCII character, but you had a whole bit to spare, which, if you were wicked, you could use for your own devious purposes.
And all was good, assuming you were an English speaker.
Because bytes have room for up to eight bits, lots of people got to thinking, "gosh, we can use the codes 128-255 for our own purposes." The trouble was, lots of people had this idea at the same time, and they had their own ideas of what should go where in the space from 128 to 255.
So now "OEM character sets" were distributed with PCs and these were still all different and incompatible. And to our contemporary amazement - it was all fine! They didn't have the Internet back than and people rarely exchanged files between systems with different locales.
Joel goes on saying:
In fact as soon as people started buying PCs outside of America all kinds of different OEM character sets were dreamed up, which all used the top 128 characters for their own purposes.
Eventually this OEM free-for-all got codified in the ANSI standard. In the ANSI standard, everybody agreed on what to do below 128, which was pretty much the same as ASCII, but there were lots of different ways to handle the characters from 128 and on up, depending on where you lived. These different systems were called code pages.
And this is how the "Windows Code pages" were born, eventually. They were actually "parented" by the DOS code pages. And then Unicode was born! :) and UTF-8 is "another system for storing your string of Unicode code points" and actually "every code point from 0-127 is stored in a single byte" and is the same as ASCII. I will not go into anymore specifics of Unicode and UTF-8, but you should read up on the BOM, Endianness and Character Encoding as a general.
On "the ANSI conspiracy", Microsoft actually admits the miss-labeling of Windows-1252 in a glossary of terms:
The so-called Windows character set (WinLatin1, or Windows code page 1252, to be exact) uses some of those positions for printable characters. Thus, the Windows character set is NOT identical with ISO 8859-1. The Windows character set is often called "ANSI character set", but this is SERIOUSLY MISLEADING. It has NOT been approved by ANSI.
So, ANSI when refering to Windows character sets is not ANSI-certified! :)
As Jukka pointed out (credits go to you for the nice answer )
Windows-1252 ISO Latin 1, also known as ISO-8859-1 as a character encoding, so that the code range 0x80 to 0x9F is reserved for control characters in ISO-8859-1 (so-called C1 Controls), wheres in Windows-1252, some of the codes there are assigned to printable characters (mostly punctuation characters), others are left undefined.
However my personal opinion and technical understanding is that both Windows-1252 and ISO-8859-1 ARE NOT WEB ENCODINGS! :) So:
For web pages please use UTF-8 as encoding for the content
So store data as UTF-8 and "spit it out" with the HTTP Header: Content-Type: text/html; charset=utf-8.
There is also a thing called the HTML content-type meta-tag:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Now, what browsers actually do when they encounter this tag is that they start from the beginning of the HTML document again so that they could reinterpret the document in the declared encoding. This should happen only if there is no 'Content-type' header.
Use other specific encodings if the users of your system need files generated from it.
For example some western users may need Excel generated files, or CSVs in Windows-1252. If this is the case, encode text in that locale and then store it on the fs and serve it as a download-able file.
There is another thing to be aware of in the design of HTTP:
The content-encoding distribution mechanism should work like this.
I. The client requests a web page in a specific content-types and encodings via: the 'Accept' and 'Accept-Charset' request headers.
II. Then the server (or web application) returns the content trans-coded to that encoding and character set.
This is NOT THE CASE in most modern web apps. What actually happens it that web applications serve (force the client) content as UTF-8. And this works because browsers interpret received documents based on the response headers and not on what they actually expected.
We should all go Unicode, so please, please, please use UTF-8 to distribute your content wherever possible and most of all applicable. Or else the elders of the Internet will haunt you! :)
P.S.
Some more nice articles on using MS Windows characters in Web Pages can be found here and here.
The most authoritative reference to meanings of character encoding names is the IANA registry Character Sets.
Windows-1252 is commonly known as Windows Latin 1 or as Windows West European or something like that. It differs from ISO Latin 1, also known as ISO-8859-1 as a character encoding, so that the code range 0x80 to 0x9F is reserved for control characters in ISO-8859-1 (so-called C1 Controls), wheres in Windows-1252, some of the codes there are assigned to printable characters (mostly punctuation characters), others are left undefined.
ANSI comes here as a misnomer. Microsoft once submitted Windows-1252 to American National Standards Institute (ANSI) to be adopted as a standard; the proposal was rejected, but Microsoft still calls their code “ANSI”. For further confusion, they may use “ANSI” for different encodings (basically, the “native 8-bit encoding” of a Windows installation).
In the web context, declaring ISO-8859-1 will be taken as if you declared Windows-1252. The reason is that C1 Controls are not used, or useful, on the web, whereas the added characters are often used, even on pages mislabelled as ISO-8859-1. So in practical terms it does not matter which one you declare.
There might still be some browsers that actually interpret data as ISO-8859-1 if declared so, but they must be very rare (the last I remember seeing was a version of Opera about ten years ago).
You do not describe what problems you have encountered. The most common cause of problems seems to be that data is actually UTF-8 encoded but declared as ISO-8859-1 (or Windows-1252), or vice versa. This becomes a real problem to web page authors if a server forces a Content-Type header declaring a character encoding and it is one that they cannot deal with in their authoring environment (or don’t know how to do that).
This table gives an overview about the differences. It shows all characters which are defined in Windows-1252 but not available in ISO-8859-1/ISO-8859-15:
│ …0 │ …1 │ …2 │ …3 │ …4 │ …5 │ …6 │ …7 │ …8 │ …9 │ …A │ …B │ …C │ …D │ …E │ …F │
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
8… │ € │ │ ‚ │ ƒ │ „ │ … │ † │ ‡ │ ˆ │ ‰ │ Š │ ‹ │ Œ │ │ Ž │ │
Unicode │ 20AC │ │ 201A │ 0192 │ 201E │ 2026 │ 2020 │ 2021 │ 02C6 │ 2030 │ 0160 │ 2039 │ 0152 │ │ 017D │ │
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
9… │ │ ‘ │ ’ │ “ │ ” │ • │ – │ — │ ˜ │ ™ │ š │ › │ œ │ │ ž │ Ÿ │
Unicode │ │ 2018 │ 2019 │ 201C │ 201D │ 2022 │ 2013 │ 2014 │ 02DC │ 2122 │ 0161 │ 203A │ 0153 │ │ 017E │ 0178 │
Unlike Windows-1252 range 0x80…0x9F is used for Control Codes in ISO-8859-1.
This table shows the differences between Windows-1252, ISO-8859-1 and ISO-8859-15
Character │ € │ Š │ š │ Ž │ ž │ Œ │ œ │ Ÿ │ ¤ │ ¦ │ ¨ │ ´ │ ¸ │ ¼ │ ½ │ ¾ │
───────────────────────────────────────────────────────────────────────────────────────────────────────
ISO 8859-1 │ – │ – │ – │ – │ – │ – │ – │ – │ A4 │ A6 │ A8 │ B4 │ B8 │ BC │ BD │ BE │
ISO 8859-15 │ A4 │ A6 │ A8 │ B4 │ B8 │ BC │ BD │ BE │ – │ – │ – │ – │ – │ – │ – │ – │
Windows-1252 │ 80 │ 8A │ 9A │ 8E │ 9E │ 8C │ 9C │ 9F │ A4 │ A6 │ A8 │ B4 │ B8 │ BC │ BD │ BE │
Unicode │ 20AC │ 160 │ 161 │ 17D │ 17E │ 152 │ 153 │ 178 │ A4 │ A6 │ A8 │ B4 │ B8 │ BC │ BD │ BE │
ANSI (Windows-1252) in countries with an english/latin alphabet, e.g. UK/US/France/Germany and others, refers to the Windows-1252 encoding. https://web.archive.org/web/20170916200715/http://www.microsoft.com:80/resources/msdn/goglobal/default.mspx
Windows-1252. and ISO-8859-1 are very similar. They only differ
in 32 characters.
In Windows-1252, the characters from 128 to 159 are used for some useful
characters such as the Euro symbol.
In ISO-8859-1 these characters are mapped to control characters which
are useless in HTML.
__
so a suggestion
so see if 128 is euro symbol.. if it is it's Windows 1252.
__
The codes from 128 to 159 are not in use in ISO-8859-1, but many
browsers will display the characters from the Windows-1252)
character set instead of nothing.
These 2 links list them both.
http://www.w3schools.com/charsets/ref_html_ansi.asp
http://www.w3schools.com/charsets/ref_html_8859.asp
Some comments were very useful and I amended my post accordingly based on them.
Chenfeng points out
On Windows, "ANSI" refers to the system codepage specified by the locale, whatever that is (Arabic/Chinese/Cyrillic/Vietnamese/...). It does not [necessarily] refer.. to Windows-1252. You can test this by changing your locale and then use notepad.exe to save a text file in "ANSI". According to this MS documentation, there are 14 different "ANSI" code pages https://learn.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers
Wernfriend points out
https://web.archive.org/web/20170916200715/http://www.microsoft.com:80/resources/msdn/goglobal/default.mspx and that usa codepage 437 is the 'OEM codepage', (see OEM column), and the OEM codepage is the one used by the cmd prompt. And he points out / suggests, showing from that webpage, that in many non-english/latin-alphabet speaking countries ansi is not windows 1252. I notice that for example, hebrew ansi uses 1255. (hebrew OEM codepage is 862).