Jump to content

Charset issues from Reporting: " ‘ " and " â€™ " showing in csv output instead of left and right single quotation mark


Berto2002

Recommended Posts

We have a report which correctly displays what I would call inverted commas in the data preview but when the data is output to CSV, some of the open and close inverted commas show as " ‘ " and " â€™ ". Example of data showing in the CSV: " ‘witches hat’ ". Example of the data in the UI:

image.png.322d7364a8e04945e846018ea6921172.png

I found this site: encoding - "’" showing on page instead of " ' " - Stack Overflow which states "It's a  (RIGHT SINGLE QUOTATION MARK - U+2019) character which is being decoded as CP-1252 instead of UTF-8"

I have other examples of the data where this issue is not occurring such as this data in the CSV where the inverted comma is as it should be:

image.png.8e702b9177dc34b36620ab84fd661565.png

Can you please review why the reporting engine is spitting-out these strange characters in some circumstances. I can log this with support if you need access to my instance.

Link to comment
Share on other sites

  • Berto2002 changed the title to Charset issues from Reporting: " ‘ " and " â€™ " showing in csv output instead of left and right single quotation mark

@Steve Giller the main issue I am reporting is that the HB GUI is clearly handling these correctly (readably), as is the reporting data preview but the csv output is not. Would you not consider this an issue?

If not, how can I prevent such chars ending in reporting?

Link to comment
Share on other sites

I'll defer to any Developer who jumps on should they offer a different view, but my understanding is that Hornbill have control of the Hornbill GUI, and this is working - but Hornbill doesn't have control of whatever you're displaying the csv file in.

When displaying the character we can say "This should appear in this way." but when exporting data we cannot change it to a different value, as we do not alter Customer Data. It may be that exact character for a reason.

Link to comment
Share on other sites

@Berto2002,

A specifc technical point:

The apostrophe in the children's sample is an actual APOSTROPHE and neither a RIGHT nor LEFT SINGLE QUOTATION MARK - which are visible in the ‘witches hat’ example. In most fonts they can be distinguished by the apostrophe being less fancy/swishy than the left single quotation mark (some fonts use the exact same definition (similar to 1 (one), l (lowercase L) and I (uppercase i) being similar in some font-sets)) - similarly the backtick/accent-grave matches the right single quotation mark.

As @Steve Gillermentioned, some word processors automate the use of those quotation marks - i.e. when you press the apostrophe button on your keyboard it "decides" which quotation mark to use. Just type (don't copy-paste) 'D and on another line D' in MS Word - you will see the difference. And, somewhat naturally, those UTF-8 characters are copied along when someone copies and pastes.

IF you were to switch/force the encoding of the .csv file to UTF-8, then the characters should show the way you expect to see them.

I would venture that whatever you are using to view the .csv file is treating that .csv file as a ASCII file (or in the very least an 8-bit character encoding). Enhanced text editors (eg Notepad++) allow one to set the character encoding in which the text can be viewed (in Notepad++ you will find this under a menu called "Encoding"). Force this to UTF-8 and the data should be visible as expected.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...