Find Duplicate Function names in different files - merge

I have been merging all of source-code files used by various developers/CAD drafters for the past 15 or so years. It appears that everyone worked off the same code base until about 7 years ago, when everyone seems to have made a local copy of all the files and used/edited them locally.
I have successfully/painfully merged all of their files with the same names back together. However, I am finding that sometimes, files with different names contain functions with the same names and parameters. Tools that are expecting one implementation of a function may end up calling a different one depending on which files were loaded when.
Is there a simple way to search all of the files for repeated function names?
For Example, a function looks like this:
(defun MyInStr (SearchIn SearchFor)
...
)
How could I search all files for (defun MyInStr (SearchIn SearchFor)

I would suggest using ctags to generate the TAGS file, then searching it for duplicate lines:
$ ctags -R
$ sort TAGS -o - | uniq -c | grep -v '^ *1 '
The above will produce output like this:
...
3 defun MyInStr (SearchIn SearchFor)
...
which will tell you that MyInStr is re-defined 3 times in the codebase with the identical signature.
You can also extract just the function name using sed or do a more complicated processing of the TAGS file with perl or lisp or python any other scripting tool.

Related

Variable not being recognized after "read"

-- Edit : Resolved. See answer.
Background:
I'm writing a shell that will perform some extra actions required on our system when someone resizes a database.
The shell is written in ksh (requirement), the OS is Solaris 5.10 .
The problem is with one of the checks, which verifies there's enough free space on the underlying OS.
Problem:
The check reads the df -k line for root, which is what I check in this step, and prints it to a file. I then "read" the contents into variables which I use in calculations.
Unfortunately, when I try to run an arithmetic operation on one of the variables, I get an error indicating it is null. And a debug output line I've placed after that line verifies that it is null... It lost it's value...
I've tried every method of doing this I could find online, they work when I run it manually, but not inside the shell file.
(* The file does have #!/usr/bin/ksh)
Code:
df -k | grep "rpool/ROOT" > dftest.out
RPOOL_NAME=""; declare -i TOTAL_SIZE=0; USED_SPACE=0; AVAILABLE_SPACE=0; AVAILABLE_PERCENT=0; RSIGN=""
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
This is the result:
DBResize.sh[11]: TOTAL_SIZE=/1024: syntax error
I'm pulling hairs at this point, any help would be appreciated.
The code you posted cannot produce the output you posted. Most obviously, the error is signalled at line 11 but you posted fewer than 11 lines of code. The previous lines may matter. Always post complete code when you ask for help.
More concretely, the declare command doesn't exist in ksh, it's a bash thing. You can achieve the same result with typeset (declare is a bash equivalent to typeset, but not all options are the same). Either you're executing this script with bash, or there's another error message about declare, or you've defined some additional commands including declare which may change the behavior of this code.
None of this should have an impact on the particular problem that you're posting about, however. The variables created by read remain assigned until the end of the subshell, i.e. until the code hits a ), the end of a pipe (left-hand side of the pipe only in ksh), etc.
About the use of declare or typeset, note that you're only declaring TOTAL_SIZE as an integer. For the other variables, you're just assigning a value which happens to consist exclusively of digits. It doesn't matter for the code you posted, but it's probably not what you meant.
One thing that may be happening is that grep matches nothing, and therefore read reads an empty line. You should check for errors. Use set -e in scripts to exit at the first error. (There are cases where set -e doesn't catch errors, but it's a good start.)
Another thing that may be happening is that df is splitting its output onto multiple lines because the first column containing the filesystem name is too large. To prevent this splitting, pass the option -P.
Using a temporary file is fragile: the code may be executed in a read-only directory, another process may want to access the same file at the same time... Here a temporary file is useless. Just pipe directly into read. In ksh (unlike most other sh variants including bash), the right-hand side of a pipe runs in the main shell, so assignments to variables in the right-hand side of a pipe remain available in the following commands.
It doesn't matter in this particular script, but you can use a variable without $ in an arithmetic expression. Using $ substitutes a string which can have confusing results, e.g. a='1+2'; $((a*3)) expands to 7. Not using $ uses the numerical value (in ksh, a='1+2'; $((a*3)) expands to 9; in some sh implementations you get an error because a's value is not numeric).
#!/usr/bin/ksh
set -e
typeset -i TOTAL_SIZE=0 USED_SPACE=0 AVAILABLE_SPACE=0 AVAILABLE_PERCENT=0
df -Pk | grep "rpool/ROOT" | read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=TOTAL_SIZE/1024))
Strange...when I get rid of your "declare" line, your original code seems to work perfectly well (at least with ksh on Linux)
The code :
#!/bin/ksh
df -k | grep "/home" > dftest.out
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
print $TOTAL_SIZE
The result :
32962416 5732492 25552588 19% /home
5598
Which are the value a simple df -k is returning. The variables seem to last.
For those interested, I have figured out that it is not possible to use "read" the way I was using it.
The variable values assigned by "read" simply "do not last".
To remedy this, I have applied the less than ideal solution of using the standard "while read" format, and inside the loop, echo selected variables into a variable file.
Once said file was created, I just "loaded" it.
(pseudo code:)
LOOP START
echo "VAR_A="$VAR_A"; VAR_B="$VAR_B";" > somefile.out
LOOP END
. somefile.out

use perl to extract specific output lines

I'm endeavoring to create a system to generalize rules from input text. I'm using reVerb to create my initial set of rules. Using the following command[*], for instance:
$ echo "Bananas are an excellent source of potassium." | ./reverb -q | tr '\t' '\n' | cat -n
To generate output of the form:
1 stdin
2 1
3 Bananas
4 are an excellent source of
5 potassium
6 0
7 1
8 1
9 6
10 6
11 7
12 0.9999999997341693
13 Bananas are an excellent source of potassium .
14 NNS VBP DT JJ NN IN NN .
15 B-NP B-VP B-NP I-NP I-NP I-NP I-NP O
16 bananas
17 be source of
18 potassium
I'm currently piping the output to a file, which includes the preceding white space and numbers as depicted above.
What I'm really after is just the simple rule at the end, i.e. lines 16, 17 & 18. I've been trying to create a script to extract just that component and put it to a new file in the form of a Prolog clause, i.e. be source of(banans, potassium).
Is that feasible? Can Prolog rules contain white space like that?
I think I'm locked into getting all that output from reVerb so, what would be the best way to extract the desirable component? With a Perl script? Or maybe sed?
*Later I plan to replace this with a larger input file as opposed to just single sentences.
This seems wasteful. Why not leave the tabs as they are, and use:
$ echo "Bananas are an excellent source of potassium." \
| ./reverb -q | cut --fields=16,17,18
And yes, you can have rules like this in Prolog. See the answer by #mat. You need to know a bit of Prolog before you move on, I guess.
It is easier, however, to just make the string a a valid name for a predicate:
be_source_of with underscores instead of spaces
or 'be source of' with spaces, and enclosed in single quotes.
You can use probably awk to do what you want with the three fields. See for example the printf command in awk. Or, you can parse it again from Prolog directly. Both are beyond the scope of your current question, I feel.
sed -n 'N;N
:cycle
$!{N
D
b cycle
}
s/\(.*\)\n\(.*\)\n\(.*\)/\2 (\1,\3)/p' YourFile
if number are in output and not jsut for the reference, change last sed action by
s/\^ *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)/\2 (\1,\3)/p
assuming the last 3 lines are the source of your "rules"
Regarding the Prolog part of the question:
Yes, Prolog facts can contain whitespace like this, with suitable operator declarations present.
For example:
:- op(700, fx, be).
:- op(650, fx, source).
:- op(600, fx, of).
Example query and its result, to let you see the shape of terms that are created with this syntax:
?- write_canonical(be source of(a, b)).
be(source(of(a,b))).
Therefore, with these operator declarations, a fact like:
be source of(a, b).
is exactly the same as stating:
be(source(of(a,b)).
Depending on use cases and other definitions, it may even be an advantage to create this kind of facts (i.e., facts of the form be/1 instead of source_of/2). If this is the only kind of facts you need, you can simply write:
source_of(a, b).
This creates no redundant wrappers and is easier to use.
Or, as Boris suggested, you can use single quotes as in 'be source of'/2.

Replace matches of one regex expression with matches from another, across two files

I am currently helping a friend reorganise several hundred images on a database driven website. I have generated a list of the new, reorganised image paths offline and would like to replace each matching image reference in the sql export of the database with the new paths.
EDIT: Here is an example of what I am trying to achieve
The new_paths_list.txt is a file that I generated using a batch script after I had organised all of the existing images into folders. Prior to this all of the images were in just a few folders. A sample of this generated list might be:
image/data/product_photos/telephones/snom/snom_xyz.jpg
image/data/product_photos/telephones/gigaset/giga_xyz.jpg
A sample of my_exported_db.sql (the database exported from the website) might be:
...
,(110,32,'data/phones/snom_xyz.jpg',3),(213,50,'data/telephones/giga_xyz.jpg',0),
...
The result I want is my_exported_db.sql to be:
...
,(110,32,'data/product_photos/telephones/snom/snom_xyz.jpg',3),(213,50,'data/product_photos/telephones/gigaset/giga_xyz.jpg',0),
...
Some pseudo code to illustrate:
1/ Find the first image name in my_exported_db.sql, such as 'snom_xyz.jpg'.
2/ Find the same image name in new_paths_list.txt
3/ If it is present, copy the whole line (the path and filename)
4/ Replace the whole path in in my_exported_db.sql of this image with the copied line
5/ Repeat for all other image names in my_exported_db.sql
A regex expression that appears to match image names is:
([^)''"/])+\.(?:jpg|jpeg|gif|png)
and one to match image names, complete with path (for relative or absolute) is:
\bdata[^)''"\s]+\.(?:jpg|jpeg|gif|png)
I have looked around and have seen that Sed or Awk may be capable of doing this, but some pointers would be greatly appreciated. I understand that this will only work accurately if there are no duplicated filenames.
You can use sed to convert new_paths_list.txt into a set of sed replacement commands:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt > rules.sed
The file rules.sed will look like this:
s#data/snom_xyz.jpg#image/data/product_photos/telephones/snom/snom_xyz.jpg#
s#data/giga_xyz.jpg#image/data/product_photos/telephones/gigaset/giga_xyz.jpg#
Then use sed again to translate my_exported_db.sql:
sed -i -f rules.sed my_exported_db.sql
I think in some shells it's possible to combine these steps and do without rules.sed:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt | sed -i -f - my_exported_db.sql
but I'm not certain about that.
EDIT<:
If the images are in several directories under data/, make this change:
sed "s|image/\(.*\(/[^/]*$\)\)|s#[^']*\2#\1#|" new_paths_list.txt > rules.sed

How to extract strings from plist files for translation (localization)?

I need to prepare list of strings for translation of my iPhone application.
I have extracted strings from *.m files using genstring and from the XIB files using ibtool command.
But I have also lots of texts to translate in plist files (String field types enclosed in string tag).
Is there a nice bash script / command to extract those strings into a flat txt file?
I could review and filter it so my translators can work with nice list but not with alien looking XML file.
I made a custom shell script which tries to figure out the values needed. You can then use the localize.py script in a modified way (see below) to automatically create the translation files. (The line break where somehow very important) If there more entities to be translated, the shell script can be modified accordingly
#!/bin/bash
rm -f $2
sed -n 'N;/<key>Title<\/key>/{N;/<string>.*<\/string>/{s/.*<string>\(.*\)<\/string>.*/\/* \1 *\/\
"\1" = "\1";\
/p;};}' $1 >> $2
sed -n 'N;/<key>FooterText<\/key>/{N;/<string>.*<\/string>/{s/.*<string>\(.*\)<\/string>.*/\/* \1 *\/\
\"\1" = "\1";\
/p;}
;}' $1 >> $2
sed -n 'N;/<key>Titles<\/key>/{N;/<array>/{:a
N;/<\/array>/!{
/<string>.*<\/string>/{s/.*<string>\(.*\)<\/string>.*/\/* \1 *\/\
\"\1" = "\1";\
/p;}
ba
;};};}' $1 >> $2
the localize.py script needed some modification. Therefore I created a small package containing the localizer for the source code and for the plist Files. The new script even supports Duplikates (meaning it will kick them)
We recently made a small online application to do that, please take a look on: http://www.icapps.be/plist-translator/
I can't think of any command off the top of my head. However, plists are glorified xml files and there are various parsers available for them.
It shouldn't be too difficult to create a simple python script to get all the strings from the file.
Does this help?
http://www.icanlocalize.com/site/tutorials/how-to-translate-plist-files/
We much prefer paying clients who use our translation system with our translators, but you can translate yourself in our GUI at no charge.

zsh filename globbling/substitution

I am trying to create my first zsh completion script, in this case for the command netcfg.
Lame as it may sound I have stuck on the first hurdle, disclaimer, I know how to do this crudely, however I seek the "ZSH WAY" to do this.
I need to list the files in /etc/networking but only the files, not the directory component, so I do the following.
echo $(ls /etc/network.d/*(.))
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
What I wanted was:
ethernet-dhcp wireless-wpa-config
So I try (excuse my naivity) :
echo ${(s/*\/)$(ls /etc/network.d/*(.))}
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
It seems that this doesn't work, I'm sure there must be some clever way of doing this by splitting into an array and getting the last part but as I say, I'm complete noob at this.
Any advice gratefully received.
General note: There is no need to use ls to generate the filenames. You might as well use echo some*glob. But if you want to protect the possible embedded newline characters even that is a bad idea. The first example below globs directly into an array to protect embedded newlines. The second one uses printf to generate NUL terminated data to accomplish the same thing without using a variable.
It is easy to do if you are willing to use a variable:
typeset -a entries
entries=(/etc/network.d/*(.)) # generate the list
echo ${entries#/etc/network.d/} # strip the prefix from each one
You can also do it without a variable, but the extra stuff to isolate individual entries is a bit ugly:
# From the inside, to the outside:
# * glob the entries
# * NUL terminate them into a single string
# * split at NUL
# * strip the prefix from each one
echo ${${(0)"$(printf '%s\0' /etc/network.d/*(.))"}#/etc/network.d/}
Or, if you are going to use a subshell anyway (i.e. the command substitution in the previous example), just cd to the directory so it is not part of the glob expansion (plus, you do not have to repeat the directory name):
echo ${(0)"$(cd /etc/network.d && printf '%s\0' *(.))"}
Chris Johnsen's answer is full of useful information about zsh, however it doesn't mention the much simpler solution that works in this particular case:
echo /etc/network.d/*(:t)
This is using the t history modifier as a glob qualifier.
Thanks for your suggestions guys, having done yet more reading of ZSH and coming back to the problem a couple of days later, I think I've got a very terse solution which I would like to share for your benefit.
echo ${$(print /etc/network.d/*(.)):t}
I'm used to seeing basename(1) stripping off directory components; also, you can use echo /etc/network/* to get the file listing without running the external ls program. (Running external programs can slow down completion more than you'd like; I didn't find a zsh-builtin for basename, but that doesn't mean that there isn't one.)
Here's something I hope will help:
haig% for f in /etc/network/* ; do basename $f ; done
if-down.d
if-post-down.d
if-pre-up.d
if-up.d
interfaces